--11
Evolutionary Feature Extraction for SAR Air to Ground Moving Target Recognition – a Statistical Approach
Evolving Hardware
Dr. Janusz StarzykOhio University
--22
Neural Network Data Classification
Concept of “ Logic Brain”
Random learning data generation
Multiple space classification of data
Feature function extraction
Dynamic selectivity strategy
Training procedure for data identification
FPGA implementation for fast training process
--33
Neural Network Data Classification
Concept of “ Logic Brain” Threshold setup converts analog to digital worldThreshold setup converts analog to digital world “ “Logic Brain” is possible based on artificial Logic Brain” is possible based on artificial neural neural
network network Random learning data generation
Gaussian distribution random multiple dimensionGaussian distribution random multiple dimension data generationdata generation
Half data sets prepared for learning procedureHalf data sets prepared for learning procedure Another half used later for training procedureAnother half used later for training procedure
Abdulqadir Alaqeeli, and Jing Pang
--44
Neural Network Data Classification
Multiple space classification of data Each space can be represented by a set of Each space can be represented by a set of minimum base vectorsminimum base vectors
Feature function extraction and dynamic selecting strategy Conditional entropy extracts information in eachConditional entropy extracts information in each
subspacesubspace Different combinations of base vectors compose Different combinations of base vectors compose the redundant sets of new subspacethe redundant sets of new subspace
expansion strategyexpansion strategy Minimum function selection Minimum function selection
shrinking strategyshrinking strategy
--55
Neural Network Data Classification
FPGA implementation for fast training process
Learning results are saved on boardLearning results are saved on board
Testing data sets are generated on board and sentTesting data sets are generated on board and sent
through the artificial neural network generated onthrough the artificial neural network generated on
board to test the successful data classification rateboard to test the successful data classification rate
The results are displayed on board The results are displayed on board
Promising application
Especially useful for feature extraction of large data Especially useful for feature extraction of large data setssets
Catastrophic circuit fault detectionCatastrophic circuit fault detection
--66
Information Index: Background
• A priori class probabilities are known
• Entropy measure based on conditional probabilities
H(X;Y) = P *log(P ) + P *log(P ) +
+ P *log(P ) + P *log(P )
- P *log(P ) - P *log(P )
1w 1w 2w 2w
12w 12w 21w 21w
1 1 2 2
XXX
X
X
X
X
X XX
XXX
OO OO
OOO
OO
OO
Class A
Class B
X
--77
Information Index: Background
• P1 and P2 and a priori class probabilities
• P1w and P2w are conditional probabilities of correct
classification for each class
• P12w and P21w are conditional probabilities of misclassification given a test signal
• P1w , P2w, P12w and P21w are calculated using Bayesian estimates of their probability density functions
--88
Information Index: Background
• probability density functions of P1w , P2w, P12w, P21w
pdf pdfpdf P
pdf P pdf Pw1 11 1
1 1 2 2
**
* *
pdf pdfpdf P
pdf P pdf Pw2 22 2
1 1 2 2
**
* *
pdf pdfpdf P
pdf P pdf Pw12 12 2
1 1 2 2
**
* *
pdf pdfpdf P
pdf P pdf Pw21 21 1
1 1 2 2
**
* *
--99
Direct Integration
m
iiSpdf
for N dimensions , mn grid points are needed to estimate pdf
S S i i < < S S kk
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 00
0. 2
0. 4
0. 6
0. 8
1
1. 2
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 00
0. 2
0. 4
0. 6
0. 8
1
1. 2
uniform uniform
gridgrid
nonuniform nonuniform
gridgrid
S S i i = = S S kk
S S kk S S iiS S kk
S S ii
--1010
Monte Carlo Integration
-100 0 100 200 300 400 500 600 7000
0.5
1
1.5
pdfpdf11
pdfpdf22
W(XW(Xii))
xxi i
m
XPP
m
ii
1
11
m
XWPP
m
ii
w
11
Xi generated with pdf1
--1111
Information Index: probability density functions
P2w
-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1pdfs of the dominating feature for BMP2 and BTR60
--1212
Information Index: weighted pdfs
P2w
-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1misclassification pdfs of the dominating feature for BMP2 and BTR60
feature value
rela
tive
dens
ity
--1313
Information Index: Monte Carlo Integration
• To integrate the probability density function– generate random points xi with pdf1
– weight generated points according to
– estimate the conditional probability P1w using
w xpdf P
pdf P pdf Pi11 1
1 1 2 2
( )*
* *
P P w x mw ii
m
1 1 11
( ) /
--1414
Information Index and Probability of Misclassification
0 0.2 0.4 0.6 0.8 10
0.02
0.04
0.06
0.08
0.1
0.12
0.14
information
cond
ition
al p
roba
bilit
y P
12w
--1515
Standard Deviation of Information in MC Simulation
0
5
10
15 0.2
.5.75
1
-20
-15
-10
-5
0
average information levellog2 of the number of the MC points
log2
of
the
info
rmat
ion
erro
r
--1616
Normalized Standard Deviation of Information
0
5
10
15 0.2
.5.75
1
-15
-10
-5
0
5
average information levellog2 of the number of the MC points
log2
of t
he n
orm
aliz
ed in
form
atio
n er
ror
--1717
Information Index: Status
• MIIFS was generalized to continuous distributions
• N-dimensional information index was developed
• Efficient N-dimensional integration was used
• Information error analysis was performed
• Information index can be used with non Gaussiandistributions
• For small training sets and low information index information error is larger than information
--1818
Optimum Transformation: Background
• Principal Component Analysis (PCA) based on Mahalanobis distance suffers from scaling
• PCA assumes Gaussian distributions and estimates covariance matrices and mean values
• PCA is sensitive to outliers
• Wavelets provide compact data representation and improve recognition
• Improvement shows no statistically significant difference in recognition for different wavelets
• Need for a specialized transformation
--1919
Optimum Transformation: Haar Wavelet
• Example
H =(a + a )
2 i = 0...(N / 2) -1 average
H = (a - a ) i = 0...(N / 2) -1 difference
i2i 2i+1
N/2+i 2i 2i+1
a0 a1 a2 a3 a4 a5 a6 a7
Input Signal 0.0 0.5 1.0 0.5 0 -0.5 -1 -0.5
Haar coefficients 0.25 0.75 -0.25 -0.75 -0.5 0.5 0.5 -0.5
--2020
Optimum Transformation: Haar Wavelet
• Repeat average and difference log2(n) times
Input Signal
Level 0
0 0.5 1 0.5 0 -0.5 -1 -0.5
Level 1 0.25 0.75 -0.25 -0.75 -0.5 0.5 0.5 -0.5
Level 2 0.5 -0.5 -0.5 0.5 0 0 -1 1
Level 3 0 1 0 -1 0 0 0 -2
--2121
Optimum Transformation: Haar Wavelet
• Waveform interpretation
--2222
Optimum Transformation: Haar Wavelet
• Matrix interpretation
• b=W*a where W
L
N
MMMMMMMMMMMMMMMMM
O
Q
PPPPPPPPPPPPPPPPP
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
5 5 0 0
0 0 5 5
1 1 0 0
0 0 1 1
25 25 25 25
5 5 5 5
5 5 5 5
1 1 1 1
. .
. .
. . . .
. . . .
. . . .
--2323
Optimum Transformation: Haar Wavelet
• Matrix interpretation for the class of signals B=W*A
• where A is (n x m) input signal matrix
• Selection of n best coefficients performed using the information index
Bs1=S1*W*A
• where S1 is (n x n*log2(n)) selection matrix
--2424
Optimum Transformation: Evolutionary Iterations
• Iterating on the selected result Bs2=S2*W* Bs1
• where S2 is a selection matrix orBs2=S2*W* S1*W* A
• after k iterations Bsk= Sk*W* ... S2*W* S1*W* A
• So, the optimized transformation matrix T= Sk*W* ... S2*W* S1*W
• can be obtained from the Haar wavelet
--2525
Optimum Transformation: Evolutionary Iterations
• Learning with the evolved features
0 20 40 60 80 100 120 1400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Most selective coefficients in decreasing order
Info
rmat
ion
valu
eSelected coefficients in two class ATR problem
--2626
Optimum Transformation: Evolutionary Iterations
• Waveform interpretation of T rows
0 50 100 150-2
-1
0
1
2
0 50 100 150-2
-1
0
1
2
0 50 100 150-2
-1
0
1
2
0 50 100 150-2
-1
0
1
2
--2727
Optimum Transformation: Evolutionary Iterations
• Mean values and the evolved transformation
0 20 40 60 80 100 120 140-1.5
-1
-0.5
0
0.5
1
1.5Original Signals and the evolved transformation
Bin Index
Sig
na
l Va
lue
--2828
Two Class Training
• Training on HRR signals 17o depression angle profiles of BMP2 and BTR60
0 5 10 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Information for BTR60 and BMP2
number of features
info
rma
tion
and
err
or
--2929
window
t
8bit
8bit
Sample # 1
Sample # m8bit
8bit
Haar-WaveletTransform
N.N. input signal is recognized
1
k
Note: k m
Wavelet-Based Reconfigurable FPGA for Classification
--3030
Block Diagram of The Parallel Architecture
R R R R RRRR
A A A A DDDD
A A D D DDAA
A D A D DADA
D= DIFFERENCE With registerd Out
A= AVERAGE With registerd Out
R= REGISTER
8 InputSamples
8 OutputSamples
0 1 2 3 4 5 6 7
(0+1)/2
(0+1)/2
0 1 2 3 4 5 6 7
(0+1)/2
0 1 2 3 4 5 6 7
--3131
Simplified Block Diagram of The Serial Architecture
R R
A R D R
D RA R D RA R
A DA D A DA D
R: register using CLBs A: registered Average D: registered difference
R: register using IOBs2
01
(0+1)/2 (0-1)(2+3)/2 (2-3)(0+1)/2 (0-1)
4
23First the BlueSecond the Green
--3232
RAM-Based Wavelet
RAM16x8
RAM16x8
RAM16x8
RAM16x8
RAM16x8PEPE PE PE
WA RA WA RA WA RA WA RA WA RA
Control Control Control Control
DoneStart
DataIn
--3333
The Processing Element
2
BA
A - B
REGISTER
REGISTER
Dt+1
Dt
A
B
A
B
8
M
Dout
8
5 CLBs
5 CLBs
Note:A=Dt and B=Dt+1
20 8 10 2 11
10 2 11 X
8 10 2 11-8 9 XX
9 6 5 X
9 9 5 X
00101
--3434
Results: For One Iteration of Haar Wavelet
• For 8 samples:– Parallel arch.: 120 CLBs, 128 IOBs, 58ns.
– Serial arch. : 98 CLBs*, 72 IOBs, 148ns*.
Parallel Arch. wins for larger number of samples.
• For 16 samples:– Parallel arch.: 320 CLBs, 256 IOBs, 233ns.
– RAM-Based arch.: 136 CLBs, 16 IOBs, ~ 1s.
RAM-Based Arch. Wins since 1s is not so slow.
------------------------------------------------------------* These values increase very fast when the # of samples increases, and the
delay becomes extremely higher.
--3535
Reconfigurable Haar-Wavelet-Based Architecture
ROM#1
ROM#2
ROM#3
RAM#1
RAM#2
RAM#3
RAM#4
RAM#5
TemporaryRAM
Controllers for Selecting Coefficients
Data In
Feedback
Selected Coefficients
PE PE PE PE
‘ Data
--3636
--3737
Test Results
• Testing on HRR signals 15o depression angle profiles of BMP2 and BTR60
• With 15 features selected correct classification for BMP2 data is 69.3% and for BTR60 is 82.6%
• Comparable results in SHARP Confusion Matrix for BMP2 data is 56.7% and for BTR60 is 67%
--3838
Problem Issues
• BTR60 signals with 17o and 15o depression angles do not have compatible statistical distributions
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
--3939
Problem Issues
• BMP2 and BTR60 signal distributions are not Gaussian
-0.5 0 0.5 1 1.5 2 2.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5Two dimensional projection of transformed two class HRRdata
--4040
Work Completed
• Information index and its properties
• Multidimensional MC integration
• Information as a measure of learning quality
• Information error
• Wavelets and their effect on pattern recognition
• Haar wavelet as a linear matrix operator
• Evolution of the Haar wavelet
• Statistical support for classification
--4141
Recommendations and Future Work
• Training Data must represent a statistical sample of all signals not a hand picked subset
• Probability density functions will be approximated using parametric or NN approach
• Information measure will be extended to k-class problems
• Training and test will be performed on 12 class data
• Dynamic clustering will prepare decision tree structure
• Hybrid, evolutionary classifier will be developed