lec 08: feature aggregation ii - sce.umkc.edu€¦ · lec 08 summary • fisher vector •aggregate...
TRANSCRIPT
Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm
ECE 5582 Computer VisionLec 08: Feature Aggregation II
Zhu LiDept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu
Z. Li: ECE 5582 Computer Vision, 2019. p.1
slides created with WPS Office Linux and EqualX LaTex equation editor
Outline
• ReCap of Lecture 07• Image Retrieval System• BoW • VLAD
• Dense SIFT• Fisher Vector Aggregation• AKULA• Summary
Z. Li, Image Analysis & Retrv. Spring 2018 p.2
Precision, Recall, F-measure
• Precision, TPR = TP/(TP + FP),
• Recall = TP/(TP + FN),
• FPR=FP/(TP+FP)
• F-measure
= 2*(precision*recall)/(precision + recall)
Precision: is the probability that a
retrieved document is relevant.
Recall: is the probability that a
relevant document is retrieved in a search.
Z. Li, Image Analysis & Retrv. Spring 2018 p.3
Why Aggregation ?
• Curse of Dimensionality
•Decision Boundary / Indexing
Z. Li, Image Analysis & Retrv. Spring 2018 p.4
+
…..
Bag-of-Words: Histogram Coding
Z. Li, Image Analysis & Retrv. Spring 2018 p.5
k
n
Kernel Code Book Soft Encoding
Z. Li, Image Analysis & Retrv. Spring 2018 p.6
VLAD- Vector of Locally Aggregated Descriptors
Z. Li, Image Analysis & Retrv. Spring 2018 p.7
3
x
v1 v2 v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
VLAD on SIFT
• Example of aggregating SIFT with VLAD• K=16 codebook entries• Each cell is a SIFT visualized as centroids in blue, and
VLAD difference in red• Top row: left image, bottom row: right image, red: code
book, blue: encoded VLAD
Z. Li, Image Analysis & Retrv. Spring 2018 p.8
Outline
• ReCap of Lecture 07• Image Retrieval System• BoW • VLAD
• Dense SIFT• Fisher Vector Aggregation• AKULA• Summary
Z. Li, Image Analysis & Retrv. Spring 2018 p.9
One more trick
• Recall that SIFT is a powerful descriptor
• VL_FEAT: vl_dsift • A dense description of image by computing SIFT descriptor
(no spatial-scale space extrema detection) at predetermined grid
• Supplement HoG as an alternative texture descriptor
Z. Li, Image Analysis & Retrv. Spring 2018 p.10
VL_FEAT: vl_dsift
• Compute dense SIFT as a texture descriptor for the image• [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘step’, 2);
• There’s also a FAST option• [f, dsift]=vl_dsift(single(rgb2gray(im)), ‘fast’, ‘step’, 2);• Huge amount of SIFT data will be generated
Z. Li, Image Analysis & Retrv. Spring 2018 p.11
Fisher Vector
• Fisher Vector and variations:• Winning in image classification:
• Winning in the MPEG object re-identification:o SCFV(Scalable Coded Fisher Vec) in CDVS
Z. Li, Image Analysis & Retrv. Spring 2018 p.12
Codebook: Gaussian Mixture Model (GMM)
Z. Li, Image Analysis & Retrv. Spring 2018 p.13
A bit of Theory: Fisher Kernel
Z. Li, Image Analysis & Retrv. Spring 2018 p.14
X1 +
A bit of Theory: Fisher Kernel
Z. Li, Image Analysis & Retrv. Spring 2018 p.15
Fisher Vector
• KFK(X, Y) is a measure of similarity, w.r.t. the generative model• Similar to the Mahanolibis distance
case, we can decompose this kernel as,
• That give us a kernel feature mapping of X to Fisher Vector
• For observed images features {xt}, can be computed as,
Z. Li, Image Analysis & Retrv. Spring 2018 p.16
GMM Fisher Vector
Z. Li, Image Analysis & Retrv. Spring 2018 p.17
weight
mean
variance
GMM Fisher Vector VL_FEAT implementation
Z. Li, Image Analysis & Retrv. Spring 2018 p.18
GMM Fisher Vector VL_FEAT implementation
• FV encoding• Gradient w.r.t. the mean, variance, for GMM component k,
j=1..D
• In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances
Z. Li, Image Analysis & Retrv. Spring 2018 p.19
VL_FEAT GMM/FV API
• Compute GMM model with VL_FEAT• Prepare data:numPoints = 1000 ; dimension = 2 ;data = rand(dimension,N) ;
• Call vl_gmm:numClusters = 30 ;[means, covariances, priors] = vl_gmm(data, numClusters) ;
• Visualize:figure ;hold on ;plot(data(1,:),data(2,:),'r.') ;for i=1:numClusters vl_plotframe([means(:,i)' sigmas(1,i) 0 sigmas(2,i)]);end
Z. Li, Image Analysis & Retrv. Spring 2018 p.20
VL_FEAT API
• FV encodingencoding = vl_fisher(data_to_Be_Encoded, means, covariances, priors);
• Bonus points:• Encode HoG features with Fisher Vector ?• randomly collect 2~3 images from each class• Stack all HoG features together into an n x 36 data matrix• Compute its GMM• Use this GMM to encode all image HoG features (other than
average)
Z. Li, Image Analysis & Retrv. Spring 2018 p.21
Super Vector Aggregation – Speaker ID
• Fisher Vector: Aggregates Features against a GMM• Super Vector: Aggregates GMM against GMM
• Ref:o William M. Campbell, Douglas E. Sturim, Douglas A. Reynolds: Support vector
machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5): 308-311 (2006)
Z. Li, Image Analysis & Retrv. Spring 2018 p.22
“Yes, We Can !”
?
Super Vector from MFCC• Motivated from Speaker ID work
• Speech is a continuous evolution of the vocal tract• Need to extract a sequence of spectra or sequence of spectral coefficients• Use a sliding window - 25 ms window, 10 ms shift
Z. Li, Image Analysis & Retrv. Spring 2018 p.23
DCTLog|X(ω)|MFCC
GMM Model from MFCC• GMM on MFCC feature
Z. Li, Image Analysis & Retrv. Spring 2018 p.24
• The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by
• Gaussian mixture model (GMM) for speaker s:
Universal Background Model
• UBM GMM Model:
Z. Li, Image Analysis & Retrv. Spring 2018 p.25
• The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM):
• Parameters of the UBM
MAP Adaption
• Given the UBM GMM, how is the new observation derivate ?• The adapted mean is given by:
Z. Li, Image Analysis & Retrv. Spring 2018 p.26
Supervector Distance
Z. Li, Image Analysis & Retrv. Spring 2018 p.27
Supervector Performance in NIST Speaker ID
• System 5: Gaussian SV• DCF (Detection Cost Function)
Z. Li, Image Analysis & Retrv. Spring 2018 p.28
m31491
AKULA – Adaptive KLUster Aggregation
2013/10/25
Abhishek Nagar, Zhu Li, Gaurav Srivastava and Kyungmo Park
Z. Li, Image Analysis & Retrv. Spring 2018 p.29
Outline
•Motivation•Adaptive Aggregation•Results with TM7•Summary
Z. Li, Image Analysis & Retrv. Spring 2018 p.30
Motivation
•Better Aggregation• Fisher Vector and VLAD type aggregation depending on a
global model• AKULA removes this dependence, and directly coding the
cluster centroids and sift count• SCFV/RVD all having situations where clusters are turned
off due to no assignment, this can be avoided in AKULA
SIFT detection & selection K-means AKULA description
Z. Li, Image Analysis & Retrv. Spring 2018 p.31
Motivation
•Better Subspace Choice• Both SCFV and RVD do fixed normalization and PCA
projection based on heuristic.• What is the best possible subspace to do the aggregation ?• Using a boosting scheme to keep adding subspaces and
aggregations in an iterative fashion, and tune TPR-FPR to the desired operating points on FPR.
Z. Li, Image Analysis & Retrv. Spring 2018 p.32
CE2: AKULA – Adaptive KLUster Aggregation
• AKULA Descriptor: cluster centroids + SIFT count
A2={yc21, yc2
2, …, yc2k ; pc2
1, pc22, …, pc2
k }
• Distance metric:• Min centroids distance, weighted
by SIFT count
A1={yc11, yc1
2, …, yc1k ; pc1
1, pc12, …, pc1
k },
Z. Li, Image Analysis & Retrv. Spring 2018 p.33
AKULA implementation in TM7
• Inner loop aggregation• Dimension is fixed at 8• Numb of clusters, or nc=8, 16, 32, to hit 64, 128, and 256
bytes• Quantization: scale by ½ and quantized to int8, sift count is
8 bits, total (nc+1)*dim bytes per aggregation
Z. Li, Image Analysis & Retrv. Spring 2018 p.34
AKULA implementation in TM7
•Outer loop subspace optimization by boosting• Initial set of subspace models {Ak} computed from MIR
FLICKR data set SIFT extractions by k-means the space to 4096 clusters
• Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall
• Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.
•The algorithm is still being debugged, hence only having 1st iteration results in TM7
Z. Li, Image Analysis & Retrv. Spring 2018 p.35
AKULA implementation in TM7
• Outer loop subspace optimization by boosting• Initial set of subspace models {Ak} computed from MIR
FLICKR data set SIFT extractions by k-means the space to 4096 clusters
• Iterative search on subspaces to generate AKULA aggregation that can improve performance in precision-recall
• Notice that aggregation is de-coupled in subspace iteration, to allow more DoF in aggregation, to find subspaces that provides complimentary info.
• The algorithm is still being debugged, hence only having 1st iteration results in TM7
• Indexing/Hashing is required for AKULA, it involves nc x dim multiplications and additions at this time. A binarization scheme will be considered once its performance is optimized in non-binary form.
Z. Li, Image Analysis & Retrv. Spring 2018 p.36
GD Only TPR-FPR: AKULA vs SCFV
•Data set 1:• AKULA (128bytes, dim=8, nc=16) distance is just 1-way
dmin1.*wt• Forcing a weighted sum on SCFV (512 bytes) hamming
distances without 2D decision fitting, i.e, count hamming distance between common active clusters, and sum up their distances
Z. Li, Image Analysis & Retrv. Spring 2018 p.37
GD Only TPR-FPR: AKULA vs SCFV
•Data set 2, 3:• AKULA distance is just 1-way dmin1.*wt• AKULA=128bytes, SCFV = 512 bytes.
Z. Li, Image Analysis & Retrv. Spring 2018 p.38
3D object set: 4 , 5
•Data set4, 5:
Z. Li, Image Analysis & Retrv. Spring 2018 p.39
AKULA in PM
•FPR performance:
•AKULA rates:
pm rates m akula rates 512 8 64 1K 16 128 2K 16 128 1K_4K 16 128 2K_4K 16 128 4K 16 128 8K 32 256 16K 32 256
Z. Li, Image Analysis & Retrv. Spring 2018 p.40
TPR@1% FPR
Z. Li, Image Analysis & Retrv. Spring 2018 p.41
TPR@1%FPR:
Z. Li, Image Analysis & Retrv. Spring 2018 p.42
TPR@1%FPR:
Z. Li, Image Analysis & Retrv. Spring 2018 p.43
TPR@1%FPR:
Z. Li, Image Analysis & Retrv. Spring 2018 p.44
AKULA Localization
•Quite some improvements: 2.7%
Z. Li, Image Analysis & Retrv. Spring 2018 p.45
AKULA Summary
• Benefits:• Allow more DoF in aggregation optimization,
o by an outer loop boosting scheme for subspace projection optimization
o And an inner loop adaptive clustering without the constraint of the global GMM model
• Simple weighted distance sum metric, with no need to tune a multi-dimensional decision boundary
• The overall pair wise matching matched up with TM7 SCFV with 2-dimensional decision boundary
• In GD only matching outperforms the TM7 GD• Good improvements to the localization accuracy• Light in extraction, but still heavy in pair wise matching, and
need binarization scheme and/or indexing scheme to work for retrieval
• Future Improvements:• Supervector AKULA ?
Z. Li, Image Analysis & Retrv. Spring 2018 p.46
Lec 08 Summary
• Fisher Vector• Aggregate features {Xk} in RD
against GMM
•Super Vector• Aggregate GMM against a global
GMM (UBM)
• AKULA• Direct Aggregation, non-
indexable
Z. Li, Image Analysis & Retrv. Spring 2018 p.47
++ + +