adaptive skew-sensitive ensembles for face recognition in ... · adaptive skew-sensitive ensembles...

Adaptive skew-sensitive ensembles for face recognitionin video surveillance

Miguel De-la-Torre a,b,n, Eric Granger a, Robert Sabourin a, Dmitry O. Gorodnichy c

a Laboratoire d'imagerie de vision et d'intelligence artificielle, École de technologie supérieure, Université du Québec, Montréal, Canadab Centro Universitario de Los Valles, Universidad de Guadalajara, Ameca, Mexicoc Science and Engineering Directorate, Canada Border Services Agency, Ottawa, Canada

a r t i c l e i n f o

Article history:Received 25 November 2014Received in revised form29 March 2015Accepted 8 May 2015Available online 19 May 2015

Keywords:Adaptive classifier ensemblesBoolean combinationImbalance estimationVideo-to-video face recognitionVideo surveillanceAdaptive multiple classifier systems

a b s t r a c t

Decision support systems for surveillance rely more and more on face recognition (FR) to detect targetindividuals of interest captured with video cameras. FR is a challenging problem in video surveillance due tovariations in capture conditions, to camera interoperability, and to the limited representativeness of targetfacial models used for matching. Although adaptive classifier ensembles have been applied for robust facematching, it is often assumed that the proportions of faces captured for target and non-target individuals arebalanced, known a priori, and do not change over time. Recently, some techniques have been proposed toadapt the fusion function of an ensemble according to class imbalance of the input data stream. For instance,Skew-Sensitive Boolean combination (SSBC) is a active approach that estimates target vs. non-targetproportions periodically during operations using Hellinger distance, and adapts its ensemble fusion functionto operational class imbalance. Beyond the challenges of estimating class imbalance, such techniquescommonly generate diverse pools of classifiers by selecting balanced training data, limiting the potentialdiversity produced using the abundant non-target data. In this paper, adaptive skew-sensitive ensembles areproposed to combine classifiers trained by selecting data with varying levels of imbalance and complexity, tosustain a high level the performance for video-to-video FR. Faces captured for each person in the scene aretracked and regrouped into trajectories. During enrollment, captures in a reference trajectory are combinedwith selected non-target captures to generate a pool of 2-class classifiers using data with various levels ofimbalance and complexity. During operations, the level of imbalance is periodically estimated from the inputtrajectories using the HDx quantification method, and pre-computed histogram representations of imbalanceddata distributions. This approach allows one to adapt pre-computed histograms and ensemble fusion functionsbased on the imbalance and complexity of operational data. Finally, the ensemble scores are accumulated oftrajectories for robust spatio-temporal recognition. Results on synthetic data show that adapting the fusionfunction of ensemble trained with different complexities and levels of imbalance can significantly improveperformance. Results on the Face in Action video data show that the proposed method can outperformreference techniques (including SSBC and meta-classification) in imbalanced video surveillance environments.Transaction-based analysis shows that performance is consistently higher across operational imbalances.Individual-specific analysis indicates that goat- and lamb-like individuals can benefit the most from adaptationto the operational imbalance. Finally, trajectory-based analysis shows that a video-to-video FR system based onthe proposed approach can maintain, and even improve overall system discrimination.

& 2015 Elsevier Ltd. All rights reserved.

1. Introduction

Video surveillance systems commonly rely on spatio-temporalface recognition (FR) to detect the presence of target individuals of

interest in live or archived videos, either for watchlist screening orsearch and retrieval applications. Video-to-video FR systemscommonly match input facial trajectories1 from videos againstthe facial models of all target individuals enrolled to the system,and raise a warning in the case of positive detection. In thischallenging scenario several persons may appear before a camera

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/pr

Pattern Recognition

http://dx.doi.org/10.1016/j.patcog.2015.05.0080031-3203/& 2015 Elsevier Ltd. All rights reserved.

n Corresponding author. Tel.: þ1152 375 75 80 500x47291;fax: þ1152 375 75 80 500x47218.

E-mail addresses: [email protected] (M. De-la-Torre),[email protected] (E. Granger), [email protected] (R. Sabourin),[email protected] (D.O. Gorodnichy).

1 A trajectory is a set of facial regions of interest (ROIs) captured in video thatcorrespond to a same (high quality) track of a person appearing across consecutiveframes.

Pattern Recognition 48 (2015) 3385–3406

www.sciencedirect.com/science/journal/00313203

www.elsevier.com/locate/pr

http://dx.doi.org/10.1016/j.patcog.2015.05.008



http://crossmark.crossref.org/dialog/?doi=10.1016/j.patcog.2015.05.008&domain=pdf



mailto:[email protected]





view point, and their appearance varies either abruptly or gradu-ally due to, e.g., changes in illumination and pose. Changes in thecapture conditions are associated with changes in the repre-sentation of the underlying class distribution of data in the facematching space. Uneven proportions between target and non-target individuals are related to the prior probability of occurr-ence for a given individual, and are commonly referred to as classimbalance or skew.

Facial models used for matching are composed of a set ofreference samples (for template matching), or a statistical modelestimated during training with reference samples (for statistical orneural classification). For instance, some recent systems for facere-identification applications successfully employ adaptive ens-embles of 2-class (target vs. non-target) classifiers to design andupdate facial models based on new reference trajectories, yetavoiding the knowledge corruption [1,2]. And approaches to add-ress the class imbalance problem in face recognition have alsobeen proposed [3,4]. This paper focuses on the design of facialmodels based on adaptive skew-sensitive ensembles of 2-classclassifiers.

The effects of class imbalance on classifier performance have beenshown by several authors [5–8], and pattern recognition literaturepresents several ensemble-based methods to train ensembles onimbalanced data [9]. Algorithms designed for environments withdata distributions that change over time can be categorized accord-ing to the use of a mechanism to detect concept drift or change [10].Approaches with active detection of changes in prior probabilitiesseek explicitly to determine whether and when a change hasoccurred in the prior probability before taking a corrective action[3,4,10]. Conversely, passive approaches assume that a change mayoccur at any time, or is continuously occurring, and hence classifica-tion systems are updated every time new data becomes available[10,11]. The advantage of active approaches mainly consists in theavoidance of unnecessary updates. However, they are prone to bothfalse positive and false negative drift detections, with the respectivefalse updates and false no-updates. Passive approaches avoid theseproblems at an increased computational cost due to the constantupdate.

A representative example of active approaches for changingimbalances is the skew-sensitive Boolean combination (SSBC) thatcontinuously estimates the class proportions using the Hellingerdistance between histogram representations of operational andvalidation samples [4]. Every time the operational imbalancechanges, SSBC selects one of the pre-calculated fusion functionsthat correspond to a set of prefixed imbalances. However, thelimited number of validation imbalance levels that can be used toapproximate the imbalance in operations is a limiting factor forthe estimation of operational imbalance. Rather than selecting theclosest imbalanced histogram representations, more sophisticatedestimation methods may be employed for accurate estimation ofthe class proportions. Moreover, although it is scarcely exploited,the abundant non-target samples in video surveillance allow oneto produce training sets with different complexities and imbal-ances, and use them to generate diverse pools. A specializedcombination and selection scheme of these diversified pools maylead to robust ensembles, considering both the different levels ofcomplexity and imbalance [8].

In this paper, adaptive skew-sensitive classifier ensembles areproposed for video surveillance applications. The proposed approachallows to select training data with varying levels of imbalance andcomplexity to design ensembles of classifiers that provide enhancedaccuracy and robustness. Face captures of each person in the sceneare tracked and regrouped into trajectories, and a decision thresholdis applied to the accumulation of positive predictions from baseclassifiers for robust spatio-temporal recognition. During enrollment,facial captures from a reference trajectory are combined with selected

non-target captures from the universal and cohort models2 togenerate a pool of 2-class classifiers using data with various levelsof imbalance and complexity (class overlap and dispersion). Training/validation sets with different imbalances and complexities are builtthrough random undersampling, and cover a range of imbalancesfrom 1:1 to a maximum imbalanced estimated according to experi-ence 1 : λmax. During operations, the operational level of imbalance isperiodically estimated from the input data stream using the HDxquantification method, and pre-computed histogram representationsof imbalanced data distributions. The HDx quantification allows oneto estimate the prior probability of operational data based on theHellinger distance between histogram representations of class dis-tributions in the feature space, and employ a single validation set thatis not required to provide a specific imbalance [12]. Finally, theproposed approach allows one to adapt pre-computed histogramsand ensemble fusion functions based on the imbalance and complex-ity of operational data.

The proposed approach is validated with synthetic and videodata, and compared against reference adaptive ensembles usingBC, meta-kNN fusion and score-level average fusion. The syntheticproblem was designed to observe the impact of different theore-tical probabilities of error as well as distinct imbalance levels inthe performance of the system (Gaussian distributions in a two-dimensional feature space). The Carnegie Mellon University FaceIn Action (FIA) video database was used to emulate face re-identification applications. The transaction-based performanceevaluates face matching of the system using the ROC and preci-sion–recall spaces, and individual-specific characterization allowsone to analyze specific cases. Finally, trajectory-based analysis isemployed to show the overall system performance over time.

The rest of this paper is organized as follows. Section 2 presents abrief review of techniques for ensemble design (generation, selectionand fusion) techniques, and specifically ensemble techniques pro-posed to address the problem of class imbalance. Section 3 describesthe adaptive skew-sensitive ensembles proposed for FR in imbal-anced environments. Section 4 provides synthetic experiments thatmotivated the proposed approach. Section 5 presents the experi-mental methodology and results with the FIA video data for valida-tion of the proposed approach in face re-identification applications.

2. Ensemble methods for class imbalance

Ensemble-learning techniques combine classifiers with diversityof opinions to increase classification performance. The design processcan be divided into three main steps – generation of a pool of baseclassifiers, selection and fusion of classifiers [13–16]. The first stepallows one to train base classifiers with diversity of opinions, and thelast two take advantage of this diversity to produce more accuratepredictions. Diversity can be created by employing distinct classifiers,train distinct instances of a classifier with different initial conditions(parameters), or using different training sets [14].

Representative examples of ensemble methods are bagging,boosting, random subspaces, which employ different training sets ofdata or features from the training set to build distinct base classifiers[14,17]. An example of diversity generation by various parameters isthe work of Connolly et al. [18], which takes advantage of diversity inthe hyperparameter space of classifiers to produce useful diversity ofopinions. Examples of selection strategies are greedy search,clustering-based methods and ranking-based methods, and examples

2 In this paper, a universal model (UM) is defined as a database containing ROIpatterns from selected unknown people appearing in scene, and the cohort model(CM) is defined as a database with ROI patterns from other target individualsenrolled to the system.

M. De-la-Torre et al. / Pattern Recognition 48 (2015) 3385–34063386

of fusion strategies can be divided into feature-based, score-based anddecision-based [19].

The algorithms designed for environments with changes in theprobability distribution of data in general, and particularly in theclass priors, can be categorized according to the use of a mechanismto detect changes in prior probabilities [10]. Approaches with activedetection of changes in prior probabilities seek explicitly to deter-mine whether and when a change has occurred in the priorprobability before taking a corrective action [3,4,10]. Conversely,approaches with passive change detection assume that a changemay occur at any time, or is continuously occurring, and hence theclassifiers are updated every time new data becomes available[10,11]. The rest of this section describes representative approachesof passive and active ensembles for changing priors.

2.1. Passive approaches

Passive ensemble-based methods for class imbalance can becategorized into cost-sensitive ensembles, boosting-based, bagging-based and hybrids [9]. In cost-sensitive approaches, the combinationof classifiers (i.e. weights) is designed to consider the cost of classindependent errors. Examples of these approaches include theAdaCost, CSB, RareBoost, AdaC1, AdaC2 and AdaC3 algorithms[20,21]. Boosting-based ensembles include techniques that use datapreprocessing embedded into boosting algorithms. These methodsbias the data distribution towards the minority class before theclassifier generation step. Examples of these approaches are theLearnþþ .CDS, Learnþþ .NIE, SMOTEBoost, MSMOTEBoost, RUS-Boost and DataBoost-IM algorithms [10,22]. Bagging-based ensem-bles integrate bagging with data preprocessing techniques, andhence, they do not require to update any kind of weights. Thesetechniques address the class imbalance by the way they collect thetraining samples, using oversampling and/or undersampling techni-ques to generate training sets of different sizes. Examples of thesetechniques are the OverBagging, UnderBagging, UnderOverBaggingand Imbalanced IVotes [23,24]. Finally, hybrid ensembles combine apre-processing technique with a bagging and a boosting technique.Techniques in this category are also called exploratory undersam-pling, and basically include EasyEnsemble and BalanceCascade [25].

Although the aforementionedmethods account for class imbalancethrough adaptation every time new reference samples become avail-able, they are passive since they do not perform an estimation of theimbalance before adaptation. The advantage of passive approaches liesin the avoidance of false positive and false negative change detections,at the cost of the increased complexity of continuous adaptation.

2.2. Active approaches

Active methods for adaptation to class imbalance employ amechanism to estimate the class priors of the input data, and adaptthe algorithm to the estimated class proportions when a changeoccurs. Hence, these approaches avoid the assumption of continuouschanges and the complexity of continuous adaptations, with thepotential disadvantage of false positive and false negative changedetections. Several examples of active approaches that employensembles for classification in imbalanced environments appear inthe literature [3,4,26]. In general, passive approaches for changingimbalance can be modified by adding a mechanism to detect changesin prior probabilities. Some examples of such mechanisms are basedin Hellinger distance [4], Kullback Leibler divergence [27], or account-ing for class-specific performance measures like recall [26,28].

A recently proposed active approach employed in face recognitionin video surveillance is the skew-sensitive Boolean combination(SSBC), which estimates the imbalance using the Hellinger distancebetween the distributions of validation data and the most resentunlabeled operational samples [4]. During training, SSBC assumes that

a diversified pool of binary classifiers P ¼ fp1;…; png, and operates atthe combination level to take advantage of the diversity of opinions inthe ensemble. To do that, validation data with different levels ofimbalance is used to estimate the operations points of the Booleancombination function (covering the whole ROC space). Two validationsets with that imbalances are as follows: the first (OPT) employed toestimate the operational imbalance, and the other (VAL) to select theoperation point with the proper estimated imbalance. During opera-tions, the imbalance is estimated using the Hellinger distance, and theoperation points are selected from the predefined imbalances. Theknown levels of class imbalance used by the approach form the setΛ¼ fλbal ¼ 1 : 1;…; λmaxg. A subset of class imbalances ΛBC � Λ isselected from Λ to optimize a subset of BCs E. The subset ofimbalances ΛBC should contain evenly distributed intermediate classimbalance levels between the minimum λbal and the maximum levelof imbalance λmax inclusively. The sets OPT and VAL are generatedfrom imbalanced reference data that follows λmax. Different data setswith the levels of class imbalance are defined in Λ, in which theamount of target samples remains fixed, while the amount of non-target samples are added to the set through random under sampling.

The classification system processes streams of input patterns.The operational histogram opd corresponding to these operationalsamples is accumulated over time, and the closest level of classimbalance λnAΛ is estimated by comparing opd to the data sets inOPT using the Hellinger distance. The estimated operational classimbalance λn corresponds to the imbalance of the closest set inOPT to opd in terms of Hellinger distance. Then, λn is used to selectthe BC that corresponds to that imbalance, and in the case λn is notavailable on ΛBC, the BCs for the two closest imbalances aremerged, and the convex hull is estimated.

The strength of the SSBC algorithm lies in the adaptive selection ofsuitable fusion functions (ROC operations points) according to theestimated operational imbalance. However, this technique assumesthat the generation of a pool of classifiers, where each classifier istrained using balanced target and non-target data, provides enoughdiversity of opinions to discriminate when input operational data isimbalanced. Another issue is related to the precision of the methodused by SSBC to estimate the class imbalance limited by the amountand sampling strategy used to create the set of imbalances Λ.Specialized methods to quantify the class priors of unlabeled (opera-tional) data have been proposed in the literature [12], and two ofthem are summarized in the next section.

2.3. Estimation of class imbalance

Quantification (i.e. estimation of the class distribution in Bayesianterms) is the task that deals with the estimation of the number ofsamples belonging to each class in an unlabeled set [7,29,30]. In theliterature, different quantification methods appear and are basedeither on the classifier confusion matrix [7,31], the posterior prob-ability estimates provided by a classifier [29], or the comparison ofclass conditional probability densities of data sets with known andunknown proportions [4,12,30,32]. Regarding the estimation taskfrom the point of view of a classification algorithm, two levels can beidentified to estimate the class imbalance of a distribution repre-sented by a set of unlabeled (operational) samples. Data-levelestimation operates in the feature space, employing the probabilitydistribution of samples for each feature [3,4,12]. On the other hand,score-level allows one to employ the probability distribution of thescores generated by a probabilistic classifier.

Two representative quantification methods were recently pro-posed to use the Hellinger distance to estimate the prior probabilityof unlabeled data, either using the features (HDx quantification) orscores from a classifier (HDy quantification) [12]. Given an unlabeleddataset U ¼ fðanÞ;n¼ 1;…;Ng and a labeled validation datasetV ¼ fðam; lmÞ;m¼ 1;…;Mg, the Hellinger distance between these

M. De-la-Torre et al. / Pattern Recognition 48 (2015) 3385–3406 3387

two sets can be computed according to

HDðV ;UÞ ¼ 1nf

Xnf

f ¼ 1

HDf ðV ;UÞ; ð1Þ

where the feature-specific Hellinger distance is given by

HDf ðV ;UÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXbi ¼ 1

ffiffiffiffiffiffiffiffiffiffiffiffijVf ;i jjV j

s�

ffiffiffiffiffiffiffiffiffiffiffiffijUf ;i jjU j

s !2vuuut ; ð2Þ

where nf is the number of features, b is the number of bins used toconstruct the feature-specific histogram representation of the prob-ability density functions of the datasets. jU j is the number ofsamples in U and jUf ;i j is the number of samples whose feature fbelongs to the bin i, similarly with jV j and jVf ;i j for the validationset V. The Hellinger distance between the probability densities of theunlabeled and validation sets can be computed by making theassumption

jVf ;i jjV j ¼ j S�

f ;i jj S� jPvð�Þþ jSþ

f ;i jjSþ jPvðþÞ; ð3Þ

where jS� j is the number of non-target training samples and jS�f ;i j is

the number of non-target samples whose feature f belongs to bin i inthe histogram representation of the probability distribution of thetraining data S. Similarly, jSþ j and jSþ

f ;i j are equivalent measures forthe target class. The prior probability PvðþÞ (and similarly Pvð�Þ) canbe manually assigned by the quantification method (see Algorithm 1).Algorithm 1 summarizes the process followed by the HDx quantifica-tion method.

Algorithm 1. Quantification HDx [12].

Input: Labeled data S; operational data U (non-labeled);number of bins b;

Output: Estimated target prior probability for U: P̂ ðþÞCompute jSþ j , jS� j and jU jfor f ¼ 1…nf do

for i¼ 1…b doCompute jSþ

f ;i j ; j S�f ;i j and jUf ;i j

j$

for PvðþÞ ¼ 0…1 in small steps dofor f ¼ 1…nf do

Compute HDf according to ð2Þ; using ð3Þ with PvðþÞ�HD½PvðþÞ� ¼ 1

nf

Pnff ¼ 1 HDf ½PvðþÞ�

666664P̂ ðþÞ ¼ arg minðHDÞP̂ ð�Þ ¼ 1� P̂ ðþÞ

For HDy, the Hellinger distance between the distributions ofclassifier outputs is estimated as

HDðV ;UÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXbi ¼ 1

ffiffiffiffiffiffiffiffiffiffiffiffijVy;i jjV j

s�

ffiffiffiffiffiffiffiffiffiffiffiffijUy;i jjU j

s !2vuuut ð4Þ

where jUy;i j and jVy;i j are the number of unlabeled and validationsamples whose output y belongs to the bin i¼ 1…b, respectively.Similar to the HDx method, the substitution to avoid subsamplingand/or oversampling is given by

jVy;i jjV j ¼ jS�

y;i jjS� jPvð�Þþ jSþ

y;i jjSþ jPvðþÞ; ð5Þ

where jSþy;i j and jS�

y;i j represent the number of non-target sampleswhose output y belongs to bin i in the histogram representation ofthe probability distribution of the scores. Algorithm 2 summarizes

the process followed by HDy quantification to obtain the priorprobability based.

Algorithm 2. Quantification HDy [12].

Input: Labeled data S; operational data U (non-labeled);classifier Cw; number of bins b;

Output: Estimated target prior probability for U: P̂ðþÞCompute jSþ j , jS � j and jU jCompute classifier outputs for S as fyk ¼ Cwð

akÞ; k¼ 1;…;Kg

for i¼ 1…b do

Compute jSþy;i j ; jS�

y;i j and jUy;i jj

for PvðþÞ ¼ 0…1 in small steps doCompute HD½PvðþÞ� according to ð2Þ;�

using ð3Þ with PvðþÞ:P̂ ðþÞ ¼ arg minðHDÞP̂ ð�Þ ¼ 1� P̂ðþÞ

2.4. Challenges

Exploiting imbalance to adapt a classifier system has beenstudied in the literature, and is a consequent option regarding theimminent imbalance in face based video surveillance. Althoughthe algorithms like SSBC have successfully used imbalancedvalidation data to update an ensemble fusion function to theoperational imbalance, two issues are still to be addressed inpractice. The first is related to the source of diversity of opinionsamong experts, where classifiers may be trained on data withdifferent imbalances and complexities. In this way, the baseclassifiers trained on diverse levels of imbalance would provideincreased useful diversity in the ensemble. Even more, trainingimbalance specific classifiers on data with different complexitieswould provide even more diversity, leading to a more accurate androbust ensemble under such an imbalanced environment.

The second issue is related to the resolution needed to reliablyestimate the operational imbalance. For example, SSBC estimationrelies on the measurement of the Hellinger distance between thehistogram representation of a set with the most recent operationalsamples and validation sets with pre-defined imbalance levels (Λ). Ifthe operational imbalance is not considered in the set Λ, thecombination functions corresponding to closest adjacent imbalancesare considered, but the exact level of imbalance is never estimated.More accurate candidate quantification methods like HDx and HDymay be used, where all the validation samples are employed for amore precise estimation, avoiding the subsampling requirement.Moreover, the prior probabilities PvðþÞ and Pvð�Þ are explicit – inother words, the step size in Algorithms 1 and 2. The optimal size ofeach “small step” can be easily deducted by considering the max-imum expected imbalance λmax, which can be used to estimate theoptimal size for these steps (See Section 4.2.4).

3. Adaptive skew-sensitive ensembles for video-to-video facerecognition

The proposed architecture for skew-sensitive video-to-video FRis depicted in Fig. 1. It consists of a tracker, a skew-sensitiveclassification system with individual-specific parameters, a spatio-temporal fusion module, a sample selection and a classifier design/update systems. It is inspired on the framework proposed in [2],and incorporates the functionality provided by skew-sensitiveensembles to adapt the individual-specific ensembles to the most


recent operational imbalance. In order to adopt this functionality,some of the original blocks were modified, and others related tothe operation skew-sensitive ensembles were added. The systemworks in two different phases that separate normal operation fromthe design and update of facial models of enrolled individuals.

In the operational phase, the tracker follows the position of thesegmented faces in video, building a face trajectory composed ofsequential ROIs. Simultaneously, features for classification are extra-cted and selected from segmented ROIs to form feature vectors (a),which are sequentially feed to all the individual-specific ensembles ofclassifiers. Each skew-sensitive ensemble k – corresponding to theenrolled individual k – produces a sequence of predictions accordingto the input order of the ROIs belonging to a face trajectory. In order toadapt the fusion function to the most recent operational imbalance,the feature specific histogram representation of the distribution of theoperational data (opd) from facial captures of the last predefined timeperiod (e.g. 15 min) is computed. The most recent distribution storedin opd is employed to estimate the operational imbalance λn (seeSection 3.1). Then, the combination function corresponding to theestimated operational imbalance λn is approximated, and the opera-tions point (op) in each individual-specific ensemble is selected.Finally, the spatio-temporal fusion module accumulates ensemblepredictions over a fixed size window of face detections. When theaccumulation of predictions from an individual-specific ensemblethat corresponds to a trajectory surpasses a pre-defined detectionthreshold γdk, the individual of interest k is detected in scene. For ifself-update is required, the accumulation is compared to a secondupdate threshold γuk that triggers the adaptation process using all theROIs belonging to the face trajectory (see [2]).

The design/update phase is triggered when a new referencetrajectory becomes available. Target samples are combined withnon-target samples from UM and CM to form a learning data setDk (for training and validation). The learning set Dk follows themaximum predefined imbalance λmax, which is selected a priori inaccordance with the experience in the field. An individual-specificselection strategy is employed to select the amount of non-targetsamples that accomplishes with the maximum expected imbalanceλmax. The learning data set Dk is evenly divided for imbalancedgeneration (Dk

gen) and validation of fusion functions (Dkval). The

imbalance-based generation of classifiers allows one to generate apool P0

k of classifiers, which are incorporated to the previous poolfollowing a learn-and-combine strategy (see Section 3.2). A long termmemory (LTM) is employed to store individual-specific referencesamples and avoid knowledge corruption [33]. Then, the validationsamples used for combination are stored in the datasets optmax foroperational imbalance estimation (see Section 3.1) and the approx-imation of imbalanced BC. Finally, the skew-sensitive combinationallows one to select the operations point with validation data withthe approximated imbalance λn (see Section 2.2).

3.1. Approximation of operational imbalance

During operations, the classification system in test mode startsconsidering a balanced operational environment. Feature vectorscorresponding to input facial regions feed the data set with themost recent operational samples ops. The set ops is renewed withnew input samples every certain prefixed period of time, let us sayevery 15 min. The operational feature histogram is estimatedbased on the evidence accumulated on the feature distributionsof input facial regions during that period of time. Then, the priorprobability of the most recent target class distribution PnðþÞ ofoperational samples is estimated using HDx quantization, basedon the feature histograms from unlabeled operational (ops) andreference validation ðoptmaxÞ samples (Algorithm 1).

Let jV þ j be the number of target samples in a validation data setV (e.g., optmax). The number of non-target samples required toaccomplish with the estimated class distribution PnðþÞ is given by

jV � j ¼ jV þ j 1PnðþÞ�1� �

; ð6Þ

and the estimated class imbalance λn can be represented assumingjV þ j ¼ 1 and substituting in the notation given by Eq. (11).

The HDx quantification method requires a single validation setðoptmaxÞ, which preserves the useful abundance of non-targetsamples that provide information from both imbalance and com-plexity in the feature space. The procedure for imbalance estima-tion is summarized in Algorithm 3.

Fig. 1. Adaptive skew-sensitive MCS for video-to-video FR.


Algorithm 3. Estimation of the level of imbalance λn from refer-ence data optmax and operational data ops.

Input: Data set optmax, operational samples ops, number ofbins b

Output: Imbalance estimation λn

Estimate prior probability using Algorithm 1Assume jV þ j ¼1Compute jV � j using Eq. (6)Compute imbalance λn using Eq. (11)

The adaptation of the combination function for the new approxi-mated class imbalance λn is performing in accordance to the skew-sensitive algorithm, either updating the combination weights(weighted voting or meta-classification combiners) or selecting theimbalance-specific operations point (SSBC). The advantage of using anestimation of the prior probability as given by HDx provides a goodestimation of the class imbalance, and the selection of the correctimbalance in validation set VAL reduces the error propagation inducedby some algorithms for imbalance estimation (see Section 3.1).

3.2. Design and adaptation of ensembles

The imbalance-based generation strategy proposed in this sectionallows one to generate useful diversity of opinions, which can besuccessfully exploited with other skew-sensitive combination strate-gies. The operational imbalance in a real scenario suffers fromconstant changes, and it is inaccurate to assume a single imbalance.Active skew-sensitive ensembles allow one to estimate the opera-tional imbalance, and select and combine the classifiers from a pool.Robustness of the ensembles may be enhanced with base classifierstrained on different levels of imbalance and complexity.

Limitations in resources make impractical to train a classifierfor every possible imbalance, and a number of training imbalancesshould be fixed before training. The combination function isresponsible for the selection of the classifiers with the properimbalances according to the estimated operational imbalance. Inthis way, given predefined minimum and maximum imbalancesdenoted by λmin and λmax respectively, a fixed number of imbal-ances is chosen between them. Two issues appear from thisaffirmation, i.e. how to estimate the number imbalance levelsare enough for the application, and how close should be from eachother. The first question is equivalent to estimate the number ofclassifiers in the ensemble that allow the fusion function toprovide a high level of performance under distinct operationalimbalances. The second question can be re-stated as whichimbalances between the maximum and minimum should be usedto train the base classifiers.

Algorithm 4. Generation of diversified classifiers based on differ-ent levels of imbalance and complexity.

Input: Training data Dt, maximum imbalance λGENmax, levels of

imbalance jΛGEN j , size of subpools sp.Output: P Pool of jΛGEN j�sp

diversified classifiers.Generate ΛGEN by sampling the levels of imbalance with a log

scaleGenerate the imbalanced training sets DImb

i according to theimbalances in ΛGEN

for i¼ 1…jΛGEN j do

Train a new subpool with sp classifier Pi using DImbi

j

and a source of diversityP ( P [ Pi

The proposed procedure for imbalance-based generation ofdiversified classifiers is shown in Algorithm 4. In order to generatemore diversity, the subpools of classifiers for each specific imbal-ance can be generated employing typical sources of diversity likedifferent subsets of data, presentation orders, distinct hyperpara-meters, or other techniques (e.g. boosting, use different classifica-tion algorithms to train base classifiers, and DPSO generation).

According to the results described in Section 4, jΛGEN j ¼ 7 levelsof imbalance are a good choice to train base classifiers, assuming thatFR problems present high probability of classification error betweentarget and non-target individuals. And the parameter that controlsthe size of the subpools may consider a small number of classifiers(e.g. sp¼2 or 3) to take advantage of complexity as a source ofensemble diversity, and train robust ensembles avoiding an excessiveincrease in memory requirements.

4. Synthetic experiments

Consider a modular system used for matching in FR [34], whereindividual-specific ensembles of 2-class classifiers are trainedindependently. The scenario is replicated employing a Gaussiandistribution to generate samples for the minority target class, anda second Gaussian distribution to draw samples for majority class(the rest of the individuals in the world).

The objective of these experiments is to characterize the perfor-mance of skew-sensitive ensembles with imbalance-generation ofclassifier ensembles in five axis. First, to show the capacity of theproposed imbalance-based generation of classifiers to produceensemble diversity, since this affects positively the performance(accuracy and robustness) of ensembles. Second, a sensitivity analysisto decide the number of classifiers trained on different levels ofimbalance levels that provide useful diversity to the ensemble. Third,to provide evidence of the effectiveness of the skew-sensitiveensembles in imbalanced environments compared to other ensembletechniques. Fourth, the generation of more than one classifier for eachlevel of imbalance, bringing to the table the concept of imbalance-specific sub-pools. This approach provides combined sources ofdiversity from imbalanced training sets and different complexities.A sensitivity analysis allows one to define size of the subpools thatprovide the best classification performance and robustness. And fifth,to provide a deep analysis of the behavior of the data- and score-levels employed in the approximation of imbalance employingquantification methods based on the Hellinger distance.

4.1. Experimental protocol

The synthetic problemwas designed in the 2 dimensional featurespace, and the two overlapping multivariate Gaussian distributionswith simple linear decision boundaries are shown in Fig. 2a. Targetand non-target data distributions are characterized by a fixed centerof mass μ1 ¼ ½0;0� and μ2 ¼ ½3:29;3:29� respectively, and the degreeof overlap was variated by adjusting the covariance matrix σ of bothdistributions at the same time. The degree of overlap, and thus thetotal probability of error between classes are variated according to sixdifferent levels, permitting the analysis of the impact of the overlapand imbalance level in the performance. The variances and levels ofoverlap of the class distributions used in these experiments areshown in Fig. 2b.

Ten different levels of imbalance were used to train 2-class PFAMclassifiers ðΛGEN ¼ fλ1GEN ;…; λmax

GENg ¼ f1 : 1;1 : 2;1 : 5;1 : 10;1 : 22;1 :


46;1 : 100;1 : 215;1 : 464;1 : 1000gÞ. These imbalances correspondto a logarithmic sampling between balanced and the maximum levelof imbalance λmax ¼ 1 : 1000. This sampling scheme was selectedaccording to the following reasoning: First, it is recalled that thediversity of opinions between the base classifiers in an ensemble isan important characteristic for enhanced classification performance,and a good scheme should favor this diversity. Assuming no othersource of diversity in an ensemble but the class imbalance, twosimilar classifiers (same algorithm and parameters) trained on datawith different imbalance levels should produce different decisionboundaries. Then, the scheme to subsample the space between λGEN

1

and λGENmax should maximize the diversity of opinions, and hence

produce distinct decision boundaries for each ensemble member.Fig. 3 shows the cross-cut of the overlapping data distributions for

target (right curves) and non-target (left curves) samples. Thisexample illustrates the linear and a logarithmic schemes, and thedifferent theoretical optimal decision boundaries. It can be seen thata logarithmic scale produced a more even distribution of the decisionboundaries along the feature space, thus generating a greaterdiversity of opinions between each classifier compared to the linearscheme. For this reason, the logarithmic scheme was chosen, allow-ing for enhanced diversity of opinions whereas evenly covering thespace of decision boundaries for different imbalances.

The standard hyperparameters of the PFAM classifiers were used(e.g. ½α¼ 0:001; β¼ 1; ϵ¼ 0:001; ρ ¼ 0; r¼ 0:60�), and a hold-out vali-dation process was employed to optimize the number of trainingepochs with different orders in the presentation of training samples.A constant number of 10 positive (target) samples was maintained inthe training and validation sets, which is typical of applicationswhere a limited amount of training samples is available. Similarly,the number of negative samples was variated according to thedesired imbalances in ΛGEN, assuming the wide availability of non-target samples, which is typical of surveillance scenarios where facialcaptures from non-target individuals may be easily retrieved fromevery day operational videos (the UM). The level of imbalance (priorprobability) is internally estimated by the PFAM classifiers, based onthe amount of positive and negative samples in the training data.

4.2. Results

4.2.1. Classification on imbalanced problemsFig. 4 shows the decision boundaries estimated by the 10 classifiers

trained on the imbalances in ΛGEN. The test data set with 100 positivesand the highest imbalance is plotted behind ðλmax

GEN ¼ 1 : 1000Þ, withblue for target and red for non-target samples. The differences in thedecision boundaries is the agent that produces the diversity ofopinions that can be exploited by ensemble techniques for increasedrobustness and accuracy.

The cost curves are graphical representations of the expectedcost (or error rate) of 2-class classifiers over the full range of

possible probability costs (class distributions or misclassificationcosts) [36]. In order to find the relation with the representations inthe ROC and PROC spaces, the error rate can be defined as thedifference between the false negative rate (fnr) and false positiverate (fpr) multiplied by the prior probability of a sample beingfrom positive class pðþÞ (see Eq. (7)). In Eq. (7), the quantities of a2-class confusion matrix are represented as FP for false positives,TP for true positives, FN for false negatives and FP for false positivepredictions:

error rate¼ ðFN�FPÞnpðþÞþFP

¼ ð1�TPÞnPðþÞ ð7Þ

The extreme values of the x-axis in the cost curves represent thesituations where all samples are classified as belonging to thesame class. A point in the left extreme represents the probability ofpositives pðxÞ ¼ 0 (all samples are negative), and a point in theright extreme represents pðxÞ ¼ 1 (all samples are positive). Thus, atrivial classifiers can be represented with a cost curve that startson the lower left corner (0, 0), grows linearly up to the point withequal positive and negative probabilities with error rate of 50%(0.5, 0.5), and ends at the lower right corner (1, 0). And a costcurve that corresponds to a perfect classifier should be drawn as aflat horizontal line at zero expected cost. On the other hand, themore commonly used receiver operating characteristics (ROC)curves plot the fpr in the x-axis and the tpr in the y-axis, with apoint for each confusion matrix corresponding to an operationalpoint [37]. Similarly, the precision–recall (PROC) curves representthe recall (tpr) in the x-axis against the precision in the y-axis,although inverted axis can be employed to compare ROC and PROCspaces with the tpr in the y-axis. Examples of test ROC, PROC andcost curves [36] for the 10 PFAM classifiers trained with differentlevels of imbalance in the training set are shown in Fig. 5 for anoverlap of 20%. The curves were obtained on a common test setwith 100 positives and the highest imbalance used in the experi-ments (1:1000), the same as shown in Fig. 4. These results confirmthat there is a significant difference in the performance of thedifferent classifiers, which constitutes a different view of thediversity of opinions provided by the distinct classifiers. Thisdifference confirms the diversity of opinions that can be exploitedusing ensemble techniques, and that is related to the differentlevels of imbalance used in the training data.

In order to show the impact of training the classifiers with thesame imbalance as the appearing in operations (test), each classifierwas tested on a test set composed of 100 positive samples, and thenecessary negative samples to complete the imbalance used fortraining. The experiment was repeated 10 times, and for eachrepetition the training data was randomly re-generated to design acompletely independent experiment. After that, the 10 classifierswere combined using skew-sensitive ensembles (SSBC), and the test

Fig. 2. (a) Representation of the synthetic overlapping data set used for simulations and (b) covariance matrices used to control the degree of overlap between distributions(I is the 2�2 identity matrix). The covariance matrix allows one to change the degree of overlap, and thus the total probability of error between classes. These parameterswere extracted from [35].


set with the maximum imbalance (λmaxGEN ¼ 1 : 1000) was used to

compare its performance with the single classifier approach.Fig. 6 presents the AUC performance for each of the classifiers

and the skew-sensitive ensemble. It can be seen that the classifierstrained with the same level of imbalance that appears in test showa higher performance in terms of AUC than a classifier that learnsfrom a balanced training set. Skew-sensitive ensembles estimatethe level of imbalance in test and adapts the fusion function to the

operational class proportions, providing the highest level of AUCperformance and smaller standard error, as shown at the veryright of Fig. 6. A similar tendency was seen on the six levels ofoverlap, being more evident with higher probability of error. Ingeneral, as the probability of error increases, the problem is moredifficult and the classifiers present lower performance, but theAUC performance of the ensemble was lower bounded by theperformance of the best classifier in the ensemble.

4.2.2. Ensemble generationIn order to define the levels of imbalance that provide the highest

useful diversity to the imbalance-based ensemble, a sensitivityexperiment was designed. The aim of this experiment is to explorehow many of the 10 classifiers are more useful for the ensemble,providing the best performance after selecting the operations pointat a target fpr¼1%. This scenario provides a situation where theensemble is deployed and the operations point gives the finaldecisions, evaluating together the accuracy of the classifiers afterthe selection of the operations point, and not only a range of valuesin the ROC space. The number of PFAM classifiers was variated from2 to 10, adding to the pool the classifiers in descendant orderaccording to the ROC AUC accuracy evaluated on an independentvalidation set. In this way, the two classifiers that present the highestperformance are first combined, then the third most accurate and soon, until the ensembles contain the 10 classifiers trained according tothe imbalances in ΛGEN.

The five combination strategies used in the comparison are themax rule, average rule, meta-kNN, BC and SSBC. The max ruleselects the maximum target score produced by the base classifiers

Fig. 3. Linear scheme (a) with imbalances ΛGEN ¼ f1 : 1;1 : 2;1 : 3;1 : 4;1 : 5g and logarithmic scheme (b) with imbalances ΛGEN ¼ f1 : 20 ;1 : 21 ;1 : 22 ;1 : 23;1 : 24g.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Feature 1

Feat

ure

2

PFAM (1:1)PFAM (1:2)PFAM (1:5)PFAM (1:10)PFAM (1:22)PFAM (1:46)PFAM (1:100)PFAM (1:215)PFAM (1:464)PFAM (1:1000)

Fig. 4. Test set characterized by a 1:1000 imbalance, and the decision lines drawnby the 10 PFAM classifiers trained with different levels of imbalance in ΛGEN.Classifiers and test samples correspond to the problem with a total probability oferror corresponding to 20%. (For interpretation of the references to color in thisfigure caption, the reader is referred to the web version of this paper.)


in the pool. The avg rule estimates the mean of the target scoresproduced by the base classifiers in the pool. In meta-kNN, the 1NNclassifier was trained on independent score-level validation data,and it was employed in test to produce output distance-basedscores. In Boolean combination (BC), the 10 Boolean functions areapplied to different pairs of classifiers, and the BC algorithm wasrun on an independent validation set to find the operation pointsthat maximize the ROC convex hull [38]. Finally, the SSBC wasapplied with a validation set containing a profile with the sameimbalance as the expected in test [4].

In all cases, the operations point for a target fpr¼1% wasselected using an independent validation set. The performanceof all the approaches was evaluated on a same test set withimbalance λmax

GEN ¼ 1 : 1000, using precision and F1 measure in thecomparison, together with the ambiguity that measures theensemble diversity. Formally, the ambiguity is defined by Zenobiand Cunningham in [15], and includes the responses of the baseclassifiers as well as the responses produced by the ensemble:

Ens: Ambiguity¼ 1MN

XMm ¼ 1

XNn ¼ 1

ambðan; dm; dnÞ; ð8Þ

where M is the number of classifiers in the ensemble and N is theamount of test samples. The ambiguity for an independent samplean, given the decision dm of the classifier cm in the ensemble, isgiven by

ambðan; dm; dnÞ ¼ 0 if dm ¼ dn

1 otherwise:

(ð9Þ

Fig. 7 presents the resulting F1 measure and ambiguity for theensembles in the scenarios with total probability error of 15% and20%. Regarding the F1 measure, the maximum, average, BC andSSBC combinations perform better than meta-kNN at all times,and a significant superiority in performance is shown by SSBCwhen the ensemble contains between 5 and 8 classifiers. Thephenomenon was repeated for the other overlaps (1%, 5%, 10% and25%), becoming more evident as the total probability error grows.The ambiguity of the meta-kNN combination stays at a highcompared to the other four approaches, which combined withthe low performance shown in terms of F1 measure allows one tosee that this approach is the one that exploits the diversity ofopinions in a less efficient way. On the other hand, the ambiguityshown by SSBC remains low compared to the meta-kNN, reinfor-cing that useful diversity of opinions is correctly exploited by thisapproach.

Regarding the F1 measure in Fig. 7, it can be observed that thelast value in the curve for SSBC in Fig. 7(a), corresponding to 10classifiers in the ensemble, is significantly higher than its startingpoint (2 classifiers). The same phenomenon was observed for theproblems with total probability of error lower than 15%. However,in Fig. 7(d) the same point in the curve presents an F1 level that isonly slightly higher than the starting point (2 classifiers). Similarly,this decrease in performance was observed in the problem with25% total probability of error. This is related to the order used toadd the base classifiers in the ensemble, in which the classifierwith lowest level of performance is added in the last moment. Thislast classifier negatively affects the diversity of opinions in theensemble, and thus, the global performance. This tendency is moreevident in problems with a high level of total probability of error,

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

True

pos

itive

rate

(tpr

)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Precision

Rec

all (

tpr)

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0.01.00.90.80.70.60.50.40.30.20.10.0

Probability Cost

Nor

mal

ized

Exp

ecte

d C

ost

Fig. 5. (a) ROC, (b) PROC and (c) cost curves corresponding to the seven PFAM classifiers trained on different imbalances, for the problem with a theoretical total probabilityerror (overlap) of 20%.

1:1 1:2 1:5 1:10 1:22 1:46 1:100 1:215 1:464 1:10000.55

0.6

0.65

0.7

0.75

0.8

Test Imbalance

AU

C

Fig. 6. Average AUC estimated over 10 replications of the synthetic experiment with overlap between distributions of 20%. The left bar for each pair (blue) corresponds to theaverage AUC for the PFAM classifier trained on a balanced set (1:1), estimated on the test set with the imbalance indicated in the abscissa axis. The right bar for each pair isthe average ROC AUC for the PFAM classifier trained on the same imbalance appearing in test. (For interpretation of the references to color in this figure caption, the reader isreferred to the web version of this paper.)


Total probability of error: 15%

2 3 4 5 6 7 8 9 10

0.01

0.02

0.03

0.04

0.05

0.06

Number of Classifiers

F1Max ruleAverage ruleMeta kNNBoolean CombinationSSBC

2 3 4 5 6 7 8 9 100.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26


Am

bigu

ity

Total probability of error: 20%

2 3 4 5 6 7 8 9 100

0.005

0.01

0.015

0.02

0.025

0.03


F1

Max ruleAverage ruleMeta kNNBoolean CombinationSSBC

2 3 4 5 6 7 8 9 100.05

0.1

0.15

0.2

0.25

0.3


Am

bigu

ity

Fig. 7. Average performance of the systems with a growing number of classifiers in the ensemble, using different combination strategies and adding the classifiers indeceasing order of AUCs evaluated on validation data. The most accurate classifiers are the first added to the ensemble: (a,c) F1 measure; (b,d) ambiguity

Table 1Average performance of the different combination methods, the ensembles are composed of 7 base classifiers. The bold numbers represent the performance valuessignificantly higher than other approaches.

Imbalanced PFAM Average Meta-kNN SSBC

fprð↓Þ tprð↑Þ precð↑Þ F 1ð↑Þ fprð↓Þ tprð↑Þ precð↑Þ F 1ð↑Þ fprð↓Þ tprð↑Þ precð↑Þ F 1ð↑Þ fprð↓Þ tprð↑Þ precð↑Þ F 1ð↑Þ

Total probability error: 1%13.26% 94.30% 1.61% 0.0314 11.74% 99.90% 1.57% 0.0307 1.37% 97.70% 16.04% 0.2496 0.81% 58.50% 6.85% 0.1219(4.19) (4.94) (0.40) (0.0078) (3.06) (0.10) (0.39) (0.0075) (0.51) (0.56) (5.03) (0.0623) (0.07) (5.53) (0.45) (0.0077)Total probability error: 5%13.92% 50.30% 0.79% 0.0153 16.62% 92.30% 0.93% 0.0183 8.86% 87.40% 2.37% 0.0441 0.93% 57.50% 6.17% 0.1102(3.70) (11.06) (0.32) (0.0061) (4.64) (2.21) (0.18) (0.0034) (2.16) (2.33) (0.99) (0.0174) (0.08) (5.51) (0.73) (0.0118)Total probability error: 10%12.32% 39.50% 0.75% 0.0140 13.71% 75.80% 1.40% 0.0267 15.67% 81.70% 0.62% 0.0122 1.24% 36.80% 3.50% 0.0625(4.48) (10.02) (0.30) (0.0054) (4.16) (5.32) (0.47) (0.0087) (2.11) (3.80) (0.10) (0.0019) (0.20) (4.07) (0.66) (0.0106)Total probability error: 15%14.52% 42.00% 0.38% 0.0075 10.44% 49.10% 1.35% 0.0234 23.12% 78.20% 0.39% 0.0078 1.13% 21.80% 2.16% 0.0390(3.55) (9.58) (0.10) (0.0020) (3.97) (10.47) (0.38) (0.0059) (2.50) (2.72) (0.06) (0.0013) (0.13) (2.50) (0.34) (0.0059)Total probability error: 20%19.00% 51.50% 0.28% 0.0057 11.99% 54.50% 0.74% 0.0144 27.88% 75.00% 0.28% 0.0056 1.12% 14.20% 1.32% 0.0240(3.06) (9.33) (0.04) (0.0007) (3.77) (5.68) (0.13) (0.0024) (2.30) (2.56) (0.02) (0.0004) (0.10) (2.13) (0.22) (0.0038)Total probability error: 25%12.62% 32.40% 0.47% 0.0083 10.92% 42.20% 0.60% 0.0115 31.27% 68.60% 0.23% 0.0046 1.22% 8.10% 0.67% 0.0123(3.78) (6.52) (0.15) (0.0020) (3.30) (7.13) (0.09) (0.0017) (2.56) (2.55) (0.02) (0.0003) (0.10) (1.03) (0.07) (0.0012)


in which the classifiers with less performance bias the ensembletowards the erroneous decisions. And the classifiers with lowerlevel of performance are commonly those trained with lowerimbalance levels. For instance, regarding the problem with 20%total probability of error, in 8 of the 10 replications of theexperiment, the classifier with less performance was trained witha training set with an imbalance lower than 1:50, and theimbalance used in test was 1:1000. In general, the approachesthat show a higher diversity tend to produce a lower performance,showing that there is a limit in the useful diversity, and beyondthat limit, it damages the ensemble accuracy.

Table 1 allows for a more deep comparison between thedifferent combination strategies, by considering 7 levels of imbal-ance in the ensembles. The empirical fpr and tpr were obtainedafter predictions for the selected operations point, together withthe precision and F1 measures. According to these results the SSBCprovides the most accurate fpr in all cases, remaining always closeto the desired fpr¼1% regardless of the total probability of errorbetween classes. On the other hand, the average rule and meta-kNN provide the highest tpr at the expenses of increased fpr,which is a costly trade off in video surveillance due to the amountof false alarms in an environment full of non-target individuals, orin other words, the operational imbalance. Comparing the F1measure for the different combination methods, it reflects thatthe SSBC significantly outperforms all other approaches, and isonly the problem with an overlap of 1% that seems to be betteraddressed by the meta-kNN. From this it can be said that tradi-tional combination methods are suitable to be used in imbalancedenvironments when the classification problems are easy enough –

e.g. with lower total probability of error between classes, andsimple decision boundaries. However, as the total probability oferror grows, the superiority of the SSBC becomes more evident.

4.2.3. Using several classifiers per imbalanceUp to here, a single classifier was trained for each imbalance

level in ΛGEN. However, using a single classifier per imbalance isnot the only option to generate useful ensemble diversity andincrease the robustness of the ensemble. Adding more than oneclassifier for each level of imbalance is a possibility that can beexplored by generating a sub-pool instead of single classifiers. Inthis experiment, the number of classifiers in the ensembles wasaugmented by training more classifiers per imbalance, introducingvariations in the classifiers by changing the presentation order inthe training sets. A sensitivity analysis was conduced to observethe performance variations of the ensembles after changing thesize of these sub-pools from 1 to 3 classifiers per imbalance,resulting in pools of 7, 14 and 21 classifiers. The test set was keptwith the maximum imbalance ðλmax

GEN ¼ 1 : 1000Þ, and the sampleswere taken from the data distributions with 20% total probabilityof error.

Table 2 shows the average performance of the skew-sensitiveensemble with 7 levels of imbalance using the three different sizesof sub-pools, at the operations point for fpr¼1%. It can be seen thatthe best performance is achieved by the ensemble with 21classifiers, at the expenses of an increased memory complexity,presenting the need to store three times more classifiers thanusing a single classifier per imbalance. The difference betweenusing 7 and 14 classifiers is evident from the numbers in Table 2,showing that the ensemble with 14 classifiers presents a higheraverage performance and lower standard error. This is also truecomparing the cases of 14 and 21 classifiers, but with a smallerdifference in average performance and standard error. This con-firms that more robust ensembles can be obtained by adding moreclassifiers to the sub-pools, and the trade-off between resourcesand accuracy should be considered at the deployment stage.

Fig. 8 presents the box plots for the F1 measure achieved by theskew-sensitive ensemble with different sizes of pools of classifiers. Itcan be seen that the median is higher as the number of classifiersincreases, but there is also evidence of wide variations representedby the distance between upper and lower bars, which becomenarrower as the number of classifiers augments. The difference inthe performance between the second (7�2 classifiers) and the third(7�3 classifiers) boxes is small, and other criteria like spatialcomplexity may be used to decide the size of the sub-pools.

4.2.4. Approximation of imbalance through quantificationThe level of class imbalance in the proportions of a set of

samples is related to the prior probability of target (and equiva-lently non-target) samples. Given an imbalanced validation set Vwith jV j samples, this relationship follows the definition of priorprobability given by

PðþÞ ¼ 1�Pð�Þ ¼ jV þ jjV j ¼ jV þ j

jV þ j þ jV � j ; ð10Þ

where jV þ j and jV � j correspond to the number of target andnon-target samples in V respectively. In the notation followed inthis paper, the level of imbalance is represented as

Imbalance¼ jV þ jjV þ j :

jV � jjV þ j ; ð11Þ

and the number of target samples jV þ j is given by the context. Bysimple algebraic substitution it is easy to see that both arerepresentations of the same quantity. Hence, the HDx and HDyquantification methods provide an estimate of posterior probabil-ity PðþÞ, and equivalently, an estimate of the class imbalance.

Table 2Average performance measures for the skew-sensitive ensemble with a pool ofclassifiers with 7 imbalances, problem with 20% total probability of error. A sub-pool for each of the imbalances was growth from one to three classifiers, resultingin pools of 7, 14 and 21 classifiers.

Performance SS ensemble (7�1) SS ensemble (7�2) SS ensemble (7�3)

fprð↓Þ 1.12% 0.97% 0.96%(0.10) (0.04) (0.04)

tprð↑Þ 14.20% 17.20% 17.80%(2.13) (1.16) (1.01)

precð↑Þ 1.32% 1.75% 1.83%(0.22) (0.11) (0.09)

F1ð↑Þ 0.0240 0.0317 0.0331(0.0038) (0.0020) (0.0017)

F1 measure

SSBC (7x1) SSBC (7x2) SSBC (7x3)

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Fig. 8. Box plots for the F1 measure for the skew-sensitive ensemble with a pool ofclassifiers with 7 imbalances, problem with 20% total probability of error. A sub-pool for each of the imbalances was growth from one to three classifiers, resultingin pools of 7, 14 and 21 classifiers.


The estimation of imbalance based on representations at feature(HDx) and score (HDy) spaces are characterized employing theGaussian 2-class problem with different probabilities of error (seeFig. 2b). The underlying probability densities employed to generatesamples for the target ðPðx; þÞÞ and non-target ðPðx; �ÞÞ wereprovided with prior probabilities PðþÞ ¼ 0:4 and Pð�Þ ¼ 0:6 respec-tively. Binned distributions (histograms) for the test data wereestimated after generating 1000 samples for the joint distribution(400 target samples and 600 for non-target samples). Following thisprocedure, the original synthetic experiment with the shift datasetwas replicated [12], with the variant of a customizable overlapbetween distributions. The target prior probability PðþÞ of thevalidation set kept a fixed amount of 100 target samples, whereasnon-target samples were added one at the time to cover the differentpossible class prior probabilities. The probabilistic classifier employedto estimate the Hellinger distance at score level is the PFAM trainedwith balanced data.

The resulting Hellinger distance in feature and score spaces,corresponding to the low and high overlaps with single and multiplefeatures, is shown in Fig. 9. In general, the HDy provides a softercurve for easy problems (e.g., small overlap between classes and fewfeatures). But as the standard deviation is increased and the overlapbetween probability densities grows, both curves present irregula-rities. Irregularities in the HDx curve are more evident with lessfeatures and a higher overlap between classes, but still using thedistance based Hellinger distance provides a good estimation of theprior probability. Irregularities in the HDy curve increase with boththe number of features and the overlap between classes, but still iscapable of a good estimation of the prior probability. These irregula-rities in HDy are highly dependent on the complexity of the problem,and at the same time the accuracy of the classifiers employed togenerate the estimated posterior probabilities (scores). Furthermore,the methods have been compared for a small fixed imbalance (1:2.5),but in video surveillance applications the imbalance is generallyhigher and changes over time.

The accuracy of the quantification methods was evaluated usingdata sets with 15 levels of imbalance, including 7 levels distinct tothose used for training and validation. The samples were drawn fromthe overlapping Gaussian distributions described in this section. Testimbalances that appear in Λ are {1:5, 1:7, 1:10, 1:15, 1:22, 1:32, 1:46,1:68, 1:100, 1:147, 1:215, 1:316, 1:464, 1:681, 1:1000}. Equivalently,the target prior probabilities of these datasets can be computed as

15þ1¼ 0:1667; 1

7þ1¼ 0:1250;…n o

. A single validation set with themaximum level of imbalance was used with the quantificationmethods, avoiding the requirement of using several validation setswith different levels of imbalance. The size of the “small steps” inAlgorithms 1 and 2 is set in accordance to the minimum possibleprobability, or equivalently, the maximum expected imbalanceλmax ¼ 1 : 1000. The STEPSIZE employed in experiments was definedusing the validation set V, and is given by

STEPSIZE¼ PminðþÞ ¼ V þ

V þ þV � ð12Þ

The computational complexity of Algorithm 1 is bounded by thecomplexity of the algorithm employed to estimate the prior prob-ability. In the proposed scheme the HDx quantification method isemployed (Algorithm 1), and the other operations in Algorithm 3 canbe considered as elementary.

The amount of operations in Algorithm 1 is related to theexecution of two pairs of nested for cycles. The complexity of thefirst double for cycle depends on three quantities: the dimensionof the feature space (nf), the number of bins (b) employed in thehistogram representations of the probability distributions, and thesize of the biggest set Sþ , S� and the set of unlabeled operationalsamples U. The complexity of the second pair of nested for cyclesdepends on these three quantities, plus the additional quantity

defined by the term “small steps” that can be computed as theinverse of STEPSIZE defined in Eq. (12). Formally, the (worst case)computational complexity of Algorithm 3 can be expressed as

O nf bmaxðjSþ j ; jS� j ; jU j Þþ 1STEPSIZE

nf b maxðjSþ j ; jS� j ; jU j Þ� �

ð13ÞSince the inverse of STEPSIZE is typically greater than 1, the

expression can be simplified by means of the maximum rule, andrewritten as

Onf b maxðjSþ j ; jS� j ; jU j Þ

STEPSIZE

!ð14Þ

Regarding the accuracy of the quantification methods, the averagemean squared error between true prior probabilities and theestimations obtained with the HDx and HDy methods are shownin Fig. 10. Comparing Fig. 10(a)–(c), it can be seen that the HDyquantification outperforms HDx when the total probability of error issmall, and HDx outperforms HDy as the classifiers are less accurate.This is consistent with the affirmation that HDy is more reliablewhen classifiers are more accurate, as stated in [12]. We canreformulate and affirm that according to the results shown inFig. 10, HDy is more reliable when the target and non-target samplesare easily separable, but HDx is preferable for problems with highertotal probability of error (e.g. overlap between class distributions).

According to the observations in this section, the estimation of theclass imbalance should be guided by the characteristics of the dataemployed in the application and the particular algorithm used forclassification. In this paper, an experiment was conduced to selectthe proper method, and the results are analyzed in Section 5.4.

4.3. Discussion

In conclusion, the following affirmations should be considered inthe choice of parameters for systems that will operate in environ-ments with changing class imbalance. First, the design of classifiersconsidering the imbalance expected in test allows the classifiers tooutperform those classifiers that are trained with balanced data.Second, according to the simulations, a generation strategy thatconsiders 7 imbalances to train base classifiers is a good choice,specially for problems that present high probability of error betweenclasses. In general, the approaches that present a higher diversitytend to produce a lower performance, showing that there is a limit inthe useful diversity, and beyond that limit it damages the ensembleaccuracy. Third, from the combination methods analyzed in thissection, skew-sensitive ensembles provide the highest level ofperformance in terms of F1 measure in environments with differentlevels of imbalance. Fourth, the use of several classifiers per imbal-ance is an option to increase the performance of the ensemble andreduce the standard error, and the advantage has to be contrastedwith the significant increase of the pool size at deployment time.Finally, quantification methods may be used within skew-sensitiveapproaches to obtain a more precise estimation of the operationalimbalance, and HDx quantification is a good candidate speciallywhen the total probability of error is high.

5. Experiments on video data


This section presents the methodology used in simulations,following a video surveillance scenario using real data to demon-strate the effectiveness of the proposed imbalance-based generationmethod, and the characterization of this method when combinedwith SSBC. The video-based FR system that was used as a model in


the experiments is depicted in Fig. 11. A single IP camera continu-ously captures the scene and feeds the segmentation module thatisolates the facial regions of interest (ROIs) in each consecutive frame.After a first ROI is captured from an individual in scene, the trackingand classification modules are triggered in parallel. The trackingmodule starts following the individual's face and regrouping ROIsfrom a same individual in trajectories, whereas the classificationmodule produces consecutive identity predictions for each ROI.Finally, the spatio-temporal decision fusion module allows one to

accumulate target predictions, and applies individual-specific thresh-olds for enhanced spatio-temporal FR, as described in [2].

In this particular implementation, the popular Viola–Jones facedetector was used to extract grayscale ROIs [39]. Pixel intensities areconcatenated with multi-block local binary patterns (MBLBP) features,and the 32 principal components are selected after application of PCA.Training feature vectors a are used to design the biometric database,and the pixels of never seen ROIs are projected to the 32 dimensionalfeature space employed for face matching. Face tracking was

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Target class prior probability

Hel

linge

r Dis

tanc

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

FeaturesScores

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

Fig. 9. HDx and HDy quantification examples related to the comparison between target and non-target distributions for the different cases: (a) 1 feature, 1% prob. error; (b) 1feature, 20% prob. error; (c) 16 features, 1% prob. error; and (d) 16 features, 20% prob. error.

10−3 10−2 10−10

0.020.040.060.080.1

0.120.140.160.180.2

Prior Probability in Test

MS

E

HDx QuantificationHDy Quantification

10−3 10−2 10−10

0.020.040.060.080.1

0.120.140.160.180.2


MS

E

10−3 10−2 10−10

0.020.040.060.080.1

0.120.140.160.180.2


MS

E

Fig. 10. Average mean squared error (MSE) between the true prior probability in test and the estimation produced using the quantification methods HDx and HDy: (a) 1%probability of error; (b) 5% probability of error; and (c) 20% probability of error.


implemented using the incremental visual tracking (IVT) algorithm,which incrementally learns the low-dimensional subspace represen-tation (Eigen basis) by efficiently adapting online to changes in theappearance of the face model [40].

The classification architecture used for matching is composedof an ensemble of 2-class ARTMAP classifiers for each individual.This architecture has been widely used for face matching in theliterature, which models the general recognition problem in termsof individual-specific detection problems [41,34,4]. In the refer-ence system used in comparison, the individual-specific EoDs areco-jointly trained using a DPSO learning strategy, which allows forthe generation of a diversified pool of Probabilistic Fuzzy ARTMAPclassifiers. The proposed approach preserves the same architec-ture, but the base classifiers are trained independently on differentimbalances, using DPSO to optimize the hyperparameters and theglobal best is added to the pool.

5.2. Video surveillance data

Videos from the Carnegie Mellon University – Face in Action (FIA)database are used in experiments [42]. These videos correspond to20 s sequences for 244 individuals that act simulating a passportchecking scenario. Six cameras capture the scene at a resolution of640�480 pixels, at a frame rate of 30 frames per second. Data wascaptured over three different capture sessions separated by a threemonths interval. The six cameras were distributed in three pairs withfocal lengths of 2.8 mm (unzoomed) and 4.8 mm (zoomed), andpositioned in horizontal positions with frontal, left and right orienta-tions corresponding to 01, and 7721. In the experiment, a videostream from a single IP camera is formed using the frontal zoomedand unzoomed cameras along the three capture sessions.

Ten individuals of interest were selected from the FIA databasefor enrollment (FIA IDs 2, 58, 72, 92, 147, 151, 176, 188, 190 and209), and the rest was divided into two independent subsets ofnon-target classes appearing in training and test. For each indivi-dual of interest, 100 non-target individuals are selected for train-ing (UM and CM), and 100 different individuals are selected fortest, providing a maximum class imbalance of λmax ¼ 1 : 100. Thecohort and universal models (CM and UM) allow one to train 2-class ensembles with improved discrimination between target andnon-target classes, training one EoD for each individual of interestas the target class, as described in [34].


For enrollment, an adaptive skew-sensitive ensemble of classifierswas trained for each one of the selected individuals of interest. In theinitial step, a pool of PFAM classifiers [43,44] was generated usingseven different imbalances for training. The DPSO learning strategywas used to co-jointly optimize the hyperparameters of a PFAM

neural network for each imbalance in ΛGEN, using training andvalidation data that follows the corresponding imbalances in ΛGEN.The DPSO algorithm was initialized with a population size of 20particles, a maximum of 6 subswarms of 5 particles maximum, and amaximum of 10 iterations [35]. At the end of the DPSO learningprocess, the global best classifier was selected as the classifier thelevel of imbalance that corresponds to each the training levels in ΛGEN.

Let λ1 and λmax be the minimum and maximum possible imbal-ances in the classification environment, respectively. λmax can bemanually set according to the amount of detectable faces that can fitin a frame captured by the camera. The range of possible imbalanceshas to be sampled in as many imbalances as classifiers are required inthe pool. Having established a maximum imbalance of λmax ¼ 1 : 100,five subdivisions were established in a logarithmic scale in orderto obtain seven different imbalances between λ1 ¼ 1 : 1 andλmax ¼ 1 : 100. The resulting imbalances used are Λ¼ f1 : 1;1 : 101=3;1 : 102=3;1 : 10;1 : 104=3;1 : 105=3;1 : 100g.

Learning is performed following a 4�6-fold cross-validationprocess for 24 independent trials. Positive samples from theincoming sequence are randomly split according to a uniformdistribution, in 6 folds of the same size. The first two foldscombined in a training set (Dt), and the rest of the folds aredistributed in validation sets used to stop training epochs (De),fitness evaluation (Df), estimation of combination points in theROC space (Dc) and selection of the operational point (Ds). Fourimbalance levels are produced for each training and validation set,picking different number of negative samples from the CM andUM. The levels of class imbalance used for the different test blocksare 1/20, 1/35, 1/80, 1/55, 1/100, 1/70, 1/50 and 1/15, for t¼1, …,8 respectively. The changes in the class imbalance of the test setsare obtained by randomly removing individuals from each block of30-min.

The proposed system is evaluated at transaction-level using theROC and PROC spaces after selecting the operations point for a fixed ”

fpr¼1%. The operational measures used in the characterization arethe fpr, tpr (or recall), precision and the F1 measure. The ambiguity isused as a measure of the diversity of opinions generated by the baseclassifiers trained on different imbalances [15].

Individual specific analysis is employed following the Dodding-ton's Zoo taxonomy [45,46], with the thresholds shown in Table 3.Finally, time-based analysis is employed to see the adaptation ofthe system to the operational class imbalance over time.

5.4. Results

This section presents the results obtained after computer simula-tions, divided into four different levels of analysis. The first levelpresents transaction-based analysis, which corresponds to the eva-luation of the classification system after the presentation of eachsingle facial region, and its evolution as the system adapts to the

Fig. 11. Generic video-based FR system used in video surveillance applications.


imbalance in the environment. It is known that biometric systemshave different performance depending on each specific case, and thesecond level of analysis presents the individual-specific characteriza-tion of the system. The third level of analysis is related to thefunctionality of the system to perform operational imbalance estima-tion. Finally, as a video-based FR system, the trajectory level analysispresents the overall evaluation of the system for trajectories from thedifferent individuals of interest.

5.4.1. Transaction-based analysisTable 4 shows the average performance of the system for the

different approaches after selecting the operations point for adesired fpr¼1%. The first two approaches are the referencesystems that use the baseline balanced DPSO generation method,either using BC or the proposed approach. These two approachespresent the same initial performance, the F1 score presented bythe proposed approach after adaptation to each block of test datais higher than DPSOþBC. This superiority is product of the betterestimation of the operations point when the fusion functionconsiders the class imbalance in the environment, which resultsin a more accurate combination than employing balanced trainingwithout imbalance estimation. It is remarkable to observe that theproposed approach preserves a fpr closer to the desired fpr¼1%,which is an evidence of the correct exploitation of the imbalanceto select a more accurate operations point.

The last two approaches presented in Table 4 correspond to thesame approaches as the first two, but replacing the balancedgeneration by the proposed imbalanced generation scheme. Asimilar trend can be observed when the combination methods arecompared. The proposed approach overcomes the performance ofimbalanced trainingþBC in terms of F1 score, and the rejection offalse positives (fpr) obtained by the proposed approach is moreaccurate than using imbalanced trainingþBC. This trend confirmsthat the adaptive capacity of the proposed approach provides apowerful tool for combination in environments with changingimbalance, regardless of the generation method.

In conclusion, skew-sensitive ensembles are benefited by con-sidering different levels of imbalance and complexities for trainingthe pool of base classifiers. And adapting the fusion function to themost recent operational imbalance employing the proposedscheme allows to provide a higher level of performance, mainlyin the capacity of the system to preserve a low fpr.

5.4.2. Individual-specific analysisFollowing an individual-specific analysis, Table 5 shows the

average fpr, tpr, precision and F1 performance measures for two ofindividuals enrolled to the system. The performance of the eighttest blocks with different imbalance levels are included, followingthe same structure as Table 4. The levels of imbalance for eachblock are shown in the first row of Table 7.

According to the initial performance presented by the systemfor individual 58, it can be categorized as a goat-like individual(see Table 3). For this individual, the tpr is initially low ðtpro55%Þand maintained at that level for all test blocks except for t¼5. Andthe initially low fpr level that is very close to the desired 1%, is also

maintained low through the operation of the system. This evi-dences that the performance for this goat-like individual isbenefited by the adaptation to the operational imbalance, but itremains in the same Doddington category with a low tpr, but alsoa low fpr regardless of the adaptation. It can also be seen thatadapting the fusion function to the operational imbalance canpotentially increase the tpr of the system at certain imbalances, asit happened for t¼3 and t¼5.

Similarly, according to the initial performance shown forindividual 209, it can be categorized as a lamb-like individual.Although a high level of positive detections is presented by thisindividual-specific ensemble, a high level of negative acceptancesis also shown in Table 5. This high fpr is significantly reduced afterthe system is adapted to the operational imbalance, becomingmore accurate to discard the non-target samples. On the otherhand, the tpr for this module is initially high tpr480%, and ismaintained or increased when the system is adapted to theoperational imbalance. This shows the effectiveness of the systemto maintain or even increase the amount of correct positivedetections when the operational imbalance is taken into account.It also confirms the difficulty faced by the BC algorithm in theestimation of detection thresholds with balanced validation data.

5.4.3. Approximation of operational imbalanceAs the operational imbalance changes over time, the system

produces an estimate of such imbalance, and the accuracy of thisestimation directly depends on the levels of imbalance consideredin the initial set of imbalances Λ. A sensitivity analysis wasperformed by variating the amount of imbalance levels in Λ, fromfive imbalances besides the balanced set, to 100 evenly sampledadding five imbalances at the time. In that manner, the first set iscomposed of the balanced set plus 5 different imbalances ðΛ1 ¼f1 : 1;1 : 20;1 : 40;1 : 60;1 : 80;1 : 100gÞ, the second is composedof balanced and 10 imbalances ðΛ2 ¼ f1 : 1;1 : 10;1 : 20;…;

1 : 100gÞ, the third is composed of balanced and 20 imbalancesðΛ20 ¼ f1 : 1;1 : 2;1 : 3;…;1 : 20gÞ, and the last set contains 50imbalances ðΛ50 ¼ f1 : 1;1 : 2;1 : 3;…;1 : 50gÞ.

Table 6 presents the performance evaluated for the whole systemusing different resolutions of Λ, using a single test set with themaximum level of imbalance used in the experiment, λmax ¼ 1 : 100.Zooming to a more general scope, we can conclude that it may not benecessary to use the highest available resolution in terms of knownlevels of class imbalance (Λ), in order to obtain a good estimation ofthe operational imbalance.

The accuracy of the method based on the Hellinger distancewith respect to a set Λ of validation sets with different levels ofimbalance (employed by SSBC) was compared with the HDx andHDy quantification in the real scenario. Table 7 presents theaverage imbalance estimated with different sizes of Λ for theestimation based on different validation sets (Λ) and the proposedHDx method, for the test blocks t¼2, 5, 7 and 8 (from Table 4).Blocks for t¼2, 8 were selected for its relatively small imbalance(1:35 or less), block for t¼7 presents medium imbalance (1:50),and the block for t¼5 presents the maximum imbalance (1:100).

Results in Table 7 show that an increase in the number ofimbalance levels employed in Λ is related to a more accurateestimation of the operational imbalance. However there is a limitimposed by the difference in the data distributions of the samplesused for validation and those captured in operations. This limitbecomes evident by observing the great difference between the trueimbalance (first row) and the estimated imbalance with the differentmethods. Comparing the approximation based on the Λ validationsets with the HDx quantification, the first shows a higher accuracy forimbalances close to 1:50, although fails for other cases. In the samesense, the HDx quantification is better for small imbalances (close to

Table 3Doddington's zoo taxonomy for binary decisions. False negative rate (fnr) and falsepositive rate (fpr) thresholds are applied to each individual-specific ensemble.

Category Target class Non-target class

Sheep tprZ55% and not a lamb fprr1%Lamb At least 5% of non-target individuals are wolves –

Goat tpro55% and not a lamb –

Wolf – fpr41%


1:15), although it fails to estimate greater operational imbalances.However, a more detailed analysis and comparison is required inorder to evidence which of the methods provide a better estimation,as follows below.

A more deep characterization of the Hellinger distance with realdata is presented in Fig. 12. The Hellinger distance was obtained bycomparing a test set with different (but fixed) imbalances againstrandomly selected validation samples fitting all the possible imbal-ances (prior probabilities), either in the feature or score spaces. Thecurves shown in Fig. 12 evidence the difficulty faced by thequantification methods that employ the Hellinger distance to esti-mate the imbalance of a set of test samples. Fig. 12a–c shows that theimbalance for the goat-like individual 58 is more easily estimatedemploying the Hellinger distance in the feature space, and the scorespace produces less accurate estimations. However, both methods areaccurate when the imbalance is high (target prior close to 0.01). Thiseffect is related to the difficulty of the classification problem, as it wasseen in Section 4. Fig. 12d–f show that the imbalance estimation forthe lamb-like individual 209 is also challenging for both methods,that in all cases fail in finding the true target prior probability.However, the Hellinger distance estimated in the feature space seemsto provide a better estimation of the target prior probability.

Fig. 13 shows the real and estimated imbalances for the sametrajectory, with randomized ROIs for generalization purposes. The Λsets employed in the simulation were Λ1, Λ4, Λ10, with 5, 20 and 50levels of imbalance respectively, and the HDx quantification method.The operational imbalance was estimated every 3 minwith a windowthat considers operational data for the last 15 min of captures andcorresponds to the black dashed line. The true imbalance estimatedover time corresponds to the red solid line. It can be seen that the

estimation of class imbalance for the first minutes falls to zero in thefour cases, which is related to the initial state of the system with anempty buffer of operational samples. The highest peak in the curvefor true class imbalance was chosen for a visual comparison, whichappears close to 140 min. The blue ellipses in the four graphs showthe estimated imbalance levels, showing that the best fit between realand estimated imbalances is given by the HDx quantification, with anarrower peak closer to the solid red graph. However, this tendency isnot always true, as can be seen looking at the peak of the blackdashed line that appears between 90 and 100 min in the four cases,indicating that the estimated imbalance was better with any of the Λsets. This shows that even though the HDx quantification performsbetter than the raw comparison of Hellinger distance betweenoperational and validation histograms, there is a limit in the estima-tion related to the data used in validation. In any case, the superiorityof the HDx quantification is evidenced by the more objectivecomparison shown below.

A numeric estimation of the difference between the real andestimated imbalance curves is the mean squared error (MSE), whichis widely employed in statistics to measure the average of the squaresof the differences between the estimation and the quantity that isestimated. Fig. 14 presents the average of the mean squared errorbetween the true and estimated imbalances for all the 20 differentresolutions used in the experiment using the method that employsdifferent validation sets [4], and the HDy and HDx quantificationmethods [12]. The results involve the 24 replications of the experi-ment and the 10 individuals of interest. In Fig. 14, the mean of theMSE observed in the first bar that corresponds to 5 levels ofimbalance in Λ is close to 0.162, and drops to 0.137 when 20 levelsof imbalance are employed. After using 20 levels of imbalance, the

Table 4Average performance for different approaches for a target 1% fpr on test blocks at different t times, including the different individuals enrolled to the system. The standarderror is detailed between parenthesis.

Approach Measure t¼1 t¼2 t¼3 t¼4 t¼5 t¼6 t¼7 t¼8

Balanced trainingþBC fpr 4.80% 4.14% 5.93% 5.57% 4.35% 4.11% 3.00% 3.19%(0.032) (0.023) (0.030) (0.023) (0.024) (0.022) (0.014) (0.021)

tpr 57.02% 57.63% 58.39% 59.49% 58.09% 56.29% 55.61% 54.70%recall (0.317) (0.327) (0.213) (0.230) (0.223) (0.262) (0.349) (0.342)precision 43.28% 36.82% 20.09% 26.04% 24.11% 27.47% 36.99% 54.96%

(0.190) (0.191) (0.086) (0.082) (0.132) (0.143) (0.194) (0.249)F1 0.436 0.400 0.267 0.328 0.302 0.326 0.394 0.479

(0.225) (0.226) (0.110) (0.117) (0.155) (0.172) (0.248) (0.284)

Balanced trainingþSSBC fpr 4.80% 1.17% 1.61% 1.69% 1.17% 1.08% 0.55% 0.62%(0.032) (0.011) (0.007) (0.009) (0.007) (0.009) (0.005) (0.006)


(0.190) (0.300) (0.163) (0.144) (0.184) (0.208) (0.303) (0.313)F1 0.436 0.428 0.339 0.372 0.339 0.338 0.441 0.453

(0.225) (0.272) (0.154) (0.179) (0.209) (0.233) (0.311) (0.328)

Imbalanced trainingþBC fpr 4.96% 4.25% 5.18% 5.06% 4.30% 4.15% 3.33% 4.03%(0.037) (0.025) (0.025) (0.021) (0.025) (0.022) (0.015) (0.028)


(0.196) (0.180) (0.079) (0.086) (0.089) (0.100) (0.183) (0.239)F1 0.456 0.420 0.302 0.351 0.297 0.320 0.408 0.502

(0.220) (0.209) (0.094) (0.115) (0.111) (0.131) (0.220) (0.261)

Proposed approach fpr 4.96% 1.78% 1.69% 1.92% 1.52% 1.49% 1.06% 1.60%(0.037) (0.018) (0.009) (0.013) (0.008) (0.010) (0.006) (0.013)


(0.196) (0.302) (0.155) (0.182) (0.136) (0.166) (0.263) (0.315)F1 0.456 0.491 0.445 0.467 0.428 0.427 0.510 0.541

(0.220) (0.262) (0.112) (0.180) (0.170) (0.228) (0.300) (0.328)


reduction in the MSE for more levels of imbalance in Λ is notsignificant but consistent, as evidenced by a median of 0.128 and0.125 for 50 and 100 imbalances respectively. Finally, the HDy andHDx quantification methods present a significantly lower averageMSE of 0.117 and 0.101 respectively.

A Kruskal–Wallis analysis on the complete set of results using theoriginal approximation method (first 20 boxplots) throws a p-valueof 6:82� 10�29r0:05, which confirms that the differences in MSEbetween the estimated and real imbalances are significant with a95% confidence interval. The same analysis on the last 17 test casesusing the original method (removing Λ1 with 5, 10 and 15 levels ofimbalance) throws a higher p-value of 0:119340:05, which meansthat there is no significant difference between all the last 17 caseswith a confidence interval of 95%. However, pairwise Kruskal–Wallisanalysis for ðΛ4;Λ20Þ and ðΛ5;Λ20Þ produce p-values of 0:0021o0:05

and 0:0436≮0:05, respectively, confirming a significant difference.Thus, according to these results using more levels of imbalance in Λprovides significantly higher resolution for imbalance estimation.Finally, the Kruskal–Wallis test between the original imbalanceestimation method with Λ20 and the HDx quantification throws ap-value of 3:94� 10�1750:05, showing a significant superiority ofthe HDx quantification method when comparing the method basedon several validation sets.

5.5. Trajectory-level analysis

In this scenario videos were concatenated one after the other,emulating a passport checking scenario where individuals approx-imate to the camera one after the other from the waiting line. Fourblocks of 30 min were obtained (D1, D2, D3 and D4), showing differentimbalances in a realistic scenario. The first two blocks are composedof trajectories from capture session 2, and the last two blocks arecomposed of trajectories from capture session 3. Trajectories fromblocks D1 and D3 were captured with an unzoomed camera, andtrajectories from blocks D2 and D4 were captured with a zoomedcamera. The four blocks were presented to the system in order.

Table 8 shows the average performance of the system usingbalanced BC and SSBC for the passport checking scenario, afterselecting the operations point at fpr¼1%. It can be seen that theperformance of the proposed approach is significantly higher thanthe performance for the reference system. And comparing the fpr forboth systems, it can be seen that the performance superiority of theproposed approach is mainly due to its capacity to keep a lowamount of false alarms after the operations point is adapted to theoperational imbalance. This capacity proposed approach is related to

Table 5Average performance measures for different individuals enrolled to the system, setting a target 1% fpr on test blocks at different t times. The standard error is detailedbetween parenthesis.

Approach Measure t¼1 t¼2 t¼3 t¼4 t¼5 t¼6 t¼7 t¼8

Module for individual 58Imbalanced trainingþBC fpr 1.71% 1.73% 3.18% 2.63% 3.96% 3.27% 2.98% 3.06%

(0.027) (0.023) (0.040) (0.031) (0.041) (0.035) (0.038) (0.032)tpr 32.24% 33.48% 46.23% 31.94% 56.97% 38.89% 36.14% 36.10%recall (0.320) (0.391) (0.360) (0.329) (0.333) (0.351) (0.358) (0.390)precision 56.55% 48.65% 33.62% 29.30% 27.14% 23.99% 24.46% 43.11%

(0.370) (0.284) (0.242) (0.222) (0.179) (0.161) (0.160) (0.263)F1 0.351 0.317 0.331 0.270 0.318 0.255 0.248 0.307

(0.291) (0.295) (0.225) (0.218) (0.163) (0.176) (0.206) (0.242)Proposed approach fpr 1.71% 0.16% 0.36% 0.39% 1.15% 0.74% 0.57% 1.05%


(0.370) (0.311) (0.278) (0.241) (0.178) (0.186) (0.290) (0.361)F1 0.351 0.348 0.530 0.329 0.556 0.362 0.370 0.365

(0.291) (0.366) (0.330) (0.224) (0.184) (0.211) (0.251) (0.343)

Module for individual 209Imbalanced trainingþBC fpr 9.97% 8.04% 5.79% 7.45% 3.64% 4.32% 4.66% 7.93%


(0.225) (0.209) (0.140) (0.136) (0.175) (0.183) (0.223) (0.254)F1 0.489 0.421 0.363 0.386 0.469 0.491 0.507 0.557

(0.233) (0.211) (0.171) (0.162) (0.195) (0.196) (0.230) (0.241)Proposed approach fpr 9.97% 4.19% 2.57% 4.00% 1.77% 2.15% 1.65% 3.12%


(0.225) (0.183) (0.143) (0.173) (0.143) (0.168) (0.157) (0.241)F1 0.489 0.536 0.534 0.529 0.585 0.628 0.694 0.725

(0.233) (0.201) (0.158) (0.209) (0.156) (0.148) (0.166) (0.208)

Table 6Average performance measures for different sizes of Λ, for a target 1% fpr on a testblock with the maximum imbalance λmax ¼ 1 : 100. The standard error is detailedbetween parenthesis.

Measure Λ1 Λ2 Λ20 Λ50

fpr 1.5895% 1.5229% 1.5314% 1.4932%(0.1760) (0.1552) (0.1478) (0.1514)

tpr 58.4093% 58.3262% 58.4746% 57.9274%(5.6170) (5.5146) (5.3191) (5.3279)

precision 37.4892% 38.4638% 38.6255% 38.7287%(2.8750) (2.7846) (2.8232) (2.9378)

F1 0.4223 0.4277 0.4310 0.4286(0.0356) (0.0346) (0.0338) (0.0347)


the employment of the widely available non-target samples toestablish the decision frontier at the combination function, enhan-cing the discrimination between target and non-target classes.

The face trajectories built using the IVT face tracker to regrouptarget facial regions were used for trajectory-based analysis of thesystem in this real passport-checking scenario. The first time a face isfound in the video sequence, the location of the facial region isemployed to initialize the tracker that follows it until the individualleaves the scene. Target predictions produced by the system wereaccumulated over time for full trajectories to provide overall decisions,and the detection threshold was applied to these accumulations.

Fig. 15 presents an example of the accumulation of detectionsproduced by the EoD trained on samples from individual 151, forthe sequence of individuals entered in the scene over time. Twozoomed regions that are representative of the system response arealso shown in the same figure. The accumulation of positivepredictions produced in response to the target trajectory aredrawn in a bold, solid blue line, and the accumulations for non-target trajectories are drawn in bold, dashed red line. The detec-tion threshold is drawn with a dashed black horizontal line. Targetand non-target trajectories produce accumulation levels that maysurpass the detection thresholds, producing true and false positive

Table 7Actual imbalance in test and the average number of ROIs for target individuals, as well as average imbalance estimated with the different lambda values and the HDxmethod.

t¼2 t¼5 t¼7 t¼8

Imbalance in test blocks1:35 1:100 1:50 1:15Average target ROIs per block for 10 individuals65.5000 116.4000 113.3000 95.0000(4.6170) (5.8731) (5.8425) (6.4395)Estimated non-target ROIs jLambdaj ¼ 5 (2 estimations per block – every 15 min)59.5833 69.6667 59.7500 79.5000 74.3333 61.0000 65.9167 60.7500(0.2464) (0.3079) (0.2456) (0.3073) (0.3692) (0.2877) (0.3238) (0.3102)Estimated non-target ROIs jLambdaj ¼ 20 (2 estimations per block – every 15 min)51.0000 61.7708 55.7917 72.8125 69.0625 53.0833 60.7708 54.0833(0.3077) (0.2740) (0.2681) (0.3064) (0.2917) (0.3264) (0.3721) (0.2513)Estimated non-target ROIs jLambdaj ¼ 50 (2 estimations per block – every 15 min)47.2667 57.7417 53.2417 68.8833 64.8917 51.0417 55.5083 51.8667(0.3239) (0.2979) (0.2752) (0.2573) (0.2943) (0.3308) (0.3146) (0.2544)Estimated non-target ROIs HDx (2 estimations per block – every 15 min)9.0667 9.7958 8.5417 10.4583 13.4375 11.7125 11.6875 11.2917(0.4864) (0.4671) (0.1920) (0.4212) (0.6890) (0.5364) (0.3914) (0.3473)

Individual 58

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

Individual 209

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Hel

linge

r Dis

tanc

e

Fig. 12. Hellinger distance between validation and test data from target and non-target distributions across different prior probabilities. The small circles correspond to theglobal minimum of the estimations, and constitute the approximation to the target prior probability. The experiment was realized with data from target individuals 58 and209 and randomly selected non-target samples: Target prior (a, d) 0.4; (b, e) 0.1; and (c, f) 0.01


detections. In the left zoomed area in Fig. 15, the target trajectorywas correctly detected, whereas one of the non-target trajectorieswas incorrectly recognized as belonging to the target individual. Inthe right zoomed area in Fig. 15, the target trajectory was detectedwith a higher accumulation than the initial left zoomed area, andthe non-target trajectories were correctly rejected showing anincreased discrimination after adapting the system to the opera-tional imbalance.

As followed in the protocol, the first adaptation of the fusionfunction is performed after 30 min of operation, where the lastblock of operational samples are used for imbalance estimation.When the first capture session is presented, the discriminationbetween target and non-target trajectories is less clear, evidencedby some false positive detections (see the left zoomed area inFig. 15). When the operation point is adapted – after 30 min –, thesystem increases its capacity to discriminate between target and

0 30 60 90 120 150 180 210 2400

10

20

30

40

50

60

70

80

90

100

time (minutes)

clas

s im

bala

nce

(ske

w le

vel)

Class imbalanceEstimated class imbalance

0 30 60 90 120 150 180 210 2400

10

20

30

40

50

60

70

80

90

100

time (minutes)

clas

s im

bala

nce

(ske

w le

vel)

0 30 60 90 120 150 180 210 2400

10

20

30

40

50

60

70

80

90

100

time (minutes)

clas

s im

bala

nce

(ske

w le

vel)

0 30 60 90 120 150 180 210 2400

10

20

30

40

50

60

70

80

90

100

time (minutes)

clas

s im

bala

nce

(ske

w le

vel)

Fig. 13. Adaptation of the level of class imbalance over time, corresponding to individual 58 at the first experimental trial. Comparison of four different sizes of jΛjcorresponding to 5 (a), 20 (b) and 50 (c) levels of imbalance, for an evenly sampled space of imbalances between 1:1 and 1:100. (For interpretation of the references to colorin this figure caption, the reader is referred to the web version of this paper.)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 HDy HDx0.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

Number of imbalances in Lambda

Mea

n S

quar

ed E

rror

Fig. 14. Average mean squared error between real and estimated operational imbalances for different number of imbalance levels in Λ for the method based on differentvalidation sets, compared to the HDx and HDy quantification (right extreme).


non-target trajectories, as shown in the right zoomed area inFig. 15. This is a clear evidence that selecting the operations pointbased on a validation set with the appropriate class imbalanceallows for a better discrimination between target and non-targetclasses, which is extended to the overall trajectory-based responseof the system.

Table 9 shows the average operational imbalance, as well as theaverage overall AUC for the ROC curves obtained over0r fprr0:05 (AUC-5%). The performance of the system for thefirst test block, when the operational imbalance is not considered,is significantly lower in terms of AUC-5%, compared to theperformance after adapting the fusion function. This is the sametendency as the observed in the transaction-based evaluation,

which confirms that the performance increase of the system usingproposed approach can be extended to the overall system perfor-mance in video-to-video FR.

6. Conclusion

In video surveillance, it is often assumed that the proportions offaces captured for target and non-target individuals are balanced,known a priori and do not change over time. Recently, sometechniques have been proposed to adapt the fusion function of anensemble according to class imbalance measured on operational data.However, skew sensitive ensembles commonly employ balancedtraining data to generate diverse pools of base classifiers, limitingthe potential diversity produced using the abundant non-target data,with multiple levels of imbalance and complexity.

In this paper, skew-sensitive adaptive classifier ensembles havebeen investigated and applied to video-to-video FR in video surveil-lance applications. The proposed scheme allows one to combineclassifiers trained by selecting data with varying levels of imbalanceand complexity, and leads to a significant improvement of system'saccuracy and robustness. In this way, the assumption of a balancedgeneration of classifiers is discarded. During enrollment, target facial

Table 8Average performance measures for different approaches for a target 1% fpr on testblocks at different t times. The standard error is detailed between parenthesis, andbold numbers symbolize significant difference in terms of F1 measure with respectto the other approach.

Approach Measure t¼1 t¼2 t¼3 t¼4

Reference system fpr 5.15% 4.15% 4.71% 3.30%(0.025) (0.024) (0.023) (0.014)

tpr 61.54% 56.94% 59.74% 59.41%recall (0.171) (0.234) (0.283) (0.313)precision 23.19% 24.67% 30.61% 34.43%

(0.077) (0.099) (0.154) (0.171)F1 0.300 0.307 0.363 0.383

(0.094) (0.135) (0.183) (0.217)

Proposed approach fpr 5.15% 1.47% 1.61% 1.11%(0.025) (0.010) (0.013) (0.006)

tpr 61.54% 54.60% 49.79% 54.40%recall (0.171) (0.327) (0.341) (0.354)precision 23.19% 40.82% 48.82% 48.13%

(0.077) (0.158) (0.251) (0.247)F1 0.300 0.422 0.434 0.477

(0.094) (0.204) (0.238) (0.285)

Fig. 15. Examples of target detection accumulations for concatenated input trajectories corresponding to the module trained for individual 151. The left and right zoomedviews of the graph show the target individual entering in the scene, as well as two non-target individuals with ID 174 and 188. (For interpretation of the references to color inthis figure caption, the reader is referred to the web version of this paper.)

Table 9Average operational imbalance and overall AUC-5% for the reference system andthe proposed approach, considering the 10 individuals over 24 trials. The standarderror is shown in parenthesis.

Performance t¼1 t¼2 t¼3 t¼4

Average imbalance 1:15.73 1:16.02 1:10.14 1:15.16Average target ROIs 85.3 102.7 79.3 95.0

(7.07) (6.56) (5.35) (6.44)Reference system (AUC-5%) 67.87 67.67 71.41 73.36

(2.21) (2.40) (2.36) (2.28)Proposed approach (AUC-5%) 67.87 79.45 78.61 74.07

(2.21) (1.98) (2.14) (2.57)


captures from a reference trajectory are combined with selectedcaptures from non-target trajectories to generate a pool of 2-classclassifiers using data with various levels of imbalance and complex-ity. During operations, face captures of each person in the scene aretracked and regrouped into trajectories for video-to-video FR, produ-cing enhanced discrimination between target and non-target trajec-tories. The level of imbalance is periodically estimated from the inputdata stream using the HDx quantification, and pre-computed histo-gram representations of imbalanced data distributions. The HDxquantification method allows one to reduce the problem of discard-ing useful non-target samples to select the appropriate class imbal-ance for validation. Finally, pre-computed histograms and ensemblefusion functions are updated based on the imbalance and complexityof operational data.

Results on synthetic problems show that the combination of theclassifiers trained with different imbalance levels and complexitiesincreases ensemble diversity and robustness, leading to an increase inthe ROC and precision–recall performances. A comparison of imbal-ance quantification based on Hellinger distance in score and featurespaces shows that feature-based estimation is more accurate whenthe probability of error is high. Similarly, results on the CMU-FIAvideo data show that the proposed method can outperform othertechniques in imbalanced environments. In that sense, transaction-based analysis shows a significantly higher performance in terms of F1measure, that is consistently higher for different operational imbal-ances. Individual-specific analysis indicates that goat- and lamb-likeindividuals can benefit the most from adaptation to the operationalimbalance. Trajectory-based analysis shows that the improvementpresented at transaction level is propagated to the overall perfor-mance evaluated in a realistic video-to-video FR scenario.

The future work should consider exploiting the class imbalance atdecision fusion level, setting imbalance-specific thresholds for theestimated test skew. Although HDx quantification method providedthe highest accuracy with respect to the compared methods, there isstill room for further improvement. Further characterization of thesystem in different and more challenging scenarios would be inter-esting, including for instance crowded and outdoor places. Otherapplications like gait-based biometrics may also be benefited from thefindings of this research, since several individuals appear in videos.Finally, adaptation to permanent changes in the probability distribu-tion of data due to changes in facial appearance may be addressedemploying self-update techniques, leading to further improvement inthe performance of the system.

Conflict of interest

None declared.

Acknowledgements

This work was partially supported by the Natural Sciences andEngineering Research Council of Canada, and the Defence Researchand Development Canada's Centre for Security Science PublicSecurity Technical Program (project PSTP-03-401BIOM). This workwas also supported by the Program for the Improvement of theProfessoriate of the Secretariat of Public Education (folio UDG-612), Mexico, the Mexican National Council for Science andTechnology (folio 312337, reg. 42091).

References

[1] M. De-la Torre, E. Granger, P.V.W. Radtke, R. Sabourin, D.O. Gorodnichy,Incremental update of biometric models in face-based video surveillance, in:Proceedings of IJCNN, Brisbane, Australia, 2012, pp. 1–8.

[2] M. De-la Torre, E. Granger, P.V. Radtke, R. Sabourin, D.O. Gorodnichy, Partially-supervised learning from facial trajectories for face recognition in videosurveillance, Inf. Fusion 24 (2015) 31–53. http://dx.doi.org/10.1016/j.inffus.2014.05.006.

[3] P. Radtke, E. Granger, R. Sabourin, D. Gorodnichy, Adaptive ensemble selectionfor face re-identification under class imbalance, in: Z.-H. Zhou, F. Roli, J. Kittler(Eds.), Multiple Classifier Systems, Lecture Notes in Computer Science, vol.7872, Springer, Berlin, Heidelberg, 2013, pp. 95–108.

[4] P.V. Radtke, E. Granger, R. Sabourin, D.O. Gorodnichy, Skew-sensitive booleancombination for adaptive ensembles - an application to face recognition invideo surveillance, Inf. Fusion 20 (2013) 31–48. http://dx.doi.org/10.1016/j.inffus.2013.11.001.

[5] X. Guo, Y. Yin, C. Dong, G. Yang, G. Zhou, On the class imbalance problem, in:2008 Fourth International Conference on Natural Computation, vol. 4, Piscaat-way, NJ, USA, 2008, pp. 192–201.

[6] T.C.W. Landgrebe, et al., Precision–recall operating characteristic (p-roc) curves inimprecise environments, in: Proceedings of ICPR, 2006, pp. 123–127.

[7] G. Forman, Quantifying Trends Accurately Despite Classifier Error and ClassImbalance, vol. 2006, Philadelphia, PA, United states, 2006, pp. 157–166.

[8] V. Lopez, A. Fernandez, S. Garcia, V. Palade, F. Herrera, An insight intoclassification with imbalanced data: empirical results and current trends onusing data intrinsic characteristics, Inf. Sci. 250 (2013) 113–141. http://dx.doi.org/10.1016/j.ins.2013.07.007.

[9] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review onensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, IEEE Trans. Syst. Man Cybern. 42 (2011) 463–484.

[10] G. Ditzler, R. Polikar, Incremental learning of concept drift from streamingimbalanced data, , IEEE Trans. Knowl. Data Eng. 25 (10) (2013) 2283–2301.http://dx.doi.org/10.1109/TKDE.2012.136.

[11] S. Oh, M.S. Lee, B.-T. Zhang, Ensemble learning with active example selectionfor imbalanced biomedical data classification, IEEE/ACM Trans. Comput. Biol.Bioinform. 8 (2) (2011) 316–325 http://dx.doi.org/ieeecomputersociety.org/10.1109/TCBB.2010.96.

[12] V. Gonzalez-Castro, R. Alaiz-Rodriguez, E. Alegre, Class distribution estimationbased on the Hellinger distance, Inf. Sci. 218 (2013) 146–164.

[13] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd Edition, Wiley,Michigan, U.S., 2001.

[14] L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley,New Jersey, 2004.

[15] G. Zenobi, P. Cunningham, Using diversity in preparing ensembles of classifiersbased on different feature subsets to minimize generalization error, in:L. Raedt, P. Flach (Eds.), Machine Learning: ECML 2001, Lecture Notes inComputer Science, vol. 2167, Springer, Berlin, Heidelberg, 2001, pp. 576–587.

[16] A.S. Britto, R. Sabourin, L.E. Oliveira, Dynamic selection of classifiers—acomprehensive review, Pattern Recognit. 47 (2014) 3665–3680. http://dx.doi.org/10.1016/j.patcog.2014.05.003.

[17] J. Kittler, Combining classifiers: a theoretical framework, Pattern Anal. Appl. 1(1998) 18–27.

[18] J.-F. Connolly, E. Granger, R. Sabourin, Evolution of heterogeneous ensemblesthrough dynamic particle swarm optimization for video-based face recogni-tion, Pattern Recognit. 45 (7) (2012) 2460–2477.

[19] Q. Tao, R. Veldhuis, Hybrid fusion for biometrics: combining score-level anddecision-level fusion, in: 2008 IEEE Computer Society Conference on Compu-ter Vision and Pattern Recognition Workshops (CVPR Workshops), Piscataway,NJ, USA, 2008, pp. 1–6.

[20] W. Fan, S.J. Stolfo, J. Zhang, P.K. Chan, Adacost: misclassification cost-sensitiveboosting, in: Proceedings of the Sixteenth International Conference onMachine Learning, ICML '99, Morgan Kaufmann Publishers Inc., San Francisco,CA, USA, 1999, pp. 97–105. ⟨http://dl.acm.org/citation.cfm?id=645528.657651⟩.

[21] F. Wu, Comparing boosting and cost-sensitive boosting with imbalanced data,J. Converg. Inf. Technol. 7 (21) (2012) 1–8. http://dx.doi.org/10.4156/jcit.vol7.issue21.1.

[22] G. Ditzler, R. Polikar, An ensemble based incremental learning framework forconcept drift and class imbalance, in: WCCI 2010 IEEE World Congress onComputational Intelligence, 2010.

[23] S. Wang, X. Yao, Diversity analysis on imbalanced data sets by using ensemblemodels, in: CIDM'09, 2009, pp. 324–331.

[24] R. Barandela, R. Valdovinos, J. Sánchez, New applications of ensembles ofclassifiers, Pattern Anal. Appl. 6 (3) (2003) 245–256. http://dx.doi.org/10.1007/s10044-003-0192-z.

[25] X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class-imbalancelearning, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39 (2) (2009) 539–550.http://dx.doi.org/10.1109/TSMCB.2008.2007853.

[26] S. Wang, L. Minku, D. Ghezzi, D. Caltabiano, P. Tino, X. Yao, Concept driftdetection for online class imbalance learning, in: The 2013 International JointConference on Neural Networks (IJCNN), 2013, pp. 1–10. http://dx.doi.org/10.1109/IJCNN.2013.6706768.

[27] M.C. du Plessis, M. Sugiyama, Semi-supervised learning of class balance underclass-prior change by distribution matching, CoRR abs/1206.4677, 2012, pp. 1–26. http://arxiv.org/abs/1206.4677.

[28] S. Wang, L. Minku, X. Yao, A learning framework for online class imbalancelearning, in: 2013 IEEE Symposium on Computational Intelligence andEnsemble Learning (CIEL), 2013, pp. 36–45. http://dx.doi.org/10.1109/CIEL.2013.6613138.


http://dx.doi.org/10.1016/j.inffus.2014.05.006




http://refhub.elsevier.com/S0031-3203(15)00187-9/sbref3








http://dx.doi.org/10.1016/j.ins.2013.07.007







http://dx.doi.org/10.1109/TKDE.2012.136



dx.doi.org/10.1109/TCBB.2010.96

dx.doi.org/10.1109/TCBB.2010.96




















http://dl.acm.org/citation.cfm?id=645528.657651

http://dl.acm.org/citation.cfm?id=645528.657651

http://dx.doi.org/10.4156/jcit.vol7.issue21.1




http://dx.doi.org/10.1007/s10044-003-0192-z

http://dx.doi.org/10.1007/s10044-003-0192-z

http://dx.doi.org/10.1007/s10044-003-0192-z

http://dx.doi.org/10.1007/s10044-003-0192-z

http://dx.doi.org/10.1109/TSMCB.2008.2007853



dx.doi.org/10.1109/IJCNN.2013.6706768

dx.doi.org/10.1109/IJCNN.2013.6706768

http://arxiv.org/abs/1206.4677

dx.doi.org/10.1109/CIEL.2013.6613138

dx.doi.org/10.1109/CIEL.2013.6613138

[29] A. Bella, C. Ferri, J. Hernandez-Orallo, M.J. Ramirez-Quintana, Quantificationvia Probability Estimators, Sydney, NSW, Australia, 2010, pp. 737–742.

[30] G. Forman, Quantifying counts and costs via classification, Data Min. Knowl.Discov. 17 (2) (2008) 164–206.

[31] Y.S. Chan, H.T. Ng, Estimating class priors in domain adaptation for word sensedisambiguation, in: Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of the Associationfor Computational Linguistics, Stroudsburg, PA, USA, 2006, pp. 89–96.

[32] V. González-Castro, R. Alaiz-Rodríguez, L. Fernández-Robles, R. Guzmán-Martínez, E. Alegre, Estimating class proportions in boar semen analysis usingthe Hellinger distance, in: Proceedings of the 23rd International Conferenceon Industrial Engineering and Other Applications of Applied IntelligentSystems, Part I, Springer-Verlag, Berlin, Heidelberg, 2010, pp. 284–293.

[33] M. De-la Torre, E. Granger, R. Sabourin, D.O. Gorodnichy, An individual-specificstrategy for management of reference data in adaptive ensembles for face re-identification, in: IET (Ed.), 5th International Conference on Imaging for CrimeDetection and Prevention (ICDP), London, U.K., 2013, pp. 1–7.

[34] C. Pagano, E. Granger, R. Sabourin, D.O. Gorodnichy, Detector ensembles forface recognition in video surveillance, in: IJCNN, Brisbane, Australia, 2012,pp. 1–8.

[35] E. Granger, P. Henniges, R. Sabourin, L.S. Oliveira, Supervised learning of fuzzyart map neural networks through particle swarm optimization, J. PatternRecognit. Res. 2 (2007) 27–60.

[36] C. Drummond, R.C. Holte, Cost curves: An improved method for visualizingclassifier performance, Mach. Learn. 65 (1) (2006) 95–130.

[37] T. Fawcett, An introduction to roc analysis, Pattern Recognit. Lett. 27 (8) (2006)861–874.

[38] W. Khreich, E. Granger, A. Miri, R. Sabourin, Adaptive roc-based ensemble ofhmms applied to anomaly detection, Pattern Recognit. 45 (2012) 208–230.

[39] P. Viola, M. Jones, Robust real-time face detection, Int. J. Comput. Vis. 2 (57)(2004) 137–154.

[40] D.A. Ross, J. Lim, R.-S. Lin, M.-H. Yang, Incremental learning for robust visualtracking, Int. J. Comput. Vis. 77 (2008) 125–141 (Special issue: Learning forVision).

[41] M. De-la Torre, P.V.W. Radtke, E. Granger, R. Sabourin, D.O. Gorodnichy, Acomparison of adaptive matchers for screening of faces in video surveillance,in: Symposium on Computational Intelligence for Security and DefenceApplications, Ottawa, Canada, 2012, pp. 1–8.

[42] R. Goh, L. Liu, X. Liu, T. Chen, The CMU face in action database, in: Analysis andModelling of Faces and Gestures, Carnegie Mellon University, 2005, pp. 255–263.

[43] C.P. Lim, R.F. Harrison, Probabilistic fuzzy artmap: an autonomous neuralnetwork architecture for Bayesian probability estimation, in: Fourth Interna-tional Conference on Artificial Neural Networks, 1995, 1995, pp. 148–153.

[44] C.P. Lim, R.F. Harrison, An incremental adaptive network for on-line super-vised learning and probability estimation, Neural Netw. 10 (5) (1997)925–939.

[45] G. Doddington, W. Liggett, A. Martin, M. Przybocki, D. Reynolds, Sheep, goats,lambs and wolves: a statistical analysis of speaker performance, in: Interna-tional Conference on Spoken Language Processing, 1998, pp. 1351–1354.

[46] A. Rattani, G. Marcialis, F. Roli, An experimental analysis of the relationshipbetween biometric template update and the Doddington's zoo: a case study inface verification, in: Proceedings of the 15th International Conference onImage Analysis and Processing, Berlin, Germany, 2009, pp. 434–42.

Miguel De-la-Torre obtained his bachelor's degree in Computer Engineer University of Guadalajara, and received his M.Sc. degree in computer sciences at the Center ofResearch and Advanced Studies (CINVESTAV), Mexico, in 2005. Since 2006, he is a full time teacher at the Universitary Center of Los Valles, University of Guadalajara, and hewas elected the president of the Academy of Engineering in 2007 and 2008. He earned his Ph.D. at the École de technologie supérieure in 2015, under the supervision ofProfessor Eric Granger, with a project on adaptive multiclassification systems for face based video surveillance. His research interests include adaptive biometric systems,adaptive video-to-video face recognition and adaptive ensembles of classifiers for class imbalance.

Eric Granger obtained a Ph.D. in Electrical Engineering from the École Polytechnique de Montréal in 2001, and from 1999 to 2001, he was a Defence Scientist at Defence R&DCanada in Ottawa. Until then, his work was focused primarily on neural networks for fast classification of radar signals in Electronic Surveillance (ES) systems. From 2001 to2003, he worked in R&D with Mitel Networks Inc. on algorithms and electronic circuits to implement cryptographic functions in Internet Protocol (IP) based communicationplatforms. In 2004, Dr. Eric Granger joined the ETS, Université du Québec, where he has developed applied research activities in the areas of patterns recognition, computervision and microelectronics. He presently holds the rank of Full Professor in System Engineering. Since joining ÉTS, he has been a member of the Laboratoire d'imagerie, devision et d'intelligence artificielle (LIVIA), and his main research interests are adaptive classification systems, incremental learning, change detection, and multiclassifiersystems, with applications in biometrics, video surveillance, and intrusion detection in computer and network security.

Robert Sabourin joined the physics department of the Montreal University in 1977 where he was responsible for the design, experimentation and development of scientificinstrumentation for the Mont Mégantic Astronomical Observatory. His main contribution was the design and the implementation of a micro-processor based fine trackingsystem combined with a low light level CCD detector. In 1983, he joined the staff of the École de Technologie Supérieure, Université du Québec, in Montréal where hecofounded the Department of Automated Manufacturing Engineering where he is a currently Full Professor and teaches Pattern Recognition, Evolutionary Algorithms, NeuralNetworks and Fuzzy Systems. In 1992, he joined also the Computer Science Department of the Pontifícia Universidade Católica do Paraná (Curitiba, Brazil) where he wascoresponsible for the implementation in 1995 of a master program and in 1998 a Ph.D. program in applied computer science. Since 1996, he is a senior member of the Centrefor Pattern Recognition and Machine Intelligence (CENPARMI, Concordia University). Since 2012, he is the Research Chair holder specializing in Adaptive SurveillanceSystems in Dynamic Environments. Dr. Sabourin is the author (and coauthor) of more than 300 scientific publications including journals and conference proceeding. He wascochair of the program committee of CIFED'98 (Conférence Internationale Francophone sur l'Écrit et le Document, Québec, Canada) and IWFHR04 (9th InternationalWorkshop on Frontiers in Handwriting Recognition, Tokyo, Japan). He was nominated as Conference cochair of ICDAR'07 (9th International Conference on DocumentAnalysis and Recognition) that has been held in Curitiba, Brazil in 2007. His research interests are in the areas of adaptive biometric systems, adaptive surveillance systems indynamic environments, intelligent watermarking systems, evolutionary computation and biocryptography.

Dmitry O Gorodnichy (Ph.D., University of Alberta) is a Senior Research Scientist and a founding Leader of the Video Surveillance and Biometrics section at the Science andEngineering Directorate of the Canada Border Services Agency, also an Adjunct Professor at the University of Ottawa and the Université du Québec's École de TechnologieSupérieure. Prior to his employment with the agency, he worked for eight years with National Research Council of Canada, where he led the Video Recognition Systemsproject. He is the author of several patents and over a hundred scientific papers, the editor of Special Issue of Image and Vision Computing journal on Face Processing inVideo Sequences, chair of the International workshops on Face Processing in Video and Video Processing and Recognition and the Government of Canada workshops onVideo Technology for National Security. He is the recipient of the Outstanding Scientific Achievement Award from the National Research Council of Canada, the YoungInvestigator Award from the Canadian Image Processing and Pattern Recognition Society, the Cultural Diversity Leadership Award from University of Alberta, named theLeader of Tomorrow by the Canadian Royal Society Academy of Science's Partnership Group for Science and Engineering. His current interests are in automated bordercontrol and development of video recognition and biometric technologies for border security applications.





















adaptive skew-sensitive ensembles for face recognition in ... · adaptive skew-sensitive ensembles...

Documents