adaboost-1

7/29/2019 AdaBoost-1

1/4

AdaBoost Theory and Application

Ying Qin

AdaBoost, short for Adaptive Boosting, is a machine learning algorithm formulated by Freund and

Shapire. AdaBoost is adaptive in the sense that subsequent classifiers are built to focus on the

instances misclassified by previous classifiers. In this report, I am going to summarize basic theory

about boosting and AdaBoost and introduce the application of AdaBoost in music information

retrieval (MIR).

1BoostingThere is an old saying that states there is strength in numbers, which means, the result of a

group can be higher than the simple sum of its parts. This is also, in some extent, true for

machine learning. Boosting is such a supervised machine learning approach that builds a strong

classifier from weak ones. Each weak classifier receives an input and returns a positive or

negative vote, and the final strong classifier output the weighted voting where the weights

depend on quality of weak classifiers. In this way, every added weak classifier contributes to or

improves the outcome.

The development of boosting algorithm can date back to 1988, when Kearns and Valiant first

explored the potential of boosting a weak classifier (slightly better than chance) into a

strong classifier. Later in 1990, Schapire showed that a learner, even if rough an moderately

inaccurate, could always improve its performance by training two additional classifiers on filtered

version of the input data stream. The first provable polynomial-time boosting algorithm was

discussed in Schapires work The strength of weak learnability. Inspired by Shapire, Freund, in

1995, proposed a far more efficient algorithm by combining a large number of hypotheses.

However, this algorithm has practical drawbacks for it assumes that each hypothesis has a fixed

error rate. Finally the AdaBoost algorithm was introduced in 1997 by Freund and Schapire, which

solved many of the practical difficulties of the previous boosting algorithms (de Haan 2010).

2AdaBoostAdaBoost needs no prior knowledge of the accuracies of the weak classifier. Rather, it iteratively

applies a learning algorithm to the same training data and adds versions to final classifier. At each

iteration, it generates a confidence parameter that changes according to the error of the weak

hypothesis. This is the basis of its name: Ada is short for adaptive.

2.1Understanding AdaBoostGiven a binary classification case, the training set will have both typical and rare samples, and we


2/4

usually have no idea about the importance of samples. Thus, we might just give them equal

weights to initiate training. The classification error is calculated to reweigh the data for next

classifier, and the aim of re-weighting is to make both the correct and error at the rate of 50%.

Since we have a null hypothesis that the error rate is always smaller than , the reweighting will

reduce the weights of correct samples and increase the weights of error samples. In other words,

the weak classifier at first iteration (or the first classifier) is good on average training samples, and

the classifier at second iteration (or the second classifier) is good on errors in the first classifier.

After a number of iterations, the sample weight focuses the attention of the weak learner on the

hard examples near the boundary of two classes.

During the whole training process, once the weak classifier has been received, AdaBoostassigns a confidence parameter to , which is directly related to its error. In this way, wegive more weight to classifier with lower error and this choice can decrease overall error. The

strong classifier results as a weighted linear combination of weak classifiers, whose weights are

determined by error of itself. The algorithm must terminate if 0 which is equivalent to 1/2. A step-by-step illustration of AdaBoost algorithm can be formulated as,

When using AdaBoost, the training data must represent reality, and the total number of samples

must be relatively large compared to the number of features. Since AdaBoost is particularly

suitable to work with many features, it prefers enormous databases. Besides, it is important to

remember that the weak classifier must be weak enough, otherwise the resulting strong learner

might overfit easily. In fact boosting seems to be especially susceptible to noise in such cases. The

most popular choices for weak classifier is decision trees or decision stumps (decision trees with

two leaves), if there is no a-priori knowledge available on the domain of the learning problem.

12 ln 1 > 0

+

.

1. Build distribution , assuming all samples equally important2. For t = 1,,T (rounds of boosting)

- Select weak classifier with the lowest error from a group

- Check if error larger than

(YES: terminate; NO: go on)

- Calculate confidence parameter, weight of sub-classifier

- Re-weight data samples to give poorly classified samples an increased weight

Where is the normalization factor

3. At the end (tth round), the final strong classifier results


3/4

2.2AdaBoost extensionsIt is possible to extend the basic AdaBoost algorithm to obtain better performance. Two major

extensions include abstention and regularization. As we have seen, in the typical AdaBoost the

binary weak learner

:{1,1}, and is therefore forced to give an option for each

examples . This is not always desirable as the weak learner might have not suited to classifyevery x . The solution of this problem is the abstention base classifier which knows when toabstain and has the form :{1,0,1}. Adaboost might overfit in some cases if it is runlong enough. To solve this problem, the general way is to validate the number of iterations ona validation set. While, regularization introduces an edge offset parameter 0 in theconfidence formula

1

2ln 1

1

2ln 1

.

This formula shows how the confidence is decreased by a constant term every iteration,

suggesting a mechanism similar to weight decay for reducing the effect of the overfitting.

2.3Multi-class AdaBoostThe binary AdaBoost is a simple and well understood scenario, but we still need to extend it to

deal with multi-class problems. AdaBoost.M1 is the most simple and straightforward way,

proposed by Freund and Schapire. In this approach, the weak learner is a full multi-class

algorithm itself. The AdaBoost algorithm does not need to be modified in any sense. However,

this method fails if the weak learner cannot achieve at least 50% accuracy on all classes when run

on hard problems. In AdaBoost.MH, proposed by Schapire and Singer, the weak learner receives a

distribution of weights which is on the data and the classes ,. In general, this weight willexpress how hard it is to classify into its correct class (if ,=). Schapire and Singer alsoproposed AdaBoost.MO, which partitions the multi-class problem into a set of binary problems.

This method can be implemented by use of error correcting output codes (ECOC) decomposition

(Casagrande 2005).

3AdaBoost in MIRAdaBoost has been used in a number of MIR problems in recent years. Dixon et al. presented a

method of genre classification with automatically rhythmic using AdaBoost (Dixon et al. 2004).

Casagrande described the approach of using multi-class AdaBoost to classify audio files based on

extracted features (Casagrande 2005). Bergstra et al. presented an algorithm that predicts

musical genre and artist from an audio waveform, using ADABOOST to select from a set of audio

features (Bergstra et al. 2006). Ecket al. proposed a method for predicting the social tags formusic recommendation directly from MP3 files using Adaboost (Eck et al. 2007). Bertin-Mahieux

et al. extended the work of Eck et al. by replacing the AdaBoost batch learning algorithm with the

FilterBoost, an online version of AdaBoost (Bertin-Mahieux et al. 2008). Overall, AdaBoost has


4/4

been proved to be an effective machine learning algorithm for music classification.

Reference

1. De Haan, Gerard. Digital Video Post Processing. Eindhoven, 2010.

2. Bishop, Christopher M. Pattern Recognition and Machine Learning.

Springer-Verlag New York, Inc., 2006.

3. Bergstra, J., N. Casagrande, D. Erhan, D. Eck, and B. Kgl. Aggregate

Features and AdaBoost for Music Classification. Machine Learning 65,

no. 2 (2006): 473-84.

4. Bertin-Mahieux, T., D. Eck, F. Maillet, and P. Lamere. "Autotagger: A

Model for Predicting Social Tags from Acoustic Features on Large

Music Databases." Journal of New Music Research 37, no. 2 (2008):

115-35.5. Casagrande, Norman. 2005. Automatic music classification using

boosting algorithms and auditory features. Computer and operational

research Department University of Montreal Montreal PhD Thesis.

6. Dixon, S., F. Gouyon, and G. Widmer. 2004. "Towards characterization

of music via rhythmic patterns." In Proceedings of the 5th International

Conference on Music Information Retrieval (ISMIR) 509-516.

7. Eck, D., P. Lamere, T. Bertin-Mahieux, and S. Green. "Automatic

Generation of Social Tags for Music Recommendation."Advances in

neural information processing systems 20, no. 20 (2007): 1-8.

adaboost-1

Documents