adaboost-1

Upload: snoopdock

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 AdaBoost-1

    1/4

    AdaBoost Theory and Application

    Ying Qin

    AdaBoost, short for Adaptive Boosting, is a machine learning algorithm formulated by Freund and

    Shapire. AdaBoost is adaptive in the sense that subsequent classifiers are built to focus on the

    instances misclassified by previous classifiers. In this report, I am going to summarize basic theory

    about boosting and AdaBoost and introduce the application of AdaBoost in music information

    retrieval (MIR).

    1BoostingThere is an old saying that states there is strength in numbers, which means, the result of a

    group can be higher than the simple sum of its parts. This is also, in some extent, true for

    machine learning. Boosting is such a supervised machine learning approach that builds a strong

    classifier from weak ones. Each weak classifier receives an input and returns a positive or

    negative vote, and the final strong classifier output the weighted voting where the weights

    depend on quality of weak classifiers. In this way, every added weak classifier contributes to or

    improves the outcome.

    The development of boosting algorithm can date back to 1988, when Kearns and Valiant first

    explored the potential of boosting a weak classifier (slightly better than chance) into a

    strong classifier. Later in 1990, Schapire showed that a learner, even if rough an moderately

    inaccurate, could always improve its performance by training two additional classifiers on filtered

    version of the input data stream. The first provable polynomial-time boosting algorithm was

    discussed in Schapires work The strength of weak learnability. Inspired by Shapire, Freund, in

    1995, proposed a far more efficient algorithm by combining a large number of hypotheses.

    However, this algorithm has practical drawbacks for it assumes that each hypothesis has a fixed

    error rate. Finally the AdaBoost algorithm was introduced in 1997 by Freund and Schapire, which

    solved many of the practical difficulties of the previous boosting algorithms (de Haan 2010).

    2AdaBoostAdaBoost needs no prior knowledge of the accuracies of the weak classifier. Rather, it iteratively

    applies a learning algorithm to the same training data and adds versions to final classifier. At each

    iteration, it generates a confidence parameter that changes according to the error of the weak

    hypothesis. This is the basis of its name: Ada is short for adaptive.

    2.1Understanding AdaBoostGiven a binary classification case, the training set will have both typical and rare samples, and we

  • 7/29/2019 AdaBoost-1

    2/4

    usually have no idea about the importance of samples. Thus, we might just give them equal

    weights to initiate training. The classification error is calculated to reweigh the data for next

    classifier, and the aim of re-weighting is to make both the correct and error at the rate of 50%.

    Since we have a null hypothesis that the error rate is always smaller than , the reweighting will

    reduce the weights of correct samples and increase the weights of error samples. In other words,

    the weak classifier at first iteration (or the first classifier) is good on average training samples, and

    the classifier at second iteration (or the second classifier) is good on errors in the first classifier.

    After a number of iterations, the sample weight focuses the attention of the weak learner on the

    hard examples near the boundary of two classes.

    During the whole training process, once the weak classifier has been received, AdaBoostassigns a confidence parameter to , which is directly related to its error. In this way, wegive more weight to classifier with lower error and this choice can decrease overall error. The

    strong classifier results as a weighted linear combination of weak classifiers, whose weights are

    determined by error of itself. The algorithm must terminate if 0 which is equivalent to 1/2. A step-by-step illustration of AdaBoost algorithm can be formulated as,

    When using AdaBoost, the training data must represent reality, and the total number of samples

    must be relatively large compared to the number of features. Since AdaBoost is particularly

    suitable to work with many features, it prefers enormous databases. Besides, it is important to

    remember that the weak classifier must be weak enough, otherwise the resulting strong learner

    might overfit easily. In fact boosting seems to be especially susceptible to noise in such cases. The

    most popular choices for weak classifier is decision trees or decision stumps (decision trees with

    two leaves), if there is no a-priori knowledge available on the domain of the learning problem.

    12 ln 1 > 0

    +

    .

    1. Build distribution , assuming all samples equally important2. For t = 1,,T (rounds of boosting)

    - Select weak classifier with the lowest error from a group

    - Check if error larger than

    (YES: terminate; NO: go on)

    - Calculate confidence parameter, weight of sub-classifier

    - Re-weight data samples to give poorly classified samples an increased weight

    Where is the normalization factor

    3. At the end (tth round), the final strong classifier results

  • 7/29/2019 AdaBoost-1

    3/4

    2.2AdaBoost extensionsIt is possible to extend the basic AdaBoost algorithm to obtain better performance. Two major

    extensions include abstention and regularization. As we have seen, in the typical AdaBoost the

    binary weak learner

    :{1,1}, and is therefore forced to give an option for each

    examples . This is not always desirable as the weak learner might have not suited to classifyevery x . The solution of this problem is the abstention base classifier which knows when toabstain and has the form :{1,0,1}. Adaboost might overfit in some cases if it is runlong enough. To solve this problem, the general way is to validate the number of iterations ona validation set. While, regularization introduces an edge offset parameter 0 in theconfidence formula

    1

    2ln 1

    1

    2ln 1

    .

    This formula shows how the confidence is decreased by a constant term every iteration,

    suggesting a mechanism similar to weight decay for reducing the effect of the overfitting.

    2.3Multi-class AdaBoostThe binary AdaBoost is a simple and well understood scenario, but we still need to extend it to

    deal with multi-class problems. AdaBoost.M1 is the most simple and straightforward way,

    proposed by Freund and Schapire. In this approach, the weak learner is a full multi-class

    algorithm itself. The AdaBoost algorithm does not need to be modified in any sense. However,

    this method fails if the weak learner cannot achieve at least 50% accuracy on all classes when run

    on hard problems. In AdaBoost.MH, proposed by Schapire and Singer, the weak learner receives a

    distribution of weights which is on the data and the classes ,. In general, this weight willexpress how hard it is to classify into its correct class (if ,=). Schapire and Singer alsoproposed AdaBoost.MO, which partitions the multi-class problem into a set of binary problems.

    This method can be implemented by use of error correcting output codes (ECOC) decomposition

    (Casagrande 2005).

    3AdaBoost in MIRAdaBoost has been used in a number of MIR problems in recent years. Dixon et al. presented a

    method of genre classification with automatically rhythmic using AdaBoost (Dixon et al. 2004).

    Casagrande described the approach of using multi-class AdaBoost to classify audio files based on

    extracted features (Casagrande 2005). Bergstra et al. presented an algorithm that predicts

    musical genre and artist from an audio waveform, using ADABOOST to select from a set of audio

    features (Bergstra et al. 2006). Ecket al. proposed a method for predicting the social tags formusic recommendation directly from MP3 files using Adaboost (Eck et al. 2007). Bertin-Mahieux

    et al. extended the work of Eck et al. by replacing the AdaBoost batch learning algorithm with the

    FilterBoost, an online version of AdaBoost (Bertin-Mahieux et al. 2008). Overall, AdaBoost has

  • 7/29/2019 AdaBoost-1

    4/4

    been proved to be an effective machine learning algorithm for music classification.

    Reference

    1. De Haan, Gerard. Digital Video Post Processing. Eindhoven, 2010.

    2. Bishop, Christopher M. Pattern Recognition and Machine Learning.

    Springer-Verlag New York, Inc., 2006.

    3. Bergstra, J., N. Casagrande, D. Erhan, D. Eck, and B. Kgl. Aggregate

    Features and AdaBoost for Music Classification. Machine Learning 65,

    no. 2 (2006): 473-84.

    4. Bertin-Mahieux, T., D. Eck, F. Maillet, and P. Lamere. "Autotagger: A

    Model for Predicting Social Tags from Acoustic Features on Large

    Music Databases." Journal of New Music Research 37, no. 2 (2008):

    115-35.5. Casagrande, Norman. 2005. Automatic music classification using

    boosting algorithms and auditory features. Computer and operational

    research Department University of Montreal Montreal PhD Thesis.

    6. Dixon, S., F. Gouyon, and G. Widmer. 2004. "Towards characterization

    of music via rhythmic patterns." In Proceedings of the 5th International

    Conference on Music Information Retrieval (ISMIR) 509-516.

    7. Eck, D., P. Lamere, T. Bertin-Mahieux, and S. Green. "Automatic

    Generation of Social Tags for Music Recommendation."Advances in

    neural information processing systems 20, no. 20 (2007): 1-8.