learning algorithms using chance-constrained programs saketha/  · pdf file learning...

Click here to load reader

Post on 02-May-2020

3 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Learning Algorithms using Chance-Constrained Programs

    A Thesis

    Submitted For the Degree of

    Doctor of Philosophy

    in the Faculty of Engineering

    by

    Saketha Nath Jagarlapudi

    Computer Science and Automation

    Indian Institute of Science

    BANGALORE – 560 012

    DECEMBER 2007

  • Acknowledgements

    I would like to express sincere gratitude and thanks to my adviser, Dr. Chiranjib Bhat-

    tacharyya. With his interesting thoughts and ideas, inspiring ideals and friendly nature,

    he made sure I was filled with enthusiasm and interest to do research all through my

    PhD. He was always approachable and spent ample time with me and all my lab mem-

    bers for discussions, though he had a busy schedule. I also thank Prof. M. N. Murty,

    Dr. Samy Bengio (Google Labs, USA) and Prof. Aharon Ben-Tal (Technion, Israel) for

    their help and co-operation.

    I am greatly in debt to my parents, wife and other family members for supporting

    and encouraging me all through the PhD years. I thank all my lab members and friends,

    especially Karthik Raman, Sourangshu, Rashmin, Krishnan and Sivaramakrishnan, for

    their useful discussions and comments.

    I thank the Department of Science and Technology, India, for supporting me finan-

    cially during the PhD work. I would also like to take this opportunity to thank all the

    people who directly and indirectly helped in finishing my thesis.

    i

  • Publications based on this Thesis

    1. J. Saketha Nath, Chiranjib Bhattacharyya and M. Narasimha Murty. Clustering

    Based Large Margin Classification: A Scalable Approach using SOCP Formulation.

    Proceedings of the 12th ACM SIGKDD International Conference on Knowledge

    Discovery and Data Mining, 2006.

    2. J. Saketha Nath and Chiranjib Bhattacharyya. Maximum Margin Classifiers with

    Specified False Positive and False Negative Error Rates. Proceedings of the SIAM

    International Conference on Data mining, 2007.

    3. Rashmin B, J. Saketha Nath, Krishnan S, Sivaramakrishnan, Chiranjib Bhat-

    tacharyya and M N Murty. Focused Crawling with Scalable Ordinal Regression

    Solvers. Proceedings of the International Conference on Machine Learning, 2007.

    4. J. Saketha Nath, Aharon Ben-Tal and Chiranjib Bhattacharyya. Robust Classifi-

    cation for Large Datasets. Submitted to Journal of Machine Learning Research.

    ii

  • Abstract

    This thesis explores Chance-Constrained Programming (CCP) in the context of learning.

    It is shown that chance-constraint approaches lead to improved algorithms for three

    important learning problems — classification with specified error rates, large dataset

    classification and Ordinal Regression (OR). Using moments of training data, the CCPs

    are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for

    solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization

    theory, the proposed formulations are made robust to moment estimation errors.

    A maximum margin classifier with specified false positive and false negative rates is

    derived. The key idea is to employ chance-constraints for each class which imply that

    the actual misclassification rates do not exceed the specified. The formulation is applied

    to the case of biased classification.

    The problems of large dataset classification and ordinal regression are addressed by

    deriving formulations which employ chance-constraints for clusters in training data rather

    than constraints for each data point. Since the number of clusters can be substantially

    smaller than the number of data points, the resulting formulation size and number of

    inequalities are very small. Hence the formulations scale well to large datasets.

    The scalable classification and OR formulations are extended to feature spaces and

    the kernelized duals turn out to be instances of SOCPs with a single cone constraint.

    Exploiting this specialty, fast iterative solvers which outperform generic SOCP solvers,

    are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a

    speed up as high as 10000 times, when the specialized SOCP solvers are employed.

    The proposed formulations involve second order moments of data and hence are

    iii

  • iv

    susceptible to moment estimation errors. A generic way of making the formulations

    robust to such estimation errors is illustrated. Two novel confidence sets for moments

    are derived and it is shown that when either of the confidence sets are employed, the

    robust formulations also yield SOCPs.

  • Contents

    Acknowledgements i

    Publications based on this Thesis ii

    Abstract iii

    Notation and Abbreviations ix

    1 Introduction 1 1.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Thesis Road-Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Classification with Specified Error Rates 8 2.1 Past Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 New Classification Formulation . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Dual and its Iterative Solver . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.3.1 Iterative Algorithm for Solving Dual . . . . . . . . . . . . . . . . 19 2.4 Non-linear Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3 Scalable Maximum Margin Classification 29 3.1 Past Work on Large Dataset Classification . . . . . . . . . . . . . . . . . 31 3.2 Scalable Classification Formulation . . . . . . . . . . . . . . . . . . . . . 32 3.3 Dual and Geometrical Interpretation . . . . . . . . . . . . . . . . . . . . 36 3.4 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4 Large-Scale Ordinal Regression and Focused Crawling 43 4.1 Past Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.1.1 Ordinal Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Focused Crawling . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.2 Large-Scale OR Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Feature Space Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Focused Crawling as an OR problem . . . . . . . . . . . . . . . . . . . . 51

    v

  • CONTENTS vi

    4.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5.1 Scalability of New OR Scheme . . . . . . . . . . . . . . . . . . . . 53 4.5.2 Performance of Focused Crawler . . . . . . . . . . . . . . . . . . . 54

    4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    5 Fast Solver for Scalable Classification and OR Formulations 57 5.1 Large-Scale OR Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    6 Robustness to Moment Estimation Errors 64 6.1 Robust Classifiers for Large Datasets . . . . . . . . . . . . . . . . . . . . 65

    6.1.1 Separate Confidence Sets for Moments . . . . . . . . . . . . . . . 66 6.1.2 Joint Confidence Sets for Moments . . . . . . . . . . . . . . . . . 68

    6.2 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2.1 Comparison of the Cone Constraints . . . . . . . . . . . . . . . . 73 6.2.2 Comparison of Original and Robust Formulations . . . . . . . . . 76

    6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    7 Conclusions 79

    A Casting Chance-Constraint as Cone-Constraint 82

    B Fast Solver for Scalable Classification Formulation 84

    C Dual of Large-Scale OR Formulation 87

    Bibliography 89

  • List of Tables

    2.1 Performance of the novel biased classifiers . . . . . . . . . . . . . . . . . 25 2.2 Comparison of error with biased classifiers and SVM . . . . . . . . . . . 26 2.3 Comparison of risk with biased classifiers and SVM . . . . . . . . . . . . 26

    3.1 Performance of scalable classifier and SVM-Perf . . . . . . . . . . . . . . 40

    4.1 Comparison of novel OR scheme and baseline . . . . . . . . . . . . . . . 54 4.2 Datasets used for focused crawling . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Harvest rate comparison with OR-based and baseline crawlers . . . . . . 55

    5.1 Training times with specialized solver and baseline . . . . . . . . . . . . . 62 5.2 Scaling experiments with specialized solver and baseline . . . . . . . . . . 62

    6.1 Comparison of original and robust cone constraints . . . . . . . . . . . . 75 6.2 Performance of robust constraints with various distributions . . . . . . . 76 6.3 Comparison of original and robust classification schemes . . . . . . . . . 77

    vii

  • List of Figures

    2.1 Geometric interpretation of biased classifier constraints . . . . . . . . . . 14 2.2 Geometric interpretation of biased classifier . . . . . . . . . . . . . . . . 15 2.3 Comparison of biased classifier a

View more