learning algorithms using chance-constrained programs saketha/ · pdf file learning...
Post on 02-May-2020
3 views
Embed Size (px)
TRANSCRIPT
Learning Algorithms using Chance-Constrained Programs
A Thesis
Submitted For the Degree of
Doctor of Philosophy
in the Faculty of Engineering
by
Saketha Nath Jagarlapudi
Computer Science and Automation
Indian Institute of Science
BANGALORE – 560 012
DECEMBER 2007
Acknowledgements
I would like to express sincere gratitude and thanks to my adviser, Dr. Chiranjib Bhat-
tacharyya. With his interesting thoughts and ideas, inspiring ideals and friendly nature,
he made sure I was filled with enthusiasm and interest to do research all through my
PhD. He was always approachable and spent ample time with me and all my lab mem-
bers for discussions, though he had a busy schedule. I also thank Prof. M. N. Murty,
Dr. Samy Bengio (Google Labs, USA) and Prof. Aharon Ben-Tal (Technion, Israel) for
their help and co-operation.
I am greatly in debt to my parents, wife and other family members for supporting
and encouraging me all through the PhD years. I thank all my lab members and friends,
especially Karthik Raman, Sourangshu, Rashmin, Krishnan and Sivaramakrishnan, for
their useful discussions and comments.
I thank the Department of Science and Technology, India, for supporting me finan-
cially during the PhD work. I would also like to take this opportunity to thank all the
people who directly and indirectly helped in finishing my thesis.
i
Publications based on this Thesis
1. J. Saketha Nath, Chiranjib Bhattacharyya and M. Narasimha Murty. Clustering
Based Large Margin Classification: A Scalable Approach using SOCP Formulation.
Proceedings of the 12th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, 2006.
2. J. Saketha Nath and Chiranjib Bhattacharyya. Maximum Margin Classifiers with
Specified False Positive and False Negative Error Rates. Proceedings of the SIAM
International Conference on Data mining, 2007.
3. Rashmin B, J. Saketha Nath, Krishnan S, Sivaramakrishnan, Chiranjib Bhat-
tacharyya and M N Murty. Focused Crawling with Scalable Ordinal Regression
Solvers. Proceedings of the International Conference on Machine Learning, 2007.
4. J. Saketha Nath, Aharon Ben-Tal and Chiranjib Bhattacharyya. Robust Classifi-
cation for Large Datasets. Submitted to Journal of Machine Learning Research.
ii
Abstract
This thesis explores Chance-Constrained Programming (CCP) in the context of learning.
It is shown that chance-constraint approaches lead to improved algorithms for three
important learning problems — classification with specified error rates, large dataset
classification and Ordinal Regression (OR). Using moments of training data, the CCPs
are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for
solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization
theory, the proposed formulations are made robust to moment estimation errors.
A maximum margin classifier with specified false positive and false negative rates is
derived. The key idea is to employ chance-constraints for each class which imply that
the actual misclassification rates do not exceed the specified. The formulation is applied
to the case of biased classification.
The problems of large dataset classification and ordinal regression are addressed by
deriving formulations which employ chance-constraints for clusters in training data rather
than constraints for each data point. Since the number of clusters can be substantially
smaller than the number of data points, the resulting formulation size and number of
inequalities are very small. Hence the formulations scale well to large datasets.
The scalable classification and OR formulations are extended to feature spaces and
the kernelized duals turn out to be instances of SOCPs with a single cone constraint.
Exploiting this specialty, fast iterative solvers which outperform generic SOCP solvers,
are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a
speed up as high as 10000 times, when the specialized SOCP solvers are employed.
The proposed formulations involve second order moments of data and hence are
iii
iv
susceptible to moment estimation errors. A generic way of making the formulations
robust to such estimation errors is illustrated. Two novel confidence sets for moments
are derived and it is shown that when either of the confidence sets are employed, the
robust formulations also yield SOCPs.
Contents
Acknowledgements i
Publications based on this Thesis ii
Abstract iii
Notation and Abbreviations ix
1 Introduction 1 1.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Thesis Road-Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Classification with Specified Error Rates 8 2.1 Past Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 New Classification Formulation . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Dual and its Iterative Solver . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Iterative Algorithm for Solving Dual . . . . . . . . . . . . . . . . 19 2.4 Non-linear Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Scalable Maximum Margin Classification 29 3.1 Past Work on Large Dataset Classification . . . . . . . . . . . . . . . . . 31 3.2 Scalable Classification Formulation . . . . . . . . . . . . . . . . . . . . . 32 3.3 Dual and Geometrical Interpretation . . . . . . . . . . . . . . . . . . . . 36 3.4 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Large-Scale Ordinal Regression and Focused Crawling 43 4.1 Past Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Ordinal Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.2 Focused Crawling . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Large-Scale OR Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Feature Space Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Focused Crawling as an OR problem . . . . . . . . . . . . . . . . . . . . 51
v
CONTENTS vi
4.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5.1 Scalability of New OR Scheme . . . . . . . . . . . . . . . . . . . . 53 4.5.2 Performance of Focused Crawler . . . . . . . . . . . . . . . . . . . 54
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Fast Solver for Scalable Classification and OR Formulations 57 5.1 Large-Scale OR Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 Robustness to Moment Estimation Errors 64 6.1 Robust Classifiers for Large Datasets . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Separate Confidence Sets for Moments . . . . . . . . . . . . . . . 66 6.1.2 Joint Confidence Sets for Moments . . . . . . . . . . . . . . . . . 68
6.2 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2.1 Comparison of the Cone Constraints . . . . . . . . . . . . . . . . 73 6.2.2 Comparison of Original and Robust Formulations . . . . . . . . . 76
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7 Conclusions 79
A Casting Chance-Constraint as Cone-Constraint 82
B Fast Solver for Scalable Classification Formulation 84
C Dual of Large-Scale OR Formulation 87
Bibliography 89
List of Tables
2.1 Performance of the novel biased classifiers . . . . . . . . . . . . . . . . . 25 2.2 Comparison of error with biased classifiers and SVM . . . . . . . . . . . 26 2.3 Comparison of risk with biased classifiers and SVM . . . . . . . . . . . . 26
3.1 Performance of scalable classifier and SVM-Perf . . . . . . . . . . . . . . 40
4.1 Comparison of novel OR scheme and baseline . . . . . . . . . . . . . . . 54 4.2 Datasets used for focused crawling . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Harvest rate comparison with OR-based and baseline crawlers . . . . . . 55
5.1 Training times with specialized solver and baseline . . . . . . . . . . . . . 62 5.2 Scaling experiments with specialized solver and baseline . . . . . . . . . . 62
6.1 Comparison of original and robust cone constraints . . . . . . . . . . . . 75 6.2 Performance of robust constraints with various distributions . . . . . . . 76 6.3 Comparison of original and robust classification schemes . . . . . . . . . 77
vii
List of Figures
2.1 Geometric interpretation of biased classifier constraints . . . . . . . . . . 14 2.2 Geometric interpretation of biased classifier . . . . . . . . . . . . . . . . 15 2.3 Comparison of biased classifier a