seminar aus data mining und maschinellem lernen | seminar | extreme classification | 3 seminar aus...
TRANSCRIPT
2016-04-12 | Seminar | Extreme Classification | 1
Seminar aus Data Mining und Maschinellem Lernen
Extreme Classification
2016-04-12 | Seminar | Extreme Classification | 2
Seminar aus Data Mining und Maschinellem Lernen Time and place - Wednesdays, 17:10 - 18:50, Room E202 First presentation: probably in three weeks Each Wednesday two talks
2016-04-12 | Seminar | Extreme Classification | 3
Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute talk on the
material they are assigned, followed by 15 minutes of questions.
The talk and the slides are allowed to be both English or German, but we strongly encourage the students to give the talk in English.
It is expected of the students to participate in the discussions. Important! The content of the talk should exceed the scope of
the paper, and demonstrate that a thorough understanding of the material was achieved.
Follow the guidelines on the Seminar site and at https://www.ke.tu-darmstadt.de/lehre/arbeiten/giving-a-talk-at-a-ke-seminar-1
2016-04-12 | Seminar | Extreme Classification | 4
Extreme Classification
Hot topic in the last one or two years Roughly: all types of classification problems where the target
space, i.e. categories/classes/labels is large in practice often multilabel classification problems the assignment of several classes instead of only one class basic tutorial at https://www.ke.tu-darmstadt.de/staff/eneldo
2016-04-12 | Seminar | Extreme Classification | 5
Image annotation
scene dataset consists of 2407 images assigned to 6 labels
{Fall foliage, Field} {Beach, Urban}
Matthew R. BOUTELL, Jiebo LUO, Xipeng SHEN, C. M. Christopher M. BROWN: LearningMulti-Label Scene Classification. In: Pattern Recognition, vol. 37 (9): pp. 1757–1771,2004.
2016-04-12 | Seminar | Extreme Classification | 7
EUR-Lex repository
19328 (freely accessible) documents of the Directory of Community legislation in force of the European Union documents available in several European languages
multiple classifications of the same documents most challenging one: EUROVOC descriptors associated to
each document 3965 descriptors, on average 5.37 labels per document descriptors are organized in a hierarchy with up to 7 levels
2016-04-12 | Seminar | Extreme Classification | 9
Formal definition
Given input: a set of training objects x
1, …, x
m , x
i vectors in Ra
a set of label mappings y1, …, y
m, each a subset of Y={λ1, … , λn}
Objective: find a function h: Ra → Y which maps x
i to y
i
as accurately as possible, as efficiently as possible
i x1
x2
x3
... xa
y
1 A 1 0 ... 0.1 {λ1,λn}
2 B 2 1 ... 0.3 {λ2}
3 C 3 0 ... 0.5 {}
4 D 4 1 ... 0.6 {λ1}
...
2016-04-12 | Seminar | Extreme Classification | 10
Formal definition
Alternative view: a set of training objects x
1, …, x
m , x
i vectors in Ra
a number of n binary Target variables yi={0,1}
Objective: find a function h: Ra → Y = {0,1}n which maps x
i to a binary vector
as accurately as possible, as efficiently as possible
i x1
x2
x3 ... x
ay1
y2 ... y
n
1 A 1 0 ... 0.1 1 0 ... 1
2 B 2 1 ... 0.3 0 1 ... 0
3 C 3 0 ... 0.5 0 0 ... 0
4 D 4 1 ... 0.6 1 0 ... 0
...
i x1
x2
x3
... xa
y
1 A 1 0 ... 0.1 {λ1,λn}
2 B 2 1 ... 0.3 {λ2}
3 C 3 0 ... 0.5 {}
4 D 4 1 ... 0.6 {λ1}
...
2016-04-12 | Seminar | Extreme Classification | 11
Extreme ClassificationTopicsApproaches can be roughly classified into Problem Transformation Approaches Binary Relevance & simplifications Hashing
Decision Trees Landmark-based label selection Label Space Transformations Neural Networks and Embeddings Topic Models and Generative Approaches
2016-04-12 | Seminar | Extreme Classification | 12
Extreme ClassificationIntro and Basics
1 Arturo Montejo Ráez, Luís Alfonso Ureña López, Ralf Steinberger. Adaptive Selection of Base Classifiers in One-Against-All Learning for Large Multi-labeled Collections I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text
classification for automated tag suggestion
2 G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels S. Bengio, J. Weston, and D. Grangier. Label embedding trees for
large multi-class task
2016-04-12 | Seminar | Extreme Classification | 13
Extreme ClassificationIntro and Basics
3 Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, Alex Strehl, Vishy Vishwanathan. Hash Kernels. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009
4 Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola. Feature Hashing for Large Scale Multitask Learning. ICML, 2009
2016-04-12 | Seminar | Extreme Classification | 14
Extreme ClassificationIntro and Basics
5 D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing , in NIPS, 2009.
6 F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation , in Neural Computation, 2012.
2016-04-12 | Seminar | Extreme Classification | 15
Extreme ClassificationBasics/Decision Trees
7 S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.
8 C Vens, J Struyf, L Schietgat, S Džeroski, H Blockeel. Decision trees for hierarchical multi-label classification, Machine Learning, 2008
9 R. Agrawal, A. Gupta , Y. Prabhu, and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW, 2013.
10 Y. Prabhu, and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD, 2014.
2016-04-12 | Seminar | Extreme Classification | 16
Extreme ClassificationLabel Space Transformations
11 Wei Bi , James Tin-Yau Kwok : Multi-Label Classification on Tree- and DAG-Structured Hierarchies. In: Proceedings of the 28th International Conference on Machine Learning, 2011.
12 Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NIPS, 2012.
13 H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML, 2014.
14 Z. Lin, G. Ding, M. Hu, and J. Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding , in ICML, 2014.
15 M. Cisse, N. Usunier, T. Artieres, and P. Gallinari, Robust Bloom Filters for Large Multilabel Classification Tasks , in NIPS, 2013.
2016-04-12 | Seminar | Extreme Classification | 17
Extreme ClassificationNeural Networks and Embeddings
16 J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.
17 Jinseok Nam, Eneldo Loza Mencía and Johannes Fürnkranz, All-in Text: Learning Document, Label, and Word Representations Jointly, in: Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016
18 K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NIPS, 2015.
19 P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.
20 N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.
2016-04-12 | Seminar | Extreme Classification | 18
Extreme ClassificationLandmark-based label selection
21 K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction , ICML, 2012.
22 W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.
2016-04-12 | Seminar | Extreme Classification | 19
Extreme ClassificationBR/ Topic Models
23 B. Hariharan, S. Vishwanathan, and M. Varma, Efficient max-margin multi-label classification with applications to zero-shot learning, in Machine Learning Journal, 2012.
24 Timothy N. Rubin,·America Chambers, Padhraic Smyth, Mark Steyvers. Statistical topic models for multi-label document classification, Machine Learning, 2011
25 Piyush Rai, Changwei Hu, Ricardo Henao, Lawrence Carin Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings , NIPS; 2015