1 unsupervised and transfer learning challenge can machines transfer knowledge from task to task?...

Unsupervised and Transfer Learning Challenge http://clopinet.com/ul

1

Can Machines Transfer Knowledge from Task to Task?

Isabelle Guyon

Clopinet, California

http://clopinet.com/ul


2

Web platform: Server made available by Prof. Joachim Buhmann, ETH Zurich, Switzerland. Computer admin.: Thomas Fuchs, ETH Zurich. Webmaster: Olivier Guyon, MisterP.net, France. Platform: Causality Wokbench.

Co-orgnizers: • David W. Aha, Naval Research Laboratory, USA.• Gideon Dror, Academic College of Tel-Aviv Yaffo, Israel.• Vincent Lemaire, Orange Research Labs, France.• Graham Taylor, NYU, New-York. USA.• Gavin Cawley, University of east Anglia, UK.• Danny Silver, Acadiau University, Canada.• Vassilis Athitsos, UT Arlington, Texas., USA.

Protocol review and advising:• Olivier Chapelle, Yahoo!, California, USA.• Gerard Rinkus, Brandeis University, USA.• Urs Mueller, Net-Scale Technilogies, USA.• Yoshua Bengio, Universite de Montreal, Canada.• David Grangier, NEC Labs, USA.• Andrew Ng, Stanford Univ., Palo Alto, California, USA.• Yann LeCun, NYU. New-York, USA.• Richard Bowden, University of Surrey, UK.• Philippe Dreuw, Aachen University, Germany.• Ivan Laptev, INRIA, France.• Jitendra Malik, UC Berkeley, USA.• Greg Mori, Simon Fraser University, Canada. • Christian Vogler, ILSP, Athens, Greece

Data donors:Handwriting recognition (AVICENNA) -- Reza Farrahi Moghaddam, Mathias Adankon, Kostyantyn Filonenko, Robert Wisnovsky, and Mohamed Chériet (Ecole de technologie supérieure de Montréal, Quebec) contributed the dataset of Arabic manuscripts. The toy example (ULE) is the MNIST handwritten digit database made available by Yann LeCun and Corinna Costes.

Object recognition (RITA) -- Antonio Torralba, Rob Fergus, and William T. Freeman, collected and made available publicly the 80 million tiny image dataset. Vinod Nair and Geoffrey Hinton collected and made available publicly the CIFAR datasets. See the techreport Learning Multiple Layers of Features from Tiny Images, by Alex Krizhevsky, 2009, for details.

Human action recognition (HARRY) -- Ivan Laptev and Barbara Caputo collected and made publicly available the KTH human action recognition datasets. Marcin Marszałek, Ivan Laptev and Cordelia Schmid collected and made publicly available the Hollywood 2 dataset of human actions and scenes.

Text processing (TERRY) -- David Lewis formatted and made publicly available the RCV1-v2 Text Categorization Test Collection.

Ecology (SYLVESTER) -- Jock A. Blackard, Denis J. Dean, and Charles W. Anderson of the US Forest Service, USA, collected and made available the (Forest cover type) dataset.

CREDITS


3

What is the problem?


4

Can learning about...


5

help us learn about…


6

Can learning about…

publicly available data


7

help us learn about…

Philip and Thomas

Philip

Anna SoleneAnna, Thomas and GM

Omar, Thomas Philip

Martin Bernhard Philip Thomas

personal data


8

Transfer learning

Philip and Thomas

Philip

Anna SoleneAnna, Thomas and GM

Omar, Thomas Philip

Martin Bernhard Philip Thomas

Common data representation


9

How?


10

Vocabulary

Targettask

labels

Sourcetask

labels


11

Vocabulary

Targettask

labels

Sourcetask

labels


12

Vocabulary

Targettask

labels

Sourcetask

labels

Domains the same?

Labels available?

Tasks the same?


13

Taxonomy of transfer learning

Adapted from: A survey on transfer learning, Pan-Yang, 2010.

TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL

No labels in both source and target domains

Labels avail. ONLY in source domain

Labels available in target domain

No labels in source domain

Labels available in source domain

Transductive TL

Cross-task TL

Same source and target task

Different source and target tasks

Self-taught TL

Multi-task TL


14



TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL






Transductive TL

Cross-task TL



Self-taught TL

Multi-task TL


15

Unsupervised transfer learning


16

What can you do with NO labels?

• No learning at all:– Normalization of examples or features– Construction of features (e.g. products)– Generic data transformations (e.g. taking the log, Fourier

transform, smoothing, etc.)

• Unsupervised learning:– Manifold learning to reduce dimension (and/or

orthogonalize features)– Sparse coding to expand dimension– Clustering to construct features– Generative models and latent variable models


17


P RSourcedomain

1)


18


P

1)


19


P

1)

PTargetdomain

2)

Task labelsC John


20


PTargetdomain C Emily


21

Manifold learning

• PCA

• ICA

• Kernel PCA

• Kohonen maps

• Auto-encoders

• MDS, Isomap, LLE, Laplacian Eigenmaps

• Regularized principal manifolds


22

Deep Learning

• Deep Belief Networks (stacks of Restricted Boltzmann machines)

• Stacks of auto-encoders

Greedy layer-wise unsupervised pre-training of multi-layer neural networks and Bayesian networks, including:

preprocessor

reconstructor


23

Clustering

• K-means and variants w. cluster overlap (Gaussian mixtures, fuzzy C-means)

• Hierarchical clustering

• Graph partitioning

• Spectral clustering


24

Example: K-means

Clusters of ULE valid after 5 it.

• Start with random cluster centers.

• Iterate:

o Assign the examples to their closest center to form clusters.

o Re-compute the centers by averaging the cluster members.

• Create features, e.g.

fk= exp – ||x-xk||


25

Results on ULE: do better!

Raw data: 784 features K-means: 20 features

Current best: AUC=1, ALC=0.96

ALC=0.79 ALC=0.84AU

C

log2(num. tr. ex.)

AU

C

log2(num. tr. ex.)


26

Unsupervised learning(resources)

• Unsupervised Learning. Z. Ghahramani. http://www.gatsby.ucl.ac.uk/~zoubin/course04/ul.pdf

• Nonlinear dimensionality reduction. http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction

• Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. Y. Bengio et al. http://books.nips.cc/papers/files/nips16/NIPS2003_AA23.pdf

• Data Clustering: A Review. Jain et al. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.2720 • Why Does Unsupervised Pre-training Help DL? D. Erhan et al.

http://jmlr.csail.mit.edu/papers/volume11/erhan10a/erhan10a.pdf • Efficient sparse coding algorithms. H. Lee et al.

http://www.eecs.umich.edu/~honglak/nips06-sparsecoding.pdf


27



TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL






Transductive TL

Cross-task TL



Self-taught TL

Multi-task TL


28

Cross-task transfer learning


29

How can you do it?

• Data representation learning:– Deep neural networks– Deep belief networks(re-use the internal representation created by the

hidden units and/or output units)

• Similarity or kernel learning:– Siamese neural networks– Graph-theoretic methods


30

Data representation learning

Source task labelsP CSource

domainSea

1)


31


P

1)


32


P

1)

Target task labelsP CTarget

domainJohn

2)


33

P CTargetdomain

Emily



34

Kernel learning

P

SSourcedomain

P

Sourcetask labels

same ordifferent

1)


35

Kernel learning

P

1)


36

Kernel learning

P

1)

Target task labelsP CTarget

domainJohn

2)


37

P CTargetdomain

Emily

Kernel learning


38

Cool results in cross-task transfer learning

NLP (almost) from scratch. Collobert et al. 2011, submitted to JMLR

Source task Target tasks

pos=Part-Of-Speech tagging chunk=Chunkingner=Named Entity Recognitionsrl=Semantic Role Labeling

Genuine or not


39

Cross-task transfer (resources)

• A Survey on Transfer Learning. Pan and Yang. http://www1.i2r.a-star.edu.sg/~jspan/publications/TLsurvey_0822.pdf

• Distance metric learning: A comprehensive survey. Yang-Jin. http://citeseerx.ist.psu.edu/viewdoc/summary?

doi=10.1.1.91.4732 • Signature Verification using a "Siamese" Time Delay Neural Network.

Bromley et al. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.4792 • Learning the kernel matrix with semi-definite

programming, Lanckriet et al. http://jmlr.csail.mit.edu/papers/volume5/lanckriet04a/lanckriet04a.pdf

• NLP (almost) from scratch. Collobert et al. 2011, http://leon.bottou.org/morefiles/nlp.pdf.


40



TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL






Transductive TL

Cross-task TL



Self-taught TL

Multi-task TL


41

Multi-task learning


42

Multi-task learning

Source task labels

P C

Sourcedomain

Sea

Target task

labelsTargetdomain

John


43

Multi-task learning

P CTargetdomain Emily


44

Cool results in multi-task learning

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model, Salakhutdinov-Tenenbaum-Torralba, 2010


45



TransferLearning

Unsupervised TL

Semi-supervised TL

Inductive TL






Transductive TL

Cross-task TL



Self-taught TL

Multi-task TL


46

Self-taught learning


47


P C

Sourcedomain

Target task

labelsTargetdomain

John


48


P CTargetdomain Emily


49

Cool results in self-taught learning

Source task Target task

Unsupervised

Semi-supervised

Multi-task

Self-taughtSelf-taught learning. R. Raina et al. 2007


50

Inductive transfer learning (resources)

• Multitask learning. R. Caruana. http://www.cs.cornell.edu/~caruana/mlj97.pdf

• Learning deep architectures for AI. Y. Bengio. http://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf

• Transfer Learning Techniques for Deep Neural Nets. S. M. Gutstein thesis. http://robust.cs.utep.edu/~gutstein/sg_home_files/thesis.pdf

• One-Shot Learning with a Hierarchical Nonparametric Bayesian Model. R. Salakhutdinov et al. http://dspace.mit.edu/bitstream/handle/1721.1/60025/MIT-CSAIL-TR-2010-052.pdf?sequence=1

• Self-taught learning. R. Raina et al. http://www.stanford.edu/~rajatr/papers/icml07_SelfTaughtLearning.pdf


51

Dec 2010-April 2011

http://clopinet.com/ul •Goal: Learning data representations or kernels.•Phase 1: Unsupervised learning (until Feb. 28)•Phase 2: Cross-task transfer learning (from Mar. 1)•Prizes: $6000 + free registrations + travel awards• Dissemination: Workshops at ICML and IJCNN; proc. in

JMLR W&CP.

Evaluators Challenge target task

labels

Challengedata

Validationdata

Development data

Validation target task

labels

Sourcetask

labels

Competitors

Data represen-tations


52

July 2011, ICML - Dec 2011, NIPS

http://clopinet.com/tl

Multi-task learning setting:

- Synthetic, Real-world

- Supervised learning

- Binary classification problems.

- 5-10 secondary tasks, 1 primary

-Impoverished primary task data in

development set

-Diversity of tasks with varying degree of

relatedness to primary taskTarget task challenge

labels

Challenge data(target only)

Validation data(target only)

Development Data

(source + target data)

Target taskvalidation

labels

Alltask

labels

Competitors

Predic-tions


53

STEP 1: Develop a “generic” sign language recognition system that can learn new signs with a few examples.

STEP 2: At conference: teach the system new signs.

STEP 3: Live evaluation in front of audience.

June 2011-June. 2012

http://clopinet.com/gs (in preparation)

Challenge


54

Conclusion

• Transfer learning algorithms offer solutions to problems in which– a lot of training samples are available for a

source task,

– fewer training samples are available for a similar but different target task.

• We stated a program of challenges featuring problems in which transfer learning is applicable.

1 unsupervised and transfer learning challenge can machines transfer knowledge from task to task?...

Documents