learning data representations with “partial supervision” ariadna quattoni
Post on 20-Dec-2015
217 views
TRANSCRIPT
![Page 1: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/1.jpg)
Learning Data Representations with “Partial Supervision”
Ariadna Quattoni
![Page 2: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/2.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 3: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/3.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 4: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/4.jpg)
Semi-Supervised Learning
)},(),...,,(),,{( 2211 uu cxcxcxU
{ 1,1}Y
:F X Y
“Raw” Feature Space
Output Space
Core Task:Learn a function from X to Y
)},()...,,(),,{( 2211 nn yxyxyxT
},...,,{ 21 uxxxU
Labeled Dataset (Small)
Unlabeled Dataset (Large)
Partially Labeled Dataset (Large)
dRX
Classical Setting
Partial Supervision Setting
![Page 5: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/5.jpg)
Semi-Supervised LearningClassical Setting
': XXG YXGF )(:
Unlabeled Dataset
Learn Representation
Labeled Dataset
TrainClassifier
hRX '
dh Dimensiona
lity Reduction
xxG )(dhR
![Page 6: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/6.jpg)
Semi-Supervised LearningPartial Supervision Setting
': XXG YXGF )(:
Unlabeled Dataset
+Partial
Supervision
Learn Representation
Labeled Dataset
TrainClassifier
hRX '
dh Dimensiona
lity Reduction
xxG )(dhR
![Page 7: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/7.jpg)
Why is “learning representations” useful?
Infer the intrinsic dimensionality of the data.
Learn the “relevant” dimensions.
Infer the hidden structure.
![Page 8: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/8.jpg)
Example: Hidden Structure
},...,,{ 2021 sssS
},..,{},,..,{ 107625211 sssTsssT
},..,{},,..,{ 20171641512113 sssTsssT
20 Symbols
4 Topics
]0,....,31,3
1,..,0,31[ 20111021 xxxxxx
},,{ 11101 sssSubset of 3 symbols
Generate a datapoint:
Choose a topic T. Sample 3 symbols from T.
Data Covariance Matrix
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
Data Covariance
![Page 9: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/9.jpg)
Example: Hidden Structure
xxG )(
Number of latent dimensions = 4 Map each x to the topic that generated it Function:
1111
1 0000
0
0000
0
0000
0
0000
0
0000
0
0000
0
1111
1
1111
1
1111
1 0000
0
0000
0
0000
0
0000
0
0000
0
0000
0 0001
31
3103
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Projection Matrix
Topic Vector
DataPoint
Latent Representation
1111 0000 0 0000 0 0000 0
0000 0 0000 0
0000 0
1111 1
1111 1
1111 10000 0
0000 0
0000 0 0000 0
0000 00000 0
1
![Page 10: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/10.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 11: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/11.jpg)
Classical SettingPrincipal Components Analysis
0,0,0,31z
4
1
'j
jzx jr
0,31,0,0z
Rows of theta as a ‘basis’:
Example generated by:
T1 T2
T3 T4
0,0,31,0z
31,0,0,0z
Low Reconstruction Error:22
2
3
1
3
1´' xx
![Page 12: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/12.jpg)
Minimum Error Formulation
d
miii
m
j
inn ubzx
11
'´iu
2
1
'1
U
unn xx
UJ
0jTi uu
Error:
Orthonormal basis
Solution
Data covariance
Distorsion
Approximate high dimensional x with low dimensional x‘
d
mi
TiJ
1iSuu
ii uSu i
d
miiJ
1
![Page 13: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/13.jpg)
Principal Component Analysis2D Example
Projection Error
Uncorrelated variables
ii eu and )var( ii x
Cut dimensions according to their variance. Variables must be correlated.
![Page 14: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/14.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 15: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/15.jpg)
Unlabeled Dataset
+Partial
SupervisionCreate
AuxiliaryTasks
StructureLearning
Partial Supervision Setting[Ando & Zhang JMLR 2005]
': XXG
![Page 16: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/16.jpg)
Partial Supervision Setting
Unlabeled data + partial supervision: Images with associated natural language captions. Video sequences with associated speech. Document + keywords
How could the partial supervision help? A hint for discovering important features. Use the partial supervision to define “auxiliary tasks”. Discover feature groupings that are useful for these tasks.
Sometimes ‘auxiliary tasks’ defined from unlabeled data alone.E.g. Auxiliary Task for word tagging predicting substructures-
![Page 17: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/17.jpg)
Auxiliary Tasks:
keywords: machine learning, dimensionality reduction
keywords: linear embedding, spectral methods, distance learning
keywords: object recognition, shape matching, stereo
machine learningpapers
computer vision papers
mask occurrencesof keywords
Auxiliary task: predict object recognition from document contentAuxiliary task: predict object recognition from document content
Core task: Is a vision or machine learning article?
![Page 18: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/18.jpg)
Auxiliary Tasks
otherwisey
ky
yxyxyxD
i
iii
uui
1
1
)},()...,,(),,{( 2211
)},(),...,,(),,{( 2211 uuxxxU
![Page 19: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/19.jpg)
Structure Learning
Learning with no prior knowledge
),(min^
jDfL
}'),({ xwxwfF
Hypothesis learned from examples
Best hypothesis
}'),({ xwxwfF
Learning with prior knowledge
}),({)( xvxvfF t
}'),({ xwxwfF
Learning from auxiliary tasks
}),({)( xvxvfF t Hypothesis learnedfor related tasks
![Page 20: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/20.jpg)
Learning Good Hypothesis Spaces
Class of linear predictors: is an h by d matrix of structural parameters. Goal: Find the parameters and shared that
minimizes the joint loss.
( , ) 'tf v x v x
v
1:
( , , ) ( ) ( )j j jj m
L v D reg v reg
jDLoss on training set
Problem specific parametersShared parameters
Class of linear predictors: is an h by d matrix of structural parameters. Goal: Find the parameters and shared that
minimizes the joint loss.
![Page 21: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/21.jpg)
Algorithm Step 1:
Train classifiers for auxiliary tasks.
* 2arg min ( ( , ), ) || ||2j w i i
i
Cw l f w x y w
![Page 22: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/22.jpg)
Algorithm Step 2:PCA On Classifiers Coefficients
1 2
* * *[ , ,... ]m
w w wW
dhR tWW
by taking the first h eigenvectors
Linear subspace of dimension h; a good low dimensional approximation
to the space of coefficients.
of Covariance Matrix:
![Page 23: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/23.jpg)
Algorithm Step 3: Training on the core task
** vw tcore
( ) 'q x x
* 2arg min ( ( , ( )), ) || ||2v i i
i
Cv l f v q x y v
Project data:
Equivalent to training core task on the original d dimensional space
with parameters constraints:
![Page 24: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/24.jpg)
Example
Object = { letter, letter, letter }
An object
abC
![Page 25: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/25.jpg)
Example
The same object seen in a different font
Abc
![Page 26: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/26.jpg)
Example
The same object seen in a different font
ABc
![Page 27: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/27.jpg)
Example
The same object seen in a different font
abC
![Page 28: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/28.jpg)
Example
acE 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 1
A C EB
6 Letters (topics)6 Letters (topics)5 fonts per letter (symbols)5 fonts per letter (symbols)
auxiliary task: recognize object .
words
“ABC” object “ADE” object
“BCF” words “ABD” words
30 Symbols 30 Features
20 words
![Page 29: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/29.jpg)
PCA on Data can not recoverlantent structure
Covariance Matrix
5 10 15 20 25 30
5
10
15
20
25
30
Covariance DATA
![Page 30: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/30.jpg)
PCA on Coefficients can recover latent structure
Weight Matrix Auxiliary Task
2 4 6 8 10 12 14 16 18 20
5
10
15
20
25
30
Featuresi.e. fonts
Auxiliary Tasks
Topicsi.e Letters
Parameters for object
BCD
W
![Page 31: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/31.jpg)
PCA on Coefficients can recover latent structure
Covariance W
5 10 15 20 25 30
5
10
15
20
25
30
Each Block of Correlated Variables corresponds to a Latent Topic
Covariance WFeaturesi.e. fonts
Featuresi.e. fonts
![Page 32: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/32.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 33: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/33.jpg)
News domain
figure skating ice hockey golden globes
grammys
Dataset: News images from Reuters web-site.Problem: Predicting news topics from images.
![Page 34: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/34.jpg)
Learning visual representations using images with captions
The Italian team celebrate their gold medal win during the flower ceremony after the final round of the men's team pursuit speedskating at Oval Lingotto during the 2006 Winter Olympics.
Former U.S. President Bill Clinton speaks during a joint news conference with Pakistan's Prime Minister Shaukat Aziz at Prime Minister house in Islamabad.
Diana and Marshall Reed leave the funeral of miner David Lewis in Philippi, West Virginia on January 8, 2006. Lewis was one of 12 miners who died in the Sago Mine.
Senior Hamas leader Khaled Meshaal (2nd-R), is surrounded by his bodyguards after a news conference in Cairo February 8, 2006.
Jim Scherr, the US Olympic Committee's chief executive officer seen here in 2004, said his group is watching the growing scandal and keeping informed about the NHL's investigation into Rick Tocchet,
U.S. director Stephen Gaghan and his girlfriend Daniela Unruh arrive on the red carpet for the screening of his film 'Syriana' which runs out of competition at the 56th Berlinale International Film Festival.
Auxiliary task: predict “ team ” from image contentAuxiliary task: predict “ team ” from image content
![Page 35: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/35.jpg)
Learning visual topics
word ‘games’ might contain the visual
topics:medalspeople paveme
nt
Auxiliary tasksshare visual topics
people
word ‘Demonstrations’ might contain the visual
topics:
Different words can share topics.Each topic can be observed under
different appearances.
![Page 36: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/36.jpg)
Experiments Results
![Page 37: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/37.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 38: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/38.jpg)
Chunking
Jane lives in New York and works for Bank of New York. PER LOC ORG
But economists in Europe failed to predict that … NP NP VP SBARPP
• Named entity chunking
• Syntactic chunking
Data points: word occurrences Labels: Begin-PER, Inside-PER, Begin-LOC, …, Outside
![Page 39: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/39.jpg)
Example input vector representation
1curr-“in”
left-“lives”right-“New”
11
… lives in New York …
1curr-“New”
left-“in”
right-“York”
1
1
input vector X• High-dimensional vectors. • Most entries are 0.
![Page 40: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/40.jpg)
1. Create m auxiliary problems. 2. Assign auxiliary labels to unlabeled data. 3. Compute (shared structure) by joint empirical risk minimization over all
the auxiliary problems.
4. Fix , and minimize empirical risk on the labeled data for the target task.
xvxwx TTf ),(
Algorithmic Procedure
Predictor: Additional features
![Page 41: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/41.jpg)
Is the current word “New”? Is the current word “day”? Is the current word “IBM”? Is the current word “computer”? :
Predict 1 from 2 . compute shared add 2 as new features
Example auxiliary problems
Example auxiliary problems
??:?
currentword
left word
right word
1
1
1
2
![Page 42: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/42.jpg)
Experiments (CoNLL-03 named entity)
4 classes: LOC, ORG, PER, MISC Labeled data: News documents.
204K words (English), 206K words (German) Unlabeled data:
27M words (English), 35M words (German) Features: A slight modification of ZJ03.
Words, POS, char types, 4 chars at the beginning/ending in a 5-word window; words in a 3-chunk window; labels assigned to two words on the left, bi-gram of the current word and left label; labels assigned to previous occurrences of the current word.
No gazetteer. No hand-crafted resources.
![Page 43: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/43.jpg)
Auxiliary problems
# of aux. problems
Auxiliary labels Features used for learning auxiliary problems
100010001000
Previous wordsCurrent wordsNext words
All but previous wordsAll but current wordsAll but next words
300 auxiliary problems.
![Page 44: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/44.jpg)
Syntactic chunking results (CoNLL-00)
method description F-measure
supervised baseline 93.60ASO-semi +Unlabeled data 94.39
Co/self oracle +Unlabeled data 93.66KM01 SVM combination 93.91CM03 Perceptron in two layers 93.74ZDJ02 Reg. Winnow 93.57
Exceeds previous best systems.
ZDJ02+ +full parser (ESG) output 94.17
(+0.79%)
![Page 45: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/45.jpg)
Other experiments
POS tagging Text categorization (2 standard corpora)
Confirmed effectiveness on:
![Page 46: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/46.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 47: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/47.jpg)
Notation
Collection of Tasks
},...,,{ 21 mDDDD},...,,{ 21 mDDDD
Joint SparseApproximation
1D2D mD
)},(),....,,{( 11kn
kn
kkk kk
yxyxD
dx }1,1{ y
mddd
m
m
www
www
www
,2,1,
,22,21,2
,12,11,1
W
![Page 48: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/48.jpg)
Single Task Sparse Approximation
xwxf )(
Dyx
d
jjwQyxfl
),( 1
||)),((minargw
Consider learning a single sparse linear classifier of the form:
We want a few features with non-zero coefficients
Recent work suggests to use L1 regularization:
Classification error
L1 penalizes
non-sparse solutions
Donoho [2004] proved (in a regression setting) that the solution with smallest L1 norm is also the sparsest solution.
![Page 49: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/49.jpg)
Joint Sparse Approximation
m
kmk
Dyxk
QyxflD
k121
),(,...,, ),....,,R()),((
||
1minarg www
m21 www
Setting : Joint Sparse Approximation
Average Loss on training set k
penalizes solutions that
utilize too many features
xxf kk w)(
![Page 50: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/50.jpg)
Joint Regularization Penalty
rowszerononW #)R(
mddd
m
m
WWW
WWW
WWW
,2,1,
,22,21,2
,12,11,1
How do we penalize solutions that use too many features?
Coefficients forfor feature 2
Coefficients for classifier 2
Would lead to a hard combinatorial problem .
![Page 51: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/51.jpg)
Joint Regularization Penalty
We will use a L1-∞ norm [Tropp 2006]
d
iik
kWW
1
|)(|max)R(
The combination of the two norms results in a solution where only a few features are used but the features used will contribute in solvingmany classification problems.
This norm combines:
An L1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity.
Use few features
The L∞ norm on each row promotes non-sparsity on the rows.
Share features
![Page 52: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/52.jpg)
Joint Sparse Approximation
m
k
d
iik
kk
Dyxk
WQyxflD
k1 1),(
|)(|max)),((||
1minW
Using the L1-∞ norm we can rewrite our objective function as:
For the hinge loss: the optimization problem can be expressed as a linear
program.
))(1,0max()),(( xyfyxfl
For any convex loss this is a convex objective.
![Page 53: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/53.jpg)
Joint Sparse Approximation
Objective:
m
k
D
j
d
ii
kj
k
k
tQD1
||
1 1],,[ ||
1min tεW
Linear program formulation (hinge loss):
Max value constraints:
mkfor :1:
difor :1:
0kj
iiki twt mkfor :1:
|:|1: kDjfor
kj
kjk
kj xfy 1)(
and
Slack variables constraints:and
![Page 54: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/54.jpg)
The LP formulation is feasible for small problems but becomes intractable for larger data-sets with thousands of examples and dimensions.
We might want a more general optimization algorithm that can handle arbitrary convex losses.
An efficient training algorithm
The LP formulation can be optimized using standard LP solvers.
We developed a simple an efficient global optimization algorithm for training joint models with L1−∞ constraints.
The total cost is in the order of: ))log(( dmdmO
![Page 55: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/55.jpg)
Outline
Motivation: Low dimensional representations. Principal Component Analysis. Structural Learning. Vision Applications. NLP Applications. Joint Sparsity. Vision Applications.
![Page 56: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/56.jpg)
SuperBowl
Danish CartoonsSharon
Australian Open Trapped Miners
Golden globes
Grammys Figure Skating
Academy Awards
Iraq
Learn a representation using labeled data from 9 topics.
Train a classifier for the 10th held out topic using the relevantfeatures R only.
}0|)(|max:{ rkk wrR Define the set of relevant features to be:
Learn the matrix W using our transfer algorithm.
![Page 57: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/57.jpg)
4 20 40 60 80 100 120 1400.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
Average AUC
# training samples
Asymmetric Transfer
Baseline RepresentationTransfered Representation
Results
![Page 58: Learning Data Representations with “Partial Supervision” Ariadna Quattoni](https://reader030.vdocuments.net/reader030/viewer/2022032704/56649d445503460f94a208a8/html5/thumbnails/58.jpg)
Future Directions
Joint Sparsity Regularization to control inference time.
Learning representations for ranking problems.