latent space domain transfer between high dimensional overlapping distributions

29
Latent Space Domain Transfer between High Dimensional Overlapping Distributions Sihong Xie† Wei FanJing Peng* Olivier Verscheure‡ Jiangtao Ren† †Sun Yat-Sen University ‡IBM T. J. Watson Research Center *Montclair State University Main Challenge: 1. Transfer learning 2. High Dimensional (4000 features) 3. Overlapping (<80% features are the same) 4. Solution with performance bounds

Upload: amable

Post on 17-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Sihong Xie† Wei Fan ‡ Jing Peng* Olivier Verscheure‡ Jiangtao Ren† †Sun Yat-Sen University ‡IBM T. J. Watson Research Center *Montclair State University. Latent Space Domain Transfer between High Dimensional Overlapping Distributions. Main Challenge: Transfer learning - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Sihong Xie† Wei Fan‡ Jing Peng* Olivier Verscheure‡ Jiangtao Ren††Sun Yat-Sen University‡IBM T. J. Watson Research Center*Montclair State University

Main Challenge:1. Transfer learning2. High Dimensional (4000 features)3. Overlapping (<80% features are the same)4. Solution with performance bounds

Page 2: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Standard Supervised Learning

New York Times

training (labeled)

test (unlabeled)

Classifier 85.5%

New York Times

Page 3: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

In Reality……

New York Times

training (labeled)

test (unlabeled)

Classifier 64.1%

New York Times

Labeled data not available!Reuters

Page 4: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Domain Difference Performance Droptrain test

NYT NYT

New York Times New York Times

Classifier 85.5%

Reuters NYT

Reuters New York Times

Classifier 64.1%

ideal setting

realistic setting

Page 5: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

High Dimensional Data Transfer High Dimensional Data:

Text Categorization Image Classification

The number of features in our experiment is more than 4000

Challenges: High dimensionality.

more than training examples Euclidean distance becomes similar

Feature sets completely overlapping?No. Some less than 80% features are the same.Marginally not so related?Harder to find transferable structuresProper similarity definition.

Page 6: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Transfer between high dimensional overlapping distributions

• Overlapping Distribution

A ? 1 0.2 +1

Data from two domains may not be lying on exactly the same space, but at most an overlapping one.

B 0.09 ? 0.1 +1

C 0.01 ? 0.3 -1

x y z label

Page 7: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Problems with overlapping distribution Using only the overlapping features may be lack of

predictive information

Transfer between high dimensional overlapping distributions

A ? 1 0.2 +1

B 0.09 ? 0.1 +1

C 0.01 ? 0.3 -1

f1 f2 f3 labelHard to predict correctly

Page 8: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Overlapping Distribution Use the union of all features and fill in the missing

value “zeros”?

Transfer between high dimensional overlapping distributions

A 0 1 0.2 +1

B 0.09 0 0.1 +1

C 0.01 0 0.3 -1

f1 f2 f3 label

Does it helps?

D2 { A, B} = 0.0181

>

D2 {A, C} = 0.0101

A is mis-classified as the same

class as C, insteadof B

Page 9: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Transfer between high dimensional overlapping distributions

When one uses the union of the overlapping and non-overlapping features and leave the missing values as “zero”, the distance of two marginal distributions p(x) can

become asymptotically very large as a function of non-overlapping features:

becomes a dominant factor in similarity measure.

Page 10: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

High dimensionality can underpin important features

Transfer between high dimensional overlapping distributions

The “blues” are closer to the “green” than to

the “red”

Page 11: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

LatentMap: two step correction

Missing value regression Brings marginal distribution closer

Latent space dimensionality reduction Further brings marginal distribution closer Ignores non-important noisy and “error imported

features” Identify transferable substructures across two

domains.

Page 12: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Filling up missing values (recall the previous example)

Missing Value Regression

1. Project to overlapped feature

2. Map from z to xRelationship

found byregression

model

D { img(A’), B} = 0.0109

<

D {img(A’), C} = 0.0125

A is correctlyclassified

as the same class as B

Page 13: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

out-domainword vectors

in-domainword vectors

X

Dimensionality Reduction

Missing Values Filled

Overlapping Features

Missing Values

Word vector Matrix

Page 14: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Dimensionality Reduction

• Project the word vector matrix to the most important and inherent sub-space

=

d×t

XVk

UT

Σ-1

Low dimensional representatio

n

Page 15: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Solution (high dimensionality)

The blues are closer to the reds than to the greens

recall the previous example

The blues are closer to the greens than to the

reds

Page 16: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Properties It can bring the marginal distributions of two

domain close.- Marginal distributions are brought close in high-dimensional space (section 3.2)- Two marginal distributions are further minimized in low dimensional space. (theorem 3.2)

It bring two domains conditional distributions close.- Nearby instances from two domains have similar

conditional distribution (section 3.3)

It can reduce domain transfer risk- The risk of nearest neighbor classifier can be bounded in transfer learning settings. (theorem 3.3)

Page 17: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Experiment (I)

Data Sets 20 News Groups

20000 newsgroup articles SRAA (simulated real auto aviation)

73128 articles from 4 discussion groups Reuters

21758 Reuters news articles Baseline methods

naïve Bayes, logistic regression, SVM Knn-Reg: missing value filled without SVD pLatentMap: SVD but missing value as 0

Try to justify the two steps in our framework

First fill up the “GAP”, then useknn classifier to do classification

20 News groups

comp

comp.sys

comp.graphics

rec

rec.sport

rec.auto

Out-Domain

In-Domain

Page 18: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Learning Tasks

Page 19: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Experiment (II)10 win1 lossOverall performance

Page 20: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Experiment (III)

knnReg: Missing values filled but without SVD

Compared with knnReg8 win3 loss

pLatentMap: SVD but without filling missing values

Compared with pLatentMap8 win3 loss

Page 21: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Conclusion Problem: High dimensional overlapping domain

transfer -– text and image categorization

Step 1: Missing values filling up

--- Bring two domains’ marginal distributions closer

Step 2: SVD dimension reduction

--- Further bring two marginal distributions closer (Theorem 3.2)

--- Cluster points from two domains, making conditional distribution transferable. (Theorem 3.3)

Code and data available from the author’s webpage

Page 22: Latent Space Domain Transfer between High Dimensional Overlapping Distributions
Page 23: Latent Space Domain Transfer between High Dimensional Overlapping Distributions
Page 24: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

=

d×t d×d d×t

t×t

X V Σ U

Solution (high dimensionality)

• Illustration of SVD

The most important and inherent information is in eigen-vectors corresponding to the top k eigen-values.

Top k singular-values

Top k singular vectors

So We can ….

Page 25: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Analysis (I)

SVR (support vector regression) minimizes the distance between two domains’ marginal distributions

Minimized by SVR

Brings the marginal distributions closeIn original space

Upper bound of distance between 2 domains’ points on overlapping features

Page 26: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Analysis (II)

SVD also clusters data such that nearby data have similar concept

Min

∝SVD achieve the

optimum solution

Objective function of k-

means

Page 27: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Analysis (III)

• SVD (singular value decomposition) bounds the distance of two marginal distributions (Theorem 3.2)

=

d×t

XVk

UT

Σ-1

Vk =XT

||T||2 =

Where >1

So the two marginal distributions are brought closer

Page 28: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Analysis (IV)

Bound the risk (R) of Nearest Neighbor classifier under transfer learning settings (Theorem 3.3)

Cluster data such that nearest neighbors have

similar conditional distribution

•The larger the distance between two conditional distributions, the higher the bound will be•Justify why we use SVD R -cov(r1, r2)∝

Where ri related with conditional distribution

↓↑

Page 29: Latent Space Domain Transfer between High Dimensional Overlapping Distributions

Experiment (IV)

Parameter sensitivity

Number of neighbors to retrieve

Number of the dimension of latent space