semi-supervised and active learningepxing/class/10701-10s/recitation/...active learning •when does...
TRANSCRIPT
![Page 1: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/1.jpg)
Semi-supervised and Active Learning
4/22
Amr
Credit: lecture slides
![Page 2: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/2.jpg)
The big picture
• Semi-supervised Learning
• Active Learning
Learning algorithm
Selective
labeling
Learning algorithm
![Page 3: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/3.jpg)
How?
• There is no free lunch!
• You need to make assumption
• Leverage them to construct an algorithm
• If assumption are correct we can improve
![Page 4: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/4.jpg)
Assumption: Overview
uncertainty samplingquery instances the model
is least confident about
query-by-committee (QBC)use ensembles to rapidlyreduce the version space
self-trainingexpectation-maximization (EM)
propagate confident labelingsamong unlabeled data
co-trainingmulti-view learning
use ensembles with multiple viewsto constrain the version space
both try to attack the same problem: making the most of unlabeled data U
![Page 5: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/5.jpg)
Semi-supervised
• If x and x’ are similar, then they are likely to have the same label
• Algorithm
– Assume generative model
– Cluster and label
– Regularize the classifier using unlabeled data
– Multi-view learning
• Does it help?
![Page 6: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/6.jpg)
Example
![Page 7: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/7.jpg)
Examples: 1-NN, works!
![Page 8: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/8.jpg)
Example: 1-NN, doesn’t work
![Page 9: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/9.jpg)
Can we be more robust?
• So in general how to deal with this problem?
– Generative model
– Regularization
![Page 10: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/10.jpg)
SSL using Mixture Models
• Use all data not one at a time!
![Page 11: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/11.jpg)
SSL using Mixture Models
![Page 12: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/12.jpg)
SSL using Mixture Models
• Inference and learning
– This was your midterm problem!
– You know more than you think you do!
• Is this robust to noise?
– At least you can get Bayes optimal if assumption is correct
– What if assumption are wrong?
![Page 13: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/13.jpg)
![Page 14: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/14.jpg)
Can we be more robust?
• So in general how to deal with this problem?
– Generative model
– Regularization
![Page 15: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/15.jpg)
So why a new method
• As we said earlier
• Different kind of assumption
• What if data is not Gaussian?
– Remember spectral clustering
![Page 16: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/16.jpg)
Graph Regularization
• Regularized classifier
• Learn a classifier that minimize
– Loss term + regularize
– Example: ridge regression
• Can we use unlabeled data for regularization?
Loss on labeled data(mean square,0-1)
Graph based smoothness prioron labeled and unlabeled data
![Page 17: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/17.jpg)
Is it robust?
• You can play with the regularization parameter
• Sensitive to graph construction
Loss on labeled data(mean square,0-1)
Graph based smoothness prioron labeled and unlabeled data
![Page 18: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/18.jpg)
The big picture
• Semi-supervised Learning
• Active Learning
Learning algorithm
Selective
labeling
Learning algorithm
There is no free lunch
Pay a little than passive
![Page 19: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/19.jpg)
Active Learning
• Passive learning
– Input a set of example
– Output a classifier
• Observation:
– Labels are expensive
– Sometime you can get the same classifier with subset of the data
• Example?
![Page 20: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/20.jpg)
Active Learning
• SVM
– Only need support vector
• Is it that easy?
• What assumption are we making here?
– Noise free environment
• In general, we need a localized function
![Page 21: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/21.jpg)
Active Learning• When does it help?
Passive = Active
Active learning is useful if complexity of target function is localized – labels of some data points are more informative than others.
[Castro et al.,’05]
![Page 22: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/22.jpg)
Active Learning setup
induce a model
inspectunlabeleddata
select “queries”
label newinstances,
repeat
![Page 23: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/23.jpg)
Algorithms from Insights
• We need to learn a decision boundary
• Classification uncertainty
– Query example closer to decision boundary
– We become more confident if we get them right
– Somehow this is still local decisions
• Version-Space uncertainty
– Some how makes global decision
![Page 24: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/24.jpg)
Version Space
• Set of hypothesis consistent with labeled examples
![Page 25: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/25.jpg)
Version Space
• Our goal: get a single hypothesis
• Select example that results in maximum reduction of hypothesis space
• What is the problem with that?
![Page 26: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/26.jpg)
Version space: Algorithm
• Query by committee
– Keep an ensemble of classifiers to approximate
• Goal reduce “entropy” over their contributions
• Idea
– Sample from P(parameters| data)
![Page 27: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/27.jpg)
Case study: SVM
• How to represent version space
• This is slightly re-parameterized SVM objective but it is the same
![Page 28: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/28.jpg)
Case study: SVM
• How to represent version space
![Page 29: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/29.jpg)
![Page 30: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/30.jpg)
Given the current labeled data we have an explicit representation of the version space
![Page 31: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/31.jpg)
Query point
• Halving the version space (query point c)
![Page 32: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/32.jpg)
Is it the End?
• Supervised
• Semi-supervised
• Active
• Transductive
– You still get to see unlabeled data
– But these are also your test data
– What can you do with that?
![Page 33: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/33.jpg)
Transductive SVM
• Chose a confident labeling of unlabeled data
Unlabeled data
![Page 34: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/34.jpg)
Transductive SVM
• Why does it make sense?
![Page 35: Semi-supervised and Active Learningepxing/Class/10701-10s/recitation/...Active Learning •When does it help? Passive = Active Active learning is useful if complexity of target function](https://reader034.vdocuments.net/reader034/viewer/2022042412/5f2b48571c21ce25b41943af/html5/thumbnails/35.jpg)
Transductive SVM
• When is it useful?
• News filtering
– Labeled data: news users liked in the past
– Test data (unlabeled): today’s news
– We only need to do well on those test data