Download - Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012
![Page 1: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/1.jpg)
Semi-Supervised Learning & Summary
Advanced Statistical Methods in NLPLing 572
March 8, 2012
![Page 2: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/2.jpg)
2
RoadmapSemi-supervised learning:
Motivation & perspective
Yarowsky’s modelCo-training
Summary
![Page 3: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/3.jpg)
3
Semi-supervised Learning
![Page 4: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/4.jpg)
4
MotivationSupervised learning:
![Page 5: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/5.jpg)
5
MotivationSupervised learning:
Works really well But need lots of labeled training data
Unsupervised learning:
![Page 6: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/6.jpg)
6
MotivationSupervised learning:
Works really well But need lots of labeled training data
Unsupervised learning:No labeled data required, butMay not work well, may not learn desired
distinctions
![Page 7: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/7.jpg)
7
MotivationSupervised learning:
Works really well But need lots of labeled training data
Unsupervised learning:No labeled data required, butMay not work well, may not learn desired
distinctions
E.g. Unsupervised parsing techniquesFits data, but doesn’t correspond to linguistic
intuition
![Page 8: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/8.jpg)
8
SolutionSemi-supervised learning:
![Page 9: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/9.jpg)
9
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training data
![Page 10: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/10.jpg)
10
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training
data Use information in unlabeled data to improve models
![Page 11: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/11.jpg)
11
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training
data Use information in unlabeled data to improve models
Many different semi-supervised machine learnersVariants of supervised techniques:
Semi-supervised SVMs, CRFs, etc
![Page 12: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/12.jpg)
12
SolutionSemi-supervised learning:
General idea:Use a small amount of labeled training dataAugment with large amount of unlabeled training
data Use information in unlabeled data to improve models
Many different semi-supervised machine learnersVariants of supervised techniques:
Semi-supervised SVMs, CRFs, etc
Bootstrapping approaches Yarowsky’s method, self-training, co-training
![Page 13: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/13.jpg)
13
There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in therainforest that we have not yet discovered.Biological Example
The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world-wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the…Industrial Example
Label the First Use of “Plant”
![Page 14: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/14.jpg)
14
Word Sense Disambiguation
Application of lexical semantics
Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest
Crucial for real syntactic & semantic analysis
![Page 15: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/15.jpg)
15
Word Sense Disambiguation
Application of lexical semantics
Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest
Crucial for real syntactic & semantic analysisCorrect sense can determine
.
![Page 16: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/16.jpg)
16
Word Sense Disambiguation
Application of lexical semantics
Goal: Given a word in context, identify the appropriate senseE.g. plants and animals in the rainforest
Crucial for real syntactic & semantic analysisCorrect sense can determine
Available syntactic structureAvailable thematic roles, correct meaning,..
![Page 17: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/17.jpg)
17
Disambiguation FeaturesKey: What are the features?
![Page 18: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/18.jpg)
18
Disambiguation FeaturesKey: What are the features?
Part of speech Of word and neighbors
Morphologically simplified formWords in neighborhood
Question: How big a neighborhood? Is there a single optimal size? Why?
(Possibly shallow) Syntactic analysisE.g. predicate-argument relations, modification, phrases
Collocation vs co-occurrence featuresCollocation: words in specific relation: p-a, 1 word +/-Co-occurrence: bag of words..
![Page 19: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/19.jpg)
19
WSD Evaluation
![Page 20: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/20.jpg)
20
WSD EvaluationIdeally, end-to-end evaluation with WSD
componentDemonstrate real impact of technique in systemDifficult, expensive, still application specific
![Page 21: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/21.jpg)
21
WSD EvaluationIdeally, end-to-end evaluation with WSD
componentDemonstrate real impact of technique in systemDifficult, expensive, still application specific
Typically, intrinsic, sense-basedAccuracy, precision, recallSENSEVAL/SEMEVAL: all words, lexical sample
![Page 22: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/22.jpg)
22
WSD Evaluation Ideally, end-to-end evaluation with WSD component
Demonstrate real impact of technique in system Difficult, expensive, still application specific
Typically, intrinsic, sense-based Accuracy, precision, recall SENSEVAL/SEMEVAL: all words, lexical sample
Baseline: Most frequent sense
Topline: Human inter-rater agreement: 75-80% fine; 90% coarse
![Page 23: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/23.jpg)
23
Minimally Supervised WSDYarowsky’s algorithm (1995)
Bootstrapping approach:Use small labeled seedset to iteratively train
![Page 24: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/24.jpg)
24
Minimally Supervised WSDYarowsky’s algorithm (1995)
Bootstrapping approach:Use small labeled seedset to iteratively train
Builds on 2 key insights:One Sense Per Discourse
Word appearing multiple times in text has same sense
Corpus of 37232 bass instances: always single sense
![Page 25: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/25.jpg)
25
Minimally Supervised WSD Yarowsky’s algorithm (1995)
Bootstrapping approach: Use small labeled seedset to iteratively train
Builds on 2 key insights: One Sense Per Discourse
Word appearing multiple times in text has same senseCorpus of 37232 bass instances: always single sense
One Sense Per CollocationLocal phrases select single sense
Fish -> Bass1
Play -> Bass2
![Page 26: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/26.jpg)
26
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag
![Page 27: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/27.jpg)
27
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K
![Page 28: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/28.jpg)
28
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
![Page 29: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/29.jpg)
29
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules
![Page 30: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/30.jpg)
30
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D)
![Page 31: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/31.jpg)
31
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2
![Page 32: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/32.jpg)
32
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2
3. Apply 1 Sense/Discourse
![Page 33: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/33.jpg)
33
Yarowsky’s AlgorithmTraining Decision Lists
1. Pick Seed Instances & Tag2. Find Collocations: Word Left, Word Right,
Word +K(A) Calculate Informativeness on Tagged Set,
Order:
(B) Tag New Instances with Rules(C) Apply 1 Sense/Discourse(D) If Still Unlabeled, Go To 2
3. Apply 1 Sense/Discourse
Disambiguation: First Rule Matched
![Page 34: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/34.jpg)
34
Yarowsky Decision List
![Page 35: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/35.jpg)
35
Iterative Updating
![Page 36: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/36.jpg)
36
There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in therainforest that we have not yet discovered.Biological Example
The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world-wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the…Industrial Example
Label the First Use of “Plant”
![Page 37: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/37.jpg)
37
Sense Choice With Collocational Decision
ListsCreate Initial Decision List
Rules Ordered by
![Page 38: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/38.jpg)
38
Sense Choice With Collocational Decision
ListsCreate Initial Decision List
Rules Ordered by
Check nearby Word Groups (Collocations)Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words
![Page 39: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/39.jpg)
39
Sense Choice With Collocational Decision
ListsCreate Initial Decision List
Rules Ordered by
Check nearby Word Groups (Collocations)Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words
Result: Correct Selection95% on Pair-wise tasks
![Page 40: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/40.jpg)
40
Self-TrainingBasic approach:
Start off with small labeled training set
Train a supervised classifier with the training set
Apply new classifier to residual unlabeled training data
Add ‘best’ newly labeled examples to labeled training
Iterate
![Page 41: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/41.jpg)
41
Self-TrainingSimple – right?
![Page 42: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/42.jpg)
42
Self-TrainingSimple – right?
Devil in the details:Which instances are ‘best’ to add?
![Page 43: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/43.jpg)
43
Self-TrainingSimple – right?
Devil in the details:Which instances are ‘best’ to add?
Highest confidence?Probably accurate, butProbably add little new information to classifier
![Page 44: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/44.jpg)
44
Self-TrainingSimple – right?
Devil in the details:Which instances are ‘best’ to add?
Highest confidence?Probably accurate, butProbably add little new information to classifier
Most different?Probably adds information, butMay not be accurate
Use most different, highly confident instances
![Page 45: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/45.jpg)
45
Co-TrainingBlum & Mitchell, 1998
Basic intuition: “Two heads are better than one”
![Page 46: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/46.jpg)
46
Co-TrainingBlum & Mitchell, 1998
Basic intuition: “Two heads are better than one”Ensemble classifier:
Uses results from multiple classifiers
![Page 47: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/47.jpg)
47
Co-TrainingBlum & Mitchell, 1998
Basic intuition: “Two heads are better than one”Ensemble classifier:
Uses results from multiple classifiers
Multi-view classifier:Uses different views of data – feature subsetsIdeally, views should be:
Conditionally independent Individually sufficient – enough information to learn
![Page 48: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/48.jpg)
48
Co-training Set-upCreate two views of data:
Typically partition feature set by typeE.g. predicting speech emphasis
View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context
![Page 49: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/49.jpg)
49
Co-training Set-upCreate two views of data:
Typically partition feature set by typeE.g. predicting speech emphasis
View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context
Some approaches use learners of different types
In practice, views may not truly be conditionally indep.But often works pretty well anyway
![Page 50: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/50.jpg)
50
Co-training ApproachCreate small labeled training data set
Train two (supervised) classifiers on current training Using different views
Use two classifiers to label residual unlabeled instances
Select ‘best’ newly labeled data to add to training data* Adding instances labeled by C1 to training data for C2, v.v.
Iterate
![Page 51: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/51.jpg)
51
Graphically
Figure from Jeon&Liu’11
![Page 52: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/52.jpg)
52
More Devilish DetailsQuestions for co-training:
![Page 53: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/53.jpg)
53
More Devilish DetailsQuestions for co-training:
Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine
![Page 54: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/54.jpg)
54
More Devilish DetailsQuestions for co-training:
Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine
How many instances to add per iteration?Threshold – by count, by value?
![Page 55: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/55.jpg)
55
More Devilish DetailsQuestions for co-training:
Which instances are ‘best’ to add to training?Most confident? Most different? Random?Many approaches combine
How many instances to add per iteration?Threshold – by count, by value?
How long to iterate?Fixed count? Threshold classifier confidence? etc…
![Page 56: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/56.jpg)
56
Co-training ApplicationsApplied to many language related tasks
Blum & Mitchell’s paperAcademic home web page classification95% accuracy: 12 pages labeled; 788 classified
Sentiment analysis
Statistical parsing
Prominence recognition
Dialog classification
![Page 57: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/57.jpg)
57
Learning Curves:Semi-supervised vs
Supervised
9 12 24 50 100 30066
68
70
72
74
76
78
80
82
84
supervisedsemi-supervised
# of labelled examples
Ac
cu
rac
y
![Page 58: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/58.jpg)
58
Semi-supervised LearningUmbrella term for machine learning techniques
that:Use a small amount of labeled training dataAugmented with information from unlabeled data
![Page 59: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/59.jpg)
59
Semi-supervised LearningUmbrella term for machine learning techniques
that:Use a small amount of labeled training dataAugmented with information from unlabeled data
Can be very effective:Training on ~10 labeled samples Can yield results comparable to training on 1000s
![Page 60: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/60.jpg)
60
Semi-supervised LearningUmbrella term for machine learning techniques that:
Use a small amount of labeled training dataAugmented with information from unlabeled data
Can be very effective:Training on ~10 labeled samples Can yield results comparable to training on 1000s
Can be temperamental:Sensitive to data, learning algorithm, design choicesHard to predict effects of:
amount of labeled data, unlabeled data, etc
![Page 61: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/61.jpg)
61
Summary
![Page 62: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/62.jpg)
62
ReviewIntroduction:
Entropy, cross-entropy, and mutual information
Classic machine learning algorithms:Decision trees, kNN, Naïve Bayes
Discriminative machine learning algorithms:MaxEnt, CRFs, SVMs
Other models:TBL, EM, Semi-supervised approaches
![Page 63: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/63.jpg)
63
General MethodsData organization:
Training, development, test data splits
Cross-validation:Parameter turning, evaluation
Feature selection:Wrapper methods, filtering, weighting
Beam search
![Page 64: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/64.jpg)
64
Tools, Data, & TasksTools:
Mallet libSVM
Data:20 Newsgroups (Text classification)Penn Treebank (POS tagging)
![Page 65: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/65.jpg)
65
Beyond 572Ling 573:
‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,
repositories
![Page 66: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/66.jpg)
66
Beyond 572Ling 573:
‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,
repositories
Ling 575s:Speech technology: Michael Tjalve (TH: 4pm)NLP on mobile devices: Scott Farrar (T: 4pm)
![Page 67: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/67.jpg)
67
Beyond 572Ling 573:
‘Capstone’ project class:Integrates material from 57* classesMore ‘real world’: project teams, deliverables,
repositories
Ling 575s:Speech technology: Michael Tjalve (TH: 4pm)NLP on mobile devices: Scott Farrar (T: 4pm)
Ling and other electives
![Page 68: Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012](https://reader036.vdocuments.net/reader036/viewer/2022062515/56649cac5503460f9496d479/html5/thumbnails/68.jpg)
68
Course Evaluationshttps://depts.washington.edu/oeaias/webq/surve
y.cgi?user=UWDL&survey=1397
Thank you!