![Page 1: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/1.jpg)
1
Computational Learning Theory: Probably Approximately Correct (PAC)
Learning
MachineLearningFall2017
SupervisedLearning:TheSetup
1
Machine LearningSpring 2018
The slides are mainly from Vivek Srikumar
![Page 2: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/2.jpg)
This lecture: Computational Learning Theory
• The Theory of Generalization
• Probably Approximately Correct (PAC) learning
• Positive and negative learnability results
• Agnostic Learning
• Shattering and the VC dimension
2
![Page 3: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/3.jpg)
Where are we?
• The Theory of Generalization– When can be trust the learning algorithm?– What functions can be learned?– Batch Learning
• Probably Approximately Correct (PAC) learning
• Positive and negative learnability results
• Agnostic Learning
• Shattering and the VC dimension
3
![Page 4: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/4.jpg)
This section
1. Analyze a simple algorithm for learning conjunctions
2. Define the PAC model of learning
3. Make formal connections to the principle of Occam’s razor
4
![Page 5: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/5.jpg)
This section
ü Analyze a simple algorithm for learning conjunctions
2. Define the PAC model of learning
3. Make formal connections to the principle of Occam’s razor
5
![Page 6: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/6.jpg)
Formulating the theory of prediction
In the general case, we have – X: instance space, Y: output space = {+1, -1}
– D: an unknown distribution over X
– f: an unknown target function X → Y, taken from a concept class C
– h: a hypothesis function X → Y that the learning algorithm selects from a hypothesis class H
– S: a set of m training examples drawn from D, labeled with f
– errD(h): The true error of any hypothesis h
– errS(h): The empirical error or training error or observed error of h
6
All the notation we have so far on one slide
![Page 7: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/7.jpg)
Theoretical questions
• Can we describe or bound the true error (errD) given the empirical error (errS)?
• Is a concept class C learnable?
• Is it possible to learn C using only the functions in H using the supervised protocol?
• How many examples does an algorithm need to guarantee good performance?
7
![Page 8: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/8.jpg)
Requirements of Learning
• Cannot expect a learner to learn a concept exactly– There will generally be multiple concepts consistent with the available
data (which represent a small fraction of the available instance space)– Unseen examples could potentially have any label – We “agree” to misclassify uncommon examples that do not show up in the
training set
• Cannot always expect to learn a close approximation to the target concept– Sometimes (only in rare learning situations, we hope) the training set will
not be representative (will contain uncommon examples)
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
8
![Page 9: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/9.jpg)
Requirements of Learning
• Cannot expect a learner to learn a concept exactly– There will generally be multiple concepts consistent with the available
data (which represent a small fraction of the available instance space)– Unseen examples could potentially have any label – We “agree” to misclassify uncommon examples that do not show up in the
training set
• Cannot always expect to learn a close approximation to the target concept– Sometimes (only in rare learning situations, we hope) the training set will
not be representative (will contain uncommon examples)
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
9
![Page 10: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/10.jpg)
Requirements of Learning
• Cannot expect a learner to learn a concept exactly– There will generally be multiple concepts consistent with the available
data (which represent a small fraction of the available instance space)– Unseen examples could potentially have any label – We “agree” to misclassify uncommon examples that do not show up in the
training set
• Cannot always expect to learn a close approximation to the target concept– Sometimes (only in rare learning situations, we hope) the training set will
not be representative (will contain uncommon examples)
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
10
![Page 11: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/11.jpg)
Requirements of Learning
• Cannot expect a learner to learn a concept exactly– There will generally be multiple concepts consistent with the available
data (which represent a small fraction of the available instance space)– Unseen examples could potentially have any label – We “agree” to misclassify uncommon examples that do not show up in the
training set
• Cannot always expect to learn a close approximation to the target concept– Sometimes (only in rare learning situations, we hope) the training set will
not be representative (will contain uncommon examples)
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
11
![Page 12: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/12.jpg)
Probably approximately correctness
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
• In Probably Approximately Correct (PAC) learning, one requires that – given small parameters ² and ±, – With probability at least 1 - ±, a learner produces a hypothesis
with error at most ²
• The only reason we can hope for this is the consistent distribution assumption
12
![Page 13: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/13.jpg)
Probably approximately correctness
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
• In Probably Approximately Correct (PAC) learning, one requires that – given small parameters ! and ", – With probability at least 1 - ", a learner produces a hypothesis
with error at most !
• The only reason we can hope for this is the consistent distribution assumption
13
![Page 14: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/14.jpg)
Probably approximately correctness
• The only realistic expectation of a good learner is that with high probability it will learn a close approximation to the target concept
• In Probably Approximately Correct (PAC) learning, one requires that – given small parameters ! and ", – With probability at least 1 - ", a learner produces a hypothesis
with error at most !
• The only reason we can hope for this is the consistent distribution assumption
14
![Page 15: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/15.jpg)
PAC Learnability
Consider a concept class C defined over an instance space X (containing instances of length n), and a learner L using a hypothesis space H
The concept class C is PAC learnable by L using H iffor all f 2 C,for all distribution D over X, and fixed 0< e, d < 1, given m examples sampled independently according to D, the algorithm L produces, with probability at least (1- d), a hypothesis h 2 H that has error at most e, where m is polynomial in 1/ e, 1/ d, n and size(H)
The concept class C is efficiently learnable if L can produce the hypothesis in time that is polynomial in 1/ e, 1/ d, n and size(H)
15
recall that ErrD(h) = PrD[f(x) ≠ h(x)]
![Page 16: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/16.jpg)
PAC Learnability
Consider a concept class C defined over an instance space X (containing instances of length n), and a learner L using a hypothesis space H
The concept class C is PAC learnable by L using H iffor all f 2 C,for all distribution D over X, and fixed 0< e, d < 1, given m examples sampled independently according to D, the algorithm L produces, with probability at least (1- d), a hypothesis h 2 H that has error at most e, where m is polynomial in 1/ e, 1/ d, n and size(H)
The concept class C is efficiently learnable if L can produce the hypothesis in time that is polynomial in 1/ e, 1/ d, n and size(H)
16
recall that ErrD(h) = PrD[f(x) ≠ h(x)]
![Page 17: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/17.jpg)
PAC Learnability
Consider a concept class C defined over an instance space X (containing instances of length n), and a learner L using a hypothesis space H
The concept class C is PAC learnable by L using H iffor all f ∈ C,for all distribution D over X, and fixed 0< e, d < 1, given m examples sampled independently according to D, the algorithm L produces, with probability at least (1- d), a hypothesis h 2 H that has error at most e, where m is polynomial in 1/ e, 1/ d, n and size(H)
The concept class C is efficiently learnable if L can produce the hypothesis in time that is polynomial in 1/ e, 1/ d, n and size(H)
17
recall that ErrD(h) = PrD[f(x) ≠ h(x)]
![Page 18: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/18.jpg)
PAC Learnability
Consider a concept class C defined over an instance space X (containing instances of length n), and a learner L using a hypothesis space H
The concept class C is PAC learnable by L using H iffor all f ∈ C,for all distribution D over X, and fixed 0< e, d < 1, given m examples sampled independently according to D, the algorithm L produces, with probability at least (1- d), a hypothesis h ∈ H that has error at most e, where m is polynomial in 1/ e, 1/ d, n and size(H)
The concept class C is efficiently learnable if L can produce the hypothesis in time that is polynomial in 1/ e, 1/ d, n and size(H)
18
recall that ErrD(h) = PrD[f(x) ≠ h(x)]
![Page 19: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/19.jpg)
PAC Learnability
Consider a concept class C defined over an instance space X (containing instances of length n), and a learner L using a hypothesis space H
The concept class C is PAC learnable by L using H iffor all f ∈ C,for all distribution D over X, and fixed 0< e, d < 1, given m examples sampled independently according to D, the algorithm L produces, with probability at least (1- d), a hypothesis h ∈ H that has error at most e, where m is polynomial in 1/ e, 1/ d, n and size(H)
The concept class C is efficiently learnable if L can produce the hypothesis in time that is polynomial in 1/ e, 1/ d, n and size(H)
19
recall that ErrD(h) = PrD[f(x) ≠ h(x)]
![Page 20: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/20.jpg)
PAC Learnability
Consider a concept class C defined over an instance space X (containing instances of length n), and a learner L using a hypothesis space H
The concept class C is PAC learnable by L using H iffor all f ∈ C,for all distribution D over X, and fixed 0< e, d < 1, given m examples sampled independently according to D, the algorithm L produces, with probability at least (1- d), a hypothesis h ∈ H that has error at most e, where m is polynomial in 1/ e, 1/ d, n and size(H)
The concept class C is efficiently learnable if L can produce the hypothesis in time that is polynomial in 1/ e, 1/ d, n and size(H)
20
recall that ErrD(h) = PrD[f(x) ≠ h(x)]
![Page 21: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/21.jpg)
PAC Learnability
• We impose two limitations• Polynomial sample complexity (information theoretic constraint)
– Is there enough information in the sample to distinguish a hypothesis h that approximate f ?
• Polynomial time complexity (computational complexity)– Is there an efficient algorithm that can process the sample and produce a
good hypothesis h ?
To be PAC learnable, there must be a hypothesis h Î H with arbitrary small error for every f Î C. We assume H Ê C. (Properly PAC learnable if H=C)
Worst Case definition: the algorithm must meet its accuracy – for every distribution (The distribution free assumption)– for every target function f in the class C
21
![Page 22: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/22.jpg)
PAC Learnability
• We impose two limitations• Polynomial sample complexity (information theoretic constraint)
– Is there enough information in the sample to distinguish a hypothesis h that approximate f ?
• Polynomial time complexity (computational complexity)– Is there an efficient algorithm that can process the sample and produce a
good hypothesis h ?
To be PAC learnable, there must be a hypothesis h Î H with arbitrary small error for every f Î C. We assume H Ê C. (Properly PAC learnable if H=C)
Worst Case definition: the algorithm must meet its accuracy – for every distribution (The distribution free assumption)– for every target function f in the class C
22
![Page 23: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/23.jpg)
PAC Learnability
• We impose two limitations• Polynomial sample complexity (information theoretic constraint)
– Is there enough information in the sample to distinguish a hypothesis h that approximate f ?
• Polynomial time complexity (computational complexity)– Is there an efficient algorithm that can process the sample and produce a
good hypothesis h ?
To be PAC learnable, there must be a hypothesis h Î H with arbitrary small error for every f Î C. We assume H Ê C. (Properly PAC learnable if H=C)
Worst Case definition: the algorithm must meet its accuracy – for every distribution (The distribution free assumption)– for every target function f in the class C
23
![Page 24: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/24.jpg)
PAC Learnability
• We impose two limitations• Polynomial sample complexity (information theoretic constraint)
– Is there enough information in the sample to distinguish a hypothesis h that approximate f ?
• Polynomial time complexity (computational complexity)– Is there an efficient algorithm that can process the sample and produce a
good hypothesis h ?
To be PAC learnable, there must be a hypothesis h Î H with arbitrary small error for every f Î C. We assume H Ê C. (Properly PAC learnable if H=C)
Worst Case definition: the algorithm must meet its accuracy – for every distribution (The distribution free assumption)– for every target function f in the class C
24
![Page 25: Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine](https://reader034.vdocuments.net/reader034/viewer/2022052015/602c90369c9cce7d1f7d93e9/html5/thumbnails/25.jpg)
PAC Learnability
• We impose two limitations• Polynomial sample complexity (information theoretic constraint)
– Is there enough information in the sample to distinguish a hypothesis h that approximate f ?
• Polynomial time complexity (computational complexity)– Is there an efficient algorithm that can process the sample and produce a
good hypothesis h ?
To be PAC learnable, there must be a hypothesis h Î H with arbitrary small error for every f Î C. We assume H Ê C. (Properly PAC learnable if H=C)
Worst Case definition: the algorithm must meet its accuracy – for every distribution (The distribution free assumption)– for every target function f in the class C
25