arxiv:1910.09457v3 [cs.lg] 16 sep 2020
TRANSCRIPT
Aleatoric and Epistemic Uncertainty in
Machine Learning: An Introduction
to Concepts and Methods
Eyke Hullermeier a and Willem Waegeman b
a Paderborn University
Heinz Nixdorf Institute and Department of Computer Science
b Ghent University
Department of Mathematical Modelling, Statistics and Bioinformatics
Abstract
The notion of uncertainty is of major importance in machine learning and con-
stitutes a key element of machine learning methodology. In line with the statistical
tradition, uncertainty has long been perceived as almost synonymous with standard
probability and probabilistic predictions. Yet, due to the steadily increasing relevance
of machine learning for practical applications and related issues such as safety require-
ments, new problems and challenges have recently been identified by machine learning
scholars, and these problems may call for new methodological developments. In par-
ticular, this includes the importance of distinguishing between (at least) two different
types of uncertainty, often referred to as aleatoric and epistemic. In this paper, we
provide an introduction to the topic of uncertainty in machine learning as well as an
overview of attempts so far at handling uncertainty in general and formalizing this
distinction in particular.
1 Introduction
Machine learning is essentially concerned with extracting models from data, often (though
not exclusively) using them for the purpose of prediction. As such, it is inseparably
connected with uncertainty. Indeed, learning in the sense of generalizing beyond the data
seen so far is necessarily based on a process of induction, i.e., replacing specific observations
by general models of the data-generating process. Such models are never provably correct
but only hypothetical and therefore uncertain, and the same holds true for the predictions
produced by a model. In addition to the uncertainty inherent in inductive inference, other
sources of uncertainty exist, including incorrect model assumptions and noisy or imprecise
data.
1
arX
iv:1
910.
0945
7v3
[cs
.LG
] 1
6 Se
p 20
20
Needless to say, a trustworthy representation of uncertainty is desirable and should be
considered as a key feature of any machine learning method, all the more in safety-critical
application domains such as medicine (Yang et al., 2009; Lambrou et al., 2011) or socio-
technical systems (Varshney, 2016; Varshney and Alemzadeh, 2016). Besides, uncertainty
is also a major concept within machine learning methodology itself; for example, the
principle of uncertainty reduction plays a key role in settings such as active learning
(Aggarwal et al., 2014; Nguyen et al., 2019), or in concrete learning algorithms such as
decision tree induction (Mitchell, 1980).
Traditionally, uncertainty is modeled in a probabilistic way, and indeed, in fields like
statistics and machine learning, probability theory has always been perceived as the ul-
timate tool for uncertainty handling. Without questioning the probabilistic approach in
general, one may argue that conventional approaches to probabilistic modeling, which
are essentially based on capturing knowledge in terms of a single probability distribution,
fail to distinguish two inherently different sources of uncertainty, which are often referred
to as aleatoric and epistemic uncertainty (Hora, 1996; Der Kiureghian and Ditlevsen,
2009). Roughly speaking, aleatoric (aka statistical) uncertainty refers to the notion of
randomness, that is, the variability in the outcome of an experiment which is due to
inherently random effects. The prototypical example of aleatoric uncertainty is coin flip-
ping: The data-generating process in this type of experiment has a stochastic component
that cannot be reduced by any additional source of information (except Laplace’s demon).
Consequently, even the best model of this process will only be able to provide probabil-
ities for the two possible outcomes, heads and tails, but no definite answer. As opposed
to this, epistemic (aka systematic) uncertainty refers to uncertainty caused by a lack of
knowledge (about the best model). In other words, it refers to the ignorance (cf. Sections
3.3 and A.2) of the agent or decision maker, and hence to the epistemic state of the agent
instead of any underlying random phenomenon. As opposed to uncertainty caused by
randomness, uncertainty caused by ignorance can in principle be reduced on the basis of
additional information. For example, what does the word “kichwa” mean in the Swahili
language, head or tail? The possible answers are the same as in coin flipping, and one
might be equally uncertain about which one is correct. Yet, the nature of uncertainty is
different, as one could easily get rid of it. In other words, epistemic uncertainty refers to
the reducible part of the (total) uncertainty, whereas aleatoric uncertainty refers to the
irreducible part.
In machine learning, where the agent is a learning algorithm, the two sources of uncer-
tainty are usually not distinguished. In some cases, such a distinction may indeed appear
unnecessary. For example, if an agent is forced to make a decision or prediction, the source
of its uncertainty—aleatoric or epistemic—might actually be irrelevant. This argument is
often put forward by Bayesians in favor of a purely probabilistic approach (and classical
Bayesian decision theory). One should note, however, that this scenario does not always
apply. Instead, the ultimate decision can often be refused or delayed, like in classifica-
tion with a reject option (Chow, 1970; Hellman, 1970), or actions can be taken that are
specifically directed at reducing uncertainty, like in active learning (Aggarwal et al., 2014).
2
Figure 1: Predictions by EfficientNet (Tan and Le, 2019) on test images from ImageNet:For the left image, the neural network predicts “typewriter keyboard” with certainty83.14 %, for the right image “stone wall” with certainty 87.63 %.
Motivated by such scenarios, and advocating a trustworthy representation of uncertainty
in machine learning, Senge et al. (2014) explicitly refer to the distinction between aleatoric
and epistemic uncertainty. They propose a quantification of these uncertainties and show
the usefulness of their approach in the context of medical decision making. A very similar
motivation is given by Kull and Flach (2014) in the context of their work on reliability
maps. They distinguish between a predicted probability score and the uncertainty in
that prediction, and illustrate this distinction with an example from weather forecasting1:
“... a weather forecaster can be very certain that the chance of rain is 50 %; or her best
estimate at 20 % might be very uncertain due to lack of data.” Roughly, the 50 % chance
corresponds to what one may understand as aleatoric uncertainty, whereas the uncertainty
in the 20 % estimate is akin to the notion of epistemic uncertainty. On the more practical
side, Varshney and Alemzadeh (2016) give an example of a recent accident of a self-driving
car, which led to the death of the driver (for the first time after 130 million miles of testing).
They explain the car’s failure by the extremely rare circumstances, and emphasize “the
importance of epistemic uncertainty or “uncertainty on uncertainty” in these AI-assisted
systems”.
More recently, a distinction between aleatoric and epistemic uncertainty has also been
advocated in the literature on deep learning (Kendall and Gal, 2017), where the limited
awareness of neural networks of their own competence has been demonstrated quite nicely.
For example, experiments on image classification have shown that a trained model does
often fail on specific instances, despite being very confident in its prediction (cf. Fig. 1).
Moreover, such models are often lacking robustness and can easily be fooled by “adversarial
examples” (Papernot and McDaniel, 2018): Drastic changes of a prediction may already
be provoked by minor, actually unimportant changes of an object. This problem has not
only been observed for images but also for other types of data, such as natural language
text (cf. Fig. 2 for an example).
This paper provides an overview of machine learning methods for handling uncertainty,
with a specific focus on the distinction between aleatoric and epistemic uncertainty in
1The semantic interpretation of probability is arguably not very clear in examples of that kind: Whatexactly does it mean to have a 50% chance of rain for the next day? This is not important for the pointwe want to make here, however.
3
There is really but one thing to say aboutthis sorry movie It should never have beenmade The first one one of my favourites AnAmerican Werewolf in London is a greatmovie with a good plot good actors andgood FX But this one It stinks to heavenwith a cry of helplessness
There is really but one thing to say aboutthat sorry movie It should never have beenmade The first one one of my favourites AnAmerican Werewolf in London is a greatmovie with a good plot good actors andgood FX But this one It stinks to heavenwith a cry of helplessness
Figure 2: Adversarial example (right) misclassified by a machine learning model trainedon textual data: Changing only a single — and apparently not very important — word(highlighted in bold font) is enough to turn the correct prediction “negative sentiment”into the incorrect prediction “positive sentiment” (Sato et al., 2018).
Table 1: Notation used throughout the paper.
Notation Meaning
P , p probability measure, density or mass functionX , x, xi instance space, instanceY, y, yi output space, outcomeH, h hypothesis space, hypothesisV = V(H,D) version spaceπ(y) possibility/plausibility of outcome yh(x) outcome y ∈ Y for instance x predicted by hypothesis hph(y |x) = p(y |x, h) probability of outcome y given x, predicted by h` loss function
h∗, h risk-minimizing hypothesis, empirical risk minimizerf∗ pointwise Bayes predictorE(X) expected value of random variable XV(X) variance of random variable XJ·K indicator function
the common setting of supervised learning. In the next section, we provide a short intro-
duction to this setting and propose a distinction between different sources of uncertainty.
Section 3 elaborates on modeling epistemic uncertainty. More specifically, it considers
set-based modeling and modeling with (probability) distributions as two principled ap-
proaches to capturing the learner’s epistemic uncertainty about its main target, that is,
the (Bayes) optimal model within the hypothesis space. Concrete approaches for model-
ing and handling uncertainty in machine learning are then discussed in Section 4, prior
to concluding the paper in Section 5. In addition, Appendix A provides some general
background on uncertainty modeling (independent of applications in machine learning),
specifically focusing on the distinction between set-based and distributional (probabilistic)
representations and emphasizing the potential usefulness of combining them.
Table 1 summarizes some important notation. Let us note that, for the sake of readabil-
ity, a somewhat simplified notation will be used for probability measures and associated
distribution functions. Most of the time, these will be denoted by P and p, respectively,
4
even if they refer to different measures and distributions on different spaces (which will be
clear from the arguments). For example, we will write p(h) and p(y |x) for the probability
(density) of hypotheses h (as elements of the hypothesis space H) and outcomes y (as
elements of the output space Y), instead of using different symbols, such as pH(h) and
pY(y |x).
2 Sources of uncertainty in supervised learning
Uncertainty occurs in various facets in machine learning, and different settings and learning
problems will usually require a different handling from an uncertainty modeling point of
view. In this paper, we focus on the standard setting of supervised learning (cf. Fig. 3),
which we briefly recall in this section. Moreover, we identify different sources of (predictive)
uncertainty in this setting.
2.1 Supervised learning and predictive uncertainty
In supervised learning, a learner is given access to a set of training data
D ..=
(x1, y1), . . . , (xN , yN )⊂ X × Y , (1)
where X is an instance space and Y the set of outcomes that can be associated with
an instance. Typically, the training examples (xi, yi) are assumed to be independent and
identically distributed (i.i.d.) according to some unknown probability measure P on X×Y.
Given a hypothesis space H (consisting of hypotheses h : X −→ Y mapping instances x
to outcomes y) and a loss function ` : Y × Y −→ R, the goal of the learner is to induce a
hypothesis h∗ ∈ H with low risk (expected loss)
R(h) ..=
∫
X×Y`(h(x), y) dP (x, y) . (2)
Thus, given the training data D, the learner needs to “guess” a good hypothesis h. This
choice is commonly guided by the empirical risk
Remp(h) ..=1
N
N∑
i=1
`(h(xi), yi) , (3)
i.e., the performance of a hypothesis on the training data. However, since Remp(h) is only
an estimation of the true risk R(h), the hypothesis (empirical risk minimizer)
h ..= argminh∈H
Remp(h) (4)
5
test data x<latexit sha1_base64="Qsm+2a4kjUWc4JESoVFf/c/uG50=">AAACxnicbVFNa9wwENW6X6n7tWmPvZhuCj0tdqAfx0Ch5JhCNwmszTKWx14RSTbSOOkiDP2Fvfd/9NoQedcN200HhJ5m3jwNb/JGCktx/GsU3Lv/4OGjvcfhk6fPnr8Y7788tXVrOM54LWtznoNFKTTOSJDE88YgqFziWX7xua+fXaKxotbfaNVgpqDSohQcyKcW42WaYyW0U0KLBirs3HuuutRAVWFhRLWkcGBw1ISmCwktRQUQRAdpXsvCrpS/3PfuIAxT1MUtcf241V2MJ/E0Xkd0FyQDmLAhThb7oy9pUfNWeTkuwdp5EjeUOTAkuESv31psgF949bmHGhTazK0t6aK3PlNEZW380RSts9sdDpTtJ/dMBbS0u7U++b/avKXyU+aEblpCzTcfla2MqI56f6NCGOQkVx4AN8LPGvElGODeE69kUOMVr5UCb05aghJyVWAJraTOpbb8i8N0m2eRunmSubSfh4N0k6TrdsUukW9IW1vped75ZNfnu+D0cJrE0+Tr4eTow7CDPfaavWHvWMI+siN2zE7YjHH2k/1mf9h1cBzooA2uNtRgNPS8Yv9E8OMGNzLkOg==</latexit><latexit sha1_base64="Qsm+2a4kjUWc4JESoVFf/c/uG50=">AAACxnicbVFNa9wwENW6X6n7tWmPvZhuCj0tdqAfx0Ch5JhCNwmszTKWx14RSTbSOOkiDP2Fvfd/9NoQedcN200HhJ5m3jwNb/JGCktx/GsU3Lv/4OGjvcfhk6fPnr8Y7788tXVrOM54LWtznoNFKTTOSJDE88YgqFziWX7xua+fXaKxotbfaNVgpqDSohQcyKcW42WaYyW0U0KLBirs3HuuutRAVWFhRLWkcGBw1ISmCwktRQUQRAdpXsvCrpS/3PfuIAxT1MUtcf241V2MJ/E0Xkd0FyQDmLAhThb7oy9pUfNWeTkuwdp5EjeUOTAkuESv31psgF949bmHGhTazK0t6aK3PlNEZW380RSts9sdDpTtJ/dMBbS0u7U++b/avKXyU+aEblpCzTcfla2MqI56f6NCGOQkVx4AN8LPGvElGODeE69kUOMVr5UCb05aghJyVWAJraTOpbb8i8N0m2eRunmSubSfh4N0k6TrdsUukW9IW1vped75ZNfnu+D0cJrE0+Tr4eTow7CDPfaavWHvWMI+siN2zE7YjHH2k/1mf9h1cBzooA2uNtRgNPS8Yv9E8OMGNzLkOg==</latexit><latexit sha1_base64="Qsm+2a4kjUWc4JESoVFf/c/uG50=">AAACxnicbVFNa9wwENW6X6n7tWmPvZhuCj0tdqAfx0Ch5JhCNwmszTKWx14RSTbSOOkiDP2Fvfd/9NoQedcN200HhJ5m3jwNb/JGCktx/GsU3Lv/4OGjvcfhk6fPnr8Y7788tXVrOM54LWtznoNFKTTOSJDE88YgqFziWX7xua+fXaKxotbfaNVgpqDSohQcyKcW42WaYyW0U0KLBirs3HuuutRAVWFhRLWkcGBw1ISmCwktRQUQRAdpXsvCrpS/3PfuIAxT1MUtcf241V2MJ/E0Xkd0FyQDmLAhThb7oy9pUfNWeTkuwdp5EjeUOTAkuESv31psgF949bmHGhTazK0t6aK3PlNEZW380RSts9sdDpTtJ/dMBbS0u7U++b/avKXyU+aEblpCzTcfla2MqI56f6NCGOQkVx4AN8LPGvElGODeE69kUOMVr5UCb05aghJyVWAJraTOpbb8i8N0m2eRunmSubSfh4N0k6TrdsUukW9IW1vped75ZNfnu+D0cJrE0+Tr4eTow7CDPfaavWHvWMI+siN2zE7YjHH2k/1mf9h1cBzooA2uNtRgNPS8Yv9E8OMGNzLkOg==</latexit><latexit sha1_base64="Qsm+2a4kjUWc4JESoVFf/c/uG50=">AAACxnicbVFNa9wwENW6X6n7tWmPvZhuCj0tdqAfx0Ch5JhCNwmszTKWx14RSTbSOOkiDP2Fvfd/9NoQedcN200HhJ5m3jwNb/JGCktx/GsU3Lv/4OGjvcfhk6fPnr8Y7788tXVrOM54LWtznoNFKTTOSJDE88YgqFziWX7xua+fXaKxotbfaNVgpqDSohQcyKcW42WaYyW0U0KLBirs3HuuutRAVWFhRLWkcGBw1ISmCwktRQUQRAdpXsvCrpS/3PfuIAxT1MUtcf241V2MJ/E0Xkd0FyQDmLAhThb7oy9pUfNWeTkuwdp5EjeUOTAkuESv31psgF949bmHGhTazK0t6aK3PlNEZW380RSts9sdDpTtJ/dMBbS0u7U++b/avKXyU+aEblpCzTcfla2MqI56f6NCGOQkVx4AN8LPGvElGODeE69kUOMVr5UCb05aghJyVWAJraTOpbb8i8N0m2eRunmSubSfh4N0k6TrdsUukW9IW1vped75ZNfnu+D0cJrE0+Tr4eTow7CDPfaavWHvWMI+siN2zE7YjHH2k/1mf9h1cBzooA2uNtRgNPS8Yv9E8OMGNzLkOg==</latexit>
background knowledge<latexit sha1_base64="gzWvrNendDNFlNO0dDeMlCKtr6I=">AAACcnicbVHLattAFB2rr9R9xGm6ajfTmkIXxUhZtFkGCqXLBOIkYAlzNbpyBs9DzFwluEL/km37R/2PfkBHtgqJ0wsDh3PPfcy5eaWkpzj+PYgePHz0+MnO0+Gz5y9e7o72Xp15WzuBU2GVdRc5eFTS4JQkKbyoHILOFZ7ny69d/vwKnZfWnNKqwkzDwshSCqBAzUevcxDLhbO1KfjS2GuFxQLno3E8idfB74OkB2PWx/F8b/AtLayoNRoSCryfJXFFWQOOpFDYDtPaYxUmwQJnARrQ6LNmvX7LPwSm4KV14Rnia/Z2RQPa+5XOg1IDXfrtXEf+LzerqTzMGmmqmtCIzaCyVpws77zghXQoSK0CAOFk2JWLS3AgKDg2TB0avBZWazBFk5agpVoVWEKtqG1SX/7Dw/S2ziO1syRr0m4fAaoZJ2273ewKxUaUW1V0f7MbXXA+2fb5Pjg7mCTxJDk5GB997m+ww96y9+wjS9gXdsS+s2M2ZYL9YDfsJ/s1+BO9id5F/cGiQV+zz+5E9OkvORfBmw==</latexit><latexit sha1_base64="gzWvrNendDNFlNO0dDeMlCKtr6I=">AAACcnicbVHLattAFB2rr9R9xGm6ajfTmkIXxUhZtFkGCqXLBOIkYAlzNbpyBs9DzFwluEL/km37R/2PfkBHtgqJ0wsDh3PPfcy5eaWkpzj+PYgePHz0+MnO0+Gz5y9e7o72Xp15WzuBU2GVdRc5eFTS4JQkKbyoHILOFZ7ny69d/vwKnZfWnNKqwkzDwshSCqBAzUevcxDLhbO1KfjS2GuFxQLno3E8idfB74OkB2PWx/F8b/AtLayoNRoSCryfJXFFWQOOpFDYDtPaYxUmwQJnARrQ6LNmvX7LPwSm4KV14Rnia/Z2RQPa+5XOg1IDXfrtXEf+LzerqTzMGmmqmtCIzaCyVpws77zghXQoSK0CAOFk2JWLS3AgKDg2TB0avBZWazBFk5agpVoVWEKtqG1SX/7Dw/S2ziO1syRr0m4fAaoZJ2273ewKxUaUW1V0f7MbXXA+2fb5Pjg7mCTxJDk5GB997m+ww96y9+wjS9gXdsS+s2M2ZYL9YDfsJ/s1+BO9id5F/cGiQV+zz+5E9OkvORfBmw==</latexit><latexit sha1_base64="gzWvrNendDNFlNO0dDeMlCKtr6I=">AAACcnicbVHLattAFB2rr9R9xGm6ajfTmkIXxUhZtFkGCqXLBOIkYAlzNbpyBs9DzFwluEL/km37R/2PfkBHtgqJ0wsDh3PPfcy5eaWkpzj+PYgePHz0+MnO0+Gz5y9e7o72Xp15WzuBU2GVdRc5eFTS4JQkKbyoHILOFZ7ny69d/vwKnZfWnNKqwkzDwshSCqBAzUevcxDLhbO1KfjS2GuFxQLno3E8idfB74OkB2PWx/F8b/AtLayoNRoSCryfJXFFWQOOpFDYDtPaYxUmwQJnARrQ6LNmvX7LPwSm4KV14Rnia/Z2RQPa+5XOg1IDXfrtXEf+LzerqTzMGmmqmtCIzaCyVpws77zghXQoSK0CAOFk2JWLS3AgKDg2TB0avBZWazBFk5agpVoVWEKtqG1SX/7Dw/S2ziO1syRr0m4fAaoZJ2273ewKxUaUW1V0f7MbXXA+2fb5Pjg7mCTxJDk5GB997m+ww96y9+wjS9gXdsS+s2M2ZYL9YDfsJ/s1+BO9id5F/cGiQV+zz+5E9OkvORfBmw==</latexit><latexit sha1_base64="gzWvrNendDNFlNO0dDeMlCKtr6I=">AAACcnicbVHLattAFB2rr9R9xGm6ajfTmkIXxUhZtFkGCqXLBOIkYAlzNbpyBs9DzFwluEL/km37R/2PfkBHtgqJ0wsDh3PPfcy5eaWkpzj+PYgePHz0+MnO0+Gz5y9e7o72Xp15WzuBU2GVdRc5eFTS4JQkKbyoHILOFZ7ny69d/vwKnZfWnNKqwkzDwshSCqBAzUevcxDLhbO1KfjS2GuFxQLno3E8idfB74OkB2PWx/F8b/AtLayoNRoSCryfJXFFWQOOpFDYDtPaYxUmwQJnARrQ6LNmvX7LPwSm4KV14Rnia/Z2RQPa+5XOg1IDXfrtXEf+LzerqTzMGmmqmtCIzaCyVpws77zghXQoSK0CAOFk2JWLS3AgKDg2TB0avBZWazBFk5agpVoVWEKtqG1SX/7Dw/S2ziO1syRr0m4fAaoZJ2273ewKxUaUW1V0f7MbXXA+2fb5Pjg7mCTxJDk5GB997m+ww96y9+wjS9gXdsS+s2M2ZYL9YDfsJ/s1+BO9id5F/cGiQV+zz+5E9OkvORfBmw==</latexit>
inductionprinciple
<latexit sha1_base64="3/k1KmTB09i69ofbhhFZ9iy5NQY=">AAACj3icbVHLattAFB2rj6Tqy26X3YiaQldGyqLJqhgCpd2lUCcBjzBXV1fOkHmImVGCEfqOfE23zTfkbzqyHUidXhg4nHPunTtniloK59P0bhA9efrs+d7+i/jlq9dv3g5H706daSzSDI009rwAR1JomnnhJZ3XlkAVks6Ky+NeP7si64TRv/yqplzBUotKIPhALYYZL2gpdIukPdkuFrpssJc4j2srNIpaUsxJl/eWxXCcTtJ1JY9BtgVjtq2TxWjwjZcGGxX6UYJz8yytfd6C9QIldTFvHNWAl7CkeYAaFLm8Xb+tSz4FpkwqY8PRPlmzDztaUM6tVBGcCvyF29V68n/avPHVUd4KXTeeNG4uqhqZeJP0QSWlsIRergIAtCLsmuAFWMAQQphkSdM1GqUgRMMrUEKuSqqgkb5ruavuccwf+hz5bp7lLe/3QZDtOOu63WFXhBtTYWTZv81sfCH5bDfnx+D0YJKlk+znwXj6ZfsH++wD+8g+s4wdsin7zk7YjCG7Yb/ZH3YbjaLD6Gs03VijwbbnPfunoh9/AStazQ0=</latexit><latexit sha1_base64="3/k1KmTB09i69ofbhhFZ9iy5NQY=">AAACj3icbVHLattAFB2rj6Tqy26X3YiaQldGyqLJqhgCpd2lUCcBjzBXV1fOkHmImVGCEfqOfE23zTfkbzqyHUidXhg4nHPunTtniloK59P0bhA9efrs+d7+i/jlq9dv3g5H706daSzSDI009rwAR1JomnnhJZ3XlkAVks6Ky+NeP7si64TRv/yqplzBUotKIPhALYYZL2gpdIukPdkuFrpssJc4j2srNIpaUsxJl/eWxXCcTtJ1JY9BtgVjtq2TxWjwjZcGGxX6UYJz8yytfd6C9QIldTFvHNWAl7CkeYAaFLm8Xb+tSz4FpkwqY8PRPlmzDztaUM6tVBGcCvyF29V68n/avPHVUd4KXTeeNG4uqhqZeJP0QSWlsIRergIAtCLsmuAFWMAQQphkSdM1GqUgRMMrUEKuSqqgkb5ruavuccwf+hz5bp7lLe/3QZDtOOu63WFXhBtTYWTZv81sfCH5bDfnx+D0YJKlk+znwXj6ZfsH++wD+8g+s4wdsin7zk7YjCG7Yb/ZH3YbjaLD6Gs03VijwbbnPfunoh9/AStazQ0=</latexit><latexit sha1_base64="3/k1KmTB09i69ofbhhFZ9iy5NQY=">AAACj3icbVHLattAFB2rj6Tqy26X3YiaQldGyqLJqhgCpd2lUCcBjzBXV1fOkHmImVGCEfqOfE23zTfkbzqyHUidXhg4nHPunTtniloK59P0bhA9efrs+d7+i/jlq9dv3g5H706daSzSDI009rwAR1JomnnhJZ3XlkAVks6Ky+NeP7si64TRv/yqplzBUotKIPhALYYZL2gpdIukPdkuFrpssJc4j2srNIpaUsxJl/eWxXCcTtJ1JY9BtgVjtq2TxWjwjZcGGxX6UYJz8yytfd6C9QIldTFvHNWAl7CkeYAaFLm8Xb+tSz4FpkwqY8PRPlmzDztaUM6tVBGcCvyF29V68n/avPHVUd4KXTeeNG4uqhqZeJP0QSWlsIRergIAtCLsmuAFWMAQQphkSdM1GqUgRMMrUEKuSqqgkb5ruavuccwf+hz5bp7lLe/3QZDtOOu63WFXhBtTYWTZv81sfCH5bDfnx+D0YJKlk+znwXj6ZfsH++wD+8g+s4wdsin7zk7YjCG7Yb/ZH3YbjaLD6Gs03VijwbbnPfunoh9/AStazQ0=</latexit><latexit sha1_base64="3/k1KmTB09i69ofbhhFZ9iy5NQY=">AAACj3icbVHLattAFB2rj6Tqy26X3YiaQldGyqLJqhgCpd2lUCcBjzBXV1fOkHmImVGCEfqOfE23zTfkbzqyHUidXhg4nHPunTtniloK59P0bhA9efrs+d7+i/jlq9dv3g5H706daSzSDI009rwAR1JomnnhJZ3XlkAVks6Ky+NeP7si64TRv/yqplzBUotKIPhALYYZL2gpdIukPdkuFrpssJc4j2srNIpaUsxJl/eWxXCcTtJ1JY9BtgVjtq2TxWjwjZcGGxX6UYJz8yytfd6C9QIldTFvHNWAl7CkeYAaFLm8Xb+tSz4FpkwqY8PRPlmzDztaUM6tVBGcCvyF29V68n/avPHVUd4KXTeeNG4uqhqZeJP0QSWlsIRergIAtCLsmuAFWMAQQphkSdM1GqUgRMMrUEKuSqqgkb5ruavuccwf+hz5bp7lLe/3QZDtOOu63WFXhBtTYWTZv81sfCH5bDfnx+D0YJKlk+znwXj6ZfsH++wD+8g+s4wdsin7zk7YjCG7Yb/ZH3YbjaLD6Gs03VijwbbnPfunoh9/AStazQ0=</latexit>
learningalgorithm
<latexit sha1_base64="eXeP1/wMFiW7LFCdpHE+iIG82w8=">AAACjnicbVHLattAFB2rr1R9xGmX3YiaQldGyiLtJjRQCFmmUCcBjzBXV1fykHmImVGKEfqNfk236T/0bzqyHUidXhg4nHPunblnikYK59P0zyh69PjJ02d7z+MXL1+93h8fvLlwprVIMzTS2KsCHEmhaeaFl3TVWAJVSLosrr8O+uUNWSeM/u5XDeUKai0qgeADtRinvKBa6A5Je7J9LAmsFrrmPAZZGyv8UsWcdHnnWIwn6TRdV/IQZFswYds6XxyMTnlpsFWhHyU4N8/SxucdWC9QUh/z1lEDeA01zQPUoMjl3Xq1PvkQmDKpjA1H+2TN3u/oQDm3UkVwKvBLt6sN5P+0eeurz3kndNN60ri5qGpl4k0y5JSUwhJ6uQoAMMQgMMElWMAQQphkSdMPNEpBiIZXoIRclVRBK33fcVfd4Zjf9zny/TzLOz68B0F2k6zvd4fdEG5MhZHlsJvZ+ELy2W7OD8HF4TRLp9m3w8nJ0fYP9tg79p59ZBn7xE7YGTtnM4bsJ/vFbtnvaBwdRcfRl401Gm173rJ/Kjr7C/rKzIc=</latexit><latexit sha1_base64="eXeP1/wMFiW7LFCdpHE+iIG82w8=">AAACjnicbVHLattAFB2rr1R9xGmX3YiaQldGyiLtJjRQCFmmUCcBjzBXV1fykHmImVGKEfqNfk236T/0bzqyHUidXhg4nHPunblnikYK59P0zyh69PjJ02d7z+MXL1+93h8fvLlwprVIMzTS2KsCHEmhaeaFl3TVWAJVSLosrr8O+uUNWSeM/u5XDeUKai0qgeADtRinvKBa6A5Je7J9LAmsFrrmPAZZGyv8UsWcdHnnWIwn6TRdV/IQZFswYds6XxyMTnlpsFWhHyU4N8/SxucdWC9QUh/z1lEDeA01zQPUoMjl3Xq1PvkQmDKpjA1H+2TN3u/oQDm3UkVwKvBLt6sN5P+0eeurz3kndNN60ri5qGpl4k0y5JSUwhJ6uQoAMMQgMMElWMAQQphkSdMPNEpBiIZXoIRclVRBK33fcVfd4Zjf9zny/TzLOz68B0F2k6zvd4fdEG5MhZHlsJvZ+ELy2W7OD8HF4TRLp9m3w8nJ0fYP9tg79p59ZBn7xE7YGTtnM4bsJ/vFbtnvaBwdRcfRl401Gm173rJ/Kjr7C/rKzIc=</latexit><latexit sha1_base64="eXeP1/wMFiW7LFCdpHE+iIG82w8=">AAACjnicbVHLattAFB2rr1R9xGmX3YiaQldGyiLtJjRQCFmmUCcBjzBXV1fykHmImVGKEfqNfk236T/0bzqyHUidXhg4nHPunblnikYK59P0zyh69PjJ02d7z+MXL1+93h8fvLlwprVIMzTS2KsCHEmhaeaFl3TVWAJVSLosrr8O+uUNWSeM/u5XDeUKai0qgeADtRinvKBa6A5Je7J9LAmsFrrmPAZZGyv8UsWcdHnnWIwn6TRdV/IQZFswYds6XxyMTnlpsFWhHyU4N8/SxucdWC9QUh/z1lEDeA01zQPUoMjl3Xq1PvkQmDKpjA1H+2TN3u/oQDm3UkVwKvBLt6sN5P+0eeurz3kndNN60ri5qGpl4k0y5JSUwhJ6uQoAMMQgMMElWMAQQphkSdMPNEpBiIZXoIRclVRBK33fcVfd4Zjf9zny/TzLOz68B0F2k6zvd4fdEG5MhZHlsJvZ+ELy2W7OD8HF4TRLp9m3w8nJ0fYP9tg79p59ZBn7xE7YGTtnM4bsJ/vFbtnvaBwdRcfRl401Gm173rJ/Kjr7C/rKzIc=</latexit><latexit sha1_base64="eXeP1/wMFiW7LFCdpHE+iIG82w8=">AAACjnicbVHLattAFB2rr1R9xGmX3YiaQldGyiLtJjRQCFmmUCcBjzBXV1fykHmImVGKEfqNfk236T/0bzqyHUidXhg4nHPunblnikYK59P0zyh69PjJ02d7z+MXL1+93h8fvLlwprVIMzTS2KsCHEmhaeaFl3TVWAJVSLosrr8O+uUNWSeM/u5XDeUKai0qgeADtRinvKBa6A5Je7J9LAmsFrrmPAZZGyv8UsWcdHnnWIwn6TRdV/IQZFswYds6XxyMTnlpsFWhHyU4N8/SxucdWC9QUh/z1lEDeA01zQPUoMjl3Xq1PvkQmDKpjA1H+2TN3u/oQDm3UkVwKvBLt6sN5P+0eeurz3kndNN60ri5qGpl4k0y5JSUwhJ6uQoAMMQgMMElWMAQQphkSdMPNEpBiIZXoIRclVRBK33fcVfd4Zjf9zny/TzLOz68B0F2k6zvd4fdEG5MhZHlsJvZ+ELy2W7OD8HF4TRLp9m3w8nJ0fYP9tg79p59ZBn7xE7YGTtnM4bsJ/vFbtnvaBwdRcfRl401Gm173rJ/Kjr7C/rKzIc=</latexit>
training data D<latexit sha1_base64="1/HpL/1d2a8+o7V7Lvdf8VCwOGo=">AAACeXicbVFNb9NAEN2YAiV8NC3HXrZNkSoOkV0k4FipCHEsUtNWiq1ovB6nq+6HtTsuiiwf+DVc4efwW7iwTgJq04600tN7b3Zn3uaVkp7i+HcverTx+MnTzWf95y9evtoabO+ce1s7gWNhlXWXOXhU0uCYJCm8rByCzhVe5NcnnX5xg85La85oXmGmYWZkKQVQoKaDXXIgjTQzXgABP0g10JUA1XxqD6aDYTyKF8Xvg2QFhmxVp9Pt3ue0sKLWaEgo8H6SxBVlDTiSQmHbT2uPFYhrmOEkQAMafdYstmj5m8AUvLQuHEN8wd7uaEB7P9d5cHZD+nWtIx/SJjWVH7NGmqomNGL5UFkrTpZ3kfBCOhSk5gGAcDLMysUVOBAUguunDg1+E1ZrMEWTlqClmhdYQq2obVJf/sP99LbPI7WTJGv+BzpM2nb9shsUS1NuVdHtZpe+kHyynvN9cH40SuJR8vVoePx+9QebbJfts0OWsA/smH1hp2zMBPvOfrCf7FfvT7QXHUZvl9aot+p5ze5U9O4vTAnD0Q==</latexit><latexit sha1_base64="1/HpL/1d2a8+o7V7Lvdf8VCwOGo=">AAACeXicbVFNb9NAEN2YAiV8NC3HXrZNkSoOkV0k4FipCHEsUtNWiq1ovB6nq+6HtTsuiiwf+DVc4efwW7iwTgJq04600tN7b3Zn3uaVkp7i+HcverTx+MnTzWf95y9evtoabO+ce1s7gWNhlXWXOXhU0uCYJCm8rByCzhVe5NcnnX5xg85La85oXmGmYWZkKQVQoKaDXXIgjTQzXgABP0g10JUA1XxqD6aDYTyKF8Xvg2QFhmxVp9Pt3ue0sKLWaEgo8H6SxBVlDTiSQmHbT2uPFYhrmOEkQAMafdYstmj5m8AUvLQuHEN8wd7uaEB7P9d5cHZD+nWtIx/SJjWVH7NGmqomNGL5UFkrTpZ3kfBCOhSk5gGAcDLMysUVOBAUguunDg1+E1ZrMEWTlqClmhdYQq2obVJf/sP99LbPI7WTJGv+BzpM2nb9shsUS1NuVdHtZpe+kHyynvN9cH40SuJR8vVoePx+9QebbJfts0OWsA/smH1hp2zMBPvOfrCf7FfvT7QXHUZvl9aot+p5ze5U9O4vTAnD0Q==</latexit><latexit sha1_base64="1/HpL/1d2a8+o7V7Lvdf8VCwOGo=">AAACeXicbVFNb9NAEN2YAiV8NC3HXrZNkSoOkV0k4FipCHEsUtNWiq1ovB6nq+6HtTsuiiwf+DVc4efwW7iwTgJq04600tN7b3Zn3uaVkp7i+HcverTx+MnTzWf95y9evtoabO+ce1s7gWNhlXWXOXhU0uCYJCm8rByCzhVe5NcnnX5xg85La85oXmGmYWZkKQVQoKaDXXIgjTQzXgABP0g10JUA1XxqD6aDYTyKF8Xvg2QFhmxVp9Pt3ue0sKLWaEgo8H6SxBVlDTiSQmHbT2uPFYhrmOEkQAMafdYstmj5m8AUvLQuHEN8wd7uaEB7P9d5cHZD+nWtIx/SJjWVH7NGmqomNGL5UFkrTpZ3kfBCOhSk5gGAcDLMysUVOBAUguunDg1+E1ZrMEWTlqClmhdYQq2obVJf/sP99LbPI7WTJGv+BzpM2nb9shsUS1NuVdHtZpe+kHyynvN9cH40SuJR8vVoePx+9QebbJfts0OWsA/smH1hp2zMBPvOfrCf7FfvT7QXHUZvl9aot+p5ze5U9O4vTAnD0Q==</latexit><latexit sha1_base64="1/HpL/1d2a8+o7V7Lvdf8VCwOGo=">AAACeXicbVFNb9NAEN2YAiV8NC3HXrZNkSoOkV0k4FipCHEsUtNWiq1ovB6nq+6HtTsuiiwf+DVc4efwW7iwTgJq04600tN7b3Zn3uaVkp7i+HcverTx+MnTzWf95y9evtoabO+ce1s7gWNhlXWXOXhU0uCYJCm8rByCzhVe5NcnnX5xg85La85oXmGmYWZkKQVQoKaDXXIgjTQzXgABP0g10JUA1XxqD6aDYTyKF8Xvg2QFhmxVp9Pt3ue0sKLWaEgo8H6SxBVlDTiSQmHbT2uPFYhrmOEkQAMafdYstmj5m8AUvLQuHEN8wd7uaEB7P9d5cHZD+nWtIx/SJjWVH7NGmqomNGL5UFkrTpZ3kfBCOhSk5gGAcDLMysUVOBAUguunDg1+E1ZrMEWTlqClmhdYQq2obVJf/sP99LbPI7WTJGv+BzpM2nb9shsUS1NuVdHtZpe+kHyynvN9cH40SuJR8vVoePx+9QebbJfts0OWsA/smH1hp2zMBPvOfrCf7FfvT7QXHUZvl9aot+p5ze5U9O4vTAnD0Q==</latexit>
MODEL INDUCTION
h = Ind(D)<latexit sha1_base64="BRriWujX9D0UZlHxaqtvbuA89As=">AAAC4nicbVFNbxMxEPUuX2X5aAon4GKRIpVLtFuJjwtSpQZEJGiL1LSVslHk9c5urNrele0tRJZvnLghrvwx/gK/Am8SopIykqXnmTdvPM9ZzZk2cfwrCK9dv3Hz1sbt6M7de/c3O1sPTnTVKApDWvFKnWVEA2cShoYZDme1AiIyDqfZ+X5bP70ApVklj82shrEgpWQFo8T41KTzNc2gZNIKJllNSnD2BRUuVaQsIVesnJpoyaAgDSgXfTzsv/2ABwf94f7x4PAg2t6O0ikxdurwG5wa+GLsQOZuJxXETCnhtu+ez0kg85XI/LKaOel04148D3wVJEvQRcs4mmwF79K8oo3wcpQTrUdJXJuxJcowysHrNxpqQs+9+shDSQTosZ3b5fAzn8lxUSl/pMHz7OUOS4TWM5F5ZruDXq+1yf/VRo0pXo8tk3VjQNLFoKLh2FS49R7nTAE1fOYBoYr5t2I6JYpQ74lXUiDhM62EIN6ctCCC8VkOBWm4cTbVxV8cpZd5GowbJWO78rubOLcudgF0Qcoqnre7VQuedz5Z9/kqONntJXEv+bTb3Xu5/IMN9AQ9RTsoQa/QHnqPjtAQUfQ76ASPgsdhHn4Lv4c/FtQwWPY8RP9E+PMPuLLpvw==</latexit><latexit sha1_base64="BRriWujX9D0UZlHxaqtvbuA89As=">AAAC4nicbVFNbxMxEPUuX2X5aAon4GKRIpVLtFuJjwtSpQZEJGiL1LSVslHk9c5urNrele0tRJZvnLghrvwx/gK/Am8SopIykqXnmTdvPM9ZzZk2cfwrCK9dv3Hz1sbt6M7de/c3O1sPTnTVKApDWvFKnWVEA2cShoYZDme1AiIyDqfZ+X5bP70ApVklj82shrEgpWQFo8T41KTzNc2gZNIKJllNSnD2BRUuVaQsIVesnJpoyaAgDSgXfTzsv/2ABwf94f7x4PAg2t6O0ikxdurwG5wa+GLsQOZuJxXETCnhtu+ez0kg85XI/LKaOel04148D3wVJEvQRcs4mmwF79K8oo3wcpQTrUdJXJuxJcowysHrNxpqQs+9+shDSQTosZ3b5fAzn8lxUSl/pMHz7OUOS4TWM5F5ZruDXq+1yf/VRo0pXo8tk3VjQNLFoKLh2FS49R7nTAE1fOYBoYr5t2I6JYpQ74lXUiDhM62EIN6ctCCC8VkOBWm4cTbVxV8cpZd5GowbJWO78rubOLcudgF0Qcoqnre7VQuedz5Z9/kqONntJXEv+bTb3Xu5/IMN9AQ9RTsoQa/QHnqPjtAQUfQ76ASPgsdhHn4Lv4c/FtQwWPY8RP9E+PMPuLLpvw==</latexit><latexit sha1_base64="BRriWujX9D0UZlHxaqtvbuA89As=">AAAC4nicbVFNbxMxEPUuX2X5aAon4GKRIpVLtFuJjwtSpQZEJGiL1LSVslHk9c5urNrele0tRJZvnLghrvwx/gK/Am8SopIykqXnmTdvPM9ZzZk2cfwrCK9dv3Hz1sbt6M7de/c3O1sPTnTVKApDWvFKnWVEA2cShoYZDme1AiIyDqfZ+X5bP70ApVklj82shrEgpWQFo8T41KTzNc2gZNIKJllNSnD2BRUuVaQsIVesnJpoyaAgDSgXfTzsv/2ABwf94f7x4PAg2t6O0ikxdurwG5wa+GLsQOZuJxXETCnhtu+ez0kg85XI/LKaOel04148D3wVJEvQRcs4mmwF79K8oo3wcpQTrUdJXJuxJcowysHrNxpqQs+9+shDSQTosZ3b5fAzn8lxUSl/pMHz7OUOS4TWM5F5ZruDXq+1yf/VRo0pXo8tk3VjQNLFoKLh2FS49R7nTAE1fOYBoYr5t2I6JYpQ74lXUiDhM62EIN6ctCCC8VkOBWm4cTbVxV8cpZd5GowbJWO78rubOLcudgF0Qcoqnre7VQuedz5Z9/kqONntJXEv+bTb3Xu5/IMN9AQ9RTsoQa/QHnqPjtAQUfQ76ASPgsdhHn4Lv4c/FtQwWPY8RP9E+PMPuLLpvw==</latexit><latexit sha1_base64="BRriWujX9D0UZlHxaqtvbuA89As=">AAAC4nicbVFNbxMxEPUuX2X5aAon4GKRIpVLtFuJjwtSpQZEJGiL1LSVslHk9c5urNrele0tRJZvnLghrvwx/gK/Am8SopIykqXnmTdvPM9ZzZk2cfwrCK9dv3Hz1sbt6M7de/c3O1sPTnTVKApDWvFKnWVEA2cShoYZDme1AiIyDqfZ+X5bP70ApVklj82shrEgpWQFo8T41KTzNc2gZNIKJllNSnD2BRUuVaQsIVesnJpoyaAgDSgXfTzsv/2ABwf94f7x4PAg2t6O0ikxdurwG5wa+GLsQOZuJxXETCnhtu+ez0kg85XI/LKaOel04148D3wVJEvQRcs4mmwF79K8oo3wcpQTrUdJXJuxJcowysHrNxpqQs+9+shDSQTosZ3b5fAzn8lxUSl/pMHz7OUOS4TWM5F5ZruDXq+1yf/VRo0pXo8tk3VjQNLFoKLh2FS49R7nTAE1fOYBoYr5t2I6JYpQ74lXUiDhM62EIN6ctCCC8VkOBWm4cTbVxV8cpZd5GowbJWO78rubOLcudgF0Qcoqnre7VQuedz5Z9/kqONntJXEv+bTb3Xu5/IMN9AQ9RTsoQa/QHnqPjtAQUfQ76ASPgsdhHn4Lv4c/FtQwWPY8RP9E+PMPuLLpvw==</latexit>
MODEL h : X ! Y<latexit sha1_base64="CSaYXuCuk+VLnieNnmKyOKAQv5w=">AAAC43icbVHLihQxFE2Vr7F8devO2QR7BBfSVA34wNWAD1wojmDPtHSaJpW6VRUmjyJJzdAUWbpyJ279ML/BnzD9sB17vBA4Offck+QkbwS3Lk1/RvGly1euXtu5nty4eev2nV7/7pHVrWEwYlpoM86pBcEVjBx3AsaNASpzAcf5yctF//gUjOVafXLzBqaSVoqXnFEXqFnvC8mh4qqTXPGGVuC7J0x6YmhVQWF4VbtkrWCgHBifvP/w6vU7vEdq6rrav8DkMSaSuppR0Y09Jsshaow++8t/9ntJQkAVG5flZnPorDdIh+my8EWQrcEAretw1o/ekEKzVgY7Jqi1kyxt3LSjxnEmIPi3FhrKToL7JEBFJdhpt8zL44eBKXCpTVjK4SV7fqKj0tq5zINy8QK73VuQ/+tNWlc+n3ZcNa0DxVYHla3ATuNF+LjgBpgT8wAoMzzcFbOaGspCJsHJgIIzpqWkIRxSUsnFvICStsL5jtjyD07IeZ0F5yfZtNukPci83zY7BbYS5VoUi7fplS4kn23nfBEc7Q+zdJh93B8cPF3/wQ7aRQ/QI5ShZ+gAvUWHaIQY+hX1o/vRbgzx1/hb/H0ljaP1zD30T8U/fgNhkOxA</latexit><latexit sha1_base64="CSaYXuCuk+VLnieNnmKyOKAQv5w=">AAAC43icbVHLihQxFE2Vr7F8devO2QR7BBfSVA34wNWAD1wojmDPtHSaJpW6VRUmjyJJzdAUWbpyJ279ML/BnzD9sB17vBA4Offck+QkbwS3Lk1/RvGly1euXtu5nty4eev2nV7/7pHVrWEwYlpoM86pBcEVjBx3AsaNASpzAcf5yctF//gUjOVafXLzBqaSVoqXnFEXqFnvC8mh4qqTXPGGVuC7J0x6YmhVQWF4VbtkrWCgHBifvP/w6vU7vEdq6rrav8DkMSaSuppR0Y09Jsshaow++8t/9ntJQkAVG5flZnPorDdIh+my8EWQrcEAretw1o/ekEKzVgY7Jqi1kyxt3LSjxnEmIPi3FhrKToL7JEBFJdhpt8zL44eBKXCpTVjK4SV7fqKj0tq5zINy8QK73VuQ/+tNWlc+n3ZcNa0DxVYHla3ATuNF+LjgBpgT8wAoMzzcFbOaGspCJsHJgIIzpqWkIRxSUsnFvICStsL5jtjyD07IeZ0F5yfZtNukPci83zY7BbYS5VoUi7fplS4kn23nfBEc7Q+zdJh93B8cPF3/wQ7aRQ/QI5ShZ+gAvUWHaIQY+hX1o/vRbgzx1/hb/H0ljaP1zD30T8U/fgNhkOxA</latexit><latexit sha1_base64="CSaYXuCuk+VLnieNnmKyOKAQv5w=">AAAC43icbVHLihQxFE2Vr7F8devO2QR7BBfSVA34wNWAD1wojmDPtHSaJpW6VRUmjyJJzdAUWbpyJ279ML/BnzD9sB17vBA4Offck+QkbwS3Lk1/RvGly1euXtu5nty4eev2nV7/7pHVrWEwYlpoM86pBcEVjBx3AsaNASpzAcf5yctF//gUjOVafXLzBqaSVoqXnFEXqFnvC8mh4qqTXPGGVuC7J0x6YmhVQWF4VbtkrWCgHBifvP/w6vU7vEdq6rrav8DkMSaSuppR0Y09Jsshaow++8t/9ntJQkAVG5flZnPorDdIh+my8EWQrcEAretw1o/ekEKzVgY7Jqi1kyxt3LSjxnEmIPi3FhrKToL7JEBFJdhpt8zL44eBKXCpTVjK4SV7fqKj0tq5zINy8QK73VuQ/+tNWlc+n3ZcNa0DxVYHla3ATuNF+LjgBpgT8wAoMzzcFbOaGspCJsHJgIIzpqWkIRxSUsnFvICStsL5jtjyD07IeZ0F5yfZtNukPci83zY7BbYS5VoUi7fplS4kn23nfBEc7Q+zdJh93B8cPF3/wQ7aRQ/QI5ShZ+gAvUWHaIQY+hX1o/vRbgzx1/hb/H0ljaP1zD30T8U/fgNhkOxA</latexit><latexit sha1_base64="CSaYXuCuk+VLnieNnmKyOKAQv5w=">AAAC43icbVHLihQxFE2Vr7F8devO2QR7BBfSVA34wNWAD1wojmDPtHSaJpW6VRUmjyJJzdAUWbpyJ279ML/BnzD9sB17vBA4Offck+QkbwS3Lk1/RvGly1euXtu5nty4eev2nV7/7pHVrWEwYlpoM86pBcEVjBx3AsaNASpzAcf5yctF//gUjOVafXLzBqaSVoqXnFEXqFnvC8mh4qqTXPGGVuC7J0x6YmhVQWF4VbtkrWCgHBifvP/w6vU7vEdq6rrav8DkMSaSuppR0Y09Jsshaow++8t/9ntJQkAVG5flZnPorDdIh+my8EWQrcEAretw1o/ekEKzVgY7Jqi1kyxt3LSjxnEmIPi3FhrKToL7JEBFJdhpt8zL44eBKXCpTVjK4SV7fqKj0tq5zINy8QK73VuQ/+tNWlc+n3ZcNa0DxVYHla3ATuNF+LjgBpgT8wAoMzzcFbOaGspCJsHJgIIzpqWkIRxSUsnFvICStsL5jtjyD07IeZ0F5yfZtNukPci83zY7BbYS5VoUi7fplS4kn23nfBEc7Q+zdJh93B8cPF3/wQ7aRQ/QI5ShZ+gAvUWHaIQY+hX1o/vRbgzx1/hb/H0ljaP1zD30T8U/fgNhkOxA</latexit>
predictions y = h(x)<latexit sha1_base64="TEofOTdP9l+ZGdXmnYO+LS6Soow=">AAAC3HicbVFNb9QwEHXCVwkf3VJuXCy2SOWySirxcUGqhIQ4FoltK23CynEmiVXbiWynsLLMiRviyn/jzg/ByYZq2TKS5eeZN29Gz3nLmTZx/CsIb9y8dfvOzt3o3v0HD3cne49OddMpCnPa8Ead50QDZxLmhhkO560CInIOZ/nF275+dglKs0Z+NKsWMkEqyUpGifGp5eRrmkPFpBVMspZU4OwLKlyqSFVBoVhVm2hkUJAGlIu8fMFo363xQVoTY1cOv8EDqh0+TPOGF3ol/GW/uOcHUZSCLK7ah8fVtOVkGs/iIfB1kIxgisY4We4F79KioZ3wcpQTrRdJ3JrMEmUY5eD1Ow0toRdefeGhJAJ0ZgejHH7mMwUuG+WPNHjIbnZYInS/umcKYmq9XeuT/6stOlO+ziyTbWdA0vWgsuPYNLh3HRdMATV85QGhivldMa2JItR74pUUSPhMGyGINyctiWB8VUBJOm6cTXX5F0fpJk+DcYsks2m/DyXcThPntsUuga5JG9/S87zzybbP18Hp0SyJZ8mHo+nxy/EPdtAT9BQdogS9QsfoPTpBc0TR7yAK9oPH4afwW/g9/LGmhsHYs4/+ifDnH8mm6jQ=</latexit><latexit sha1_base64="TEofOTdP9l+ZGdXmnYO+LS6Soow=">AAAC3HicbVFNb9QwEHXCVwkf3VJuXCy2SOWySirxcUGqhIQ4FoltK23CynEmiVXbiWynsLLMiRviyn/jzg/ByYZq2TKS5eeZN29Gz3nLmTZx/CsIb9y8dfvOzt3o3v0HD3cne49OddMpCnPa8Ead50QDZxLmhhkO560CInIOZ/nF275+dglKs0Z+NKsWMkEqyUpGifGp5eRrmkPFpBVMspZU4OwLKlyqSFVBoVhVm2hkUJAGlIu8fMFo363xQVoTY1cOv8EDqh0+TPOGF3ol/GW/uOcHUZSCLK7ah8fVtOVkGs/iIfB1kIxgisY4We4F79KioZ3wcpQTrRdJ3JrMEmUY5eD1Ow0toRdefeGhJAJ0ZgejHH7mMwUuG+WPNHjIbnZYInS/umcKYmq9XeuT/6stOlO+ziyTbWdA0vWgsuPYNLh3HRdMATV85QGhivldMa2JItR74pUUSPhMGyGINyctiWB8VUBJOm6cTXX5F0fpJk+DcYsks2m/DyXcThPntsUuga5JG9/S87zzybbP18Hp0SyJZ8mHo+nxy/EPdtAT9BQdogS9QsfoPTpBc0TR7yAK9oPH4afwW/g9/LGmhsHYs4/+ifDnH8mm6jQ=</latexit><latexit sha1_base64="TEofOTdP9l+ZGdXmnYO+LS6Soow=">AAAC3HicbVFNb9QwEHXCVwkf3VJuXCy2SOWySirxcUGqhIQ4FoltK23CynEmiVXbiWynsLLMiRviyn/jzg/ByYZq2TKS5eeZN29Gz3nLmTZx/CsIb9y8dfvOzt3o3v0HD3cne49OddMpCnPa8Ead50QDZxLmhhkO560CInIOZ/nF275+dglKs0Z+NKsWMkEqyUpGifGp5eRrmkPFpBVMspZU4OwLKlyqSFVBoVhVm2hkUJAGlIu8fMFo363xQVoTY1cOv8EDqh0+TPOGF3ol/GW/uOcHUZSCLK7ah8fVtOVkGs/iIfB1kIxgisY4We4F79KioZ3wcpQTrRdJ3JrMEmUY5eD1Ow0toRdefeGhJAJ0ZgejHH7mMwUuG+WPNHjIbnZYInS/umcKYmq9XeuT/6stOlO+ziyTbWdA0vWgsuPYNLh3HRdMATV85QGhivldMa2JItR74pUUSPhMGyGINyctiWB8VUBJOm6cTXX5F0fpJk+DcYsks2m/DyXcThPntsUuga5JG9/S87zzybbP18Hp0SyJZ8mHo+nxy/EPdtAT9BQdogS9QsfoPTpBc0TR7yAK9oPH4afwW/g9/LGmhsHYs4/+ifDnH8mm6jQ=</latexit><latexit sha1_base64="TEofOTdP9l+ZGdXmnYO+LS6Soow=">AAAC3HicbVFNb9QwEHXCVwkf3VJuXCy2SOWySirxcUGqhIQ4FoltK23CynEmiVXbiWynsLLMiRviyn/jzg/ByYZq2TKS5eeZN29Gz3nLmTZx/CsIb9y8dfvOzt3o3v0HD3cne49OddMpCnPa8Ead50QDZxLmhhkO560CInIOZ/nF275+dglKs0Z+NKsWMkEqyUpGifGp5eRrmkPFpBVMspZU4OwLKlyqSFVBoVhVm2hkUJAGlIu8fMFo363xQVoTY1cOv8EDqh0+TPOGF3ol/GW/uOcHUZSCLK7ah8fVtOVkGs/iIfB1kIxgisY4We4F79KioZ3wcpQTrRdJ3JrMEmUY5eD1Ow0toRdefeGhJAJ0ZgejHH7mMwUuG+WPNHjIbnZYInS/umcKYmq9XeuT/6stOlO+ziyTbWdA0vWgsuPYNLh3HRdMATV85QGhivldMa2JItR74pUUSPhMGyGINyctiWB8VUBJOm6cTXX5F0fpJk+DcYsks2m/DyXcThPntsUuga5JG9/S87zzybbP18Hp0SyJZ8mHo+nxy/EPdtAT9BQdogS9QsfoPTpBc0TR7yAK9oPH4afwW/g9/LGmhsHYs4/+ifDnH8mm6jQ=</latexit>
Figure 3: The basic setting of supervised learning: A hypothesis h ∈ H is induced fromthe training data D and used to produce predictions for new query instances x ∈ X .
favored by the learner will normally not coincide with the true risk minimizer
h∗ ..= argminh∈H
R(h) . (5)
Correspondingly, there remains uncertainty regarding h∗ as well as the approximation
quality of h (in the sense of its proximity to h∗) and its true risk R(h).
Eventually, one is often interested in predictive uncertainty , i.e., the uncertainty related
to the prediction yq for a concrete query instance xq ∈ X . In other words, given a par-
tial observation (xq, ·), we are wondering what can be said about the missing outcome,
especially about the uncertainty related to a prediction of that outcome. Indeed, esti-
mating and quantifying uncertainty in a transductive way, in the sense of tailoring it for
individual instances, is arguably important and practically more relevant than a kind of
average accuracy or confidence, which is often reported in machine learning. In medical
diagnosis, for example, a patient will be interested in the reliability of a test result in her
particular case, not in the reliability of the test on average. This view is also expressed, for
example, by Kull and Flach (2014): “Being able to assess the reliability of a probability
score for each instance is much more powerful than assigning an aggregate reliability score
[...] independent of the instance to be classified.”
Emphasizing the transductive nature of the learning process, the learning task could also
be formalized as a problem of predictive inference as follows: Given a set (or sequence)
of data points (X1, Y1), . . . , (XN , YN ) and a query point XN+1, what is the associated
outcome YN+1? Here, the data points are considered as (realizations of) random variables2,
which are commonly assumed to be independent and identically distributed (i.i.d.). A
prediction could be given in the form of a point prediction YN+1 ∈ Y, but also (and
perhaps preferably) in the form of a predictive set C(XN+1) ⊆ Y that is likely to cover the
true outcome, for example an interval in the case of regression (cf. Sections 4.8 and 4.9).
Regarding the aspect of uncertainty, different properties of C(XN+1) might then be of
interest. For example, coming back to the discussion from the previous paragraph, a basic
distinction can be made between a statistical guarantee for marginal coverage, namely
P (YN+1 ∈ C(XN+1)) ≥ 1 − δ for a (small) threshold δ > 0, and conditional coverage,
2whence they are capitalized here
6
bh<latexit sha1_base64="NvjEF4vZyFe18UtinOj3PPc2wT4=">AAACaXicbVFNSxxBEO2dmGg2X25yCeYyOAZyWmY8RI+CEHI0kFVhZ5Canhq3sT/G7hplaeZ3eDU/K78hfyI9uxvQNQUNj/deVVe/LhspHKXp70H0bOP5i82tl8NXr9+8fbc9en/qTGs5TriRxp6X4FAKjRMSJPG8sQiqlHhWXh33+tkNWieM/knzBgsFl1rUggMFqtjLb0WFMyA/6/YutpN0nC4qfgqyFUjYqk4uRoNveWV4q1ATl+DcNEsbKjxYElxiN8xbhw3wK7jEaYAaFLrCL7bu4s+BqeLa2HA0xQv2YYcH5dxclcGpgGZuXevJ/2nTlurDwgvdtISaLy+qWxmTifsI4kpY5CTnAQC3Iuwa8xlY4BSCGuYWNd5yoxToyuc1KCHnFdbQSup87up/eJg/9DmkbpoVPu/34SB9knXd+rAb5EtTaWTVv80sfSH5bD3np+B0f5yl4+zHfnL0dfUHW+wT22VfWMYO2BH7zk7YhHF2ze7YPfs1+BONoo/RztIaDVY9H9ijipK/Zmi+Dg==</latexit><latexit sha1_base64="NvjEF4vZyFe18UtinOj3PPc2wT4=">AAACaXicbVFNSxxBEO2dmGg2X25yCeYyOAZyWmY8RI+CEHI0kFVhZ5Canhq3sT/G7hplaeZ3eDU/K78hfyI9uxvQNQUNj/deVVe/LhspHKXp70H0bOP5i82tl8NXr9+8fbc9en/qTGs5TriRxp6X4FAKjRMSJPG8sQiqlHhWXh33+tkNWieM/knzBgsFl1rUggMFqtjLb0WFMyA/6/YutpN0nC4qfgqyFUjYqk4uRoNveWV4q1ATl+DcNEsbKjxYElxiN8xbhw3wK7jEaYAaFLrCL7bu4s+BqeLa2HA0xQv2YYcH5dxclcGpgGZuXevJ/2nTlurDwgvdtISaLy+qWxmTifsI4kpY5CTnAQC3Iuwa8xlY4BSCGuYWNd5yoxToyuc1KCHnFdbQSup87up/eJg/9DmkbpoVPu/34SB9knXd+rAb5EtTaWTVv80sfSH5bD3np+B0f5yl4+zHfnL0dfUHW+wT22VfWMYO2BH7zk7YhHF2ze7YPfs1+BONoo/RztIaDVY9H9ijipK/Zmi+Dg==</latexit><latexit sha1_base64="NvjEF4vZyFe18UtinOj3PPc2wT4=">AAACaXicbVFNSxxBEO2dmGg2X25yCeYyOAZyWmY8RI+CEHI0kFVhZ5Canhq3sT/G7hplaeZ3eDU/K78hfyI9uxvQNQUNj/deVVe/LhspHKXp70H0bOP5i82tl8NXr9+8fbc9en/qTGs5TriRxp6X4FAKjRMSJPG8sQiqlHhWXh33+tkNWieM/knzBgsFl1rUggMFqtjLb0WFMyA/6/YutpN0nC4qfgqyFUjYqk4uRoNveWV4q1ATl+DcNEsbKjxYElxiN8xbhw3wK7jEaYAaFLrCL7bu4s+BqeLa2HA0xQv2YYcH5dxclcGpgGZuXevJ/2nTlurDwgvdtISaLy+qWxmTifsI4kpY5CTnAQC3Iuwa8xlY4BSCGuYWNd5yoxToyuc1KCHnFdbQSup87up/eJg/9DmkbpoVPu/34SB9knXd+rAb5EtTaWTVv80sfSH5bD3np+B0f5yl4+zHfnL0dfUHW+wT22VfWMYO2BH7zk7YhHF2ze7YPfs1+BONoo/RztIaDVY9H9ijipK/Zmi+Dg==</latexit><latexit sha1_base64="NvjEF4vZyFe18UtinOj3PPc2wT4=">AAACaXicbVFNSxxBEO2dmGg2X25yCeYyOAZyWmY8RI+CEHI0kFVhZ5Canhq3sT/G7hplaeZ3eDU/K78hfyI9uxvQNQUNj/deVVe/LhspHKXp70H0bOP5i82tl8NXr9+8fbc9en/qTGs5TriRxp6X4FAKjRMSJPG8sQiqlHhWXh33+tkNWieM/knzBgsFl1rUggMFqtjLb0WFMyA/6/YutpN0nC4qfgqyFUjYqk4uRoNveWV4q1ATl+DcNEsbKjxYElxiN8xbhw3wK7jEaYAaFLrCL7bu4s+BqeLa2HA0xQv2YYcH5dxclcGpgGZuXevJ/2nTlurDwgvdtISaLy+qWxmTifsI4kpY5CTnAQC3Iuwa8xlY4BSCGuYWNd5yoxToyuc1KCHnFdbQSup87up/eJg/9DmkbpoVPu/34SB9knXd+rAb5EtTaWTVv80sfSH5bD3np+B0f5yl4+zHfnL0dfUHW+wT22VfWMYO2BH7zk7YhHF2ze7YPfs1+BONoo/RztIaDVY9H9ijipK/Zmi+Dg==</latexit>
h<latexit sha1_base64="6G/YIfvWoxxQ53iSNAcKmSxNV1Y=">AAACYXicbVFNS+RAEO2J69fo6qhHL2HHhWUPQ+LB3aMgLB4VdlSYRKl0Kk5jf4TuijKE/Aav60/z7B+xMzMLOlrQ8HjvVfer6qyUwlEUPXeCpS/LK6tr692Nza9b272d3QtnKstxyI009ioDh1JoHJIgiVelRVCZxMvs7qTVL+/ROmH0X5qUmCq41aIQHMhTw4Px9c+Dm14/GkTTCj+CeA76bF5nNzudP0lueKVQE5fg3CiOSkprsCS4xKabVA5L4HdwiyMPNSh0aT1N24TfPZOHhbH+aAqn7NuOGpRzE5V5pwIau0WtJT/TRhUVv9Na6LIi1Hz2UFHJkEzYjh7mwiInOfEAuBU+a8jHYIGTX1A3sajxgRulQOd1UoAScpJjAZWkpk5c8R93k7c+h9SM4rRO2jwcZN2Pm2bxsnvkM1NmZN7OZmY+v/l4cc8fwcXhII4G8flh//ho/gdrbJ99Yz9YzH6xY3bKztiQcSbYI/vHnjovwXrQC3Zn1qAz79lj7yrYfwXWbLoM</latexit><latexit sha1_base64="6G/YIfvWoxxQ53iSNAcKmSxNV1Y=">AAACYXicbVFNS+RAEO2J69fo6qhHL2HHhWUPQ+LB3aMgLB4VdlSYRKl0Kk5jf4TuijKE/Aav60/z7B+xMzMLOlrQ8HjvVfer6qyUwlEUPXeCpS/LK6tr692Nza9b272d3QtnKstxyI009ioDh1JoHJIgiVelRVCZxMvs7qTVL+/ROmH0X5qUmCq41aIQHMhTw4Px9c+Dm14/GkTTCj+CeA76bF5nNzudP0lueKVQE5fg3CiOSkprsCS4xKabVA5L4HdwiyMPNSh0aT1N24TfPZOHhbH+aAqn7NuOGpRzE5V5pwIau0WtJT/TRhUVv9Na6LIi1Hz2UFHJkEzYjh7mwiInOfEAuBU+a8jHYIGTX1A3sajxgRulQOd1UoAScpJjAZWkpk5c8R93k7c+h9SM4rRO2jwcZN2Pm2bxsnvkM1NmZN7OZmY+v/l4cc8fwcXhII4G8flh//ho/gdrbJ99Yz9YzH6xY3bKztiQcSbYI/vHnjovwXrQC3Zn1qAz79lj7yrYfwXWbLoM</latexit><latexit sha1_base64="6G/YIfvWoxxQ53iSNAcKmSxNV1Y=">AAACYXicbVFNS+RAEO2J69fo6qhHL2HHhWUPQ+LB3aMgLB4VdlSYRKl0Kk5jf4TuijKE/Aav60/z7B+xMzMLOlrQ8HjvVfer6qyUwlEUPXeCpS/LK6tr692Nza9b272d3QtnKstxyI009ioDh1JoHJIgiVelRVCZxMvs7qTVL+/ROmH0X5qUmCq41aIQHMhTw4Px9c+Dm14/GkTTCj+CeA76bF5nNzudP0lueKVQE5fg3CiOSkprsCS4xKabVA5L4HdwiyMPNSh0aT1N24TfPZOHhbH+aAqn7NuOGpRzE5V5pwIau0WtJT/TRhUVv9Na6LIi1Hz2UFHJkEzYjh7mwiInOfEAuBU+a8jHYIGTX1A3sajxgRulQOd1UoAScpJjAZWkpk5c8R93k7c+h9SM4rRO2jwcZN2Pm2bxsnvkM1NmZN7OZmY+v/l4cc8fwcXhII4G8flh//ho/gdrbJ99Yz9YzH6xY3bKztiQcSbYI/vHnjovwXrQC3Zn1qAz79lj7yrYfwXWbLoM</latexit><latexit sha1_base64="6G/YIfvWoxxQ53iSNAcKmSxNV1Y=">AAACYXicbVFNS+RAEO2J69fo6qhHL2HHhWUPQ+LB3aMgLB4VdlSYRKl0Kk5jf4TuijKE/Aav60/z7B+xMzMLOlrQ8HjvVfer6qyUwlEUPXeCpS/LK6tr692Nza9b272d3QtnKstxyI009ioDh1JoHJIgiVelRVCZxMvs7qTVL+/ROmH0X5qUmCq41aIQHMhTw4Px9c+Dm14/GkTTCj+CeA76bF5nNzudP0lueKVQE5fg3CiOSkprsCS4xKabVA5L4HdwiyMPNSh0aT1N24TfPZOHhbH+aAqn7NuOGpRzE5V5pwIau0WtJT/TRhUVv9Na6LIi1Hz2UFHJkEzYjh7mwiInOfEAuBU+a8jHYIGTX1A3sajxgRulQOd1UoAScpJjAZWkpk5c8R93k7c+h9SM4rRO2jwcZN2Pm2bxsnvkM1NmZN7OZmY+v/l4cc8fwcXhII4G8flh//ho/gdrbJ99Yz9YzH6xY3bKztiQcSbYI/vHnjovwXrQC3Zn1qAz79lj7yrYfwXWbLoM</latexit>
f<latexit sha1_base64="fZtAc4yOl0+dn6ow15Z/9VnZ/Jw=">AAACYXicbVFNT9tAEN2YUkKANqFHLlZDpYpDZHNoe0RCQhyp1ABSbNB4PYYV+2HtjkGR5d/AFX4aZ/4I6ySVIDDSSk/vvdl9M5uVUjiKoqdOsPJp9fNad723sbn15Wt/sH3qTGU5jrmRxp5n4FAKjWMSJPG8tAgqk3iW3Ry2+tktWieM/kfTElMFV1oUggN5arxbXOztXvaH0SiaVfgexAswZIs6uRx0jpLc8EqhJi7BuUkclZTWYElwiU0vqRyWwG/gCicealDo0nqWtgl/eCYPC2P90RTO2NcdNSjnpirzTgV07Za1lvxIm1RU/ElrocuKUPP5Q0UlQzJhO3qYC4uc5NQD4Fb4rCG/Bguc/IJ6iUWNd9woBTqvkwKUkNMcC6gkNXXiiv+4l7z2OaRmEqd10ubhIOth3DTLl90in5syI/N2NjP3+c3Hy3t+D073R3E0iv/uDw9+Lf6gy3bYd/aTxew3O2DH7ISNGWeC3bMH9th5DtaDfrA9twadRc839qaCnRfSdLoK</latexit><latexit sha1_base64="fZtAc4yOl0+dn6ow15Z/9VnZ/Jw=">AAACYXicbVFNT9tAEN2YUkKANqFHLlZDpYpDZHNoe0RCQhyp1ABSbNB4PYYV+2HtjkGR5d/AFX4aZ/4I6ySVIDDSSk/vvdl9M5uVUjiKoqdOsPJp9fNad723sbn15Wt/sH3qTGU5jrmRxp5n4FAKjWMSJPG8tAgqk3iW3Ry2+tktWieM/kfTElMFV1oUggN5arxbXOztXvaH0SiaVfgexAswZIs6uRx0jpLc8EqhJi7BuUkclZTWYElwiU0vqRyWwG/gCicealDo0nqWtgl/eCYPC2P90RTO2NcdNSjnpirzTgV07Za1lvxIm1RU/ElrocuKUPP5Q0UlQzJhO3qYC4uc5NQD4Fb4rCG/Bguc/IJ6iUWNd9woBTqvkwKUkNMcC6gkNXXiiv+4l7z2OaRmEqd10ubhIOth3DTLl90in5syI/N2NjP3+c3Hy3t+D073R3E0iv/uDw9+Lf6gy3bYd/aTxew3O2DH7ISNGWeC3bMH9th5DtaDfrA9twadRc839qaCnRfSdLoK</latexit><latexit sha1_base64="fZtAc4yOl0+dn6ow15Z/9VnZ/Jw=">AAACYXicbVFNT9tAEN2YUkKANqFHLlZDpYpDZHNoe0RCQhyp1ABSbNB4PYYV+2HtjkGR5d/AFX4aZ/4I6ySVIDDSSk/vvdl9M5uVUjiKoqdOsPJp9fNad723sbn15Wt/sH3qTGU5jrmRxp5n4FAKjWMSJPG8tAgqk3iW3Ry2+tktWieM/kfTElMFV1oUggN5arxbXOztXvaH0SiaVfgexAswZIs6uRx0jpLc8EqhJi7BuUkclZTWYElwiU0vqRyWwG/gCicealDo0nqWtgl/eCYPC2P90RTO2NcdNSjnpirzTgV07Za1lvxIm1RU/ElrocuKUPP5Q0UlQzJhO3qYC4uc5NQD4Fb4rCG/Bguc/IJ6iUWNd9woBTqvkwKUkNMcC6gkNXXiiv+4l7z2OaRmEqd10ubhIOth3DTLl90in5syI/N2NjP3+c3Hy3t+D073R3E0iv/uDw9+Lf6gy3bYd/aTxew3O2DH7ISNGWeC3bMH9th5DtaDfrA9twadRc839qaCnRfSdLoK</latexit><latexit sha1_base64="fZtAc4yOl0+dn6ow15Z/9VnZ/Jw=">AAACYXicbVFNT9tAEN2YUkKANqFHLlZDpYpDZHNoe0RCQhyp1ABSbNB4PYYV+2HtjkGR5d/AFX4aZ/4I6ySVIDDSSk/vvdl9M5uVUjiKoqdOsPJp9fNad723sbn15Wt/sH3qTGU5jrmRxp5n4FAKjWMSJPG8tAgqk3iW3Ry2+tktWieM/kfTElMFV1oUggN5arxbXOztXvaH0SiaVfgexAswZIs6uRx0jpLc8EqhJi7BuUkclZTWYElwiU0vqRyWwG/gCicealDo0nqWtgl/eCYPC2P90RTO2NcdNSjnpirzTgV07Za1lvxIm1RU/ElrocuKUPP5Q0UlQzJhO3qYC4uc5NQD4Fb4rCG/Bguc/IJ6iUWNd9woBTqvkwKUkNMcC6gkNXXiiv+4l7z2OaRmEqd10ubhIOth3DTLl90in5syI/N2NjP3+c3Hy3t+D073R3E0iv/uDw9+Lf6gy3bYd/aTxew3O2DH7ISNGWeC3bMH9th5DtaDfrA9twadRc839qaCnRfSdLoK</latexit>
approximationuncertainty
<latexit sha1_base64="1ojyprqjt/Yvwx9Crnn6d4ctf2k=">AAACZHicbVBNaxRBEO0dv+IYzcbgSZDGRfDiMrMezDGQi8cY3CSwPSw1PTWTJv1Fd486DPO/8le8eFT/g5f0blbUxIKGx3v1qrpeaaXwIcu+jpI7d+/df7D1MH20/fjJznj36Yk3reM450Yad1aCRyk0zoMIEs+sQ1ClxNPy4nCln35C54XRH0NnsVDQaFELDiFSy/ExK7ERuldCCwsNDv1brgbmoGmwklgHmoK1znwRau1gbPEmV6pIW83RBRA6dClDXf2ZsBxPsmm2Lnob5BswIZs6Wo5/scrwVqEOXIL3izyzoejBBcElDilrPVrgF3H4IkINCn3Rr28f6KvIVLQ2Lj4d6Jr929GD8r5TZeyMF5z7m9qK/J+2aEO9X/RC2zag5teL6lbSYOgqSFoJhzzILgLgTsS/Un4ODniIcafMocbP3CgFMRtWgxKyq7CGVoahZ77+jWNa+c1sboOT2TTPpvmH2eRgtsltizwnL8lrkpN35IC8J0dkTji5JN/ID/Jz9D3ZTvaSZ9etyWjj2SP/VPLiCt4xvfs=</latexit><latexit sha1_base64="1ojyprqjt/Yvwx9Crnn6d4ctf2k=">AAACZHicbVBNaxRBEO0dv+IYzcbgSZDGRfDiMrMezDGQi8cY3CSwPSw1PTWTJv1Fd486DPO/8le8eFT/g5f0blbUxIKGx3v1qrpeaaXwIcu+jpI7d+/df7D1MH20/fjJznj36Yk3reM450Yad1aCRyk0zoMIEs+sQ1ClxNPy4nCln35C54XRH0NnsVDQaFELDiFSy/ExK7ERuldCCwsNDv1brgbmoGmwklgHmoK1znwRau1gbPEmV6pIW83RBRA6dClDXf2ZsBxPsmm2Lnob5BswIZs6Wo5/scrwVqEOXIL3izyzoejBBcElDilrPVrgF3H4IkINCn3Rr28f6KvIVLQ2Lj4d6Jr929GD8r5TZeyMF5z7m9qK/J+2aEO9X/RC2zag5teL6lbSYOgqSFoJhzzILgLgTsS/Un4ODniIcafMocbP3CgFMRtWgxKyq7CGVoahZ77+jWNa+c1sboOT2TTPpvmH2eRgtsltizwnL8lrkpN35IC8J0dkTji5JN/ID/Jz9D3ZTvaSZ9etyWjj2SP/VPLiCt4xvfs=</latexit><latexit sha1_base64="1ojyprqjt/Yvwx9Crnn6d4ctf2k=">AAACZHicbVBNaxRBEO0dv+IYzcbgSZDGRfDiMrMezDGQi8cY3CSwPSw1PTWTJv1Fd486DPO/8le8eFT/g5f0blbUxIKGx3v1qrpeaaXwIcu+jpI7d+/df7D1MH20/fjJznj36Yk3reM450Yad1aCRyk0zoMIEs+sQ1ClxNPy4nCln35C54XRH0NnsVDQaFELDiFSy/ExK7ERuldCCwsNDv1brgbmoGmwklgHmoK1znwRau1gbPEmV6pIW83RBRA6dClDXf2ZsBxPsmm2Lnob5BswIZs6Wo5/scrwVqEOXIL3izyzoejBBcElDilrPVrgF3H4IkINCn3Rr28f6KvIVLQ2Lj4d6Jr929GD8r5TZeyMF5z7m9qK/J+2aEO9X/RC2zag5teL6lbSYOgqSFoJhzzILgLgTsS/Un4ODniIcafMocbP3CgFMRtWgxKyq7CGVoahZ77+jWNa+c1sboOT2TTPpvmH2eRgtsltizwnL8lrkpN35IC8J0dkTji5JN/ID/Jz9D3ZTvaSZ9etyWjj2SP/VPLiCt4xvfs=</latexit><latexit sha1_base64="1ojyprqjt/Yvwx9Crnn6d4ctf2k=">AAACZHicbVBNaxRBEO0dv+IYzcbgSZDGRfDiMrMezDGQi8cY3CSwPSw1PTWTJv1Fd486DPO/8le8eFT/g5f0blbUxIKGx3v1qrpeaaXwIcu+jpI7d+/df7D1MH20/fjJznj36Yk3reM450Yad1aCRyk0zoMIEs+sQ1ClxNPy4nCln35C54XRH0NnsVDQaFELDiFSy/ExK7ERuldCCwsNDv1brgbmoGmwklgHmoK1znwRau1gbPEmV6pIW83RBRA6dClDXf2ZsBxPsmm2Lnob5BswIZs6Wo5/scrwVqEOXIL3izyzoejBBcElDilrPVrgF3H4IkINCn3Rr28f6KvIVLQ2Lj4d6Jr929GD8r5TZeyMF5z7m9qK/J+2aEO9X/RC2zag5teL6lbSYOgqSFoJhzzILgLgTsS/Un4ODniIcafMocbP3CgFMRtWgxKyq7CGVoahZ77+jWNa+c1sboOT2TTPpvmH2eRgtsltizwnL8lrkpN35IC8J0dkTji5JN/ID/Jz9D3ZTvaSZ9etyWjj2SP/VPLiCt4xvfs=</latexit>
modeluncertainty
<latexit sha1_base64="yWfeF4XDIrVP8ZZyPKwlaHfb1qw=">AAACXnicbVDBbtQwFPSm0JZA6QIXJC4WKyQurJLlQI+VuPTAoUhsW2kdrV6cl6xV24nsF1AU5af4GnqD/gYXvNtFQMtIlkYzb57tyRutPCXJ1SjauXd/d2//Qfzw0cHjw/GTp2e+bp3Euax17S5y8KiVxTkp0njROASTazzPL9+v/fPP6Lyq7SfqGswMVFaVSgIFaTn+IHKslO2NsqqBCof+rTSDcFBVWDhVrYjHpi5QC8EXb1Jjsri1Eh2BstTFAm3xJ7scT5JpsgG/S9ItmbAtTpfjn6KoZWvQktTg/SJNGsp6cKSkxiEWrccG5GVYvgjUgkGf9ZtfD/xVUApe1i4cS3yj/p3owXjfmTxMGqCVv+2txf95i5bKo6xXtmkJrby5qGw1p5qvK+SFcihJd4GAdCq8lcsVOJAUio6FQ4tfZG0MhG5ECUbprsASWk1DL3z5m4e20tvd3CVns2maTNOPs8nxbNvbPnvBXrLXLGXv2DE7YadsziT7yr6xH+x69D3ajQ6iw5vRaLTNPGP/IHr+C1Iwutg=</latexit><latexit sha1_base64="yWfeF4XDIrVP8ZZyPKwlaHfb1qw=">AAACXnicbVDBbtQwFPSm0JZA6QIXJC4WKyQurJLlQI+VuPTAoUhsW2kdrV6cl6xV24nsF1AU5af4GnqD/gYXvNtFQMtIlkYzb57tyRutPCXJ1SjauXd/d2//Qfzw0cHjw/GTp2e+bp3Euax17S5y8KiVxTkp0njROASTazzPL9+v/fPP6Lyq7SfqGswMVFaVSgIFaTn+IHKslO2NsqqBCof+rTSDcFBVWDhVrYjHpi5QC8EXb1Jjsri1Eh2BstTFAm3xJ7scT5JpsgG/S9ItmbAtTpfjn6KoZWvQktTg/SJNGsp6cKSkxiEWrccG5GVYvgjUgkGf9ZtfD/xVUApe1i4cS3yj/p3owXjfmTxMGqCVv+2txf95i5bKo6xXtmkJrby5qGw1p5qvK+SFcihJd4GAdCq8lcsVOJAUio6FQ4tfZG0MhG5ECUbprsASWk1DL3z5m4e20tvd3CVns2maTNOPs8nxbNvbPnvBXrLXLGXv2DE7YadsziT7yr6xH+x69D3ajQ6iw5vRaLTNPGP/IHr+C1Iwutg=</latexit><latexit sha1_base64="yWfeF4XDIrVP8ZZyPKwlaHfb1qw=">AAACXnicbVDBbtQwFPSm0JZA6QIXJC4WKyQurJLlQI+VuPTAoUhsW2kdrV6cl6xV24nsF1AU5af4GnqD/gYXvNtFQMtIlkYzb57tyRutPCXJ1SjauXd/d2//Qfzw0cHjw/GTp2e+bp3Euax17S5y8KiVxTkp0njROASTazzPL9+v/fPP6Lyq7SfqGswMVFaVSgIFaTn+IHKslO2NsqqBCof+rTSDcFBVWDhVrYjHpi5QC8EXb1Jjsri1Eh2BstTFAm3xJ7scT5JpsgG/S9ItmbAtTpfjn6KoZWvQktTg/SJNGsp6cKSkxiEWrccG5GVYvgjUgkGf9ZtfD/xVUApe1i4cS3yj/p3owXjfmTxMGqCVv+2txf95i5bKo6xXtmkJrby5qGw1p5qvK+SFcihJd4GAdCq8lcsVOJAUio6FQ4tfZG0MhG5ECUbprsASWk1DL3z5m4e20tvd3CVns2maTNOPs8nxbNvbPnvBXrLXLGXv2DE7YadsziT7yr6xH+x69D3ajQ6iw5vRaLTNPGP/IHr+C1Iwutg=</latexit><latexit sha1_base64="yWfeF4XDIrVP8ZZyPKwlaHfb1qw=">AAACXnicbVDBbtQwFPSm0JZA6QIXJC4WKyQurJLlQI+VuPTAoUhsW2kdrV6cl6xV24nsF1AU5af4GnqD/gYXvNtFQMtIlkYzb57tyRutPCXJ1SjauXd/d2//Qfzw0cHjw/GTp2e+bp3Euax17S5y8KiVxTkp0njROASTazzPL9+v/fPP6Lyq7SfqGswMVFaVSgIFaTn+IHKslO2NsqqBCof+rTSDcFBVWDhVrYjHpi5QC8EXb1Jjsri1Eh2BstTFAm3xJ7scT5JpsgG/S9ItmbAtTpfjn6KoZWvQktTg/SJNGsp6cKSkxiEWrccG5GVYvgjUgkGf9ZtfD/xVUApe1i4cS3yj/p3owXjfmTxMGqCVv+2txf95i5bKo6xXtmkJrby5qGw1p5qvK+SFcihJd4GAdCq8lcsVOJAUio6FQ4tfZG0MhG5ECUbprsASWk1DL3z5m4e20tvd3CVns2maTNOPs8nxbNvbPnvBXrLXLGXv2DE7YadsziT7yr6xH+x69D3ajQ6iw5vRaLTNPGP/IHr+C1Iwutg=</latexit>
hypothesis spaceH F
<latexit sha1_base64="HglZSvK3gI/Bh20hbC4VxtcpJlE=">AAACYnicbVDLahRBFK3p+Ijta5IsdVE4EVwN3UHQZUCQLEdwksDUMNyuvj1TpB5N3Wqlafq3/BfBZeJPuLFmMhFNPFBwOPd16hS1VhSy7Psg2bl3/8HD3Ufp4ydPnz0f7u2fkmu8xKl02vnzAgi1sjgNKmg8rz2CKTSeFRcf1vWzL+hJOfs5tDXODSytqpSEEKXFcCIKXCrbSbQBfZ+u2tqFFZIiTjVIFCI9FAbCSoLuTnouqCkIA/+jfewPU4G2vNmwGI6ycbYBv0vyLRmxLSaL4S9ROtmYOC41EM3yrA7zDnxQUmOfioYwOrmAJc4itWCQ5t3m5z1/HZWSV87HZwPfqH9PdGCIWlPEzrVhul1bi/+rzZpQvZ93ytZNQCuvD1WN5sHxdYy8VB5l0G0kIL2KXrlcgQcZM4ibPFr8Kp0xEJMRFRil2xIraHToO0HVDY9p5bezuUtOj8Z5Ns4/HY2O325z22Uv2Cv2huXsHTtmJ2zCpkyyb+wHu2I/B5dJmuwlB9etyWA7c8D+QfLyN9NTvHo=</latexit><latexit sha1_base64="HglZSvK3gI/Bh20hbC4VxtcpJlE=">AAACYnicbVDLahRBFK3p+Ijta5IsdVE4EVwN3UHQZUCQLEdwksDUMNyuvj1TpB5N3Wqlafq3/BfBZeJPuLFmMhFNPFBwOPd16hS1VhSy7Psg2bl3/8HD3Ufp4ydPnz0f7u2fkmu8xKl02vnzAgi1sjgNKmg8rz2CKTSeFRcf1vWzL+hJOfs5tDXODSytqpSEEKXFcCIKXCrbSbQBfZ+u2tqFFZIiTjVIFCI9FAbCSoLuTnouqCkIA/+jfewPU4G2vNmwGI6ycbYBv0vyLRmxLSaL4S9ROtmYOC41EM3yrA7zDnxQUmOfioYwOrmAJc4itWCQ5t3m5z1/HZWSV87HZwPfqH9PdGCIWlPEzrVhul1bi/+rzZpQvZ93ytZNQCuvD1WN5sHxdYy8VB5l0G0kIL2KXrlcgQcZM4ibPFr8Kp0xEJMRFRil2xIraHToO0HVDY9p5bezuUtOj8Z5Ns4/HY2O325z22Uv2Cv2huXsHTtmJ2zCpkyyb+wHu2I/B5dJmuwlB9etyWA7c8D+QfLyN9NTvHo=</latexit><latexit sha1_base64="HglZSvK3gI/Bh20hbC4VxtcpJlE=">AAACYnicbVDLahRBFK3p+Ijta5IsdVE4EVwN3UHQZUCQLEdwksDUMNyuvj1TpB5N3Wqlafq3/BfBZeJPuLFmMhFNPFBwOPd16hS1VhSy7Psg2bl3/8HD3Ufp4ydPnz0f7u2fkmu8xKl02vnzAgi1sjgNKmg8rz2CKTSeFRcf1vWzL+hJOfs5tDXODSytqpSEEKXFcCIKXCrbSbQBfZ+u2tqFFZIiTjVIFCI9FAbCSoLuTnouqCkIA/+jfewPU4G2vNmwGI6ycbYBv0vyLRmxLSaL4S9ROtmYOC41EM3yrA7zDnxQUmOfioYwOrmAJc4itWCQ5t3m5z1/HZWSV87HZwPfqH9PdGCIWlPEzrVhul1bi/+rzZpQvZ93ytZNQCuvD1WN5sHxdYy8VB5l0G0kIL2KXrlcgQcZM4ibPFr8Kp0xEJMRFRil2xIraHToO0HVDY9p5bezuUtOj8Z5Ns4/HY2O325z22Uv2Cv2huXsHTtmJ2zCpkyyb+wHu2I/B5dJmuwlB9etyWA7c8D+QfLyN9NTvHo=</latexit><latexit sha1_base64="HglZSvK3gI/Bh20hbC4VxtcpJlE=">AAACYnicbVDLahRBFK3p+Ijta5IsdVE4EVwN3UHQZUCQLEdwksDUMNyuvj1TpB5N3Wqlafq3/BfBZeJPuLFmMhFNPFBwOPd16hS1VhSy7Psg2bl3/8HD3Ufp4ydPnz0f7u2fkmu8xKl02vnzAgi1sjgNKmg8rz2CKTSeFRcf1vWzL+hJOfs5tDXODSytqpSEEKXFcCIKXCrbSbQBfZ+u2tqFFZIiTjVIFCI9FAbCSoLuTnouqCkIA/+jfewPU4G2vNmwGI6ycbYBv0vyLRmxLSaL4S9ROtmYOC41EM3yrA7zDnxQUmOfioYwOrmAJc4itWCQ5t3m5z1/HZWSV87HZwPfqH9PdGCIWlPEzrVhul1bi/+rzZpQvZ93ytZNQCuvD1WN5sHxdYy8VB5l0G0kIL2KXrlcgQcZM4ibPFr8Kp0xEJMRFRil2xIraHToO0HVDY9p5bezuUtOj8Z5Ns4/HY2O325z22Uv2Cv2huXsHTtmJ2zCpkyyb+wHu2I/B5dJmuwlB9etyWA7c8D+QfLyN9NTvHo=</latexit>
point prediction probabilityground truth f∗(x) p(· |x)best possible h∗(x) p(· |x, h∗)
induced predictor h(x) p(· |x, h)
Figure 4: Different types of uncertainties related to different types of discrepancies andapproximation errors: f∗ is the pointwise Bayes predictor, h∗ is the best predictor withinthe hypothesis space, and h the predictor produced by the learning algorithm.
namely
P(YN+1 ∈ C(XN+1) |XN+1 = x
)≥ 1− δ for (almost) all x ∈ X .
Roughly speaking, in the case of marginal coverage, one averages over both XN+1 and
YN+1 (i.e., the probability P refers to a joint measure on X × Y), while in the case of
conditional coverage, XN+1 is fixed and the average in taken over YN+1 only (Barber
et al., 2020). Note that predictive inference as defined here does not necessarily require
the induction of a hypothesis h in the form of a (global) map X −→ Y, i.e., the solution
of an induction problem. While it is true that transductive inference can be realized via
inductive inference (i.e., by inducing a hypothesis h first and then producing a prediction
YN+1 = h(XN+1) by applying this hypothesis to the query XN+1), one should keep in
mind that induction is a more difficult problem than transduction (Vapnik, 1998).
2.2 Sources of uncertainty
As the prediction yq constitutes the end of a process that consists of different learning and
approximation steps, all errors and uncertainties related to these steps may also contribute
to the uncertainty about yq (cf. Fig. 4):
• Since the dependency between X and Y is typically non-deterministic, the descrip-
tion of a new prediction problem in the form of an instance xq gives rise to a condi-
tional probability distribution
p(y |xq) =p(xq, y)
p(xq)(6)
on Y, but it does normally not identify a single outcome y in a unique way. Thus,
even given full information in the form of the measure P (and its density p), un-
7
certainty about the actual outcome y remains. This uncertainty is of an aleatoric
nature. In some cases, the distribution (6) itself (called the predictive posterior dis-
tribution in Bayesian inference) might be delivered as a prediction. Yet, when being
forced to commit to point estimates, the best predictions (in the sense of minimiz-
ing the expected loss) are prescribed by the pointwise Bayes predictor f∗, which is
defined by
f∗(x) ..= argminy∈Y
∫
Y`(y, y) dP (y |x) (7)
for each x ∈ X .
• The Bayes predictor (5) does not necessarily coincide with the pointwise Bayes pre-
dictor (7). This discrepancy between h∗ and f∗ is connected to the uncertainty
regarding the right type of model to be fit, and hence the choice of the hypothesis
space H (which is part of what is called “background knowledge” in Fig. 3). We
shall refer to this uncertainty as model uncertainty3. Thus, due to this uncertainty,
one can not guarantee that h∗(x) = f∗(x), or, in case the hypothesis h∗ (e.g., a
probabilistic classifier) delivers probabilistic predictions p(y |x, h∗) instead of point
predictions, that p(· |x, h∗) = p(· |x).
• The hypothesis h produced by the learning algorithm, for example the empirical risk
minimizer (4), is only an estimate of h∗, and the quality of this estimate strongly
depends on the quality and the amount of training data. We shall refer to the discrep-
ancy between h and h∗, i.e., the uncertainty about how well the former approximates
the latter, as approximation uncertainty .
2.3 Reducible versus irreducible uncertainty
As already said, one way to characterize uncertainty as aleatoric or epistemic is to ask
whether or not the uncertainty can be reduced through additional information: Aleatoric
uncertainty refers to the irreducible part of the uncertainty, which is due to the non-
deterministic nature of the sought input/output dependency, that is, to the stochastic
dependency between instances x and outcomes y, as expressed by the conditional prob-
ability (6). Model uncertainty and approximation uncertainty, on the other hand, are
subsumed under the notion of epistemic uncertainty, that is, uncertainty due to a lack of
knowledge about the perfect predictor (7), for example caused by uncertainty about the
parameters of a model. In principle, this uncertainty can be reduced.
This characterization, while evident at first sight, may appear somewhat blurry upon
closer inspection. What does “reducible” actually mean? An obvious source of additional
information is the training data D: The learner’s uncertainty can be reduced by observing
3The notion of a model is not used in a unique way in the literature. Here, as already said, weessentially refer to the choice of the hypothesis space H, so a model could be a linear model or a Gaussianprocess. Sometimes, a single hypothesis h ∈ H is also called a model. Then, “model uncertainty” mightbe incorrectly understood as uncertainty about this hypothesis, for which we shall introduce the term“approximation uncertainty”.
8
x1<latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit><latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit><latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit><latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit>
x1<latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit><latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit><latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit><latexit sha1_base64="ssfbDgVssIHGXRgJ1AS4PN6yxEE=">AAACg3icbVFdaxNBFJ2sVuv61eqjL4upIAhhJxT1RSj44mME0xSyS7g7e6cdMh/LzGzbMOyP8FV/mf/G2SSFmHph4HDOuV9zq0YK5/P8zyB58PDg0ePDJ+nTZ89fvDw6fnXuTGsZTpmRxl5U4FAKjVMvvMSLxiKoSuKsWn7t9dk1WieM/uFXDZYKLrXggoGP1OzkdkFP0nRxNMxH+Tqy+4BuwZBsY7I4HvCiNqxVqD2T4Nyc5o0vA1gvmMQuLVqHDbAlXOI8Qg0KXRnW83bZu8jUGTc2Pu2zNbubEUA5t1JVdCrwV25f68n/afPW889lELppPWq2acRbmXmT9ctntbDIvFxFAMyKOGvGrsAC8/GL0sKixhtmlAJdh4KDEnJVI4dW+i4Ujt/htNj1OfTdnJah6OdhIMOQdt1+sWtkG1NlZN3vZu58u66JNVW3KVTxMIlyvAzdv8N9cD4e0XxEv4+HZx+3Nzokb8hb8p5Q8omckW9kQqaEkSX5SX6R38lB8iEZJ6cbazLY5rwm/0Ty5S/8RMab</latexit>
x2<latexit sha1_base64="wBAF9DQ7scYKk/WruGAKqSZ2BI4=">AAACg3icbVFda9swFFW8duu8bm23x72YpoPCINihdHsZFPayxwyWphCbcC1ftSL6MJLcNgj/iL1uv2z/ZnKSQpr2guBwzrlfumUtuHVp+q8XvdjZfflq73X8Zv/tu4PDo/eXVjeG4phqoc1VCRYFVzh23Am8qg2CLAVOyvn3Tp/corFcq19uUWMh4Vpxxim4QE1O7mfDkzieHfbTQbqM5CnI1qBP1jGaHfVYXmnaSFSOCrB2mqW1KzwYx6nANs4bizXQOVzjNEAFEm3hl/O2yafAVAnTJjzlkiW7meFBWruQZXBKcDd2W+vI57Rp49jXwnNVNw4VXTVijUicTrrlk4obpE4sAgBqeJg1oTdggLrwRXFuUOEd1VKCqnzOQHKxqJBBI1zrc8secJxv+iy6dpoVPu/moSB8P2vb7WK3SFemUouq200/+DZdI6PLdlWoZH4U5HCZbPsOT8HlcJClg+znsH9xvr7RHvlIjskpycgXckF+kBEZE0rm5Df5Q/5Gu9HnaBidraxRb53zgTyK6Nt//mDGnA==</latexit><latexit sha1_base64="wBAF9DQ7scYKk/WruGAKqSZ2BI4=">AAACg3icbVFda9swFFW8duu8bm23x72YpoPCINihdHsZFPayxwyWphCbcC1ftSL6MJLcNgj/iL1uv2z/ZnKSQpr2guBwzrlfumUtuHVp+q8XvdjZfflq73X8Zv/tu4PDo/eXVjeG4phqoc1VCRYFVzh23Am8qg2CLAVOyvn3Tp/corFcq19uUWMh4Vpxxim4QE1O7mfDkzieHfbTQbqM5CnI1qBP1jGaHfVYXmnaSFSOCrB2mqW1KzwYx6nANs4bizXQOVzjNEAFEm3hl/O2yafAVAnTJjzlkiW7meFBWruQZXBKcDd2W+vI57Rp49jXwnNVNw4VXTVijUicTrrlk4obpE4sAgBqeJg1oTdggLrwRXFuUOEd1VKCqnzOQHKxqJBBI1zrc8secJxv+iy6dpoVPu/moSB8P2vb7WK3SFemUouq200/+DZdI6PLdlWoZH4U5HCZbPsOT8HlcJClg+znsH9xvr7RHvlIjskpycgXckF+kBEZE0rm5Df5Q/5Gu9HnaBidraxRb53zgTyK6Nt//mDGnA==</latexit><latexit sha1_base64="wBAF9DQ7scYKk/WruGAKqSZ2BI4=">AAACg3icbVFda9swFFW8duu8bm23x72YpoPCINihdHsZFPayxwyWphCbcC1ftSL6MJLcNgj/iL1uv2z/ZnKSQpr2guBwzrlfumUtuHVp+q8XvdjZfflq73X8Zv/tu4PDo/eXVjeG4phqoc1VCRYFVzh23Am8qg2CLAVOyvn3Tp/corFcq19uUWMh4Vpxxim4QE1O7mfDkzieHfbTQbqM5CnI1qBP1jGaHfVYXmnaSFSOCrB2mqW1KzwYx6nANs4bizXQOVzjNEAFEm3hl/O2yafAVAnTJjzlkiW7meFBWruQZXBKcDd2W+vI57Rp49jXwnNVNw4VXTVijUicTrrlk4obpE4sAgBqeJg1oTdggLrwRXFuUOEd1VKCqnzOQHKxqJBBI1zrc8secJxv+iy6dpoVPu/moSB8P2vb7WK3SFemUouq200/+DZdI6PLdlWoZH4U5HCZbPsOT8HlcJClg+znsH9xvr7RHvlIjskpycgXckF+kBEZE0rm5Df5Q/5Gu9HnaBidraxRb53zgTyK6Nt//mDGnA==</latexit><latexit sha1_base64="wBAF9DQ7scYKk/WruGAKqSZ2BI4=">AAACg3icbVFda9swFFW8duu8bm23x72YpoPCINihdHsZFPayxwyWphCbcC1ftSL6MJLcNgj/iL1uv2z/ZnKSQpr2guBwzrlfumUtuHVp+q8XvdjZfflq73X8Zv/tu4PDo/eXVjeG4phqoc1VCRYFVzh23Am8qg2CLAVOyvn3Tp/corFcq19uUWMh4Vpxxim4QE1O7mfDkzieHfbTQbqM5CnI1qBP1jGaHfVYXmnaSFSOCrB2mqW1KzwYx6nANs4bizXQOVzjNEAFEm3hl/O2yafAVAnTJjzlkiW7meFBWruQZXBKcDd2W+vI57Rp49jXwnNVNw4VXTVijUicTrrlk4obpE4sAgBqeJg1oTdggLrwRXFuUOEd1VKCqnzOQHKxqJBBI1zrc8secJxv+iy6dpoVPu/moSB8P2vb7WK3SFemUouq200/+DZdI6PLdlWoZH4U5HCZbPsOT8HlcJClg+znsH9xvr7RHvlIjskpycgXckF+kBEZE0rm5Df5Q/5Gu9HnaBidraxRb53zgTyK6Nt//mDGnA==</latexit>
Figure 5: Left: The two classes are overlapping, which causes (aleatoric) uncertainty ina certain region of the instance space. Right: By adding a second feature, and henceembedding the data in a higher-dimensional space, the two classes become separable, andthe uncertainty can be resolved.
more data, while the setting of the learning problem — the instance space X , output space
Y, hypothesis space H, joint probability P on X × Y— remains fixed. In practice, this is
of course not always the case. Imagine, for example, that a learner can decide to extend
the description of instances by additional features, which essentially means replacing the
current instance space X by another space X ′. This change of the setting may have an
influence on uncertainty. An example is shown in Fig. 5: In a low-dimensional space
(here defined by a single feature x1), two class distributions are overlapping, which causes
(aleatoric) uncertainty in a certain region of the instance space. By embedding the data
in a higher-dimensional space (here accomplished by adding a second feature x2), the two
classes become separable, and the uncertainty can be resolved. More generally, embedding
data in a higher-dimensional space will reduce aleatoric and increase epistemic uncertainty,
because fitting a model will become more difficult and require more data.
What this example shows is that aleatoric and epistemic uncertainty should not be seen
as absolute notions. Instead, they are context-dependent in the sense of depending on the
setting (X ,Y,H, P ). Changing the context will also change the sources of uncertainty:
aleatoric may turn into epistemic uncertainty and vice versa. Consequently, by allowing
the learner to change the setting, the distinction between these two types of uncertainty
will be somewhat blurred (and their quantification will become even more difficult). This
view of the distinction between aleatoric and epistemic uncertainty is also shared by Der
Kiureghian and Ditlevsen (2009), who note that “these concepts only make unambiguous
sense if they are defined within the confines of a model of analysis”, and that “In one
model an addressed uncertainty may be aleatory, in another model it may be epistemic.”
2.4 Approximation and model uncertainty
Assuming the setting (X ,Y,H, P ) to be fixed, the learner’s lack of knowledge will essen-
tially depend on the amount of data it has seen so far: The larger the number N = |D| of
observations, the less ignorant the learner will be when having to make a new prediction.
In the limit, when N →∞, a consistent learner will be able to identify h∗ (see Fig. 6 for
9
?
?
Figure 6: Left: Even with precise knowledge about the optimal hypothesis, the predictionat the query point (indicated by a question mark) is aleatorically uncertain, because thetwo classes are overlapping in that region. Right: A case of epistemic uncertainty due toa lack of knowledge about the right hypothesis, which is in turn caused by a lack of data.
an illustration), i.e., it will get rid of its approximation uncertainty.
What is (implicitly) assumed here is a correctly specified hypothesis space H, such that
f∗ ∈ H. In other words, model uncertainty is simply ignored. For obvious reasons, this
uncertainty is very difficult to capture, let alone quantify. In a sense, a kind of meta-
analysis would be required: Instead of expressing uncertainty about the ground-truth
hypothesis h within a hypothesis space H, one has to express uncertainty about which Hamong a set H of candidate hypothesis spaces might be the right one. Practically, such kind
of analysis does not appear to be feasible. On the other side, simply assuming a correctly
specified hypothesis space H actually means neglecting the risk of model misspecification.
To some extent, this appears to be unavoidable, however. In fact, the learning itself as
well as all sorts of inference from the data are normally done under the assumption that
the model is valid. Otherwise, since some assumptions are indeed always needed, it will
be difficult to derive any useful conclusions .
As an aside, let us note that these assumptions, in addition to the nature of the ground
truth f∗, also include other (perhaps more implicit) assumptions about the setting and the
data-generating process. For example, imagine that a sudden change of the distribution
cannot be excluded, or a strong discrepancy between training and test data (Malinin and
Gales, 2018). Not only prediction but also the assessment of uncertainty would then
become difficult, if not impossible. Indeed, if one cannot exclude something completely
unpredictable to happen, there is hardly any way to reduce predictive uncertainty. To
take a simple example, (epistemic) uncertainty about the bias of a coin can be estimated
and quantified, for example in the form of a confidence interval, from an i.i.d. sequence
of coin tosses. But what if the bias may change from one moment to the other (e.g.,
because the coin is replaced by another one), or another outcome becomes possible (e.g.,
“invalid” if the coin has not been tossed in agreement with a new execution rule)? While
this example may appear a bit artificial, there are indeed practical classification problems
in which certain classes y ∈ Y may “disappear” while new classes emerge, i.e., in which
Y may change in the course of time. Likewise, non-stationarity of the data-generating
10
process (the measure P ), including the possibility of drift or shift of the distribution, is a
common assumption in learning on data streams (Gama, 2012).
Coming back to the assumptions about the hypothesis space H, the latter is actually very
large for some learning methods, such as nearest neighbor classification or (deep) neural
networks. Thus, the learner has a high capacity (or “universal approximation” capability)
and can express hypotheses in a very flexible way. In such cases, h∗ = f∗ or at least
h∗ ≈ f∗ can safely be assumed. In other words, since the model assumptions are so weak,
model uncertainty essentially disappears (at least when disregarding or taking for granted
other assumptions as discussed above). Yet, the approximation uncertainty still remains
a source of epistemic uncertainty. In fact, this uncertainty tends to be high for methods
like nearest neighbors or neural networks, especially if data is sparse.
In the next section, we recall two specific though arguably natural and important ap-
proaches for capturing this uncertainty, namely version space learning and Bayesian in-
ference. In version space learning, uncertainty about h∗ is represented in terms of a set
of possible candidates, whereas in Bayesian learning, this uncertainty is modeled in terms
of a probability distribution on H. In both cases, an explicit distinction is made between
the uncertainty about h∗, and how this uncertainty translates into uncertainty about the
outcome for a query xq.
3 Modeling approximation uncertainty: Set-based versus
distributional representations
Bayesian inference can be seen as the main representative of probabilistic methods and
provides a coherent framework for statistical reasoning that is well-established in machine
learning (and beyond). Version space learning can be seen as a “logical” (and in a sense
simplified) counterpart of Bayesian inference, in which hypotheses and predictions are
not assessed numerically in terms of probabilities, but only qualified (deterministically)
as being possible or impossible. In spite of its limited practical usefulness, version space
learning is interesting for various reasons. In particular, in light of our discussion about
uncertainty, it constitutes an interesting case: By construction, version space learning is
free of aleatoric uncertainty, i.e., all uncertainty is epistemic.
3.1 Version space learning
In the idealized setting of version space learning, we assume a deterministic dependency
f∗ : X −→ Y, i.e., the distribution (6) degenerates to
p(y |xq) =
1 if y = f∗(xq)
0 if y 6= f∗(xq)(8)
11
Moreover, the training data (1) is free of noise. Correspondingly, we also assume that
classifiers produce deterministic predictions h(x) ∈ 0, 1 in the form of probabilities 0 or
1. Finally, we assume that f∗ ∈ H, and therefore h∗ = f∗ (which means there is no model
uncertainty).
Under these assumptions, a hypothesis h ∈ H can be eliminated as a candidate as soon as
it makes at least one mistake on the training data: in that case, the risk of h is necessarily
higher than the risk of h∗ (which is 0). The idea of the candidate elimination algorithm
(Mitchell, 1977) is to maintain the version space V ⊆ H that consists of the set of all
hypotheses consistent with the data seen so far:
V = V(H,D) ..=h ∈ H |h(xi) = yi for i = 1, . . . , N
(9)
Obviously, the version space is shrinking with an increasing amount of training data, i.e.,
V(H,D′) ⊆ V(H,D) for D ⊆ D′.If a prediction yq for a query instance xq is sought, this query is submitted to all members
h ∈ V of the version space. Obviously, a unique prediction can only be made if all
members agree on the outome of xq. Otherwise, several outcomes y ∈ Y may still appear
possible. Formally, mimicking the logical conjunction with the minimum operator and
the existential quantification with a maximum, we can express the degree of possibility or
plausibility of an outcome y ∈ Y as follows (J·K denotes the indicator function):
π(y) ..= maxh∈H
min(Jh ∈ VK, Jh(xq) = yK
)(10)
Thus, π(y) = 1 if there exists a candidate hypothesis h ∈ V such that h(xq) = y, and
π(y) = 0 otherwise. In other words, the prediction produced in version space learning is
a subset
Y = Y (xq) ..= h(xq) |h ∈ V = y |π(y) = 1 ⊆ Y (11)
See Fig. 7 for an illustration.
Note that the inference (10) can be seen as a kind of constraint propagation, in which the
constraint h ∈ V on the hypothesis space H is propagated to a constraint on Y, expressed
in the form of the subset (11) of possible outcomes; or symbolically:
H,D,xq |= Y (12)
This view highlights the interaction between prior knowledge and data: It shows that what
can be said about the possible outcomes yq not only depends on the data D but also on the
hypothesis space H, i.e., the model assumptions the learner starts with. The specification
of H always comes with an inductive bias, which is indeed essential for learning from data
(Mitchell, 1980). In general, both aleatoric and epistemic uncertainty (ignorance) depend
on the way in which prior knowledge and data interact with each other. Roughly speaking,
the stronger the knowledge the learning process starts with, the less data is needed to
resolve uncertainty. In the extreme case, the true model is already known, and data is
12
hypothesis space H<latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit><latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit><latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit><latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit>
y1<latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit><latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit><latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit><latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit>
y2<latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit><latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit><latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit><latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit>
y3<latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit><latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit><latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit><latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit>
y4<latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit><latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit><latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit><latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit>
training data D<latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit><latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit><latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit><latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit>
version space V<latexit sha1_base64="1nJt/xyff/ogf1Syd98kqCPuQUI=">AAACm3icbVHLattAFB2rr1R9OemyBIY6ha6MFALtMpBNCVm4UDsBS5ir0Z1kyDzEzCjBCK3yNd22X9O/ychWiuv0wsDhnDP3WVRSOJ8kfwbRk6fPnr/YeRm/ev3m7bvh7t7MmdoynDIjjb0owKEUGqdeeIkXlUVQhcTz4vqk089v0Dph9A+/rDBXcKkFFwx8oBbD/V6krgKG9CBT4K8YyGbWHsTxYjhKxskq6GOQ9mBE+pgsdgc8Kw2rFWrPJDg3T5PK5w1YL5jENs5qh6HSNVziPEANCl3erOZo6afAlJQbG572dMVu/mhAObdURXB2bbptrSP/p81rz7/mjdBV7VGzdSFeS+oN7ZZCS2GRebkMAJgVoVfKrsAC82E7cWZR4y0zSoEum4yDEnJZIoda+rbJHH/Acbbpc+jbeZo3f1c6Stt2O9kNsrWpMLLsZjMPvk3XxJqiXScqeDMJcrhMun2Hx2B2OE6Tcfr9cHR81N9oh3wgH8lnkpIv5Jh8IxMyJYzckZ/kF/kd7Ucn0Wl0trZGg/7Pe/JPRNN70UzQdA==</latexit><latexit sha1_base64="1nJt/xyff/ogf1Syd98kqCPuQUI=">AAACm3icbVHLattAFB2rr1R9OemyBIY6ha6MFALtMpBNCVm4UDsBS5ir0Z1kyDzEzCjBCK3yNd22X9O/ychWiuv0wsDhnDP3WVRSOJ8kfwbRk6fPnr/YeRm/ev3m7bvh7t7MmdoynDIjjb0owKEUGqdeeIkXlUVQhcTz4vqk089v0Dph9A+/rDBXcKkFFwx8oBbD/V6krgKG9CBT4K8YyGbWHsTxYjhKxskq6GOQ9mBE+pgsdgc8Kw2rFWrPJDg3T5PK5w1YL5jENs5qh6HSNVziPEANCl3erOZo6afAlJQbG572dMVu/mhAObdURXB2bbptrSP/p81rz7/mjdBV7VGzdSFeS+oN7ZZCS2GRebkMAJgVoVfKrsAC82E7cWZR4y0zSoEum4yDEnJZIoda+rbJHH/Acbbpc+jbeZo3f1c6Stt2O9kNsrWpMLLsZjMPvk3XxJqiXScqeDMJcrhMun2Hx2B2OE6Tcfr9cHR81N9oh3wgH8lnkpIv5Jh8IxMyJYzckZ/kF/kd7Ucn0Wl0trZGg/7Pe/JPRNN70UzQdA==</latexit><latexit sha1_base64="1nJt/xyff/ogf1Syd98kqCPuQUI=">AAACm3icbVHLattAFB2rr1R9OemyBIY6ha6MFALtMpBNCVm4UDsBS5ir0Z1kyDzEzCjBCK3yNd22X9O/ychWiuv0wsDhnDP3WVRSOJ8kfwbRk6fPnr/YeRm/ev3m7bvh7t7MmdoynDIjjb0owKEUGqdeeIkXlUVQhcTz4vqk089v0Dph9A+/rDBXcKkFFwx8oBbD/V6krgKG9CBT4K8YyGbWHsTxYjhKxskq6GOQ9mBE+pgsdgc8Kw2rFWrPJDg3T5PK5w1YL5jENs5qh6HSNVziPEANCl3erOZo6afAlJQbG572dMVu/mhAObdURXB2bbptrSP/p81rz7/mjdBV7VGzdSFeS+oN7ZZCS2GRebkMAJgVoVfKrsAC82E7cWZR4y0zSoEum4yDEnJZIoda+rbJHH/Acbbpc+jbeZo3f1c6Stt2O9kNsrWpMLLsZjMPvk3XxJqiXScqeDMJcrhMun2Hx2B2OE6Tcfr9cHR81N9oh3wgH8lnkpIv5Jh8IxMyJYzckZ/kF/kd7Ucn0Wl0trZGg/7Pe/JPRNN70UzQdA==</latexit><latexit sha1_base64="1nJt/xyff/ogf1Syd98kqCPuQUI=">AAACm3icbVHLattAFB2rr1R9OemyBIY6ha6MFALtMpBNCVm4UDsBS5ir0Z1kyDzEzCjBCK3yNd22X9O/ychWiuv0wsDhnDP3WVRSOJ8kfwbRk6fPnr/YeRm/ev3m7bvh7t7MmdoynDIjjb0owKEUGqdeeIkXlUVQhcTz4vqk089v0Dph9A+/rDBXcKkFFwx8oBbD/V6krgKG9CBT4K8YyGbWHsTxYjhKxskq6GOQ9mBE+pgsdgc8Kw2rFWrPJDg3T5PK5w1YL5jENs5qh6HSNVziPEANCl3erOZo6afAlJQbG572dMVu/mhAObdURXB2bbptrSP/p81rz7/mjdBV7VGzdSFeS+oN7ZZCS2GRebkMAJgVoVfKrsAC82E7cWZR4y0zSoEum4yDEnJZIoda+rbJHH/Acbbpc+jbeZo3f1c6Stt2O9kNsrWpMLLsZjMPvk3XxJqiXScqeDMJcrhMun2Hx2B2OE6Tcfr9cHR81N9oh3wgH8lnkpIv5Jh8IxMyJYzckZ/kF/kd7Ucn0Wl0trZGg/7Pe/JPRNN70UzQdA==</latexit>
h(xq)<latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit><latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit><latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit><latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit>
h0(xq)<latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit><latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit><latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit><latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit>
h00(xq)<latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit><latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit><latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit><latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit>
0<latexit sha1_base64="wwzy1NVnanA6HIdJ9o7g41bD4u0=">AAACf3icbVHLattAFB2rr1R95bHsRtSUdmWkEGi7C3TTpUvqJGCJcDW64wyZh5i5SjGDPqHb9tv6NxnZDrhOLwwczjn3NbdulfSU539HyaPHT54+23uevnj56vWb/YPDc287x3HGrbLusgaPShqckSSFl61D0LXCi/rm66Bf3KLz0poftGyx0rAwUkgOFKmzPE2v9sf5JF9F9hAUGzBmm5heHYxE2VjeaTTEFXg/L/KWqgCOJFfYp2XnsQV+AwucR2hAo6/CatY+ex+ZJhPWxWcoW7HbGQG090tdR6cGuva72kD+T5t3JD5XQZq2IzR83Uh0KiObDYtnjXTISS0jAO5knDXj1+CAU/yetHRo8Ce3WoNpQilAS7VsUECnqA+lF/c4Lbd9HqmfF1Uoh3k4qDAu+n632C3ytam2qhl2s/e+bdfU2bpfF6pFmEY5XqbYvcNDcH48KfJJ8f14fHqyudEee8vesY+sYJ/YKfvGpmzGOFuwX+w3+5OMkg/JJMnX1mS0yTli/0Ty5Q4ERcVR</latexit><latexit sha1_base64="wwzy1NVnanA6HIdJ9o7g41bD4u0=">AAACf3icbVHLattAFB2rr1R95bHsRtSUdmWkEGi7C3TTpUvqJGCJcDW64wyZh5i5SjGDPqHb9tv6NxnZDrhOLwwczjn3NbdulfSU539HyaPHT54+23uevnj56vWb/YPDc287x3HGrbLusgaPShqckSSFl61D0LXCi/rm66Bf3KLz0poftGyx0rAwUkgOFKmzPE2v9sf5JF9F9hAUGzBmm5heHYxE2VjeaTTEFXg/L/KWqgCOJFfYp2XnsQV+AwucR2hAo6/CatY+ex+ZJhPWxWcoW7HbGQG090tdR6cGuva72kD+T5t3JD5XQZq2IzR83Uh0KiObDYtnjXTISS0jAO5knDXj1+CAU/yetHRo8Ce3WoNpQilAS7VsUECnqA+lF/c4Lbd9HqmfF1Uoh3k4qDAu+n632C3ytam2qhl2s/e+bdfU2bpfF6pFmEY5XqbYvcNDcH48KfJJ8f14fHqyudEee8vesY+sYJ/YKfvGpmzGOFuwX+w3+5OMkg/JJMnX1mS0yTli/0Ty5Q4ERcVR</latexit><latexit sha1_base64="wwzy1NVnanA6HIdJ9o7g41bD4u0=">AAACf3icbVHLattAFB2rr1R95bHsRtSUdmWkEGi7C3TTpUvqJGCJcDW64wyZh5i5SjGDPqHb9tv6NxnZDrhOLwwczjn3NbdulfSU539HyaPHT54+23uevnj56vWb/YPDc287x3HGrbLusgaPShqckSSFl61D0LXCi/rm66Bf3KLz0poftGyx0rAwUkgOFKmzPE2v9sf5JF9F9hAUGzBmm5heHYxE2VjeaTTEFXg/L/KWqgCOJFfYp2XnsQV+AwucR2hAo6/CatY+ex+ZJhPWxWcoW7HbGQG090tdR6cGuva72kD+T5t3JD5XQZq2IzR83Uh0KiObDYtnjXTISS0jAO5knDXj1+CAU/yetHRo8Ce3WoNpQilAS7VsUECnqA+lF/c4Lbd9HqmfF1Uoh3k4qDAu+n632C3ytam2qhl2s/e+bdfU2bpfF6pFmEY5XqbYvcNDcH48KfJJ8f14fHqyudEee8vesY+sYJ/YKfvGpmzGOFuwX+w3+5OMkg/JJMnX1mS0yTli/0Ty5Q4ERcVR</latexit><latexit sha1_base64="wwzy1NVnanA6HIdJ9o7g41bD4u0=">AAACf3icbVHLattAFB2rr1R95bHsRtSUdmWkEGi7C3TTpUvqJGCJcDW64wyZh5i5SjGDPqHb9tv6NxnZDrhOLwwczjn3NbdulfSU539HyaPHT54+23uevnj56vWb/YPDc287x3HGrbLusgaPShqckSSFl61D0LXCi/rm66Bf3KLz0poftGyx0rAwUkgOFKmzPE2v9sf5JF9F9hAUGzBmm5heHYxE2VjeaTTEFXg/L/KWqgCOJFfYp2XnsQV+AwucR2hAo6/CatY+ex+ZJhPWxWcoW7HbGQG090tdR6cGuva72kD+T5t3JD5XQZq2IzR83Uh0KiObDYtnjXTISS0jAO5knDXj1+CAU/yetHRo8Ce3WoNpQilAS7VsUECnqA+lF/c4Lbd9HqmfF1Uoh3k4qDAu+n632C3ytam2qhl2s/e+bdfU2bpfF6pFmEY5XqbYvcNDcH48KfJJ8f14fHqyudEee8vesY+sYJ/YKfvGpmzGOFuwX+w3+5OMkg/JJMnX1mS0yTli/0Ty5Q4ERcVR</latexit>
1<latexit sha1_base64="PzJWBNa18jQkdid1u3Qskr88ES0=">AAACf3icbVHLahsxFJWnTZtOk+a17GaoKc3KjEIg7S6QTZcuqZOAZwh3NFeOiB6DpEkxYj4h2/bb+jfR2A64Ti8IDuec+9KtGimcz/O/g+TV6603b7ffpe93dj/s7R8cXjnTWoYTZqSxNxU4lELjxAsv8aaxCKqSeF3dX/T69QNaJ4z+6ecNlgpmWnDBwEfqkqbp7f4wH+WLyF4CugJDsorx7cGAF7VhrULtmQTnpjRvfBnAesEkdmnROmyA3cMMpxFqUOjKsJi1yz5Hps64sfFpny3Y9YwAyrm5qqJTgb9zm1pP/k+btp5/LYPQTetRs2Uj3srMm6xfPKuFReblPAJgVsRZM3YHFpiP35MWFjX+YkYp0HUoOCgh5zVyaKXvQuH4M06LdZ9D301pGYp+HgYyDGnXbRZ7QLY0VUbW/W7m2bfuGltTdctCFQ/jKMfL0M07vARXJyOaj+iPk+H56epG2+Qj+USOCSVn5Jx8J2MyIYzMyCP5Tf4kg+RLMkrypTUZrHKOyD+RfHsCBmDFUg==</latexit><latexit sha1_base64="PzJWBNa18jQkdid1u3Qskr88ES0=">AAACf3icbVHLahsxFJWnTZtOk+a17GaoKc3KjEIg7S6QTZcuqZOAZwh3NFeOiB6DpEkxYj4h2/bb+jfR2A64Ti8IDuec+9KtGimcz/O/g+TV6603b7ffpe93dj/s7R8cXjnTWoYTZqSxNxU4lELjxAsv8aaxCKqSeF3dX/T69QNaJ4z+6ecNlgpmWnDBwEfqkqbp7f4wH+WLyF4CugJDsorx7cGAF7VhrULtmQTnpjRvfBnAesEkdmnROmyA3cMMpxFqUOjKsJi1yz5Hps64sfFpny3Y9YwAyrm5qqJTgb9zm1pP/k+btp5/LYPQTetRs2Uj3srMm6xfPKuFReblPAJgVsRZM3YHFpiP35MWFjX+YkYp0HUoOCgh5zVyaKXvQuH4M06LdZ9D301pGYp+HgYyDGnXbRZ7QLY0VUbW/W7m2bfuGltTdctCFQ/jKMfL0M07vARXJyOaj+iPk+H56epG2+Qj+USOCSVn5Jx8J2MyIYzMyCP5Tf4kg+RLMkrypTUZrHKOyD+RfHsCBmDFUg==</latexit><latexit sha1_base64="PzJWBNa18jQkdid1u3Qskr88ES0=">AAACf3icbVHLahsxFJWnTZtOk+a17GaoKc3KjEIg7S6QTZcuqZOAZwh3NFeOiB6DpEkxYj4h2/bb+jfR2A64Ti8IDuec+9KtGimcz/O/g+TV6603b7ffpe93dj/s7R8cXjnTWoYTZqSxNxU4lELjxAsv8aaxCKqSeF3dX/T69QNaJ4z+6ecNlgpmWnDBwEfqkqbp7f4wH+WLyF4CugJDsorx7cGAF7VhrULtmQTnpjRvfBnAesEkdmnROmyA3cMMpxFqUOjKsJi1yz5Hps64sfFpny3Y9YwAyrm5qqJTgb9zm1pP/k+btp5/LYPQTetRs2Uj3srMm6xfPKuFReblPAJgVsRZM3YHFpiP35MWFjX+YkYp0HUoOCgh5zVyaKXvQuH4M06LdZ9D301pGYp+HgYyDGnXbRZ7QLY0VUbW/W7m2bfuGltTdctCFQ/jKMfL0M07vARXJyOaj+iPk+H56epG2+Qj+USOCSVn5Jx8J2MyIYzMyCP5Tf4kg+RLMkrypTUZrHKOyD+RfHsCBmDFUg==</latexit><latexit sha1_base64="PzJWBNa18jQkdid1u3Qskr88ES0=">AAACf3icbVHLahsxFJWnTZtOk+a17GaoKc3KjEIg7S6QTZcuqZOAZwh3NFeOiB6DpEkxYj4h2/bb+jfR2A64Ti8IDuec+9KtGimcz/O/g+TV6603b7ffpe93dj/s7R8cXjnTWoYTZqSxNxU4lELjxAsv8aaxCKqSeF3dX/T69QNaJ4z+6ecNlgpmWnDBwEfqkqbp7f4wH+WLyF4CugJDsorx7cGAF7VhrULtmQTnpjRvfBnAesEkdmnROmyA3cMMpxFqUOjKsJi1yz5Hps64sfFpny3Y9YwAyrm5qqJTgb9zm1pP/k+btp5/LYPQTetRs2Uj3srMm6xfPKuFReblPAJgVsRZM3YHFpiP35MWFjX+YkYp0HUoOCgh5zVyaKXvQuH4M06LdZ9D301pGYp+HgYyDGnXbRZ7QLY0VUbW/W7m2bfuGltTdctCFQ/jKMfL0M07vARXJyOaj+iPk+H56epG2+Qj+USOCSVn5Jx8J2MyIYzMyCP5Tf4kg+RLMkrypTUZrHKOyD+RfHsCBmDFUg==</latexit>
possibility of outcomes<latexit sha1_base64="gNnK7SGUSgKtI8SCmtpsJO7uJr8=">AAACl3icbVHLahsxFJWnj6TTl9OuQjeiptCVmQmBdpdAoWTpQJ0EPIORNFeJiB6DdCdlGIZ+Tbft9/RvqrEdcJ1eEBzOObpPXmsVMMv+jJJHj5883dt/lj5/8fLV6/HBm4vgGi9gLpx2/oqzAFpZmKNCDVe1B2a4hkt++2XQL+/AB+XsN2xrKA27tkoqwTBSy/Fh7UJQXGmFLXWSugaFMxDSdDmeZNNsFfQhyDdgQjYxWx6MZFE50RiwKDQLYZFnNZYd86iEhj4tmgA1E7fsGhYRWhbrlN1qhp5+iExFpfPxWaQrdvtHx0wIreHRaRjehF1tIP+nLRqUn8tO2bpBsGJdSDaaoqPDQmilPAjUbQRMeBV7peKGeSYwri0tPFj4HldimK26QjKjdFuBZI3GviuCvMdpse0LgP0iL7ti6Ecw3U3yvt9NdgdibeJOV8Ns7t637Zp5x/t1Ii67WZTjZfLdOzwEF0fTPJvm50eT0+PNjfbJO/KefCQ5+UROyRmZkTkR5Af5SX6R38lhcpJ8Tc7W1mS0+fOW/BPJ+V9Ees9t</latexit><latexit sha1_base64="gNnK7SGUSgKtI8SCmtpsJO7uJr8=">AAACl3icbVHLahsxFJWnj6TTl9OuQjeiptCVmQmBdpdAoWTpQJ0EPIORNFeJiB6DdCdlGIZ+Tbft9/RvqrEdcJ1eEBzOObpPXmsVMMv+jJJHj5883dt/lj5/8fLV6/HBm4vgGi9gLpx2/oqzAFpZmKNCDVe1B2a4hkt++2XQL+/AB+XsN2xrKA27tkoqwTBSy/Fh7UJQXGmFLXWSugaFMxDSdDmeZNNsFfQhyDdgQjYxWx6MZFE50RiwKDQLYZFnNZYd86iEhj4tmgA1E7fsGhYRWhbrlN1qhp5+iExFpfPxWaQrdvtHx0wIreHRaRjehF1tIP+nLRqUn8tO2bpBsGJdSDaaoqPDQmilPAjUbQRMeBV7peKGeSYwri0tPFj4HldimK26QjKjdFuBZI3GviuCvMdpse0LgP0iL7ti6Ecw3U3yvt9NdgdibeJOV8Ns7t637Zp5x/t1Ii67WZTjZfLdOzwEF0fTPJvm50eT0+PNjfbJO/KefCQ5+UROyRmZkTkR5Af5SX6R38lhcpJ8Tc7W1mS0+fOW/BPJ+V9Ees9t</latexit><latexit sha1_base64="gNnK7SGUSgKtI8SCmtpsJO7uJr8=">AAACl3icbVHLahsxFJWnj6TTl9OuQjeiptCVmQmBdpdAoWTpQJ0EPIORNFeJiB6DdCdlGIZ+Tbft9/RvqrEdcJ1eEBzOObpPXmsVMMv+jJJHj5883dt/lj5/8fLV6/HBm4vgGi9gLpx2/oqzAFpZmKNCDVe1B2a4hkt++2XQL+/AB+XsN2xrKA27tkoqwTBSy/Fh7UJQXGmFLXWSugaFMxDSdDmeZNNsFfQhyDdgQjYxWx6MZFE50RiwKDQLYZFnNZYd86iEhj4tmgA1E7fsGhYRWhbrlN1qhp5+iExFpfPxWaQrdvtHx0wIreHRaRjehF1tIP+nLRqUn8tO2bpBsGJdSDaaoqPDQmilPAjUbQRMeBV7peKGeSYwri0tPFj4HldimK26QjKjdFuBZI3GviuCvMdpse0LgP0iL7ti6Ecw3U3yvt9NdgdibeJOV8Ns7t637Zp5x/t1Ii67WZTjZfLdOzwEF0fTPJvm50eT0+PNjfbJO/KefCQ5+UROyRmZkTkR5Af5SX6R38lhcpJ8Tc7W1mS0+fOW/BPJ+V9Ees9t</latexit><latexit sha1_base64="gNnK7SGUSgKtI8SCmtpsJO7uJr8=">AAACl3icbVHLahsxFJWnj6TTl9OuQjeiptCVmQmBdpdAoWTpQJ0EPIORNFeJiB6DdCdlGIZ+Tbft9/RvqrEdcJ1eEBzOObpPXmsVMMv+jJJHj5883dt/lj5/8fLV6/HBm4vgGi9gLpx2/oqzAFpZmKNCDVe1B2a4hkt++2XQL+/AB+XsN2xrKA27tkoqwTBSy/Fh7UJQXGmFLXWSugaFMxDSdDmeZNNsFfQhyDdgQjYxWx6MZFE50RiwKDQLYZFnNZYd86iEhj4tmgA1E7fsGhYRWhbrlN1qhp5+iExFpfPxWaQrdvtHx0wIreHRaRjehF1tIP+nLRqUn8tO2bpBsGJdSDaaoqPDQmilPAjUbQRMeBV7peKGeSYwri0tPFj4HldimK26QjKjdFuBZI3GviuCvMdpse0LgP0iL7ti6Ecw3U3yvt9NdgdibeJOV8Ns7t637Zp5x/t1Ii67WZTjZfLdOzwEF0fTPJvm50eT0+PNjfbJO/KefCQ5+UROyRmZkTkR5Af5SX6R38lhcpJ8Tc7W1mS0+fOW/BPJ+V9Ees9t</latexit>
Figure 7: Illustration of version space learning inference. The version space V, whichis the subset of hypotheses consistent with the data seen so far, represents the stateof knowledge of the learner. For a query xq ∈ X , the set of possible predictions isgiven by the set Y = h(xq) |h ∈ V. The distribution on the right is the characteristicfunction π : Y −→ 0, 1 of this set, which can be interpreted as a 0, 1-valued possibilitydistribution (indicating whether y is a possible outcome or not, i.e., π(y) = Jy ∈ Y K).
completely superfluous. Normally, however, prior knowledge is specified by assuming a
certain type of model, for example a linear relationship between inputs x and outputs y.
Then, all else (namely the data) being equal, the degree of predictive uncertainty depends
on how flexible the corresponding model class is. Informally speaking, the more restrictive
the model assumptions are, the smaller the uncertainty will be. This is illustrated in Fig. 8
for the case of binary classification.
Coming back to our discussion about uncertainty, it is clear that version space learning
as outlined above does not involve any kind of aleatoric uncertainty. Instead, the only
source of uncertainty is a lack of knowledge about h∗, and hence of epistemic nature. On
the model level, the amount of uncertainty is in direct correspondence with the size of
the version space V and reduces with an increasing sample size. Likewise, the predictive
uncertainty could be measured in terms of the size of the set (11) of candidate outcomes.
Obviously, this uncertainty may differ from instance to instance, or, stated differently,
approximation uncertainty may translate into prediction uncertainty in different ways.
In version space learning, uncertainty is represented in a purely set-based manner: the
version space V and prediction set Y (xq) are subsets of H and Y, respectively. In other
words, hypotheses h ∈ H and outcomes y ∈ Y are only qualified in terms of being possible
or not. In the following, we discuss the Bayesian approach, in which hypotheses and
predictions are qualified more gradually in terms of probabilities.
3.2 Bayesian inference
Consider a hypothesis space H consisting of probabilistic predictors, that is, hypotheses h
that deliver probabilistic predictions ph(y |x) = p(y |x, h) of outcomes y given an instance
x. In the Bayesian approach, H is supposed to be equipped with a prior distribution p(·),
13
2
1
3
2
1
3
? ?
Figure 8: Two examples illustrating predictive uncertainty in version space learning forthe case of binary classification. In the first example (above), the hypothesis space is givenin terms of rectangles (assigning interior instances to the positive class and instances out-side to the negative class). Both pictures show some of the (infinitely many) elements ofthe version space. Top left: Restricting H to axis-parallel rectangles, the first query pointis necessarily positive, the second one necessarily negative, because these predictions areproduced by all h ∈ V. As opposed to this, both positive and negative predictions can beproduced for the third query. Top right: Increasing flexibility (or weakening prior knowl-edge) by enriching the hypothesis space and also allowing for non-axis-parallel rectangles,none of the queries can be classified with certainty anymore. In the second example (be-low), hypotheses are linear and quadratic discriminant functions, respectively. Bottomleft: If the hypothesis space is given by linear discriminant functions, all hypothesis con-sistent with the data will predict the query instance as positive. Bottom right: Enrichingthe hypothesis space with quadratic discriminants, the members of the version space willno longer vote unanimously: some of them predict the positive and others the negativeclass.
14
hypothesis space H<latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit><latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit><latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit><latexit sha1_base64="ZCsYlx6X1Hdx25D+fli375xB/KI=">AAACnnicbVHLattAFB2rr1R9xEmX3Qx1Cl0ZKRSaZaBQsglxoU4MljBXozvxkHmImVGKENr2a7pt/6V/k5HtFNfphYHDOWfus6ikcD5J/gyiR4+fPH229zx+8fLV6/3hweGlM7VlOGVGGjsrwKEUGqdeeImzyiKoQuJVcfO5169u0Tph9DffVJgruNaCCwY+UIshXTaV8Ut0wlFXAUN6lCnwSwayPeuO4ngxHCXjZBX0IUg3YEQ2MVkcDHhWGlYr1J5JcG6eJpXPW7BeMIldnNUOQ6UbuMZ5gBoUurxdjdLR94EpKTc2PO3pit3+0YJyrlFFcPZtul2tJ/+nzWvPT/JW6Kr2qNm6EK8l9Yb2e6GlsMi8bAIAZkXolbIlWGA+bC/OLGr8zoxSoMs246CEbErkUEvftZnj9zjOtn0OfTdP8/bvSkdp1+0mu0W2NhVGlv1s5t637ZpYU3TrRAVvJ0EOl0l37/AQXB6P02Scfj0enX7c3GiPvCXvyAeSkk/klJyRCZkSRn6Qn+QX+R3R6Et0Hl2srdFg8+cN+Sei2R0AW9HO</latexit>
p(h | D)<latexit sha1_base64="yRjZacuraS/4DZ6IGrJSbeexylM=">AAACn3icbVFba9RAFJ6Ntxov3eqjL1O3QgVZkiLoY6GCPkkEt61swnIyOdMdOpcwM2lZQp79Nb7qb/HfONndyrr1wMDH933nzLmUtRTOJ8nvQXTn7r37D3Yexo8eP3m6O9x7dupMYxlOmJHGnpfgUAqNEy+8xPPaIqhS4ll5edLrZ1donTD6q1/UWCi40IILBj5Qs+H+QX1I5zR/Q/Ng80ugwM8ZyPZD9/ogjmfDUTJOlkFvg3QNRmQd2WxvwPPKsEah9kyCc9M0qX3RgvWCSezivHFYA7uEC5wGqEGhK9rlLB19FZiKcmPD054u2c2MFpRzC1UGZ9+n29Z68n/atPH8fdEKXTceNVt9xBtJvaH9YmglLDIvFwEAsyL0StkcLDAf1hfnFjVeM6MU6KrNOSghFxVyaKTv2tzxGxznmz6HvpumRft3p6O067aLXSFbmUojq342c+PbdGXWlN2qUMnbLMjhMun2HW6D06NxmozTL0ej47frG+2QF+QlOSQpeUeOySeSkQlh5Dv5QX6SX9F+9DH6HGUrazRY5zwn/0T07Q+CWNAu</latexit><latexit sha1_base64="yRjZacuraS/4DZ6IGrJSbeexylM=">AAACn3icbVFba9RAFJ6Ntxov3eqjL1O3QgVZkiLoY6GCPkkEt61swnIyOdMdOpcwM2lZQp79Nb7qb/HfONndyrr1wMDH933nzLmUtRTOJ8nvQXTn7r37D3Yexo8eP3m6O9x7dupMYxlOmJHGnpfgUAqNEy+8xPPaIqhS4ll5edLrZ1donTD6q1/UWCi40IILBj5Qs+H+QX1I5zR/Q/Ng80ugwM8ZyPZD9/ogjmfDUTJOlkFvg3QNRmQd2WxvwPPKsEah9kyCc9M0qX3RgvWCSezivHFYA7uEC5wGqEGhK9rlLB19FZiKcmPD054u2c2MFpRzC1UGZ9+n29Z68n/atPH8fdEKXTceNVt9xBtJvaH9YmglLDIvFwEAsyL0StkcLDAf1hfnFjVeM6MU6KrNOSghFxVyaKTv2tzxGxznmz6HvpumRft3p6O067aLXSFbmUojq342c+PbdGXWlN2qUMnbLMjhMun2HW6D06NxmozTL0ej47frG+2QF+QlOSQpeUeOySeSkQlh5Dv5QX6SX9F+9DH6HGUrazRY5zwn/0T07Q+CWNAu</latexit><latexit sha1_base64="yRjZacuraS/4DZ6IGrJSbeexylM=">AAACn3icbVFba9RAFJ6Ntxov3eqjL1O3QgVZkiLoY6GCPkkEt61swnIyOdMdOpcwM2lZQp79Nb7qb/HfONndyrr1wMDH933nzLmUtRTOJ8nvQXTn7r37D3Yexo8eP3m6O9x7dupMYxlOmJHGnpfgUAqNEy+8xPPaIqhS4ll5edLrZ1donTD6q1/UWCi40IILBj5Qs+H+QX1I5zR/Q/Ng80ugwM8ZyPZD9/ogjmfDUTJOlkFvg3QNRmQd2WxvwPPKsEah9kyCc9M0qX3RgvWCSezivHFYA7uEC5wGqEGhK9rlLB19FZiKcmPD054u2c2MFpRzC1UGZ9+n29Z68n/atPH8fdEKXTceNVt9xBtJvaH9YmglLDIvFwEAsyL0StkcLDAf1hfnFjVeM6MU6KrNOSghFxVyaKTv2tzxGxznmz6HvpumRft3p6O067aLXSFbmUojq342c+PbdGXWlN2qUMnbLMjhMun2HW6D06NxmozTL0ej47frG+2QF+QlOSQpeUeOySeSkQlh5Dv5QX6SX9F+9DH6HGUrazRY5zwn/0T07Q+CWNAu</latexit><latexit sha1_base64="yRjZacuraS/4DZ6IGrJSbeexylM=">AAACn3icbVFba9RAFJ6Ntxov3eqjL1O3QgVZkiLoY6GCPkkEt61swnIyOdMdOpcwM2lZQp79Nb7qb/HfONndyrr1wMDH933nzLmUtRTOJ8nvQXTn7r37D3Yexo8eP3m6O9x7dupMYxlOmJHGnpfgUAqNEy+8xPPaIqhS4ll5edLrZ1donTD6q1/UWCi40IILBj5Qs+H+QX1I5zR/Q/Ng80ugwM8ZyPZD9/ogjmfDUTJOlkFvg3QNRmQd2WxvwPPKsEah9kyCc9M0qX3RgvWCSezivHFYA7uEC5wGqEGhK9rlLB19FZiKcmPD054u2c2MFpRzC1UGZ9+n29Z68n/atPH8fdEKXTceNVt9xBtJvaH9YmglLDIvFwEAsyL0StkcLDAf1hfnFjVeM6MU6KrNOSghFxVyaKTv2tzxGxznmz6HvpumRft3p6O067aLXSFbmUojq342c+PbdGXWlN2qUMnbLMjhMun2HW6D06NxmozTL0ej47frG+2QF+QlOSQpeUeOySeSkQlh5Dv5QX6SX9F+9DH6HGUrazRY5zwn/0T07Q+CWNAu</latexit>
y1<latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit><latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit><latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit><latexit sha1_base64="iARN/oAypHanegH0iijhfTC72bo=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7aVXYWx6ujcTpJN5E8BtkAxmSI2ep4xPJK00aiclSAtcssrV3hwThOBXZx3lisga7hFpcBKpBoC7+Zt0s+BKZKmDbhKZds2N0MD9LaVpbBKcHd2X2tJ/+nLRvHvhSeq7pxqOi2EWtE4nTSL59U3CB1og0AqOFh1oTegQHqwhfFuUGFP6iWElTlcwaSi7ZCBo1wnc8te8Bxvuuz6LplVvi8n4eC8OOs6/aL3SPdmkotqn43/eDbdc2MLrttoZL5WZDDZbL9OzwG19NJlk6y79Pxxflwo0PyjrwnH0lGPpMLcklmZE4oWZOf5Bf5HR1En6JpNHij0ZBzSv6J6Otf/cjGmg==</latexit>
y2<latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit><latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit><latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit><latexit sha1_base64="ys9vBxYDrIeyCKPg4MCxlEIDJzg=">AAACg3icbVHLattAFB2rTZuqrzyW3Yg6hULBSCbQbgqBbrJ0IY4DljBXozvJ4HmImVGKGPQR3bZf1r/pyFbAdXph4HDOua+5ZS24dWn6ZxQ9eXrw7Pnhi/jlq9dv3h4dn1xb3RiKc6qFNjclWBRc4dxxJ/CmNgiyFLgo1996fXGPxnKtrlxbYyHhVnHGKbhALc7a1fQsjldH43SSbiJ5DLIBjMkQs9XxiOWVpo1E5agAa5dZWrvCg3GcCuzivLFYA13DLS4DVCDRFn4zb5d8CEyVMG3CUy7ZsLsZHqS1rSyDU4K7s/taT/5PWzaOfSk8V3XjUNFtI9aIxOmkXz6puEHqRBsAUMPDrAm9AwPUhS+Kc4MKf1AtJajK5wwkF22FDBrhOp9b9oDjfNdn0XXLrPB5Pw8F4cdZ1+0Xu0e6NZVaVP1u+sG365oZXXbbQiXzsyCHy2T7d3gMrqeTLJ1k36fji/PhRofkHXlPPpKMfCYX5JLMyJxQsiY/yS/yOzqIPkXTaPBGoyHnlPwT0de//+TGmw==</latexit>
y3<latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit><latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit><latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit><latexit sha1_base64="CsftUTJhKpS80yAcSd6dETyst+c=">AAACg3icbVHLattAFB2rTZuqr6RddiPqFAoFI7mFZhMIdNOlC3UcsIS5Gt1JBs9DzIxSxKCPyDb9sv5NR7YCrtMLA4dzzn3NLWvBrUvTP6Po0eODJ08Pn8XPX7x89fro+M2F1Y2hOKdaaHNZgkXBFc4ddwIva4MgS4GLcv2t1xc3aCzX6qdraywkXCnOOAUXqMVJu/p8Esero3E6STeRPATZAMZkiNnqeMTyStNGonJUgLXLLK1d4cE4TgV2cd5YrIGu4QqXASqQaAu/mbdLPgSmSpg24SmXbNjdDA/S2laWwSnBXdt9rSf/py0bx04Lz1XdOFR024g1InE66ZdPKm6QOtEGANTwMGtCr8EAdeGL4tygwl9USwmq8jkDyUVbIYNGuM7nlt3jON/1WXTdMit83s9DQfhx1nX7xW6Qbk2lFlW/m7737bpmRpfdtlDJ/CzI4TLZ/h0egovpJEsn2Y/p+PzLcKND8o68Jx9JRr6Sc/KdzMicULImt+SO/I4Ook/RNBq80WjIeUv+iejsLwIPxpw=</latexit>
y4<latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit><latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit><latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit><latexit sha1_base64="pt2ffnv0uF8exI6u6i3G2iuCz6Y=">AAACg3icbVHLattAFB2rTZuqryRddiPqFAoFI5lAsykEusnShToOWMJcje4kg+chZkYJYtBHdNt+Wf+mI1sB1+mFgcM5577mlrXg1qXpn1H05OnBs+eHL+KXr16/eXt0fHJldWMozqkW2lyXYFFwhXPHncDr2iDIUuCiXH/r9cUdGsu1+uHaGgsJN4ozTsEFanHars5O43h1NE4n6SaSxyAbwJgMMVsdj1headpIVI4KsHaZpbUrPBjHqcAuzhuLNdA13OAyQAUSbeE383bJx8BUCdMmPOWSDbub4UFa28oyOCW4W7uv9eT/tGXj2Hnhuaobh4puG7FGJE4n/fJJxQ1SJ9oAgBoeZk3oLRigLnxRnBtUeE+1lKAqnzOQXLQVMmiE63xu2QOO812fRdcts8Ln/TwUhB9nXbdf7A7p1lRqUfW76QffrmtmdNltC5XMz4IcLpPt3+ExuJpOsnSSfZ+OL86GGx2S9+QD+UQy8oVckEsyI3NCyZr8JL/I7+gg+hxNo8EbjYacd+SfiL7+BQQrxp0=</latexit>
predictive posterior<latexit sha1_base64="vWz90V+WSLgypR+Jt0tNPkK/Kss=">AAAClHicbVHLahsxFJWnr3T6clrIphtRU+jKzIRAu+giEAJdFRfqJOAZzB3NVSKixyBpXIw6P9Nt+kP5m2hsB1ynFwSHc869uo+qkcL5LLsdJI8eP3n6bO95+uLlq9dvhvtvz5xpLcMpM9LYiwocSqFx6oWXeNFYBFVJPK+uT3r9fIHWCaN/+mWDpYJLLbhg4CM1Hx5Edy2YFwukjXEerTA2TefDUTbOVkEfgnwDRmQTk/n+gBe1Ya1C7ZkE52Z51vgygPWCSezSonXYALuGS5xFqEGhK8NqgI5+jExNubHxaU9X7HZGAOXcUlXRqcBfuV2tJ/+nzVrPv5RB6Kb1qNn6I95K6g3tt0FrYZF5uYwAmBWxV8quwAKLi4iVLGr8xYxSoOtQcFBCLmvk0ErfhcLxe5wW2z6HvpvlZSj6fhjIMMq7brfYAtnaVBlZ97OZe9+2a2JN1a0LVTxMohwvk+/e4SE4Oxzn2Tj/cTg6PtrcaI+8Jx/IJ5KTz+SYfCMTMiWM/CZ/yA35mxwkX5OT5HRtTQabnHfkn0i+3wGLwc5G</latexit><latexit sha1_base64="vWz90V+WSLgypR+Jt0tNPkK/Kss=">AAAClHicbVHLahsxFJWnr3T6clrIphtRU+jKzIRAu+giEAJdFRfqJOAZzB3NVSKixyBpXIw6P9Nt+kP5m2hsB1ynFwSHc869uo+qkcL5LLsdJI8eP3n6bO95+uLlq9dvhvtvz5xpLcMpM9LYiwocSqFx6oWXeNFYBFVJPK+uT3r9fIHWCaN/+mWDpYJLLbhg4CM1Hx5Edy2YFwukjXEerTA2TefDUTbOVkEfgnwDRmQTk/n+gBe1Ya1C7ZkE52Z51vgygPWCSezSonXYALuGS5xFqEGhK8NqgI5+jExNubHxaU9X7HZGAOXcUlXRqcBfuV2tJ/+nzVrPv5RB6Kb1qNn6I95K6g3tt0FrYZF5uYwAmBWxV8quwAKLi4iVLGr8xYxSoOtQcFBCLmvk0ErfhcLxe5wW2z6HvpvlZSj6fhjIMMq7brfYAtnaVBlZ97OZe9+2a2JN1a0LVTxMohwvk+/e4SE4Oxzn2Tj/cTg6PtrcaI+8Jx/IJ5KTz+SYfCMTMiWM/CZ/yA35mxwkX5OT5HRtTQabnHfkn0i+3wGLwc5G</latexit><latexit sha1_base64="vWz90V+WSLgypR+Jt0tNPkK/Kss=">AAAClHicbVHLahsxFJWnr3T6clrIphtRU+jKzIRAu+giEAJdFRfqJOAZzB3NVSKixyBpXIw6P9Nt+kP5m2hsB1ynFwSHc869uo+qkcL5LLsdJI8eP3n6bO95+uLlq9dvhvtvz5xpLcMpM9LYiwocSqFx6oWXeNFYBFVJPK+uT3r9fIHWCaN/+mWDpYJLLbhg4CM1Hx5Edy2YFwukjXEerTA2TefDUTbOVkEfgnwDRmQTk/n+gBe1Ya1C7ZkE52Z51vgygPWCSezSonXYALuGS5xFqEGhK8NqgI5+jExNubHxaU9X7HZGAOXcUlXRqcBfuV2tJ/+nzVrPv5RB6Kb1qNn6I95K6g3tt0FrYZF5uYwAmBWxV8quwAKLi4iVLGr8xYxSoOtQcFBCLmvk0ErfhcLxe5wW2z6HvpvlZSj6fhjIMMq7brfYAtnaVBlZ97OZe9+2a2JN1a0LVTxMohwvk+/e4SE4Oxzn2Tj/cTg6PtrcaI+8Jx/IJ5KTz+SYfCMTMiWM/CZ/yA35mxwkX5OT5HRtTQabnHfkn0i+3wGLwc5G</latexit><latexit sha1_base64="vWz90V+WSLgypR+Jt0tNPkK/Kss=">AAAClHicbVHLahsxFJWnr3T6clrIphtRU+jKzIRAu+giEAJdFRfqJOAZzB3NVSKixyBpXIw6P9Nt+kP5m2hsB1ynFwSHc869uo+qkcL5LLsdJI8eP3n6bO95+uLlq9dvhvtvz5xpLcMpM9LYiwocSqFx6oWXeNFYBFVJPK+uT3r9fIHWCaN/+mWDpYJLLbhg4CM1Hx5Edy2YFwukjXEerTA2TefDUTbOVkEfgnwDRmQTk/n+gBe1Ya1C7ZkE52Z51vgygPWCSezSonXYALuGS5xFqEGhK8NqgI5+jExNubHxaU9X7HZGAOXcUlXRqcBfuV2tJ/+nzVrPv5RB6Kb1qNn6I95K6g3tt0FrYZF5uYwAmBWxV8quwAKLi4iVLGr8xYxSoOtQcFBCLmvk0ErfhcLxe5wW2z6HvpvlZSj6fhjIMMq7brfYAtnaVBlZ97OZe9+2a2JN1a0LVTxMohwvk+/e4SE4Oxzn2Tj/cTg6PtrcaI+8Jx/IJ5KTz+SYfCMTMiWM/CZ/yA35mxwkX5OT5HRtTQabnHfkn0i+3wGLwc5G</latexit>
training data D<latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit><latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit><latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit><latexit sha1_base64="oPcoGEkmNUQ9ycRwa+am8/9T5wE=">AAACm3icbVHLattAFB2rj6Tqy2mXJTDUKXRlpFBol4F0UUoXLtRJwBLmanTlDJmHmLlKMEKrfk237df0bzqyneI6vTBwOOfcO/dR1Ep6SpLfg+je/QcP9/YfxY+fPH32fHjw4szbxgmcCqusuyjAo5IGpyRJ4UXtEHSh8Ly4Ou3182t0XlrzjZY15hoWRlZSAAVqPjwkB9JIs+AlEPCjTANdClDtx+4ojufDUTJOVsHvgnQDRmwTk/nBoMpKKxqNhoQC72dpUlPegiMpFHZx1nisQVzBAmcBGtDo83Y1R8ffBKbklXXhGeIrdjujBe39UhfB2bfpd7We/J82a6j6kLfS1A2hEeuPqkZxsrxfCi+lQ0FqGQAIJ0OvXFyCA0FhdXHm0OCNsFqDKdusAi3VssQKGkVdm/nqFsfZts8jdbM0b/+udJR23W6xaxRrU2FV2c9mb33bromzRbcuVFTtJMjhMunuHe6Cs+NxmozTr8ejk3ebG+2zV+w1e8tS9p6dsE9swqZMsO/sB/vJfkWH0Wn0OfqytkaDTc5L9k9E0z9uO9BG</latexit>
h(xq)<latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit><latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit><latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit><latexit sha1_base64="7qseSvvjejTdHrDx+1BDXuw/chY=">AAAClXicbVFNb9QwEPWmfJTw0S099MAlYotULqukQoILUiUQ4sYisW2lTbSaOJOuVX8E2ymsLP+aXuEH9d/gbBZp2TKS5af33ozHM2XDmbFpejuIdu7df/Bw91H8+MnTZ3vD/ednRrWa4pQqrvRFCQY5kzi1zHK8aDSCKDmel1cfOv38GrVhSn6zywYLAZeS1YyCDdR8eHi0OM5LxSuzFOFyP/38++ujOJ4PR+k4XUVyF2RrMCLrmMz3B3VeKdoKlJZyMGaWpY0tHGjLKEcf563BBugVXOIsQAkCTeFWP/DJq8BUSa10ONImK3Yzw4EwXYfBKcAuzLbWkf/TZq2t3xWOyaa1KGn/UN3yxKqkG0dSMY3U8mUAQDULvSZ0ARqoDUOLc40Sf1AlBMjK5TUIxpcV1tBy611u6r84zjd9Bq2fZYXLu34ocDfKvN8udo20N21Mv/dtuiZalb4vVNZuEuSwmWx7D3fB2ck4S8fZ15PR6Zv1jnbJC/KSHJOMvCWn5DOZkCmhxJMb8ov8jg6j99HH6FNvjQbrnAPyT0Rf/gDixM3u</latexit>
h0(xq)<latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit><latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit><latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit><latexit sha1_base64="L9eBaoJfL28BitJJTdfr4VMQKmc=">AAAClnicbVHbbtQwEPWGWwm3LbxU4iViiygvq6RCgidUCSF4DBLbVtpEq4kz7lr1JdhOYWWFr+EV/oe/wdks0rJlJMtH55wZj2eqRnDr0vT3KLpx89btO3t343v3Hzx8NN5/fGp1ayjOqBbanFdgUXCFM8edwPPGIMhK4Fl1+a7Xz67QWK7VZ7dqsJRwoTjjFFygFuODw+WLo6LSorYrGS7/rVt8eXkYx4vxJJ2m60iug2wDJmQT+WJ/xIpa01aiclSAtfMsbVzpwThOBXZx0VpsgF7CBc4DVCDRln79hS55Hpg6YdqEo1yyZrczPEjbdxicEtzS7mo9+T9t3jr2pvRcNa1DRYeHWCsSp5N+HknNDVInVgEANTz0mtAlGKAuTC0uDCr8SrWUoGpfMJBcrGpk0ArX+cKyvzgutn0WXTfPSl/0/VAQfpJ13W6xK6SDaWv6g2/blRtddUOhivk8yGEz2e4eroPT42mWTrNPx5OTV5sd7ZGn5Bk5Ihl5TU7IR5KTGaHkO/lBfpJf0UH0NnoffRis0WiT84T8E1H+B14lzh8=</latexit>
h00(xq)<latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit><latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit><latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit><latexit sha1_base64="4oRPPou/ehHA6NCRlKVTihG66yA=">AAACl3icbVHbatwwENW6t9S9bdqn0BfTTUn6stihkL41UCh53EA3CazNMpZHWRFdXElOugjTr+lr+z39m8rrLWw3HRA6nHNmNJopa8GtS9Pfg+je/QcPH+08jp88ffb8xXD35bnVjaE4pVpoc1mCRcEVTh13Ai9rgyBLgRfl9adOv7hBY7lWX9yyxkLCleKMU3CBmg/39hcHB4d5qUVllzJc/ls7//puP47nw1E6TleR3AXZGozIOibz3QHLK00bicpRAdbOsrR2hQfjOBXYxnljsQZ6DVc4C1CBRFv41R/a5G1gqoRpE45yyYrdzPAgbddhcEpwC7utdeT/tFnj2IfCc1U3DhXtH2KNSJxOuoEkFTdInVgGANTw0GtCF2CAujC2ODeo8JZqKUFVPmcguVhWyKARrvW5ZX9xnG/6LLp2lhU+7/qhIPwoa9vtYjdIe9PG9HvfpmtidNn2hUrmJ0EOm8m293AXnB+Ns3ScnR2NTt6vd7RDXpM35JBk5JickFMyIVNCyXfyg/wkv6K96GP0OTrtrdFgnfOK/BPR2R/ZoM5Q</latexit>
p(y | xq)<latexit sha1_base64="xe6iYY+dJ7zsfnPHvnNK4bQ5lUE=">AAACo3icbVFdaxQxFM2OX3X86FYffQluFyrIMlMEfSz4IviySrctbIYlk7lpQ/MxJpnqEOYX+Gt81V/ivzGzs8K69ULI4Zxzb27uLWspnM+y36Pkzt179x/sPUwfPX7ydH988OzMmcYyWDAjjb0oqQMpNCy88BIuagtUlRLOy+v3vX5+A9YJo099W0Oh6KUWXDDqI7UaTw/roxaT15hEl1+D0sjKtSpe4Vu3+vLqME1X40k2y9aBb4N8AyZoE/PVwYiTyrBGgfZMUueWeVb7IlDrBZPQpaRxUFN2TS9hGaGmClwR1v/p8DQyFebGxqM9XrPbGYEq13cYnYr6K7er9eT/tGXj+bsiCF03HjQbHuKNxN7gfji4EhaYl20ElFkRe8XsilrKfBxhSixo+MqMUlRXgXCqhGwr4LSRvgvE8b84Jds+B75b5kUgfT+MyjDJu2632A2wwbQ1/cG37ZpbU3ZDoZKHeZTjZvLdPdwGZ8ezPJvln44nJ282O9pDL9BLdIRy9BadoA9ojhaIoe/oB/qJfiXT5GPyOTkdrMlok/Mc/RNJ8Qczg9Ko</latexit><latexit sha1_base64="xe6iYY+dJ7zsfnPHvnNK4bQ5lUE=">AAACo3icbVFdaxQxFM2OX3X86FYffQluFyrIMlMEfSz4IviySrctbIYlk7lpQ/MxJpnqEOYX+Gt81V/ivzGzs8K69ULI4Zxzb27uLWspnM+y36Pkzt179x/sPUwfPX7ydH988OzMmcYyWDAjjb0oqQMpNCy88BIuagtUlRLOy+v3vX5+A9YJo099W0Oh6KUWXDDqI7UaTw/roxaT15hEl1+D0sjKtSpe4Vu3+vLqME1X40k2y9aBb4N8AyZoE/PVwYiTyrBGgfZMUueWeVb7IlDrBZPQpaRxUFN2TS9hGaGmClwR1v/p8DQyFebGxqM9XrPbGYEq13cYnYr6K7er9eT/tGXj+bsiCF03HjQbHuKNxN7gfji4EhaYl20ElFkRe8XsilrKfBxhSixo+MqMUlRXgXCqhGwr4LSRvgvE8b84Jds+B75b5kUgfT+MyjDJu2632A2wwbQ1/cG37ZpbU3ZDoZKHeZTjZvLdPdwGZ8ezPJvln44nJ282O9pDL9BLdIRy9BadoA9ojhaIoe/oB/qJfiXT5GPyOTkdrMlok/Mc/RNJ8Qczg9Ko</latexit><latexit sha1_base64="xe6iYY+dJ7zsfnPHvnNK4bQ5lUE=">AAACo3icbVFdaxQxFM2OX3X86FYffQluFyrIMlMEfSz4IviySrctbIYlk7lpQ/MxJpnqEOYX+Gt81V/ivzGzs8K69ULI4Zxzb27uLWspnM+y36Pkzt179x/sPUwfPX7ydH988OzMmcYyWDAjjb0oqQMpNCy88BIuagtUlRLOy+v3vX5+A9YJo099W0Oh6KUWXDDqI7UaTw/roxaT15hEl1+D0sjKtSpe4Vu3+vLqME1X40k2y9aBb4N8AyZoE/PVwYiTyrBGgfZMUueWeVb7IlDrBZPQpaRxUFN2TS9hGaGmClwR1v/p8DQyFebGxqM9XrPbGYEq13cYnYr6K7er9eT/tGXj+bsiCF03HjQbHuKNxN7gfji4EhaYl20ElFkRe8XsilrKfBxhSixo+MqMUlRXgXCqhGwr4LSRvgvE8b84Jds+B75b5kUgfT+MyjDJu2632A2wwbQ1/cG37ZpbU3ZDoZKHeZTjZvLdPdwGZ8ezPJvln44nJ282O9pDL9BLdIRy9BadoA9ojhaIoe/oB/qJfiXT5GPyOTkdrMlok/Mc/RNJ8Qczg9Ko</latexit><latexit sha1_base64="xe6iYY+dJ7zsfnPHvnNK4bQ5lUE=">AAACo3icbVFdaxQxFM2OX3X86FYffQluFyrIMlMEfSz4IviySrctbIYlk7lpQ/MxJpnqEOYX+Gt81V/ivzGzs8K69ULI4Zxzb27uLWspnM+y36Pkzt179x/sPUwfPX7ydH988OzMmcYyWDAjjb0oqQMpNCy88BIuagtUlRLOy+v3vX5+A9YJo099W0Oh6KUWXDDqI7UaTw/roxaT15hEl1+D0sjKtSpe4Vu3+vLqME1X40k2y9aBb4N8AyZoE/PVwYiTyrBGgfZMUueWeVb7IlDrBZPQpaRxUFN2TS9hGaGmClwR1v/p8DQyFebGxqM9XrPbGYEq13cYnYr6K7er9eT/tGXj+bsiCF03HjQbHuKNxN7gfji4EhaYl20ElFkRe8XsilrKfBxhSixo+MqMUlRXgXCqhGwr4LSRvgvE8b84Jds+B75b5kUgfT+MyjDJu2632A2wwbQ1/cG37ZpbU3ZDoZKHeZTjZvLdPdwGZ8ezPJvln44nJ282O9pDL9BLdIRy9BadoA9ojhaIoe/oB/qJfiXT5GPyOTkdrMlok/Mc/RNJ8Qczg9Ko</latexit>
Figure 9: Illustration of Bayesian inference. Updating a prior distribution on H basedon training data D, a posterior distribution p(h | D) is obtained, which represents thestate of knowledge of the learner (middle). Given a query xq ∈ X , the predictive posteriordistribution on Y is then obtained through Bayesian model averaging: Different hypothesesh ∈ H provide predictions, which are aggregated in terms of a weighted average.
and learning essentially consists of replacing the prior by the posterior distribution:
p(h | D) =p(h) · p(D |h)
p(D)∝ p(h) · p(D |h) , (13)
where p(D |h) is the probability of the data given h (the likelihood of h). Intuitively,
p(· | D) captures the learner’s state of knowledge, and hence its epistemic uncertainty:
The more “peaked” this distribution, i.e., the more concentrated the probability mass in a
small region in H, the less uncertain the learner is. Just like the version space V in version
space learning, the posterior distribution on H provides global (averaged/aggregated over
the entire instance space) instead of local, per-instance information. For a given query
instance xq, this information may translate into quite different representations of the
uncertainty regarding the prediction yq (cf. Fig. 9).
More specifically, the representation of uncertainty about a prediction yq is given by the
image of the posterior p(h | D) under the mapping h 7→ p(y |xq, h) from hypotheses to
probabilities of outcomes. This yields the predictive posterior distribution
p(y |xq) =
∫
Hp(y |xq, h) dP (h | D) . (14)
In this type of (proper) Bayesian inference, a final prediction is thus produced by model
averaging : The predicted probability of an outcome y ∈ Y is the expected probability
p(y |xq, h), where the expectation over the hypotheses is taken with respect to the posterior
distribution onH; or, stated differently, the probability of an outcome is a weighted average
over its probabilities under all hypotheses in H, with the weight of each hypothesis h
given by its posterior probability p(h | D). Since model averaging is often difficult and
15
computationally costly, sometimes only the single hypothesis
hmap ..= argmaxh∈H
p(h | D) (15)
with the highest posterior probability is adopted, and predictions on outcomes are derived
from this hypothesis.
In (14), aleatoric and epistemic uncertainty are not distinguished any more, because epis-
temic uncertainty is “averaged out.” Consider the example of coin flipping, and let the
hypothesis space be given by H ..= hα | 0 ≤ α ≤ 1, where hα is modeling a biased coin
landing heads up with a probability of α and tails up with a probability of 1−α. According
to (14), we derive a probability of 1/2 for heads and tails, regardless of whether the (poste-
rior) distribution on H is given by the uniform distribution (all coins are equally probable,
i.e., the case of complete ignorance) or the one-point measure assigning probability 1 to
h1/2 (the coin is known to be fair with complete certainty):
p(y) =
∫
HαdP =
1
2=
∫
HαdP ′
for the uniform measure P (with probability density function p(α) ≡ 1) as well as the
measure P ′ with probability mass function p′(α) = 1 if α = 1/2 and = 0 for α 6= 1/2.
Obviously, MAP inference (15) does not capture epistemic uncertainty either.
More generally, consider the case of binary classification with Y ..= −1,+1 and ph(+1 |xq)the probability predicted by the hypothesis h for the positive class. Instead of deriving a
distribution on Y according to (14), one could also derive a predictive distribution for the
(unknown) probability q ..= p(+1 |xq) of the positive class:
p(q |xq) =
∫
HJ p(+1 |xq, h) = q K dP (h | D) . (16)
This is a second-order probability, which still contains both aleatoric and epistemic uncer-
tainty. The question of how to quantify the epistemic part of the uncertainty is not at all
obvious, however. Intuitively, epistemic uncertainty should be reflected by the variabil-
ity of the distribution (16): the more spread the probability mass over the unit interval,
the higher the uncertainty seems to be. But how to put this into quantitative terms?
Entropy is arguably not a good choice, for example, because this measure is invariant
against redistribution of probability mass. For instance, the distributions p and p′ with
p(q |xq) ..= 12(q− 1/2)2 and p′(q |xq) ..= 3−p(q |xq) both have the same entropy, although
they correspond to quite different states of information. From this point of view, the
variance of the distribution would be better suited, but this measure has other deficiencies
(for example, it is not maximized by the uniform distribution, which could be considered
as a case of minimal informedness).
The difficulty of specifying epistemic uncertainty in the Bayesian approach is rooted in
the more general difficulty of representing a lack of knowledge in probability theory. This
16
issue will be discussed next.
3.3 Representing a lack of knowledge
As already said, uncertainty is captured in a purely set-based way in version space learning:
V ⊆ H is a set of candidate hypotheses, which translates into a set of candidate outcomes
Y ⊆ Y for a query xq. In the case of sets, there is a rather obvious correspondence between
the degree of uncertainty in the sense of a lack of knowledge and the size of the set of
candidates: Proceeding from a reference set Ω of alternatives, assuming some ground-truth
ω∗ ∈ Ω, and expressing knowledge about the latter in terms of a subset C ⊆ Ω of possible
candidates, we can clearly say that the bigger C, the larger the lack of knowledge. More
specifically, for finite Ω, a common uncertainty measure in information theory is log(|C|).Consequently, knowledge gets weaker by adding additional elements to C and stronger by
removing candidates.
In standard probabilistic modeling and Bayesian inference, where knowledge is conveyed
by a distribution p on Ω, it is much less obvious how to “weaken” this knowledge. This is
mainly because the total amount of belief is fixed in terms of a unit mass that can be dis-
tributed among the elements ω ∈ Ω. Unlike for sets, where an additional candidate ω can
be added or removed without changing the plausibility of all other candidates, increasing
the weight of one alternative ω requires decreasing the weight of another alternative ω′ by
exactly the same amount.
Of course, there are also measures of uncertainty for probability distributions, most notably
the (Shannon) entropy, which, for finite Ω, is given as follows:
H(p) ..= −∑
ω∈Ω
p(ω) log p(ω) (17)
However, these measures are primarily capturing the shape of the distribution, namely
its “peakedness” or non-uniformity (Dubois and Hullermeier, 2007), and hence inform
about the predictability of the outcome of a random experiment. Seen from this point of
view, they are more akin to aleatoric uncertainty, whereas the set-based approach (i.e.,
representing knowledge in terms of a subset C ⊆ Ω of candidates) is arguably better suited
for capturing epistemic uncertainty.
For these reasons, it has been argued that probability distributions are less suitable for
representing ignorance in the sense of a lack of knowledge (Dubois et al., 1996). For
example, the case of complete ignorance is typically modeled in terms of the uniform dis-
tribution p ≡ 1/|Ω| in probability theory; this is justified by the “principle of indifference”
invoked by Laplace, or by referring to the principle of maximum entropy4. Then, how-
ever, it is not possible to distinguish between (i) precise (probabilistic) knowledge about
a random event, such as tossing a fair coin, and (ii) a complete lack of knowledge due to
4Obviously, there is a technical problem in defining the uniform distribution in the case where Ω is notfinite.
17
an incomplete description of the experiment. This was already pointed out by the famous
Ronald Fisher, who noted that “not knowing the chance of mutually exclusive events and
knowing the chance to be equal are two quite different states of knowledge.”
Another problem in this regard is caused by the measure-theoretic grounding of prob-
ability and its additive nature. For example, the uniform distribution is not invariant
under reparametrization (a uniform distribution on a parameter ω does not translate into
a uniform distribution on 1/ω, although ignorance about ω implies ignorance about 1/ω).
Likewise, expressing ignorance about the length x of a cube in terms of a uniform distri-
bution on an interval [l, u] does not yield a uniform distribution of x3 on [l3, u3], thereby
suggesting some degree of informedness about its volume. Problems of this kind render the
use of a uniform prior distribution, often interpreted as representing epistemic uncertainty
in Bayesian inference, at least debatable5.
The argument that a single (probability) distribution is not enough for representing uncer-
tain knowledge is quite prominent in the literature. Correspondingly, various generaliza-
tions of standard probability theory have been proposed, including imprecise probability
(Walley, 1991), evidence theory (Shafer, 1976; Smets and Kennes, 1994), and possibility
theory (Dubois and Prade, 1988). These formalisms are essentially working with sets of
probability distributions, i.e., they seek to take advantage of the complementary nature of
sets and distributions, and to combine both representations in a meaningful way (cf. Fig.
10). We also refer to Appendix A for a brief overview.
4 Machine learning methods for representing uncertainty
This section presents several important machine learning methods that allow for repre-
senting the learner’s uncertainty in a prediction. They differ with regard to the type of
prediction produced and the way in which uncertainty is represented. Another interesting
question is whether they allow for distinguishing between aleatoric and epistemic uncer-
tainty, and perhaps even for quantifying the amount of uncertainty in terms of degrees of
aleatoric and epistemic (and total) uncertainty.
Structuring the methods or arranging them in the form of a taxonomy is not an easy
task, and there are various possibilities one may think of. As shown in the overview
in Fig. 11, a possible distinction one may think of is whether a method is rooted in
classical, frequentist statistics (building on methods for maximum likelihood inference
and parameter estimation like in Section 4.2, density estimation like in Section 4.3, or
hypothesis testing like in Section 4.8), or whether it is more inspired by Bayesian inference.
Another distinction that can be made is between uncertainty quantification and set-valued
prediction: Most of the methods produce a prediction and equip this prediction with
additional information about the certainty or uncertainty in that prediction. The other
way around, one may of course also pre-define a desired level of certainty, and then produce
5This problem is inherited by hierarchical Bayesian modeling. See work on “non-informative” priors,however (Jeffreys, 1946; Bernardo, 1979).
18
h<latexit sha1_base64="e2m6hw+mhEHnF401k6LPaDmA7mk=">AAAC33icdVLLjtMwFHXDawiPmYElG4sOEotRlKQSU3YjsWHZkejMSElUOc5Na41jR/ZNURWyZofYsGADf8J/zN+QNK1UClzJ0tE55z7s67SUwqLv3w6cO3fv3X9w8NB99PjJ08Oj42eXVleGw5Rrqc11yixIoWCKAiVclwZYkUq4Sm/edfrVEowVWn3AVQlJweZK5IIzbKnoZDGLcQHITtzZ0dD3RuPxWejTHrwdbYAf0sDz1zEkm5jMjge/4kzzqgCFXDJro8AvMamZQcElNG5cWSgZv2FziFqoWAE2qdczN/RVy2Q016Y9Cuma3c2oWWHtqkhbZ8FwYfe1jvyXFlWYj5NaqLJCULxvlFeSoqbdA9BMGOAoVy1g3Ih2VsoXzDCO7TO5sQEFH7kuCqayOs5ZIeQqg5xVEps6tvkWu/GuzwI2UZDUcTcPZ7IeBk2zX2wJvDelWmbd3fTWt+uaGJ02faE0ryf7crmWyz12Lpag2qxT+onGp3913v4QiwaQL5o68IKm2/h2rfT/4DL0gpEXXoTD8zeb3R+QF+QleU0CckbOyXsyIVPCiSbfyA/y02HOZ+eL87W3OoNNznPyRzjffwPM3+2c</latexit>
y<latexit sha1_base64="ezyqJBAMotnT234hcNtkiofHVBc=">AAAC2HicdVLLbtNAFJ2YR1vzamHJZkSKxKKybEeiYVeJDcsgkbYitqrx+DoZdR7WzDiVNVhih9iwYANfw3/wN9hxIqUBrjTS0TnnPmbuZCVnxobh74F35+69+3v7B/6Dh48ePzk8enpuVKUpTKniSl9mxABnEqaWWQ6XpQYiMg4X2fXbTr9YgjZMyQ+2LiEVZC5ZwSixHXVcH/tXh8MwGI3Hp3GIe/BmtAZhjKMgXMUQrWNydTT4leSKVgKkpZwYM4vC0qaOaMsoh8ZPKgMloddkDrMWSiLApG41bINftkyOC6XbIy1esdsZjghjapG1TkHswuxqHfkvbVbZYpw6JsvKgqR9o6Li2Crc3RznTAO1vG4BoZq1s2K6IJpQ276Pn2iQcEOVEETmLimIYLzOoSAVt41LTLHBfrLtM2CbWZS6pJuHEu6GUdPsFlsC7U2Z4nl3N7XxbbsmWmVNXygr3GRXLldyucPO2RJkm3WCP+Hk5K/Om69hrAZLF42LgqjpNr5ZK/4/OI+DaBTE7+Ph2ev17vfRc/QCvUIROkVn6B2aoCmiaIG+oR/op/fR++x98b72Vm+wznmGboX3/Q/6/OqW</latexit>
h<latexit sha1_base64="e2m6hw+mhEHnF401k6LPaDmA7mk=">AAAC33icdVLLjtMwFHXDawiPmYElG4sOEotRlKQSU3YjsWHZkejMSElUOc5Na41jR/ZNURWyZofYsGADf8J/zN+QNK1UClzJ0tE55z7s67SUwqLv3w6cO3fv3X9w8NB99PjJ08Oj42eXVleGw5Rrqc11yixIoWCKAiVclwZYkUq4Sm/edfrVEowVWn3AVQlJweZK5IIzbKnoZDGLcQHITtzZ0dD3RuPxWejTHrwdbYAf0sDz1zEkm5jMjge/4kzzqgCFXDJro8AvMamZQcElNG5cWSgZv2FziFqoWAE2qdczN/RVy2Q016Y9Cuma3c2oWWHtqkhbZ8FwYfe1jvyXFlWYj5NaqLJCULxvlFeSoqbdA9BMGOAoVy1g3Ih2VsoXzDCO7TO5sQEFH7kuCqayOs5ZIeQqg5xVEps6tvkWu/GuzwI2UZDUcTcPZ7IeBk2zX2wJvDelWmbd3fTWt+uaGJ02faE0ryf7crmWyz12Lpag2qxT+onGp3913v4QiwaQL5o68IKm2/h2rfT/4DL0gpEXXoTD8zeb3R+QF+QleU0CckbOyXsyIVPCiSbfyA/y02HOZ+eL87W3OoNNznPyRzjffwPM3+2c</latexit>
y<latexit sha1_base64="ezyqJBAMotnT234hcNtkiofHVBc=">AAAC2HicdVLLbtNAFJ2YR1vzamHJZkSKxKKybEeiYVeJDcsgkbYitqrx+DoZdR7WzDiVNVhih9iwYANfw3/wN9hxIqUBrjTS0TnnPmbuZCVnxobh74F35+69+3v7B/6Dh48ePzk8enpuVKUpTKniSl9mxABnEqaWWQ6XpQYiMg4X2fXbTr9YgjZMyQ+2LiEVZC5ZwSixHXVcH/tXh8MwGI3Hp3GIe/BmtAZhjKMgXMUQrWNydTT4leSKVgKkpZwYM4vC0qaOaMsoh8ZPKgMloddkDrMWSiLApG41bINftkyOC6XbIy1esdsZjghjapG1TkHswuxqHfkvbVbZYpw6JsvKgqR9o6Li2Crc3RznTAO1vG4BoZq1s2K6IJpQ276Pn2iQcEOVEETmLimIYLzOoSAVt41LTLHBfrLtM2CbWZS6pJuHEu6GUdPsFlsC7U2Z4nl3N7XxbbsmWmVNXygr3GRXLldyucPO2RJkm3WCP+Hk5K/Om69hrAZLF42LgqjpNr5ZK/4/OI+DaBTE7+Ph2ev17vfRc/QCvUIROkVn6B2aoCmiaIG+oR/op/fR++x98b72Vm+wznmGboX3/Q/6/OqW</latexit>
h<latexit sha1_base64="e2m6hw+mhEHnF401k6LPaDmA7mk=">AAAC33icdVLLjtMwFHXDawiPmYElG4sOEotRlKQSU3YjsWHZkejMSElUOc5Na41jR/ZNURWyZofYsGADf8J/zN+QNK1UClzJ0tE55z7s67SUwqLv3w6cO3fv3X9w8NB99PjJ08Oj42eXVleGw5Rrqc11yixIoWCKAiVclwZYkUq4Sm/edfrVEowVWn3AVQlJweZK5IIzbKnoZDGLcQHITtzZ0dD3RuPxWejTHrwdbYAf0sDz1zEkm5jMjge/4kzzqgCFXDJro8AvMamZQcElNG5cWSgZv2FziFqoWAE2qdczN/RVy2Q016Y9Cuma3c2oWWHtqkhbZ8FwYfe1jvyXFlWYj5NaqLJCULxvlFeSoqbdA9BMGOAoVy1g3Ih2VsoXzDCO7TO5sQEFH7kuCqayOs5ZIeQqg5xVEps6tvkWu/GuzwI2UZDUcTcPZ7IeBk2zX2wJvDelWmbd3fTWt+uaGJ02faE0ryf7crmWyz12Lpag2qxT+onGp3913v4QiwaQL5o68IKm2/h2rfT/4DL0gpEXXoTD8zeb3R+QF+QleU0CckbOyXsyIVPCiSbfyA/y02HOZ+eL87W3OoNNznPyRzjffwPM3+2c</latexit>
y<latexit sha1_base64="ezyqJBAMotnT234hcNtkiofHVBc=">AAAC2HicdVLLbtNAFJ2YR1vzamHJZkSKxKKybEeiYVeJDcsgkbYitqrx+DoZdR7WzDiVNVhih9iwYANfw3/wN9hxIqUBrjTS0TnnPmbuZCVnxobh74F35+69+3v7B/6Dh48ePzk8enpuVKUpTKniSl9mxABnEqaWWQ6XpQYiMg4X2fXbTr9YgjZMyQ+2LiEVZC5ZwSixHXVcH/tXh8MwGI3Hp3GIe/BmtAZhjKMgXMUQrWNydTT4leSKVgKkpZwYM4vC0qaOaMsoh8ZPKgMloddkDrMWSiLApG41bINftkyOC6XbIy1esdsZjghjapG1TkHswuxqHfkvbVbZYpw6JsvKgqR9o6Li2Crc3RznTAO1vG4BoZq1s2K6IJpQ276Pn2iQcEOVEETmLimIYLzOoSAVt41LTLHBfrLtM2CbWZS6pJuHEu6GUdPsFlsC7U2Z4nl3N7XxbbsmWmVNXygr3GRXLldyucPO2RJkm3WCP+Hk5K/Om69hrAZLF42LgqjpNr5ZK/4/OI+DaBTE7+Ph2ev17vfRc/QCvUIROkVn6B2aoCmiaIG+oR/op/fR++x98b72Vm+wznmGboX3/Q/6/OqW</latexit>
h<latexit sha1_base64="e2m6hw+mhEHnF401k6LPaDmA7mk=">AAAC33icdVLLjtMwFHXDawiPmYElG4sOEotRlKQSU3YjsWHZkejMSElUOc5Na41jR/ZNURWyZofYsGADf8J/zN+QNK1UClzJ0tE55z7s67SUwqLv3w6cO3fv3X9w8NB99PjJ08Oj42eXVleGw5Rrqc11yixIoWCKAiVclwZYkUq4Sm/edfrVEowVWn3AVQlJweZK5IIzbKnoZDGLcQHITtzZ0dD3RuPxWejTHrwdbYAf0sDz1zEkm5jMjge/4kzzqgCFXDJro8AvMamZQcElNG5cWSgZv2FziFqoWAE2qdczN/RVy2Q016Y9Cuma3c2oWWHtqkhbZ8FwYfe1jvyXFlWYj5NaqLJCULxvlFeSoqbdA9BMGOAoVy1g3Ih2VsoXzDCO7TO5sQEFH7kuCqayOs5ZIeQqg5xVEps6tvkWu/GuzwI2UZDUcTcPZ7IeBk2zX2wJvDelWmbd3fTWt+uaGJ02faE0ryf7crmWyz12Lpag2qxT+onGp3913v4QiwaQL5o68IKm2/h2rfT/4DL0gpEXXoTD8zeb3R+QF+QleU0CckbOyXsyIVPCiSbfyA/y02HOZ+eL87W3OoNNznPyRzjffwPM3+2c</latexit>
y<latexit sha1_base64="ezyqJBAMotnT234hcNtkiofHVBc=">AAAC2HicdVLLbtNAFJ2YR1vzamHJZkSKxKKybEeiYVeJDcsgkbYitqrx+DoZdR7WzDiVNVhih9iwYANfw3/wN9hxIqUBrjTS0TnnPmbuZCVnxobh74F35+69+3v7B/6Dh48ePzk8enpuVKUpTKniSl9mxABnEqaWWQ6XpQYiMg4X2fXbTr9YgjZMyQ+2LiEVZC5ZwSixHXVcH/tXh8MwGI3Hp3GIe/BmtAZhjKMgXMUQrWNydTT4leSKVgKkpZwYM4vC0qaOaMsoh8ZPKgMloddkDrMWSiLApG41bINftkyOC6XbIy1esdsZjghjapG1TkHswuxqHfkvbVbZYpw6JsvKgqR9o6Li2Crc3RznTAO1vG4BoZq1s2K6IJpQ276Pn2iQcEOVEETmLimIYLzOoSAVt41LTLHBfrLtM2CbWZS6pJuHEu6GUdPsFlsC7U2Z4nl3N7XxbbsmWmVNXygr3GRXLldyucPO2RJkm3WCP+Hk5K/Om69hrAZLF42LgqjpNr5ZK/4/OI+DaBTE7+Ph2ev17vfRc/QCvUIROkVn6B2aoCmiaIG+oR/op/fR++x98b72Vm+wznmGboX3/Q/6/OqW</latexit>
<latexit sha1_base64="ZimqVKVm3rdxb5usMDFynHS5QpQ=">AAAC23icdVLLbtNAFJ24PIp5tIUlmxEpEovKsh2Jhl0lNiyDRJpKsVWNx9fJKPOwZsZB0eAVu6qbLtjAv/Af/A12nEghwJVGOjrn3MfMnazkzNgw/NXzDu7df/Dw8JH/+MnTZ0fHJ88vjao0hTFVXOmrjBjgTMLYMsvhqtRARMZhki3et/pkCdowJT/ZVQmpIDPJCkaJbajJaVLO2al/fdwPg8FweB6HuAPvBhsQxjgKwnX00SZG1ye9n0muaCVAWsqJMdMoLG3qiLaMcqj9pDJQErogM5g2UBIBJnXreWv8umFyXCjdHGnxmt3NcEQYsxJZ4xTEzs2+1pL/0qaVLYapY7KsLEjaNSoqjq3C7eVxzjRQy1cNIFSzZlZM50QTapsn8hMNEj5TJQSRuUsKIhhf5VCQitvaJabYYj/Z9Rmw9TRKXdLOQwl3/aiu94stgXamTPG8vZva+nZdI62yuiuUFW60L5drudxjZ2wJssk6w19wcvZX5+3vMFaDpfPaRUFUtxvfrhX/H1zGQTQI4o9x/+LtZveH6CV6hd6gCJ2jC/QBjdAYUbRAd+g7+uGl3lfvxrvtrF5vk/MC/RHet986t+vY</latexit>
<latexit sha1_base64="ZimqVKVm3rdxb5usMDFynHS5QpQ=">AAAC23icdVLLbtNAFJ24PIp5tIUlmxEpEovKsh2Jhl0lNiyDRJpKsVWNx9fJKPOwZsZB0eAVu6qbLtjAv/Af/A12nEghwJVGOjrn3MfMnazkzNgw/NXzDu7df/Dw8JH/+MnTZ0fHJ88vjao0hTFVXOmrjBjgTMLYMsvhqtRARMZhki3et/pkCdowJT/ZVQmpIDPJCkaJbajJaVLO2al/fdwPg8FweB6HuAPvBhsQxjgKwnX00SZG1ye9n0muaCVAWsqJMdMoLG3qiLaMcqj9pDJQErogM5g2UBIBJnXreWv8umFyXCjdHGnxmt3NcEQYsxJZ4xTEzs2+1pL/0qaVLYapY7KsLEjaNSoqjq3C7eVxzjRQy1cNIFSzZlZM50QTapsn8hMNEj5TJQSRuUsKIhhf5VCQitvaJabYYj/Z9Rmw9TRKXdLOQwl3/aiu94stgXamTPG8vZva+nZdI62yuiuUFW60L5drudxjZ2wJssk6w19wcvZX5+3vMFaDpfPaRUFUtxvfrhX/H1zGQTQI4o9x/+LtZveH6CV6hd6gCJ2jC/QBjdAYUbRAd+g7+uGl3lfvxrvtrF5vk/MC/RHet986t+vY</latexit>
version space
learning<latexit sha1_base64="poDxd8CG5YoSwXm/lkX46Zn6maM=">AAADL3icdVLLjtMwFHXDawivDizZWFRILEZV0g6ashuJDcsi0ZlBTVU5zk1rjR+R7RRVxh/FD8BnIDaIDQs28Au4bToqBa4U6eTcc4+v73VecWZsknxpRdeu37h56+B2fOfuvfsP2ocPz4yqNYURVVzpi5wY4EzCyDLL4aLSQETO4Ty/fLnKny9AG6bkG7usYCLITLKSUWIDNW2/zXKYMekEk6wiM/DuORU+blgK0oL2ceOATUUoZFnMgWjJ5CzOQBZXqvXPldG03Um6/cHgpJfgDXjRb0DSw2k3WUcHNTGcHrY+ZoWitQh2lBNjxmlS2Ykj2jLKIfjXBkIDl8F9HKAkAszErWfg8dPAFLhUOnzS4jW7W+GIMGYp8qAUxM7Nfm5F/is3rm05mDgmq9qCpJuDyppjq/BqoLhgGqjlywAI1Sz0iumcaELDTIKTBgnvqBKChOFkJRGMLwsoSc2td5kptzjOdnUGrB+nE5et+qGEu07q/b7ZAuhGlCterO6mtrpd1VCr3G+M8tIN99PVOl3tsTO2ABmqjvB7nB39dfL2xRmrwdK5d2k39XHY+Hat+P/grNdN+93e6+PO6XGz+wP0GD1Bz1CKTtApeoWGaIQo+oS+o5/oV/Qh+hx9jb5tpFGrqXmE/ojox28+GhAk</latexit>
credal
inference<latexit sha1_base64="cubfvHn/4oFuTDDrhefr4LNW/fA=">AAADKXicdVLLjtMwFHXDawivDizZWFRILEZV0g6ashuJDcsi6MxITVU5zk1rjR+R7RRVxl/EF7DhH9gBGxZs4C9w23RUClwp0vG55x773pu84szYJPnaiq5dv3Hz1sHt+M7de/cftA8fnhlVawojqrjSFzkxwJmEkWWWw0WlgYicw3l++XKVP1+ANkzJt3ZZwUSQmWQlo8QGatp+k+UwY9IJJllFZuDdcyp83LAUpAXtY6qhIDzLYiZL0CApxBnI4iq/PlxZTNudpNsfDE56Cd6AF/0GJD2cdpN1dFATw+lh61NWKFqLYEc5MWacJpWdOKItoxyCf22gIvQyuI8DlESAmbh19x4/DUyBS6XDJy1es7sVjghjliIPSkHs3OznVuS/cuPaloOJY7KqbWh5c1FZc2wVXo0SF0wDtXwZAKGahbdiOiea0DCT4BTmBO+oEoKE4WQlEYwvCyhJza13mSm3OM52dQasH6cTl63eQwl3ndT7fbMF0I0oV7xY9aa2ul3VUKvcb4zy0g3309U6Xe2xM7YAGaqO8HucHf118/ZfM1aDpXPv0m7q47Dx7Vrx/8FZr5v2u73Xx53T42b3B+gxeoKeoRSdoFP0Cg3RCFH0EX1HP9Gv6EP0OfoSfdtIo1ZT8wj9EdGP36lPDYA=</latexit>
hierarchical
Bayes<latexit sha1_base64="9UNQ73S6Qs/gZ560pzdlH6nvAUI=">AAADK3icdVLLjtMwFHXDawivDizZWFRILEZV0g6ashvBhmWR6MxITVU5zk1rjR+R7RRVxp/EFyDxE6xAbFiwgZ/AbdNRKXAlS8fnnnts3+u84szYJPnSiq5dv3Hz1sHt+M7de/cftA8fnhlVawojqrjSFzkxwJmEkWWWw0WlgYicw3l++WqVP1+ANkzJt3ZZwUSQmWQlo8QGatoeZTnMmHSCSVaRGXj3nAofNywFaUH7eM5AE03noYxnWfySLMHEGcjiSrHeXJlM252k2x8MTnoJ3oAX/QYkPZx2k3V0UBPD6WHrU1YoWotgRzkxZpwmlZ04oi2jHIJ/baAi9DK4jwOURICZuPX7PX4amAKXSoclLV6zuxWOCGOWIg9KQezc7OdW5L9y49qWg4ljsqotSLo5qKw5tgqvmokLpoFavgyAUM3CXTGdE01o6Elw0iDhHVVCkNCcrCSC8WUBJam59S4z5RbH2a7OgPXjdOKy1X1Cw10n9X7fbAF0I8oVL1ZvU1vdrmqoVe43Rnnphvvpap2u9tgZW4AMVUf4Pc6O/jp5+9uM1WDp3Lu0m/o4THw7Vvx/cNbrpv1u781x5/S4mf0BeoyeoGcoRSfoFL1GQzRCFH1E39FP9Cv6EH2OvkbfNtKo1dQ8Qn9E9OM3/A0OTQ==</latexit>
hypothesis space H(parameter 2 )
<latexit sha1_base64="zij1w3h8/eCW/SmrluhwuKoeuWY=">AAADR3icdVJNbxMxEPVuoS3LVwpHLhYJUpGqaDcparhV4tJjkJq2UhxFXu/sxqrtXWxvULTsX+POL0D8Cm6ICxLefEAIMJKlp3lvnsczjgvBjQ3DL56/d+fu/sHhveD+g4ePHreOnlyZvNQMRiwXub6JqQHBFYwstwJuCg1UxgKu49s3DX89B214ri7tooCJpJniKWfUutS09Y7EkHFVSa54QTOoq1dM1kTTLINEQGqD2aLI7QwMN9gUlAHuEEntjFFRXdQdQoLjgmoqwYJ2lFNaiglXmFw2sPMyIKCS3/7BtNUOu/3B4KwX4hV43V+DsIejbriMNlrHcHrkfSRJzkoJyjJBjRlHYWEnFdWWM+EcSWnAdXbr7McOKteMmVTL2dT4hcskOM21O8riZXa7oqLSmIWMnbJ5ltnlmuS/uHFp08Gk4qooLSi2uigtBbY5bgaNE66BWbFwgDLNXa+YzdykmBuUc9Kg4D3LpaRuOiSlkotFAiktha0rYtINDsi2zoCtx9Gk+rWCdlTXu2ZzYCtRnIukeVu+0W2rhjqP65VRnFbDXbpY0sVONuNzUK7qBH/A5OSvmzc/0VgNls3qKupGy41v1or/D6563ajf7b09bZ+frnd/iJ6h5+gYRegMnaMLNEQjxNBn9MPb9w78T/5X/5v/fSX1vXXNU/RH7Hk/ATLhFSE=</latexit>
outcome
space Y<latexit sha1_base64="mUESe4gEu0oakhesrPUoOE/ivvA=">AAADJHicdVLLjtMwFHXDawiP6cCSjUUHicWoStpBU3YjsWGFikRnBtVV5Tg3qTV+RLZTVIX8Dl/Ami9gh1jAgg18CG7aQilwJUtH55x7r32vk0Jw66LoSyu4cvXa9Rt7N8Nbt+/c3W8f3DuzujQMRkwLbS4SakFwBSPHnYCLwgCViYDz5PLZUj+fg7Fcq1duUcBE0lzxjDPqPDVtvyAJ5FxVkite0Bzq6gmTNTE0zyEVkLlQl45pCYSEtqAM8CGR1M0YFdXr+jAkoNLfyeG03Ym6/cHgpBfhFXjaX4Ooh+Nu1EQHrWM4PWh9IKlmpQTlmKDWjuOocJOKGseZ8BVJacH3vfTlxx4qKsFOqubhNX7kmRRn2vijHG7Y7YyKSmsXMvHO5aXtrrYk/6WNS5cNJhVXRelAsVWjrBTYabycIk65AebEwgPKDPd3xWxGDWXOzzokBhS88VOT1E+HZFRysUgho6VwdUVstsEh2fZZcPU4nlS/BtyJ63q32BzYypRokS7fpje+bdfQ6KReFUqyargrF41c7LA5n4PyWUf4LSZHf3XefDPrDDg2q6u4Gzcb36wV/x+c9bpxv9t7edw5PV7vfg89QA/RYxSjE3SKnqMhGiGG3qOv6Dv6EbwLPgafgs8ra9Ba59xHf0Tw7Sfgdgqs</latexit>
model space
(hyper-parameter 2 )<latexit sha1_base64="uhvqzhXNUn/vN4Ehk3gRVuKnRH0=">AAADN3icdVJLj9MwEHbDawmP7cKRi0UXaZGWKmkXbbmtxIVjkejuSnVVOc4ksdaPyHaKqpAfxoUrv4ETN8SFAxe44r5EKTCSpU/ffPN5POOkFNy6KPrUCq5dv3Hz1t7t8M7de/f32wcPzq2uDIMR00Kby4RaEFzByHEn4LI0QGUi4CK5ernIX8zAWK7VGzcvYSJprnjGGXWemrYpSSDnqpZc8ZLm0NTPmWyIoXkOqYDMhVKnILAtKQNMSHhUeBfzrKSGSnBg8CEpC44JV5gMC374NCSg0t9+4bTdibr9weC0F+EVeNFfg6iH4260jA5ax3B60PpAUs0qCcoxQa0dx1HpJjU1jjPhHUllwbdz5e3HHirfiJ3Uy1k0+IlnUpxp449yeMluV9RUWjuXiVdK6gq7m1uQ/8qNK5cNJjVXZeVAsdVFWSWw03gxWJxyA8yJuQeUGe57xazwU2J+SN7JgIK3TEtJ/XRIRiUX8xQyWgnX1MRmGxySbZ0F14zjSU0W/TAq6k7cNLtmM2ArUaJFunib3ui2VUOjk2ZllGT1cDddLtPlDpvzGShfdYzfYXL8182bn2edAceKpo678XLjm7Xi/4PzXjfud3uvTzpnJ+vd76FH6DE6QjE6RWfoFRqiEWLoI/qGfqCfwfvgc/Al+LqSBq11zUP0RwTffwEafxEs</latexit>
Bayesian
inference<latexit sha1_base64="Rh0NawRAKbnM7IbaA/iEXgdbnro=">AAADK3icdVLLjtMwFHXDawivDizZWFRILEZV0g6ashvBhmWR6MxITVU5zk1rjR+R7RRVxp/EFyDxE6xAbFiwgZ/AbdNRKXClSMfnnnvse2/yijNjk+RLK7p2/cbNWwe34zt3791/0D58eGZUrSmMqOJKX+TEAGcSRpZZDheVBiJyDuf55atV/nwB2jAl39plBRNBZpKVjBIbqGl7lOUwY9IJJllFZuDdcyp83LAUpAXt45dkCYYRmWUxkyVokBTiDGRxpVgfrkym7U7S7Q8GJ70Eb8CLfgOSHk67yTo6qInh9LD1KSsUrUWwo5wYM06Tyk4c0ZZRDsG/NlARehncxwFKIsBM3Lp/j58GpsCl0uGTFq/Z3QpHhDFLkQelIHZu9nMr8l+5cW3LwcQxWdU2tLy5qKw5tgqvhokLpoFavgyAUM3CWzGdE01omElwCnOCd1QJQcJwspIIxpcFlKTm1rvMlFscZ7s6A9aP04nLVu+hhLtO6v2+2QLoRpQrXqx6U1vdrmqoVe43Rnnphvvpap2u9tgZW4AMVUf4Pc6O/rp5+7cZq8HSuXdpN/Vx2Ph2rfj/4KzXTfvd3pvjzulxs/sD9Bg9Qc9Qik7QKXqNhmiEKPqIvqOf6Ff0IfocfY2+baRRq6l5hP6I6MdvDZ0OVQ==</latexit>
h<latexit sha1_base64="e2m6hw+mhEHnF401k6LPaDmA7mk=">AAAC33icdVLLjtMwFHXDawiPmYElG4sOEotRlKQSU3YjsWHZkejMSElUOc5Na41jR/ZNURWyZofYsGADf8J/zN+QNK1UClzJ0tE55z7s67SUwqLv3w6cO3fv3X9w8NB99PjJ08Oj42eXVleGw5Rrqc11yixIoWCKAiVclwZYkUq4Sm/edfrVEowVWn3AVQlJweZK5IIzbKnoZDGLcQHITtzZ0dD3RuPxWejTHrwdbYAf0sDz1zEkm5jMjge/4kzzqgCFXDJro8AvMamZQcElNG5cWSgZv2FziFqoWAE2qdczN/RVy2Q016Y9Cuma3c2oWWHtqkhbZ8FwYfe1jvyXFlWYj5NaqLJCULxvlFeSoqbdA9BMGOAoVy1g3Ih2VsoXzDCO7TO5sQEFH7kuCqayOs5ZIeQqg5xVEps6tvkWu/GuzwI2UZDUcTcPZ7IeBk2zX2wJvDelWmbd3fTWt+uaGJ02faE0ryf7crmWyz12Lpag2qxT+onGp3913v4QiwaQL5o68IKm2/h2rfT/4DL0gpEXXoTD8zeb3R+QF+QleU0CckbOyXsyIVPCiSbfyA/y02HOZ+eL87W3OoNNznPyRzjffwPM3+2c</latexit>
y<latexit sha1_base64="ezyqJBAMotnT234hcNtkiofHVBc=">AAAC2HicdVLLbtNAFJ2YR1vzamHJZkSKxKKybEeiYVeJDcsgkbYitqrx+DoZdR7WzDiVNVhih9iwYANfw3/wN9hxIqUBrjTS0TnnPmbuZCVnxobh74F35+69+3v7B/6Dh48ePzk8enpuVKUpTKniSl9mxABnEqaWWQ6XpQYiMg4X2fXbTr9YgjZMyQ+2LiEVZC5ZwSixHXVcH/tXh8MwGI3Hp3GIe/BmtAZhjKMgXMUQrWNydTT4leSKVgKkpZwYM4vC0qaOaMsoh8ZPKgMloddkDrMWSiLApG41bINftkyOC6XbIy1esdsZjghjapG1TkHswuxqHfkvbVbZYpw6JsvKgqR9o6Li2Crc3RznTAO1vG4BoZq1s2K6IJpQ276Pn2iQcEOVEETmLimIYLzOoSAVt41LTLHBfrLtM2CbWZS6pJuHEu6GUdPsFlsC7U2Z4nl3N7XxbbsmWmVNXygr3GRXLldyucPO2RJkm3WCP+Hk5K/Om69hrAZLF42LgqjpNr5ZK/4/OI+DaBTE7+Ph2ev17vfRc/QCvUIROkVn6B2aoCmiaIG+oR/op/fR++x98b72Vm+wznmGboX3/Q/6/OqW</latexit>
possibility theory
evidence theory<latexit sha1_base64="+73o5xf2QsMb42/TsRzTHpwsdaA=">AAADO3icdVLLjtMwFHXDawiP6cCSjUWFxGJUJe2gKbuR2LAsEp0Zqakqx7lprfEjsp2iyOTT4Af4AtbsEBsWbGCD26SjUuBKlo7PPffYvtdpwZmxUfS5E9y4eev2nYO74b37Dx4edo8enRtVagoTqrjSlykxwJmEiWWWw2WhgYiUw0V69Wqdv1iBNkzJt7YqYCbIQrKcUWI9Ne9CksKCSSeYZAVZQO1eUFGHLUtBWtB1WChjWMo4sxW2S1C6SpIQViwDSaFlwgRkdl2x2Vybzru9qD8cjU4HEW7Ay2ELogGO+9EmeqiN8fyo8zHJFC2Ft6OcGDONo8LOHNGWUQ7evzRQEHrl3aceSiLAzNymHzV+5pkM50r7JS3esLsVjghjKpF6pSB2afZza/JfuWlp89HMMVmU1r+8OSgvObYKr5uLM6aBWl55QKhm/q6YLokm1PfEO2mQ8I4qIYhvTpITwXiVQU5KbmuXmHyLw2RXZ8DW03jmkvV9KOGuF9f1vtkKaCNKFc/Wb1Nb3a5qrFVaN0Zp7sb76WKTLvbYBVuB9FXH+D1Ojv86efv7jNVg6bJ2cT+uQz/x7Vjx/8H5oB8P+4M3J72zk3b2B+gJeoqeoxidojP0Go3RBFH0CX1HP9Gv4EPwJfgafGukQaeteYz+iODHb+sUFY4=</latexit>
Figure 10: Set-based versus distributional knowledge representation on the level of pre-dictions, hypotheses, and models. In version space learning, the model class is fixed, andknowledge about hypotheses and outcomes is represented in terms of sets (blue color). InBayesian inference, sets are replaced by probability distributions (gray shading). Theorieslike possibility and evidence theory are in-between, essentially working with sets of distri-butions. Credal inference (cf. Section 4.6) generalizes Bayesian inference by replacing asingle prior with a set of prior distributions (here identified by a set of hyper-parameters).In hierarchical Bayesian modeling, this set is again replaced by a distribution, i.e., asecond-order probability.
a prediction that complies with this requirement — naturally, such a prediction appears in
the form of a set of candidates that is sufficiently likely to contain the ground truth.
The first three sections are related to classical statistics and probability estimation. Ma-
chine learning methods for probability estimation (Section 4.1), i.e., for training a prob-
abilistic predictor, often commit to a single hypothesis learned on the data, thereby ig-
noring epistemic uncertainty. Instead, predictions obtained by this hypothesis essentially
represent aleatoric uncertainty. As opposed to this, Section 4.2 discusses likelihood-based
inference and Fisher information, which allows for deriving confidence regions that reflect
epistemic uncertainty about the hypothesis. The focus here (like in frequentist statistics
in general) is more on parameter estimation and less on prediction. Section 4.3 presents
methods for learning generative models, which essentially comes down to the problem of
density estimation. Here, predictions and uncertainty quantification are produced on the
basis of class-conditional density functions.
The second block of methods is more in the spirit of Bayesian machine learning. Here,
knowledge and uncertainty about the sought (Bayes-optimal) hypothesis is maintained
explicitly in the form of a distribution on the hypothesis space (Sections 4.4 and 4.5),
a set of distributions (Section 4.6), or a graded set of distributions (Section 4.7), and
this knowledge is updated in the light of additional observations (see also Fig. 10). Cor-
respondingly, all these methods provide information about both aleatoric and epistemic
19
likelihood, Fisher information, confidenceregion for hypothesis h, epistemic uncertainty
<latexit sha1_base64="DIrcbXDm3vF2FTnLj3ZjVqgl0nw=">AAAC63icbVHJbtRAEG2bLZhtAifEpRUHicNoNB4O4RgJKeIYJCaJNLZG7XbZLk0vVnc7yLL8CZy4Ia58FN/BD9CzIMKEOj1VvXpV9SpvBFo3nf4Mwjt3791/cPAwevT4ydNno8PnF1a3hsOca6HNVc4sCFQwd+gEXDUGmMwFXOar9+v65TUYi1p9cl0DmWSVwhI5cz61HH1Jc6hQ9RIVNqyCoT/hckgNqyooDFa1iwSuvHytdTGmZ2hrMBRVqY3cSIwp16rEAhQHaryWVtQXad012tVg0dLj+nhMofHHgEROW880jqFyXRSloIq/w5ejeDqZboLeBskOxGQX58vD4CwtNG8lKMcFs3aRTBuX9cw45AKGKG0tNIyvvPrCQ8Uk2Kzf+DbQ1z5TbLYttXJ0k73Z0TNpbSdzz/TH1na/tk7+r7ZoXfku61E1rfO+bAeVraBO0/UTaIEGuBOdB4wb9LtSXjPDuPOvilIDCj5zLSXz5qQlkyi6AkrWCjf0qS3/4Ci9ybPghkWS9el6H85EHyfDsC92DXxLyrUo1rfpLc87n+z7fBtczCbJ28ns4yw+He9+cEBekSPyhiTkhJySD+SczAknv4KXwVEQhzL8Gn4Lv2+pYbDreUH+ifDHbxVp7+0=</latexit>
single hypothesis h, conditional probabilitiesp(y | xq, h), only aleatoric uncertainty
<latexit sha1_base64="K1bIBk1IkyPkQ77X1tfO2XMREsA=">AAAC+nicbVHLbtQwFHXCq4RXC0s2FlOkQRpVk2FRlpWQEMsi0Yc0Ho1unJvEqh/BdgpRmj9hxQ6x5Wf4G5yZQZQpV7J1fO+51/Y5WS2F89Ppryi+dfvO3Xs795MHDx89frK79/TUmcZyPOFGGnuegUMpNJ544SWe1xZBZRLPsou3Q/3sEq0TRn/0bY0LBaUWheDgQ2q5+5VlWArdKaFFDSX23SFXPbNQlphbUVY+cUKXEmnV1sZX6ISj+9X+hHKjczEMAUlrazLIhAxnDOV63FI2oVfDRtkl8u5Lv/xEJ7R6FRqNli0FieCNFZw2mqP1ILRvk4Shzv++Zbk7mh5MV0FvgnQDRmQTx8u96B3LDW8Uas8lODdPp7VfdGC94BL7hDUOa+AXYfo8QA0K3aJbydjTlyGT08LYsLSnq+z1jg6Uc63KAlOBr9x2bUj+rzZvfPFm0QldNx41X19UNJJ6QwdPaC4sch9UyQVwG0TklFdggfvgXMIsavzMjVIQxGEFKCHbHAtopO875oo/OGHXeQ59P08XHRvew0F2o7Tvt4cFb9akzMh8+JtZ84Ly6bbON8Hp7CB9fTD7MBsdTTYe7JDn5AUZk5QckiPynhyTE8KjOBpHaTSLr+Jv8ff4x5oaR5ueZ+SfiH/+BkLZ8sI=</latexit>
density estimation p(x | y), aleatoric andepistemic uncertainty
<latexit sha1_base64="FIki2NeUGAU5BrR+KD/TOQhbHoY=">AAAC4HicbVFNb9QwEHXCVwlQtnAAiYvFFqlI1WqzHMqxEhLiWCS2rbRerRxnkrXqj8ieFKKQCyduiCv/jH/Az8DZXUTZMpKt55k3M543WaWkx/H4ZxTfuHnr9p2du8m9+w92Hw72Hp16WzsBU2GVdecZ96CkgSlKVHBeOeA6U3CWXbzp42eX4Ly05gM2Fcw1L40spOAYXIvBF5ZBKU2rpZEVL6Frj4TumONlCbmT5RKTHIyX2FDwKPUqje5XB+wSRPupo5Qd0s/9RZuX+4eUK+BonRSUm5xCFSYAHV61EeCQS4NNkjAw+d+Oi8FwPBqvjF4H6QYMycZOFnvRW5ZbUWswKBT3fpaOK5y33KEUCrqE1R4qLi5C9VmAhmvw83YlVkdfBE9OC+vCMUhX3qsZLdfeNzoLzDDt0m/Heuf/YrMai9fzVpqqRjBi3aioFUVLe+VpLh0IVE0AXDgZ/krFkjsuMOwnYQ4MfBRW6yBcywqupWpyKHitsGuZL/7ghF3lecBuls5b1v9HcNUO067bLhZWtSZlVuX9bHbNC8qn2zpfB6eTUfpqNHk/GR4fbnawQ56R5+SApOSIHJN35IRMiSC/ot3oSfQ0zuKv8bf4+5oaR5ucx+Qfi3/8Bnom6b8=</latexit>
Gaussian processes, posterior p(y | xq),aleatoric and epistemic uncertainty
<latexit sha1_base64="cNeXE3U81OzKGCnSiGxINiMpE5w=">AAAC63icbVHLbtNAFB2bVzGPprBCbEZNkYoURXZYlGUlJGBZJNJWiqPoenztjDoPMzMuWMafwIodYstH8R38AOMkiJJyF6Mz9577OjerBLcujn8G4Y2bt27f2bkb3bv/4OHuYO/RqdW1YThlWmhznoFFwRVOHXcCzyuDIDOBZ9nFqz5+donGcq3eu6bCuYRS8YIzcN61GHxJMyy5aiVXvIISu/aIyS41UJaYG14uXfQGams5KFoZzdBatCNaaevQcG3oQXXY0HREP/dPeoms/dQtPjw/GFEQCE4bziionGLll0Hpf7ViaBxw5ZooSlHlf5svBsN4HK+MXgfJBgzJxk4We8HrNNeslqgcE2DtLIkrN2/BOM4EdlFaW6yAXfjqMw8VSLTzdqVbR595T04Lv0WhlaMr79WMFqS1jcw8U4Jb2u1Y7/xfbFa74uW85aqqHSq2blTUgjpN+yPQnBtkTjQeADPcz0rZEgwwL6qvZFDhR6al9MK1aQGSiybHAmrhuja1xR8cpVd5Fl03S+Zt2s/DQLTDpOu2i/kDrUmZFnm/m17zvPLJts7XwelknLwYT95NhsejzQ12yFOyTw5JQo7IMXlLTsiUMPIreBLsB8NQhl/Db+H3NTUMNjmPyT8W/vgNd6ruxQ==</latexit>
credal sets for robust extension of Bayesianinference
<latexit sha1_base64="O8CKPLiTCFNTSmQuAxg3q9Aw2pA=">AAACxnicbVFNb9NAEN2Yr2K+UjhyWREhcUCRnR7KsWol1GORSFspa0Xj9dhZdT+s3XWLZVniF3Lnf3AFsU6CKClzenrz9u3Mm7yWwvkk+T6K7t1/8PDR3uP4ydNnz1+M91+eO9NYjnNupLGXOTiUQuPcCy/xsrYIKpd4kV+dDP2La7ROGP3ZtzVmCiotSsHBB2o5XrEcK6E7JbSoocK+O+SqZxaqCgsrqpWPucUCJHXoHS2NpdbkjfMUv3jUgy81JT2GFp0ATYUu0aLmGMcMdfHXdzmeJNNkXfQuSLdgQrZ1ttwffWSF4Y1C7bkE5xZpUvusA+sFl9jHrHFYA78K7osANSh0WbeOpKdvA1Osxy2N9nTN3n7RgXKuVXlQKvArt9sbyP/1Fo0vP2Sd0HUT1uebj8pGUm/okC8thEXuZRsAcCvCrJSvwAL34QoxC9ngDTdKQQiHlaCEbAssoZG+75gr/+CY3daF7PtFmnVsmIeD7CZp3++aXSPfiHIji2E3s9GF5NPdnO+C89k0PZjOPs0mR++3N9gjr8kb8o6k5JAckVNyRuaEk2/kB/lJfkWnkY6a6GYjjUbbN6/IPxV9/Q1uqePU</latexit>
Bayesian neural networks, weights as randomvariables, prediction + uncertainty quantification
<latexit sha1_base64="5kQYPLs8v0vzAtUSs5GmVwIYQlA=">AAAC8nicbVFLb9NAEF6bVzGPpnDksiJF4lFFcXooxwokxLFIpK0UR9F4PXZW2YfZXSeyLP8LTtwQV/4Qf4UT6yRASRlptZ9mvtn59pu0FNy64fBHEN64eev2nb270b37Dx7u9w4enVtdGYZjpoU2lylYFFzh2HEn8LI0CDIVeJEu3nb1iyUay7X66OoSpxIKxXPOwPnUrPc5SbHgqpFc8RIKbJuTwTGTbWKgKDAzvJi76A3UaDkoqrAyIPzlVtos7BFdYUewFCw1oDIt6RIMBz/dF72QjLNuDj18dUgrxdA44MrV9FMFyv2REUUJquyvhlmvPxwM10Gvg3gL+mQbZ7OD4F2SaVZJVI4JsHYSD0s3bcA4zgS2UVJZLIEt/OsTDxVItNNmbV9Ln/lMRnNt/FGOrrNXOxqQ1tYy9UwJbm53a13yf7VJ5fLX04arsnKo2GZQXgnqNO12QTNukDlRewDMcK+VsjkYYM5vLEoMKlwxLaV3tklykFzUGeZQCdc2ic1/4yi5yrPo2kk8bZJODwPR9OO23X1siWxDSrXIur/pDc87H+/6fB2cjwbx8WD0YdQ/PdruYI88IU/JcxKTE3JK3pMzMiaM/Axo8CJ4GbrwS/g1/LahhsG25zH5J8LvvwD0q/J9</latexit>
reliable classification, explicit modeling ofaleatoric and epistemic uncertainty
<latexit sha1_base64="FE9zeNOWTM7uVAe7hLBQxfGYg/c=">AAAC4XicbVFNb9NAEF2br2I+msIFxGVFhMShiuJwKMdKSIhjkUhbKY6i8XrsrLof1u64EFk+ceKGuPLL+An8C9ZJgJIy0kpvZ97MvH2b10p6Go9/RPGNm7du39m7m9y7/+Dh/uDg0am3jRM4FVZZd56DRyUNTkmSwvPaIehc4Vl+8aavn12i89KaD7Sqca6hMrKUAiikFoPPWY6VNK2WRtZQYdceCd1lDqoKCyerJSUuDIcwjwsF3v/pPeT4KSgUkri2RS+g4rbkoBDIOik4mIJjHZ6AOtwaI9ARSEOrJMnQFH9XLgbD8Wi8Dn4dpFswZNs4WRxEb7PCikajobWoWTquad6CIykUdknWeKxBXITpswANaPTzdu1Wx1+ETMFL68IxxNfZqx0taO9XOg9MDbT0u7U++b/arKHy9byVpm4IjdgsKhvFyfLeel5Ih4LUKgAQTgatXCzBgaDwQUnm0OBHYbUOxrVZCVqqVYElNIq6NvPlb5xkV3keqZul8zbr9QhQ7TDtut1hlyg2pNyqon+b3fCC8+muz9fB6WSUvhpN3k+Gx4fbP9hjz9hz9pKl7Igds3fshE2ZYD+j/ehJ9DQW8Zf4a/xtQ42jbc9j9k/E338BxNTshg==</latexit>
conformal prediction, inclusion/exclusion ofcandidate outcomes through hypothesis testing
<latexit sha1_base64="xpy9O7/PDnuKAmcfkEoVdp+D9uw=">AAAC6nicbVHLitRAFK3E1xhfPboR3JTTCC6GNmkX43JAEJcj2DMDnaapVG6SYuoRqm5GQ8gfuHInbv0pf8MvsNIdcezxQsHh3HMfdW5WS+Ewjn8G4Y2bt27f2bsb3bv/4OGjyf7jU2cay2HBjTT2PGMOpNCwQIESzmsLTGUSzrKLt0P+7BKsE0Z/xLaGlWKlFoXgDD21nnxJMyiF7pTQomYl9N0RV31qWVlCbkVZYcSNLoxVTFLfORd8KDykQnPZDG1fwecRUVNQznQucoZATYPcKHAUK2uasqJVWxuswAlPgUOhyyhKQed/Z68n03gWb4JeB8kIpmSMk/V+8C7NDW8UaOSSObdM4hpXHbMouIQ+ShsHNeMXvvvSQ838QqtuY1tPX3gmp/5v/mmkG/ZqRceUc63KvFIxrNxubiD/l1s2WLxZdULXDYLm20FFIykaOtyA5sICR9l6wLgVflfKK2YZR3+pKLWg4ZP3Tnkvu7RgSsg2h4I1EvsudcUfHKVXdQ6wXyarLh324Ux206Tvd5tdAt+KMiPz4W9mq/POJ7s+Xwen81nyejb/MJ8eH4432CPPyAF5SRJyRI7Je3JCFoSTX8HT4HlwEMrwa/gt/L6VhsFY84T8E+GP3xff8O0=</latexit>
set-valued prediction through (expected) utilitymaximization
<latexit sha1_base64="TXQ7/oSUZaSoZ9hqHJ3gNDXEsKU=">AAACzXicbVFNb9NAEF2bFor5SuHIZUWEVCSI4nAox0pIiBtFIm2lOIrG67G96n5Y+xEajLnyCznwW7iwToIoKSOt9DTz5s3sm7wR3Lrx+GcU39rbv33n4G5y7/6Dh48Gh4/PrPaG4ZRpoc1FDhYFVzh13Am8aAyCzAWe55dv+/r5Eo3lWn1yqwbnEirFS87AhdRioLMcK65ayRVvoMKuPWayywxUFRaGV7VLLLpXSxAeCxqkC876Tupqo31V0yO8apA5LF5Q77jgbkUlXHHJv6wnJEmGqvgrvxgMx6PxOuhNkG7BkGzjdHEYvcsKzbxE5ZgAa2fpuHHzFozjTGCXZN5iA+wyqM8CVCDRztu1Mx19HjIFLbUJTzm6zl7vaEFau5J5YEpwtd2t9cn/1WbelW/mLVeNd6jYZlDpBXWa9jbTgpvgilgFAMzwsCtlNRgITpmgZFDhZ6alhGBOVoLkYlVgCV64rs1s+Qcn2XVeuEQ3S+dt1u/DQLTDtOt2xZbINqRci6L/m97wgvPprs83wdlklL4eTT5Ohicvtzc4IE/JM3JEUnJMTsh7ckqmhJEf5Fe0F+3HH2Iff42/bahxtO15Qv6J+PtvgzzlPA==</latexit>
“Bayesian” approaches<latexit sha1_base64="YCOra8/7gFl9bLU/egM2Y3VKi60=">AAACp3icbVFdi9NAFJ1m1V3jV3d99CVYZBWkJJVFHxcF0SdXsN1CJ3RvJjfpsPMRZiYrIeQ3+Wt88EX/ipO24tr1wsDh3HO/zmSV4NbF8Y9BsHfr9p39g7vhvfsPHj4aHh7NrK4NwynTQpt5BhYFVzh13AmcVwZBZgLPs8t3ff78Co3lWn1xTYWphFLxgjNwnloOP9IMS65ayRWvoMSuPWGyowbKEnPDy5ULLy7eQoOWgzo+jqCqjAa2QhuGFFX+t3A5HMXjeB3RTZBswYhs42x5OHhPc81qicoxAdYukrhyaQvGcSawC2ltsQJ26bsvPFQg0abt+uYueuaZPCq08U+5aM1er2hBWtvIzCsluJXdzfXk/3KL2hVv0parqnao2GZQUYvI6ag3MMq5QeZE4wEww/2uEVuBAea8zSE1qPAr01KCN4cWILlociygFq5rqS3+4JBe11l03SJJW9rvw0C0o6TrdptdIduIMi3y/ja90Xnnk12fb4LZZJy8Gk8+T0anL7d/cECekKfkOUnIa3JKPpAzMiWMfCPfyU/yK3gRfApmwXwjDQbbmsfknwjgN/4G1p8=</latexit>
set-valued prediction<latexit sha1_base64="XrgvME5R3Nqxnb14F3MdpiU2bJs=">AAACpXicbVFdixMxFE3Hj13Hj+3qoy+DRfRBy0xF1scFQX0RVth2F5qh3MncmYbNx5BkKiXMT/LX+CTofzHTVly7Xggczj3n5uakaAS3Lk1/DKJbt+/cPTi8F99/8PDR0fD48czq1jCcMi20uSzAouAKp447gZeNQZCFwIvi6n3fv1ihsVyrc7duMJdQK15xBi5Qi+FHWmDNlZdc8QZq7PxbJjtqoK6xNLxeutiie70C0WKZhNElZ70zjimq8q9tMRyl43RTyU2Q7cCI7OpscTz4QEvNWonKMQHWzrO0cbkH4zgT2MW0tdgAuwrT5wEqkGhzv3lxlzwPTJlU2oSjXLJhrzs8SGvXsghKCW5p93s9+b/evHXVu9xz1bQOFdteVLUicTrp40tKbpA5sQ4AmOFh14QtwQBzIeSYGlT4lWkpIYRDK5BcrEusoBWu89RWf3BMr+tCwt08yz3t92Eg/Cjruv1hK2RbUaFF2b9Nb3Uh+Ww/55tgNhlnb8aTL5PR6avdHxySp+QZeUkyckJOySdyRqaEkW/kO/lJfkUvos/ReTTbSqPBzvOE/FPR4jchRNZW</latexit>
uncertainty quantification<latexit sha1_base64="81J52zVvjZaQgdaVFecAM1OqqiI=">AAACqnicbVFNb9NAEN2YjxbzlcKRi0WExKGK4oAEx0pIiAOHgkhbZFvReD12Vt0PsztuZVn+U/waOMIvYZ0EKCkjrfT05r3Z2bd5LYWj2ez7KLhx89btvf074d179x88HB88OnGmsRwX3Ehjz3JwKIXGBQmSeFZbBJVLPM3P3wz90wu0Thj9idoaMwWVFqXgQJ5ajt+nOVZCd0poUUOFffeSqz61UFVYWFGtKGw0R0sgNLXRlwY0/bGHYYq6+Otdjiez6Wxd0XUQb8GEbet4eTB6mxaGNwo1cQnOJfGspqwDS4JL7MO0cVgDP/fTEw81KHRZt352Hz3zTBGVxvqjKVqzVx0dKOdalXulAlq53d5A/q+XNFS+zjqh64ZQ881FZSMjMtGQYVQIi5xk6wFwK/yuEV+BBU4+6TC1qPGSG6XAh5OWoIRsCyyhkdR3qSt/4zC9qnNIfRJnXTrsw0F2k7jvd4ddIN+IciOL4W1mo/PJx7s5Xwcn82n8Yjr/MJ8cHW7/YJ89YU/ZcxazV+yIvWPHbME4+8q+sR/sZ3AYfAw+B8lGGoy2nsfsnwqKXz+C2PE=</latexit>
probability, classical(fequentist) statistics
<latexit sha1_base64="i4edDDd6mh9123KWEMDoxb2Np+Y=">AAACvnicbVFdi9QwFM3Ur7V+zeqjL8FBWGEZ2lHQxwFBfFzB2V2YlPE2ve2EzUdN0tVS+vf8D/4HX/XZdGbEddYLgcO5557cnOS1FM4nyfdRdOPmrdt3Du7G9+4/ePhofPj41JnGclxwI409z8GhFBoXXniJ57VFULnEs/zi7dA/u0TrhNEffVtjpqDSohQcfKBW408sx0roTgktaqiw715x1TMLVYWFFdXax7U1OeRCCt8eUy7BuTAt6VGJnxvUPuz4gjoPAxDcxTFDXfz1W40nyTTZFL0O0h2YkF2drA5H71hheKOC9+a2ZZrUPuvABnuJfcwahzXwi+C+DFCDQpd1myh6+jwwBS2NDUd7umGvTnSgnGtVHpQK/Nrt9wbyf71l48s3WSd03XjUfHtR2UjqDR1ypYWwyL1sAwBuRdiV8jVY4D6kHzOLGr9woxSEcFgJSsi2wBIa6fuOufIPjtlVnUPfL9OsY8M+IfRukvb9vtkl8q0oN7IY3ma2upB8up/zdXA6m6Yvp7MPs8n8ePcHB+QpeUaOSEpekzl5T07IgnDyjfwgP8mvaB6VkYrMVhqNdjNPyD8Vff0Npp7gxA==</latexit>
Sec. 4.1<latexit sha1_base64="YDvBVaDQzb8onNo5auQXtT2yU94=">AAACmHicbVHLbtNAFJ2YR4t5NIUd3VhESCyQZadIVKwiIRXYtUDaShkruh5fO6POw5oZF0WW/4Wv6Ra2/A3jJIiScqWRjs4993UmrwW3Lkl+DYI7d+/d39l9ED589PjJ3nD/6ZnVjWE4ZVpoc5GDRcEVTh13Ai9qgyBzgef55fs+f36FxnKtvrpljZmESvGSM3Cemg/f0RwrrlrJFa+hwq49ZLKjBqoKC8OrhQu/IItp9CZOQ4qq+KucD0dJnKwiug3SDRiRTZzM9wfHtNCskagcE2DtLE1ql7VgHGcCu5A2Fmtgl777zEMFEm3Wro7sopeeKaJSG/+Ui1bszYoWpLVLmXulBLew27me/F9u1rjyKGu5qhuHiq0HlY2InI56x6KCG2ROLD0AZrjfNWILMMCc9zWkBhV+Y1pK8ObQEiQXywJLaITrWmrLPzikN3UWXTdLs5b2+zAQ7Sjtuu1mV8jWolyLor9Nr3Xe+XTb59vgbBynh/H4dDyavN78wS45IC/IK5KSt2RCPpITMiWMfCfX5Af5GTwPJsGH4NNaGgw2Nc/IPxF8/g0Fks/V</latexit>
Sec. 4.2<latexit sha1_base64="GGxUN5skoyQAeNx7eF86Y2pD4Kc=">AAACmHicbVHLbtNAFJ2YR4t5NIUd3VhESCyQZadIVKwiIRXYtUDaShkruh5fO6POw5oZF0WW/4Wv6Ra2/A3jJIiScqWRjs4993UmrwW3Lkl+DYI7d+/d39l9ED589PjJ3nD/6ZnVjWE4ZVpoc5GDRcEVTh13Ai9qgyBzgef55fs+f36FxnKtvrpljZmESvGSM3Cemg/f0RwrrlrJFa+hwq49ZLKjBqoKC8OrhQu/IItp9CYehxRV8Vc5H46SOFlFdBukGzAimziZ7w+OaaFZI1E5JsDaWZrULmvBOM4EdiFtLNbALn33mYcKJNqsXR3ZRS89U0SlNv4pF63YmxUtSGuXMvdKCW5ht3M9+b/crHHlUdZyVTcOFVsPKhsROR31jkUFN8icWHoAzHC/a8QWYIA572tIDSr8xrSU4M2hJUgulgWW0AjXtdSWf3BIb+osum6WZi3t92Eg2lHaddvNrpCtRbkWRX+bXuu88+m2z7fB2ThOD+Px6Xg0eb35g11yQF6QVyQlb8mEfCQnZEoY+U6uyQ/yM3geTIIPwae1NBhsap6RfyL4/BsHms/W</latexit>
Sec. 4.3<latexit sha1_base64="5BUFW/TtDuomswuO991ZsfWt0Nc=">AAACmHicbVHLbtNAFJ2YVzGPpmUHG4sIiQWy7AQJxCpSpQK78khbKWNF1+NrZ9R5WDPjosjyv/A13cKWv2GcBFFSrjTS0bnnvs7kteDWJcmvQXDr9p279/buhw8ePnq8Pzw4PLW6MQxnTAttznOwKLjCmeNO4HltEGQu8Cy/OOrzZ5doLNfqq1vVmEmoFC85A+epxfAdzbHiqpVc8Roq7NoJkx01UFVYGF4tXfgFWUyj1/EkpKiKv8rFcJTEyTqimyDdghHZxsniYHBMC80aicoxAdbO06R2WQvGcSawC2ljsQZ24bvPPVQg0Wbt+sgueuGZIiq18U+5aM1er2hBWruSuVdKcEu7m+vJ/+XmjSvfZi1XdeNQsc2gshGR01HvWFRwg8yJlQfADPe7RmwJBpjzvobUoMJvTEsJ3hxaguRiVWAJjXBdS235B4f0us6i6+Zp1tJ+HwaiHaVdt9vsEtlGlGtR9Lfpjc47n+76fBOcjuN0Eo8/jUfTV9s/2CPPyHPykqTkDZmSD+SEzAgj38kV+UF+Bk+DafA++LiRBoNtzRPyTwSffwMJos/X</latexit>
Sec. 4.4<latexit sha1_base64="UtgxjIjl5H59hbmKxTlFrwCeXWY=">AAACmHicbVHLattAFB2rj6Tqy0l37UbUFLooQnICDVkZAn3s0oeTgEeYq9GVPGQeYmaUYIT+pV/Tbbvt33RkuzR1emHgcO65rzN5Lbh1SfJrENy5e+/+zu6D8OGjx0+eDvf2z6xuDMMp00KbixwsCq5w6rgTeFEbBJkLPM8vT/r8+RUay7X66pY1ZhIqxUvOwHlqPjymOVZctZIrXkOFXXvAZEcNVBUWhlcLF35BFtPoMD4MKarir3I+HCVxsoroNkg3YEQ2cTrfG7yjhWaNROWYAGtnaVK7rAXjOBPYhbSxWAO79N1nHiqQaLN2dWQXvfJMEZXa+KdctGJvVrQgrV3K3CsluIXdzvXk/3KzxpVHWctV3ThUbD2obETkdNQ7FhXcIHNi6QEww/2uEVuAAea8ryE1qPCaaSnBm0NLkFwsCyyhEa5rqS3/4JDe1Fl03SzNWtrvw0C0o7TrtptdIVuLci2K/ja91nnn022fb4OzcZwexONP49HkzeYPdskL8pK8Jil5SybkAzklU8LIN/Kd/CA/g+fBJHgffFxLg8Gm5hn5J4LPvwELqs/Y</latexit>
Sec. 4.5<latexit sha1_base64="j34reH6uYTiy+0vA9luu9vDQGZQ=">AAACmHicbVHLattAFB2rr1R9Oe2u3YiaQhdFSE5LQleGQB+79OEk4BHmanQlD5mHmBmlGKF/ydd0227zNxnZLk2dXhg4nHvu60xeC25dklwOglu379y9t3M/fPDw0eMnw92nx1Y3huGUaaHNaQ4WBVc4ddwJPK0NgswFnuRnh33+5ByN5Vp9d8saMwmV4iVn4Dw1H76nOVZctZIrXkOFXbvHZEcNVBUWhlcLF35DFtPobfwupKiKv8r5cJTEySqimyDdgBHZxNF8d/CBFpo1EpVjAqydpUntshaM40xgF9LGYg3szHefeahAos3a1ZFd9MozRVRq459y0Yq9XtGCtHYpc6+U4BZ2O9eT/8vNGlceZC1XdeNQsfWgshGR01HvWFRwg8yJpQfADPe7RmwBBpjzvobUoMIfTEsJ3hxaguRiWWAJjXBdS235B4f0us6i62Zp1tJ+HwaiHaVdt93sHNlalGtR9Lfptc47n277fBMcj+N0Lx5/GY8mbzZ/sENekJfkNUnJPpmQT+SITAkjF+Qn+UV+B8+DSfAx+LyWBoNNzTPyTwRfrwANss/Z</latexit>
Sec. 4.6<latexit sha1_base64="sNZJIGox8p2uOgnh54ZhZCMSa3Q=">AAACmHicbVHLattAFB2rr1R9Oe2u3YiaQhdFSE5pQleGQB+79OEk4BHmanQlD5mHmBmlGKF/ydd0227zNxnZLk2dXhg4nHvu60xeC25dklwOglu379y9t3M/fPDw0eMnw92nx1Y3huGUaaHNaQ4WBVc4ddwJPK0NgswFnuRnh33+5ByN5Vp9d8saMwmV4iVn4Dw1H76nOVZctZIrXkOFXbvHZEcNVBUWhlcLF35DFtPobfwupKiKv8r5cJTEySqimyDdgBHZxNF8d/CBFpo1EpVjAqydpUntshaM40xgF9LGYg3szHefeahAos3a1ZFd9MozRVRq459y0Yq9XtGCtHYpc6+U4BZ2O9eT/8vNGlceZC1XdeNQsfWgshGR01HvWFRwg8yJpQfADPe7RmwBBpjzvobUoMIfTEsJ3hxaguRiWWAJjXBdS235B4f0us6i62Zp1tJ+HwaiHaVdt93sHNlalGtR9Lfptc47n277fBMcj+N0Lx5/GY8mbzZ/sENekJfkNUnJPpmQT+SITAkjF+Qn+UV+B8+DSfAx+LyWBoNNzTPyTwRfrwAPus/a</latexit>
Sec. 4.7<latexit sha1_base64="YpDewmKrrCVbS/3ZI+kx5/+bQjA=">AAACmHicbVHLattAFB2rj6Tqy0l37UbUFLooQnIKCVkZAn3s0oeTgEeYq9GVPGQeYmaUYIT+pV/Tbbvt33RkuzR1emHgcO65rzN5Lbh1SfJrENy5e+/+zu6D8OGjx0+eDvf2z6xuDMMp00KbixwsCq5w6rgTeFEbBJkLPM8vT/r8+RUay7X66pY1ZhIqxUvOwHlqPjymOVZctZIrXkOFXXvAZEcNVBUWhlcLF35BFtPobXwYUlTFX+V8OEriZBXRbZBuwIhs4nS+N3hHC80aicoxAdbO0qR2WQvGcSawC2ljsQZ26bvPPFQg0Wbt6sgueuWZIiq18U+5aMXerGhBWruUuVdKcAu7nevJ/+VmjSuPsparunGo2HpQ2YjI6ah3LCq4QebE0gNghvtdI7YAA8x5X0NqUOE101KCN4eWILlYFlhCI1zXUlv+wSG9qbPoulmatbTfh4FoR2nXbTe7QrYW5VoU/W16rfPOp9s+3wZn4zg9iMefxqPJm80f7JIX5CV5TVJySCbkAzklU8LIN/Kd/CA/g+fBJHgffFxLg8Gm5hn5J4LPvwERws/b</latexit>
frequentist, Sec. 4.8<latexit sha1_base64="x9P/GxYj4YMjuoeYiUHBiPkVt6Q=">AAACpnicbVHLbtNAFJ2YVzGvFJZsRkRIXVRWnFaiy0pIqCtUBGkqZaxwPb52Rp2HmRkXRZZ/ia9hwwK+hXESREm50khH5577OpPXUjg/Hv8YRHfu3rv/YO9h/Ojxk6fPhvvPL5xpLMcpN9LYyxwcSqFx6oWXeFlbBJVLnOVXb/v87BqtE0Z/8qsaMwWVFqXg4AO1GJ6xHCuhWyW0qKHCrj3mqmMWqgoLK6qlj0uLXxrUPixzSD8iTxg9Tk7imKEu/tYthqNxMl4HvQ3SLRiRbZwv9gfvWGF4o0JrLsG5eTqufdaC9YJL7GLWOKyBX4Xu8wA1KHRZuz65o68DU9DS2PC0p2v2ZkULyrmVyoNSgV+63VxP/i83b3x5krVC141HzTeDykZSb2jvHy2ERe7lKgDgVoRdKV+CBe6DyzGzqPErN0pBMIeVoIRcFVhCI33XMlf+wTG7qXPou3matazfh4NsR2nX7Ta7Rr4R5UYW/W1mowvOp7s+3wYXkyQ9SiYfJqPTw+0f7JGX5BU5ICl5Q07JGTknU8LJN/Kd/CS/ooPofTSNZhtpNNjWvCD/RPT5N4Xu1Xk=</latexit>
Bayesian, Sec. 4.9<latexit sha1_base64="x6vmn0gs854G2/qfmoVS4BWQMiY=">AAACo3icbVFda9RAFJ2NH63xa6uPvgwuBR9KSNaC9q0oiuBL1W5b2AnLzeQmO3RmEmYmlRDyi/w1Pqp/xsnuinXrhYHDued+nclqKayL4x+j4NbtO3d3du+F9x88fPR4vPfkzFaN4TjjlazMRQYWpdA4c8JJvKgNgsoknmeXb4f8+RUaKyp96toaUwWlFoXg4Dy1GL9jGZZCd0poUUOJfXfIVc8MlCXmRpRLF76BFq0AfUC/II8YPYyOwpChzv8WLcaTOIpXQW+CZAMmZBMni73Re5ZXvFGoHZdg7TyJa5d2YJzgEvuQNRZr4Je++9xDDQpt2q3u7em+Z3JaVMY/7eiKvV7RgbK2VZlXKnBLu50byP/l5o0rXqed0HXjUPP1oKKR1FV0MI/mwiB3svUAuBF+V8qXYIA7b3HIDGr8yiulwJvDClBCtjkW0EjXd8wWf3DIrussun6epB0b9uEgu0nS99vNrpCvRVkl8+G2aq3zzifbPt8EZ9MoeRlNP00nxwebP9glz8hz8oIk5BU5Jh/ICZkRTr6R7+Qn+RXsBx+Dz8HpWhqMNjVPyT8RpL8BpTrTzg==</latexit>
ML methods<latexit sha1_base64="sT6uhu7UUGnpRvcm3k6NqLeiNYI=">AAACmnicbVHLihNBFK20r7F9ZXSpi8YguJCQjoIuXAwKoqgwwmRmINWE29W3O8XUo6m6PRKa/hm/xq3u/Burk4hjxgsFh3PPfZ3KayU9TSa/BtGVq9eu39i7Gd+6fefuveH+/WNvGydwJqyy7jQHj0oanJEkhae1Q9C5wpP87G2fPzlH56U1R7SqMdNQGVlKARSoxfA1z7GSptXSyBoq7NoXQnfcQVVh4WS1pPjzp0QjLW3h45ijKf5qF8PRZDxZR3IZpFswYts4XOwP3vHCikajIaHA+3k6qSlrwZEUCruYNx5rEGeh+zxAAxp91q7P7JIngSmS0rrwDCVr9mJFC9r7lc6DUgMt/W6uJ/+XmzdUvspaaeqG0IjNoLJRCdmk9ywppENBahUACCfDrolYggNBwdmYOzT4VVitIZjDS9BSrQosoVHUtdyXf3DML+o8UjdPs5b3+whQ7Sjtut1m5yg2otyqor/NbnTB+XTX58vgeDpOn4+nX6ajg2fbP9hjD9lj9pSl7CU7YO/ZIZsxwb6x7+wH+xk9it5EH6KPG2k02NY8YP9EdPQbP0LRSQ==</latexit>
Figure 11: Overview of the methods presented in Section 4.
uncertainty. Perhaps the most well-known approach in the realm of Bayesian machine
learning is Gaussian processes, which are discussed in Section 4.4. Another popular ap-
proach is deep learning and neural networks, and especially interesting from the point of
view of uncertainty quantification is recent work on “Bayesian” extensions of such net-
works. Section 4.6 addresses the concept of credal sets, which is closely connected to the
theory of imprecise probabilities, and which can be seen as a generalization of Bayesian in-
ference where a single prior distribution is replaced by a set of priors. Section 4.7 presents
an approach to reliable prediction that combines set-based and distributional (probabilis-
tic) inference, and which, to the best of our knowledge, was the first to explicitly motivate
the distinction between aleatoric and epistemic uncertainty in a machine learning context.
Finally, methods focusing on set-valued prediction are discussed in Sections 4.8 and 4.9.
Conformal prediction is rooted in frequentist statistics and hypothesis testing. Roughly
speaking, to decide whether to include or exclude a candidate prediction, the correctness
of that prediction is tested as a hypothesis. The second approach makes use of concepts
from statistical (Bayesian) decision theory and constructs set-valued predictions based on
the principle of expected utility maximization.
As already said, the categorization of methods is certainly not unique. For example, as the
credal approach is of a “set-valued” nature (starting with a set of priors, a set of posterior
distributions is derived, and from these in turn a set of predictions), it has a strong link to
set-valued prediction (as indicated by the dashed line in Fig. 11). Likewise, the approach
to reliable classification in Section 4.7 makes use of the concept of normalized likelihood,
and hence has a strong connection to likelihood-based inference.
20
4.1 Probability estimation via scoring, calibration, and ensembling
There are various methods in machine learning for inducing probabilistic predictors. These
are hypotheses h that do not merely output point predictions h(x) ∈ Y, i.e., elements of
the output space Y, but probability estimates ph(· |x) = p(· |x, h), i.e., complete prob-
ability distributions on Y. In the case of classification, this means predicting a single
(conditional) probability ph(y |x) = p(y |x, h) for each class y ∈ Y, whereas in regression,
p(· |x, h) is a density function on R. Such predictors can be learned in a discriminative
way, i.e., in the form of a mapping x 7→ p(· |x), or in a generative way, which essentially
means learning a joint distribution on X ×Y. Moreover, the approaches can be paramet-
ric (assuming specific parametric families of probability distributions) or non-parametric.
Well-known examples include classical statistical methods such as logistic and linear regres-
sion, Bayesian approaches such as Bayesian networks and Gaussian processes (cf. Section
4.4), as well as various techniques in the realm of (deep) neural networks (cf. Section 4.5).
Training probabilistic predictors is typically accomplished by minimizing suitable loss func-
tions, i.e., loss functions that enforce “correct” (conditional) probabilities as predictions.
In this regard, proper scoring rules (Gneiting and Raftery, 2005) play an important role,
including the log-loss as a well-known special case. Sometimes, however, estimates are
also obtained in a very simple way, following basic frequentist techniques for probability
estimation, like in Naıve Bayes or nearest neighbor classification.
The predictions delivered by corresponding methods are at best “pseudo-probabilities”
that are often not very accurate. Besides, there are many methods that deliver natural
scores, intuitively expressing a degree of confidence (like the distance from the separating
hyperplane in support vector machines), but which do not immediately qualify as proba-
bilities either. The idea of scaling or calibration methods is to turn such scores into proper,
well-calibrated probabilities, that is, to learn a mapping from scores to the unit interval
that can be applied to the output of a predictor as a kind of post-processing step (Flach,
2017). Examples of such methods include binning (Zadrozny and Elkan, 2001), isotonic
regression (Zadrozny and Elkan, 2002), logistic scaling (Platt, 1999) and improvements
thereof (Kull et al., 2017), as well as the use of Venn predictors (Johansson et al., 2018).
Calibration is still a topic of ongoing research.
Another import class of methods is ensemble learning, such as bagging or boosting, which
are especially popular in machine learning due to their ability to improve accuracy of
(point) predictions. Since such methods produce a (large) set of predictors h1, . . . , hMinstead of a single hypothesis, it is tempting to produce probability estimates following
basic frequentist principles. In the simplest case (of classification), each prediction hi(x)
can be interpreted as a “vote” in favor of a class y ∈ Y, and probabilities can be estimated
by relative frequencies — needless to say, probabilities constructed in this way tend to
be biased and are not necessarily well calibrated. Especially important in this field are
tree-based methods such as random forests (Breiman, 2001; Kruppa et al., 2014).
Obviously, while standard probability estimation is a viable approach to representing
uncertainty in a prediction, there is no explicit distinction between different types of
21
uncertainty. Methods falling into this category are mostly concerned with the aleatoric
part of the overall uncertainty.6
4.2 Maximum likelihood estimation and Fisher information
The notion of likelihood is one of the key elements of statistical inference in general, and
maximum likelihood estimation is an omnipresent principle in machine learning as well.
Indeed, many learning algorithms, including those used to train deep neural networks,
realize model induction as likelihood maximization. Besides, there is a close connection
between likelihood-based and Bayesian inference, as in many cases, the former is effec-
tively equivalent to the latter with a uniform prior — disregarding, of course, important
conceptual differences and philosophical foundations.
Adopting the common perspective of classical (frequentist) statistics, consider a data-
generating process specified by a parametrized family of probability measures Pθ, where
θ ∈ Θ is a parameter vector. Let fθ(·) denote the density (or mass) function of Pθ, i.e.,
fθ(X) is the probability to observe the data X when sampling from Pθ. An important
problem in statistical inference is to estimate the parameter θ, i.e., to identify the under-
lying data-generating process Pθ, on the basis of a set of observations D = X1, . . . , XN,and maximum likelihood estimation (MLE) is a general principle for tackling this problem.
More specifically, the MLE principle prescribes to estimate θ by the maximizer of the like-
lihood function, or, equivalently, the log-likelihood function. Assuming that X1, . . . , XN
are independent, and hence that D is distributed according to (Pθ)N , the log-likelihood
function is given by
`N (θ) ..=N∑
n=1
log fθ(Xn) .
An important result of mathematical statistics states that the distribution of the maximum
likelihood estimate θ is asymptotically normal, i.e., converges to a normal distribution as
N → ∞. More specifically,√N(θ − θ) converges to a normal distribution with mean 0
and covariance matrix I−1N (θ), where
IN (θ) = −[Eθ
(∂2`N∂θi ∂θj
)]
1≤i,j≤N
is the Fisher information matrix (the negative Hessian of the log-likelihood function at
θ). This result has many implications, both theoretically and practically. For example,
the Fisher information plays an important role in the Cramer-Rao bound and provides a
lower bound to the variance of any unbiased estimator of θ (Frieden, 2004).
Moreover, the Fisher information matrix allows for constructing (approximate) confidence
regions for the sought parameter θ around the estimate θ (approximating7 IN (θ) by
6Yet, as will be seen later on, one way to go beyond mere aleatoric uncertainty is to combine the abovemethods, for example learning ensembles of probabilistic predictors (cf. Section 4.5).
7Of course, the validity of this “plug-in” estimate requires some formal assumptions.
22
?
?
?
?
?
?
Figure 12: If a hypothesis space is very rich, so that it can essentially produce everydecision boundary, the epistemic uncertainty is high in sparsely populated regions withouttraining examples. The lower query point in the left picture, for example, could easily beclassified as red (middle picture) but also as black (right picture). In contrast to this, thesecond query is in a region where the two class distributions are overlapping, and wherethe aleatoric uncertainty is high.
IN (θ)). Obviously, the larger this region, the higher the (epistemic) uncertainty about
the true model θ. Roughly speaking, the size of the confidence region is in direct corre-
spondence with the “peakedness” of the likelihood function around its maximum. If the
likelihood function is peaked, small changes of the data do not change the estimation θ too
much, and the learner is relatively sure about the true model (data-generating process).
As opposed to this, a flat likelihood function reflects a high level of uncertainty about θ,
because there are many parameters that are close to θ and have a similar likelihood.
In a machine learning context, where parameters θ may identify hypotheses h = hθ, a
confidence region for the former can be seen as a representation of epistemic uncertainty
about h∗, or, more specifically, approximation uncertainty. To obtain a quantitative mea-
sure of uncertainty, the Fisher information matrix can be summarized in a scalar statistic,
for example the trace (of the inverse) or the smallest eigenvalue. Based on corresponding
measures, Fisher information has been used, among others, for optimal statistical design
(Pukelsheim, 2006) and active machine learning (Sourati et al., 2018).
4.3 Generative models
As already explained, for learning methods with a very large hypothesis space H, model
uncertainty essentially disappears, while approximation uncertainty might be high (cf.
Section 2.4). In particular, this includes learning methods with high flexibility, such
as (deep) neural networks and nearest neighbors. In methods of that kind, there is no
explicit assumption about a global structure of the dependency between inputs x and
outcomes y. Therefore, inductive inference will essentially be of a local nature: A class y
is approximated by the region in the instance space in which examples from that class have
been seen, aleatoric uncertainty occurs where such regions are overlapping, and epistemic
uncertainty where no examples have been encountered so far (cf. Fig. 12).
An intuitive idea, then, is to consider generative models to quantify epistemic uncertainty.
Such approaches typically look at the densities p(x) to decide whether input points are
23
located in regions with high or low density, in which the latter acts as a proxy for a high
epistemic uncertainty. The density p(x) can be estimated with traditional methods, such
as kernel density estimation or Gaussian mixtures. Yet, novel density estimation methods
still appear in the machine learning literature. Some more recent approaches in this area
include isolation forests (Liu et al., 2009), auto-encoders (Goodfellow et al., 2016), and
radial basis function networks (Bazargami and Mac-Namee, 2019).
As shown by these methods, density estimation is also a central building block in anomaly
and outlier detection. Often, a threshold is applied on top of the density p(x) to decide
whether a data point is an outlier or not. For example, in auto-encoders, a low-dimensional
representation of the original input in constructed, and the reconstruction error can be
used as a measure of support of a data point within the underlying distribution. Methods
of that kind can be classified as semi-supervised outlier detection methods, in contrast to
supervised methods, which use annotated outliers during training. Many semi-supervised
outlier detection methods are inspired by one-class support vector machines (SVMs), which
fit a hyperplane that separates outliers from “normal” data points in a high-dimensional
space (Khan and Madden, 2014). Some variations exist, where for example a hypersphere
instead of a hyperplane is fitted to the data (Tax and Duin, 2004).
Outlier detection is also somewhat related to the setting of classification with a reject
option, e.g., when the classifier refuses to predict a label in low-density regions (Perello-
Nieto et al., 2016). For example, Ziyin et al. (2019) adopt this viewpoint in an optimization
framework where an outlier detector and a standard classifier are jointly optimized. Since
a data point is rejected if it is an outlier, the focus is here on epistemic uncertainty. In
contrast, most papers on classification with reject option employ a reasoning on conditional
class probabilities p(y |x), using specific utility scores. These approaches rather capture
aleatoric uncertainty (see also Section 4.9), showing that seemingly related papers adopting
the same terminology can have different notions of uncertainty in mind.
In recent papers, outlier detection is often combined with the idea of set-valued prediction
(to be further discussed in Sections 4.6 and 4.8). Here, three scenarios can occur for
a multi-class classifier: (i) A single class is predicted. In this case, the epistemic and
aleatoric uncertainty are both assumed to be sufficiently low. (ii) The “null set” (empty
set) is predicted, i.e., the classifier abstains when there is not enough support for making
a prediction. Here, the epistemic uncertainty is too high to make a prediction. (iii) A set
of cardinality bigger than one is predicted. Here, the aleatoric uncertainty is too high,
while the epistemic uncertainty is assumed to be sufficiently low.
Hechtlinger et al. (2019) implement this idea by fitting a generative model p(x | y) per
class, and predicting null sets for data points that have a too low density for any of
those generative models. A set of cardinality one is predicted when the data point has
a sufficiently high density for exactly one of the models p(x | y), and a set of higher
cardinality is returned when this happens for more than one of the models. Sondhi et al.
(2019) propose a different framework with the same reasoning in mind. Here, a pair
of models is fitted in a joint optimization problem. The first model acts as an outlier
24
detector that intends to reject as few instances as possible, whereas the second model
optimizes a set-based utility score. The joint optimization problem is therefore a linear
combination of two objectives that capture, respectively, components of epistemic and
aleatoric uncertainty.
Generative models often deliver intuitive solutions to the quantification of epistemic and
aleatoric uncertainty. In particular, the distinction between the two types of uncertainty
explained above corresponds to what Hullermeier and Brinker (2008) called “conflict”
(overlapping distributions, evidence in favor of more than one class) and “ignorance”
(sparsely populated region, lack of evidence in favor of any class). Yet, like essentially
all other methods discussed in this section, they also have disadvantages. An inherent
problem with semi-supervised outlier detection methods is how to define the threshold
that decides whether a data point is an outlier or not. Another issue is how to choose the
model class for the generative model. Density estimation is a difficult problem, which, to
be successful, typically requires a lot of data. When the sample size is small, specifying
the right model class is not an easy task, so the model uncertainty will typically be high.
4.4 Gaussian processes
Just like in likelihood-based inference, the hypothesis space in Bayesian inference (cf.
Section 3.2) is often parametrized in the sense that each hypothesis h = hθ ∈ H is
(uniquely) identified by a parameter (vector) θ ∈ Θ ⊆ Rd of fixed dimensionality d. Thus,
computation of the posterior (13) essentially comes down to updating beliefs about the
true (or best) parameter, which is treated as a multivariate random variable:
p(θ | D) ∝ p(θ) · p(D |θ). (18)
Gaussian processes (Seeger, 2004) generalize the Bayesian approach from inference about
multivariate (but finite-dimensional) random variables to inference about (infinite-dimen-
sional) functions. Thus, they can be thought of as distributions not just over random
vectors but over random functions.
More specifically, a stochastic process in the form of a collection of random variables
f(x) |x ∈ X with index set X is said to be drawn from a Gaussian process with mean
function m and covariance function k, denoted f ∼ GP(m, k), if for any finite set of
elements x1, . . . ,xm ∈ X , the associated finite set of random variables f(x1), . . . , f(xm)
has the following multivariate normal distribution:
f(x1)
f(x2)...
f(xm)
∼ N
m(x1)
m(x2)...
m(xm)
,
k(x1,x1) · · · k(x1,xm)
.... . .
...
k(xm,x1) · · · k(xm,xm)
25
Note that the above properties imply
m(x) = E(f(x)) ,
k(x,x′) = E((f(x)−m(x))(f(x′)−m(x′))
),
and that k needs to obey the properties of a kernel function (so as to guarantee proper
covariance matrices). Intuitively, a function f drawn from a Gaussian process prior can
be thought of as a (very) high-dimensional vector drawn from a (very) high-dimensional
multivariate Gaussian. Here, each dimension of the Gaussian corresponds to an element x
from the index set X , and the corresponding component of the random vector represents
the value of f(x).
Gaussian processes allow for doing proper Bayesian inference (13) in a non-parametric
way: Starting with a prior distribution on functions h ∈ H, specified by a mean function
m (often the zero function) and kernel k, this distribution can be replaced by a posterior
in light of observed data D = (xi, yi)Ni=1, where an observation yi = f(xi) + εi could be
corrupted by an additional (additive) noise component εi. Likewise, a posterior predictive
distribution can be obtained on outcomes y ∈ Y for a new query xq ∈ X . In the general
case, the computations involved are intractable, though turn out to be rather simple in the
case of regression (Y = R) with Gaussian noise. In this case, and assuming a zero-mean
prior, the posterior predictive distribution is again a Gaussian with mean µ and variance
σ2 as follows:
µ = K(xq, X)(K(X,X) + σ2ε I)−1y ,
σ2 = K(xq,xq) + σ2ε −K(xq, X)(K(X,X) + σ2
ε I)−1K(X,xq) ,
where X is an N × d matrix summarizing the training inputs (the ith row of X is given
by (xi)>), K(X,X) is the kernel matrix with entries (K(X,X))i,j = k(xi,xj), y =
(y1, . . . , yN ) ∈ RN is the vector of observed training outcomes, and σ2ε is the variance
of the (additive) error term for these observations (i.e., yi is normally distributed with
expected value f(xi) and variance σ2ε ).
Problems with discrete outcomes y, such as binary classification with Y = −1,+1, are
made amenable to Gaussian processes by suitable link functions, linking these outcomes
with the real values h = h(x) as underlying (latent) variables. For example, using the
logistic link function
p(y |h) = s(h) =1
1 + exp(−y h),
the following posterior predictive distribution is obtained:
p(y = +1 |X,y,xq) =
∫σ(h′) p(h′ |X,y,xq) d h′ ,
where
p(h′ |X,y,xq) =
∫p(h′ |X,xq,h) p(h |X,y) dh .
26
0 1 2 3 4 5 6x
3
2
1
0
1
2
y
3 observations
0 1 2 3 4 5 6x
20 observationsf(x)m(x)f(x) + N(0, 0.3)m(x) ± 2SE
Figure 13: Simple one-dimensional illustration of Gaussian processes (X = [0, 6]), withvery few examples on the left and more examples on the right. The predictive uncertainty,as reflected by the width of the confidence band around the mean function (orange line)reduces with an increasing number of observations.
However, since the likelihood will no longer be Gaussian, approximate inference techniques
(e.g., Laplace, expectation propagation, MCMC) will be needed.
As for uncertainty representation, our general remarks on Bayesian inference (cf. Section
3.2) obviously apply to Gaussian processes as well. In the case of regression, the variance
σ2 of the posterior predictive distribution for a query xq is a meaningful indicator of (total)
uncertainty. The variance of the error term, σ2ε , corresponds to the aleatoric uncertainty,
so that the difference could be considered as epistemic uncertainty. The latter is largely
determined by the (parameters of the) kernel function, e.g., the characteristic length-scale
of a squared exponential covariance function (Blum and Riedmiller, 2013). Both have an
influence on the width of confidence intervals that are typically obtained for per-instance
predictions, and which reflect the total amount of uncertainty (cf. Fig. 13). By determining
all (hyper-) parameters of a GP, these sources of uncertainty can in principle be separated,
although the estimation itself is of course not an easy task.
4.5 Deep neural networks
Work on uncertainty in deep learning fits the general framework we introduced so far, es-
pecially the Bayesian approach, quite well, although the methods proposed are specifically
tailored for neural networks as a model class. A standard neural network can be seen as a
probabilistic classifier h: in the case of classification, given a query x ∈ X , the final layer
of the network typically outputs a probability distribution (using transformations such as
softmax) on the set of classes Y, and in the case of regression, a distribution (e.g., a Gaus-
sian8) is placed over a point prediction h(x) ∈ R (typically conceived as the expectation
of the distribution). Training a neural network can essentially be seen as doing maximum
likelihood inference. As such, it yields probabilistic predictions, but no information about
the confidence in these probabilities. In other words, it captures aleatoric but no epistemic
uncertainty.
8Lee et al. (2018a) offer an interesting discussion of the connection between deep networks and Gaussianprocesses.
27
In the context of neural networks, epistemic uncertainty is commonly understood as un-
certainty about the model parameters, that is, the weights w of the neural network (which
correspond to the parameter θ in (18)). Obviously, this is a special case of what we called
approximation uncertainty. To capture this kind of epistemic uncertainty, Bayesian neu-
ral networks (BNNs) have been proposed as a Bayesian extension of deep neural networks
(Denker and LeCun, 1991; Kay, 1992; Neal, 2012). In BNNs, each weight is represented
by a probability distribution (again, typically a Gaussian) instead of a real number, and
learning comes down to Bayesian inference, i.e., computing the posterior p(w | D). The
predictive distribution of an outcome given a query instance xq is then given by
p(y |xq,D) =
∫p(y |xq,w) p(w | D) dw .
Since the posteriors on the weights cannot be obtained analytically, approximate varia-
tional techniques are used (Jordan et al., 1999; Graves, 2011), seeking a variational distri-
bution qθ on the weights that minimizes the Kullback-Leibler divergence KL(qθ‖p(w | D))
between qθ and the true posterior. An important example is Dropout variational infer-
ence (Gal and Ghahramani, 2016), which establishes an interesting connection between
(variational) Bayesian inference and the idea of using Dropout as a learning technique for
neural networks. Likewise, given a query xq, the predictive distribution for the outcome y
is approximated using techniques like Monte Carlo sampling, i.e., drawing model weights
from the approximate posterior distributions (again, a connection to Dropout can be es-
tablished). The total uncertainty can then be quantified in terms of common measures
such as entropy (in the case of classification) or variance (in the case of regression) of this
distribution.
The total uncertainty, say the variance V(y) in regression, still contains both, the residual
error, i.e., the variance of the observation error (aleatoric uncertainty), σ2ε , and the variance
due to parameter uncertainty (epistemic uncertainty). The former can be heteroscedastic,
which means that σ2ε is not constant but a function of x ∈ X . Kendall and Gal (2017)
propose a way to learn heteroscedastic aleatoric uncertainty as loss attenuation. Roughly
speaking, the idea is to let the neural net not only predict the conditional mean of y
given xq, but also the residual error. The corresponding loss function to be minimized is
constructed in such a way (or actually derived from the probabilistic interpretation of the
model) that prediction errors for points with a high residual variance are penalized less,
but at the same time, a penalty is also incurred for predicting a high variance.
An explicit attempt at measuring and separating aleatoric and epistemic uncertainty (in
the context of regression) is made by Depeweg et al. (2018). Their idea is as follows: They
quantify the total uncertainty as well as the aleatoric uncertainty, and then obtain the
epistemic uncertainty as the difference. More specifically, they propose to measure the
total uncertainty in terms of the entropy of the predictive posterior distribution, which,
28
in the case of discrete Y, is given by9
H[p(y |x)
]= −
∑
y∈Yp(y |x) log2 p(y |x) . (19)
This uncertainty also includes the (epistemic) uncertainty about the network weights w.
Fixing a set of weights, i.e., considering a distribution p(y |w,x), thus removes the epis-
temic uncertainty. Therefore, the expectation over the entropies of these distributions,
Ep(w | D)H[p(y |w,x)
]= −
∫p(w | D)
∑
y∈Yp(y |w,x) log2 p(y |w,x)
dw , (20)
is a measure of the aleatoric uncertainty. Finally, the epistemic uncertainty is obtained as
the difference
ue(x) ..= H[p(y |x)
]−Ep(w | D)H
[p(y |w,x)
], (21)
which equals the mutual information I(y,w) between y and w. Intuitively, epistemic
uncertainty thus captures the amount of information about the model parameters w that
would be gained through knowledge of the true outcome y. A similar approach was
recently adopted by Mobiny et al. (2017), who also propose a technique to compute (21)
approximately.
Bayesian model averaging establishes a natural connection between Bayesian inference and
ensemble learning, and indeed, as already mentioned in Section 4.5, the variance of the
predictions produced by an ensemble is a good indicator of the (epistemic) uncertainty in
a prediction. In fact, the variance is inversely related to the “peakedness” of a posterior
distribution p(h | D) and can be seen as a measure of the discrepancy between the (most
probable) candidate hypotheses in the hypothesis space. Based on this idea, Lakshmi-
narayanan et al. (2017) propose a simple ensemble approach as an alternative to Bayesian
NNs, which is easy to implement, readily parallelizable, and requires little hyperparameter
tuning.
Inspired by Mobiny et al. (2017), an ensemble approach is also proposed by Shaker and
Hullermeier (2020). Since this approach does not necessarily need to be implemented
with neural networks as base learners10, we switch notation and replace weight vectors w
by hypotheses h. To approximate the measures (21) and (20), the posterior distribution
p(h | D) is represented by a finite ensemble of hypotheses h1, . . . , hM . An approximation
of (20) can then be obtained by
ua(x) ..= − 1
M
M∑
i=1
∑
y∈Yp(y |hi,x) log2 p(y |hi,x) , (22)
9Depeweg et al. (2018) actually consider the continuous case, in which case the entropy needs to bereplaced by the differential entropy.
10Indeed, the authors use decision trees, i.e., a random forest as an ensemble.
29
an approximation of (19) by
ut(x) ..= −∑
y∈Y
(1
M
M∑
i=1
p(y |hi,x)
)log2
(1
M
M∑
i=1
p(y |hi,x)
), (23)
and finally an approximation of (21) by ue(x) ..= ut(x) − ua(x). The latter corresponds
to the Jensen-Shannon divergence (Endres and Schindelin, 2003) of the distributions
p(y |hi,x), i ∈ [M ] (Malinin and Gales, 2018).
4.6 Credal sets and classifiers
The theory of imprecise probabilities (Walley, 1991) is largely built on the idea to specify
knowledge in terms of a set Q of probability distributions instead of a single distribution.
In the literature, such sets are also referred to as credal sets, and often assumed to be
convex (i.e., q, q′ ∈ Q implies αq + (1− α)q′ ∈ Q for all α ∈ [0, 1]). The concept of credal
sets and related notions provide the basis of a generalization of Bayesian inference, which
is especially motivated by the criticism of non-informative priors as models of ignorance
in standard Bayesian inference (cf. Section 3.3). The basic idea is to replace a single
prior distribution on the model space, as used in Bayesian inference, by a (credal) set of
candidate priors. Given a set of observed data, Bayesian inference can then be applied
to each candidate prior, thereby producing a (credal) set of posteriors. Correspondingly,
any value or quantity derived from a posterior (e.g., a prediction, an expected value,
etc.) is replaced by a set of such values or quantities. An important example of such
an approach is robust inference for categorical data with the imprecise Dirichlet model,
which is an extension of inference with the Dirichlet distribution as a conjugate prior for
the multinomial distribution (Bernardo, 2005).
Methods of that kind have also been used in the context of machine learning, for example in
the framework of the Naıve Credal Classifier (Zaffalon, 2002a; Corani and Zaffalon, 2008a)
or more recently as an extension of sum-product networks (Maau et al., 2017). As for the
former, we can think of a Naıve Bayes (NB) classifier as a hypothesis h = hθ specified by
a parameter vector θ that comprises a prior probability θk for each class yk ∈ Y and a
conditional probability θi,j,k for observing the ith value ai,j of the jth attribute given class
yk. For a given query instance specified by attributes (ai1,1, . . . , aiJ ,J), the posterior class
probabilities are then given by
p(yk | ai1,1, . . . , aiJ ,J) ∝ θkJ∏
j=1
θij ,j,k . (24)
In the Naıve Credal Classifier, the θk and θij ,j,k are specified in terms of (local) credal
sets11, i.e., there is a class of probability distributions Q on Y playing the role of candidate
11These could be derived from empirical data, for example, by doing generalized Bayesian inference withthe imprecise Dirichlet model.
30
priors, and a class of probability distributions Qj,k specifying possible distributions on the
attribute values of the jth attribute given class yk. A single posterior class probability
(24) is then replaced by the set (interval) of such probabilities that can be produced by
taking θk from a distribution in Q and θij ,j,k from a distribution in Qj,k. As an aside, let
us note that the computations involved may of course become costly.
4.6.1 Uncertainty measures for credal sets
Interestingly, there has been quite some work on defining uncertainty measures for credal
sets and related representation, such as Dempster-Shafer evidence theory (Klir, 1994).
Even more interestingly, a basic distinction between two types of uncertainty contained in
a credal set, referred to as conflict (randomness, discord) and non-specificity, respectively,
has been made by Yager (1983). The importance of this distinction was already emphasized
by Kolmogorov (1965). These notions are in direct correspondence with what we call
aleatoric and epistemic uncertainty.
The standard uncertainty measure in classical possibility theory (where uncertain infor-
mation is simply represented in the form of subsets A ⊆ Y of possible alternatives) is the
Hartley measure12 (Hartley, 1928)
H(A) = log(|A|) , (25)
Just like the Shannon entropy (17), this measure can be justified axiomatically13.
Given the insight that conflict and non-specificity are two different, complementary sources
of uncertainty, and standard (Shannon) entropy and (25) as well-established measures of
these two types of uncertainty, a natural question in the context of credal sets is to ask
for a generalized representation
U(Q) = AU(Q) + EU(Q) , (26)
where U is a measure of total (aggregate) uncertainty, AU a measure of aleatoric uncer-
tainty (conflict, a generalization of the Shannon entropy), and EU a measure of epistemic
uncertainty (non-specificity, a generalization of the Hartely measure).
As for the non-specificity part in (26), the following generalization of the Hartley measure
to the case of graded possibilities has been proposed by various authors (Abellan and
Moral, 2000):
GH(Q) ..=∑
A⊆YmQ(A) log(|A|) , (27)
where mQ : 2Y −→ [0, 1] is the Mobius inverse of the capacity function ν : 2Y −→ [0, 1]
12Be aware that we denote both the Hartley measure and the Shannon entropy by H, which is commonin the literature. The meaning should be clear from the context.
13For example, see Chapter IX, pages 540–616, in the book by Renyi (1970).
31
defined by
νQ(A) ..= infq∈Q
q(A) (28)
for all A ⊆ Y, that is,
mQ(A) =∑
B⊆A(−1)|A\B|νQ(B) .
The measure (27) enjoys several desirable axiomatic properties, and its uniqueness was
shown by Klir and Mariano (1987).
The generalization of the Shannon entropy H as a measure of conflict turned out to be
more difficult. The upper and lower Shannon entropy play an important role in this regard:
H∗(Q) ..= maxq∈Q
H(q) , H∗(Q) ..= minq∈Q
H(q) (29)
Based on these measures, the following disaggregations of total uncertainty (26) have been
proposed (Abellan et al., 2006):
H∗(Q) =(H∗(Q)−GH(Q)
)+ GH(Q) (30)
H∗(Q) = H∗(Q) +(H∗(Q)−H∗(Q)
)(31)
In both cases, upper entropy serves as a measure of total uncertainty U(Q), which is again
justified on an axiomatic basis. In the first case, the generalized Hartley measure is used for
quantifying epistemic uncertainty, and aleatoric uncertainty is obtained as the difference
between total and epistemic uncertainty. In the second case, epistemic uncertainty is
specified in terms of the difference between upper and lower entropy.
4.6.2 Set-valued prediction
Credal classifiers are used for “indeterminate” prediction, that is, to produce reliable set-
valued predictions that are likely to cover the true outcome. In this regard, the notion of
dominance plays an important role: An outcome y dominates another outcome y′ if y is
more probable than y′ for each distribution in the credal set, that is,
γ(y, y′,x) ..= infh∈C
p(y |xq, h)
p(y′ |xq, h)> 1 , (32)
where C is a credal set of hypotheses, and p(y |x, h) the probability of the outcome y (for
the query xq) under hypothesis h. The set-valued prediction for a query xq is then given by
the set of all non-dominated outcomes y ∈ Y, i.e., those outcomes that are not dominated
by any other outcome. Practically, the computations are based on the interval-valued
probabilities [p(y |xq), p(y |xq)] assigned to each class y ∈ Y, where
p(y |xq) ..= infh∈C
p(y |xq, h) , p(y |xq) ..= suph∈C
p(y |xq, h) . (33)
32
Note that the upper probabilities p(y |xq) are very much in the spirit of the plausibility
degrees (36).
Interestingly, uncertainty degrees can be derived on the basis of dominance degrees (32).
For example, the following uncertainty score has been proposed by Antonucci et al. (2012)
in the context of uncertainty sampling for active learning:
s(xq) ..= −max(γ(+1,−1,xq), γ(−1,+1,xq)
). (34)
This degree is low if one of the outcomes strongly dominates the other one, but high if
this is not the case. It is closely related to the degree of epistemic uncertainty in (39)
to be discussed in Section 4.7, but also seems to capture some aleatoric uncertainty. In
fact, s(xq) increases with both, the width of the interval [p(y |xq), p(y |xq)] as well as the
closeness of its middle point to 1/2.
Lassiter (2020) discusses the modeling of “credal imprecision” by means of sets of measures
in a critical way and contrasts this approach with hierarchical probabilistic (Bayesian)
models, i.e., distributions of measures (see Fig. 10, right side). Among several arguments
put forward against sets-of-measures models, one aspect is specifically interesting from a
machine learning perspective, namely the problem that modeling a lack of knowledge in a
set-based manner may hamper the possibility of inductive inference, up to a point where
learning from empirical data is not possible any more. This is closely connected to the
insight that learning from data necessarily requires an inductive bias, and learning without
assumptions is not possible (Wolpert, 1996). To illustrate the point, consider the case of
complete ignorance as an example, which could be modeled by the credal set Pall that
contains all probability measures (on a hypothesis space) as candidate priors. Updating
this set by pointwise conditioning on observed data will essentially reproduce the same
set as a credal set of posteriors. Roughly speaking, this is because Pall will contain very
extreme priors, which will remain extreme even after updating with a sample of finite size.
In other words, no finite sample is enough to annihilate a sufficiently extreme prior belief.
Replacing Pall by a proper subset of measures may avoid this problem, but any such choice
(implying a sharp transition between possible and excluded priors) will appear somewhat
arbitrary. According to Lassiter (2020), an appealing alternative is using a probability
distribution instead of a set, like in hierarchical Bayesian modeling.
4.7 Reliable classification
To the best of our knowledge, Senge et al. (2014) were the first to explicitly motivate
the distinction between aleatoric and epistemic uncertainty in a machine learning context.
Their approach leverages the concept of normalized likelihood (cf. Section A.4). Moreover,
it combines set-based and distributional (probabilistic) inference, and thus can be posi-
tioned in-between version space learning and Bayesian inference as discussed in Section
3. Since the approach contains concepts that might be less known to a machine learning
audience, our description is a bit more detailed than for the other methods discussed in
33
this section.
Consider the simplest case of binary classification with classes Y ..= −1,+1, which
suffices to explain the basic idea (and which has been generalized to the multi-class case
by Nguyen et al. (2018)). Senge et al. (2014) focus on predictive uncertainty and derive
degrees of uncertainty in a prediction in two steps, which we are going to discuss in turn:
• First, given a query instance xq, a degree of “plausibility” is derived for each can-
didate outcome y ∈ Y. These are degrees in the unit interval, but no probabilities.
As they are not constrained by a total mass of 1, they are more apt at capturing a
lack of knowledge, and hence epistemic uncertainty (cf. Section 3.3).
• Second, degrees of aleatoric and epistemic uncertainty are derived from the degrees
of plausibility.
4.7.1 Modeling the plausibility of predictions
Recall that, in version space learning, the plausibility of both hypotheses and outcomes
are expressed in a purely bivalent way: according to (10), a hypotheses is either considered
possible/plausible or not (Jh ∈ VK), and an outcome y is either supported or not (Jh(x) =
yK). Senge et al. (2014) generalize both parts of the prediction from bivalent to graded
plausibility and support: A hypothesis h ∈ H has a degree of plausibility πH(h) ∈ [0, 1],
and the support of outcomes y is expressed in terms of probabilities h(x) = p(y |x) ∈ [0, 1].
More specifically, referring to the notion of normalized likelihood , a plausibility distribution
on H (i.e., a plausibility for each hypothesis h ∈ H) is defined as follows:
πH(h) ..=L(h)
suph′∈H L(h′)=
L(h)
L(hml), (35)
where L(h) is the likelihood of the hypothesis h on the data D (i.e., the probability of Dunder this hypothesis), and hml ∈ H is the maximum likelihood (ML) estimation. Thus,
the plausibility of a hypothesis is in proportion to its likelihood14, with the ML estimate
having the highest plausibility of 1.
The second step, both in version space and Bayesian learning, consists of translating
uncertainty on H into uncertainty about the prediction for a query xq. To this end,
all predictions h(xq) need to be aggregated, taking into account the plausibility of the
hypotheses h ∈ H. Due to the problems of the averaging approach (14) in Bayesian infer-
ence, a generalization of the “existential” aggregation (10) used in version space learning
is adopted:
π(+1 |xq) ..= suph∈H
min(πH(h), π(+1 |h,xq)
), (36)
where π(+1 |h,xq) is the degree of support of the positive class provided by h15. This
14In principle, the same idea can of course also be applied in Bayesian inference, namely by definingplausibility in terms of a normalized posterior distribution.
15Indeed, π(· |h,xq) should not be interpreted as a measure of uncertainty.
34
measure of support, which generalizes the all-or-nothing support Jh(x) = yK in (10), is
defined as follows:
π(+1 |h,xq) ..= max(2h(xq)− 1, 0
)(37)
Thus, the support is 0 as long as the probability predicted by h is ≤ 1/2, and linearly
increases afterward, reaching 1 for h(xq) = 1. Recalling that h is a probabilistic classifier,
this clearly makes sense, since values h(xq) ≤ 1/2 are actually more in favor of the negative
class, and therefore no evidence for the positive class. Also, as will be seen further below,
this definition assures a maximal degree of aleatoric uncertainty in the case of full certainty
about the uniform distribution h1/2, wich is a desirable property. Since the supremum
operator in (36) can be seen as a generalized existential quantifier, the expression (36) can
be read as follows: The class +1 is plausible insofar there exists a hypothesis h that is
plausible and that strongly supports +1. Analogously, the plausibility for −1 is defined
as follows:
π(−1 |xq) ..= suph∈H
min(πH(h), π(−1 |h,xq)
), (38)
with π(−1 |h,xq) = max(1 − 2h(xq), 0). An illustration of this kind of “max-min infer-
ence” and a discussion of how it relates to Bayesian inference (based on “sum-product
aggregation”) can be found in Appendix B in the supplementary material.
4.7.2 From plausibility to aleatoric and epistemic uncertainty
Given the plausibilities π(+1) = π(+1 |xq) and π(−1) = π(−1 |xq) of the positive and
negative class, respectively, and having to make a prediction, one would naturally decide
in favor of the more plausible class. Perhaps more interestingly, meaningful definitions of
epistemic uncertainty ue and aleatoric uncertainty ua can be defined on the basis of the
two degrees of plausibility:
ue ..= min(π(+1), π(−1)
)
ua ..= 1−max(π(+1), π(−1)
) (39)
Thus, ue is the degree to which both +1 and −1 are plausible16, and ua the degree to
which neither +1 nor −1 are plausible. Since these two degrees of uncertainty satisfy
ua + ue ≤ 1, the total uncertainty (aleatoric + epistemic) is upper-bounded by 1. Strictly
speaking, since the degrees π(y) should be interpreted as upper bounds on plausibility,
which decrease in the course of time (with increasing sample size), the uncertainty degrees
(39) should be interpreted as bounds as well. For example, in the very beginning, when no
or very little data has been observed, both outcomes are fully plausible (π(+1) = π(−1) =
1), hence ue = 1 and ua = 0. This does not mean, of course, that there is no aleatoric
uncertainty involved. Instead, it is simply not yet known or, say, confirmed. Thus, uashould be seen as a lower bound on the “true” aleatoric uncertainty. For instance, when
y is the outcome of a fair coin toss, the aleatoric uncertainty will increase over time and
16The minimum plays the role of a generalized logical conjunction (Klement et al., 2002).
35
reach 1 in the limit, while ue will decrease and vanish at some point: Eventually, the
learner will be fully aware of facing full aleatoric uncertainty.
More generally, the following special cases might be of interest:
• Full epistemic uncertainty : ue = 1 requires the existence of at least two fully plausi-
ble hypotheses (i.e., both with the highest likelihood), the one fully supporting the
positive and the other fully supporting the negative class. This situation is likely
to occur (at least approximately) in the case of a small sample size, for which the
likelihood is not very peaked.
• No epistemic uncertainty : ue = 0 requires either π(+1) = 0 or π(−1) = 0, which
in turn means that h(xq) < 1/2 for all hypotheses with non-zero plausibility, or
h(xq) > 1/2 for all these hypotheses. In other words, there is no disagreement about
which of the two classes should be favored. Specifically, suppose that all plausible
hypotheses agree on the same conditional probability distribution p(+1 |x) = α and
p(−1 |x) = 1 − α, and let β = max(α, 1 − α). In this case, ue = 0, and the degree
of aleatoric uncertainty ua = 2(1− β) depends on how close β is to 1.
• Full aleatoric uncertainty : This is a special case of the previous one, in which β =
1/2. Indeed, ua = 1 means that all plausible hypotheses assign a probability of 1/2
to both classes. In other words, there is an agreement that the query instance is a
boundary case.
• No uncertainty : Again, this is a special case of the second one, with β = 1. A clear
preference (close to 1) in favor of one of the two classes means that all plausible
hypotheses, i.e., all hypotheses with a high likelihood, provide full support to that
class.
Although algorithmic aspects are not in the focus of this paper, it is worth to mention
that the computation of (36), and likewise of (38), may become rather complex. In fact,
the computation of the supremum comes down to solving an optimization problem, the
complexity of which strongly depends on the hypothesis space H.
4.8 Conformal prediction
Conformal prediction (Vovk et al., 2003; Shafer and Vovk, 2008; Balasubramanian et al.,
2014) is a framework for reliable prediction that is rooted in classical frequentist statistics,
more specifically in hypothesis testing. Given a sequence of training observations and a
new query xN+1 (which we denoted by xq before) with unknown outcome yN+1,
(x1, y1), (x2, y2), . . . , (xN , yN ), (xN+1, •) , (40)
the basic idea is to hypothetically replace • by each candidate, i.e., to test the hypothesis
yN+1 = y for all y ∈ Y. Only those outcomes y for which this hypothesis can be rejected at
36
a predefined level of confidence are excluded, while those for which the hypothesis cannot
be rejected are collected to form the prediction set or prediction region Y ε ⊆ Y. The
construction of a set-valued prediction Y ε = Y ε(xn+1) that is guaranteed to cover the
true outcome yN+1 with a given probability 1− ε (for example 95 %), instead of producing
a point prediction yN+1 ∈ Y, is the basic idea of conformal prediction. Here, ε ∈ (0, 1)
is a pre-specified level of significance. In the case of classification, Y ε is a subset of the
set of classes Y = y1, . . . , yK, whereas in regression, a prediction region is commonly
represented in terms of an interval17.
Hypothesis testing is done in a nonparametric way: Consider any “nonconformity” func-
tion f : X × Y −→ R that assigns scores α = f(x, y) to input/output tuples; the latter
can be interpreted as a measure of “strangeness” of the pattern (x, y), i.e., the higher
the score, the less the data point (x, y) conforms to what one would expect to observe.
Applying this function to the sequence (40), with a specific (though hypothetical) choice
of y = yN+1, yields a sequence of scores
α1, α2, . . . , αN , αN+1 .
Denote by σ the permutation of 1, . . . , N + 1 that sorts the scores in increasing order,
i.e., such that ασ(1) ≤ . . . ≤ ασ(N+1). Under the assumption that the hypothetical choice
of yN+1 is in agreement with the true data-generating process, and that this process has
the property of exchangeability (which is weaker than the assumption of independence
and essentially means that the order of observations is irrelevant), every permutation σ
has the same probability of occurrence. Consequently, the probability that αN+1 is among
the ε% highest nonconformity scores should be low. This notion can be captured by the
p-values associated with the candidate y, defined as
p(y) ..=#i |αi ≥ αN+1
N + 1
According to what we said, the probability that p(y) < ε (i.e., αN+1 is among the ε%
highest α-values) is upper-bounded by ε. Thus, the hypothesis yN+1 = y can be rejected
for those candidates y for which p(y) < ε.
Conformal prediction as outlined above realizes transductive inference, although inductive
variants also exist (Papadopoulos, 2008). The error bounds are valid and well calibrated
by construction, regardless of the nonconformity function f . However, the choice of this
function has an important influence on the efficiency of conformal prediction, that is, the
size of prediction regions: The more suitable the nonconformity function is chosen, the
smaller these sets will be.
Although conformal prediction is mainly concerned with constructing prediction regions,
the scores produced in the course of this construction can also be used for quantifying un-
certainty. In this regard, the notions of confidence and credibility have been introduced
17Obviously, since Y = R is infinite in regression, a hypothesis test cannot be conducted explicitly foreach candidate outcome y.
37
(Gammerman and Vovk, 2002): Let p1, . . . , pK denote the p-values that correspond, re-
spectively, to the candidate outcomes y1, . . . , yK in a classification problem. If a definite
choice (point prediction) y has to be made, it is natural to pick the yi with the highest
p-value. The value pi itself is then a natural measure of credibility, since the larger (closer
to 1) this value, the more likely the prediction is correct. Note that the value also cor-
responds to the largest significance level ε for which the prediction region Y ε would be
empty (since all candidates would be rejected). In other words, it is a degree to which yi is
indeed a plausible candidate that cannot be excluded. Another question one may ask is to
what extent yi is the unique candidate of that kind, i.e., to what extent other candidates
can indeed be excluded. This can be quantified in terms of the greatest 1 − ε for which
Y ε is the singleton set yi, that is, 1 minus the second-highest p-value. Besides, other
methods for quantifying the uncertainty of a point prediction in the context of conformal
prediction have been proposed (Linusson et al., 2016).
With its main concern of constructing valid prediction regions, conformal prediction dif-
fers from most other machine learning methods, which produce point predictions y ∈ Y,
whether equipped with a degree of uncertainty or not. In a sense, conformal prediction
can even be seen as being orthogonal: It predefines the degree of uncertainty (level of
confidence) and adjusts its prediction correspondingly, rather than the other way around.
4.9 Set-valued prediction based on utility maximization
In line with classical statistics, but unlike decision theory and machine learning, the setting
of conformal prediction does not involve any notion of loss function. In this regard,
it differs from methods for set-valued prediction based on utility maximization (or loss
minimization), which are primarily used for multi-class classification problems. Similar
to conformal prediction, such methods also return a set of classes when the classifier is
too uncertain with respect to the class label to predict, but the interpretation of this
set is different. Instead of returning a set that contains the true class label with high
probability, sets that maximize a set-based utility score are sought. Besides, being based
on the conditional distribution p(y |xq) of the outcome y given a query xq, most of these
methods capture conditional uncertainty.
Let u(y, Y ) be a set-based utility score, where y denotes the ground truth outcome and
Y the predicted set. Then, adopting a decision-theoretic perspective, the Bayes-optimal
solution Y ∗u is found by maximizing the following objective:
Y ∗u (xq) = argmaxY ∈2Y\∅
Ep(y |xq)
(u(y, Y )
)= argmax
Y ∈2Y\∅
∑
y∈Yu(y, Y ) p(y |xq) . (41)
Solving (41) as a brute-force search requires checking all subsets of Y, resulting in an
exponential time complexity. However, for many utility scores, the Bayes-optimal predic-
tion can be found more efficiently. Various methods in this direction have been proposed
under different names and qualifications of predictions, such as “indeterminate” (Zaffalon,
2002a), “credal” (Corani and Zaffalon, 2008a), “non-deterministic” (Del Coz et al., 2009),
38
and “cautious” (Yang et al., 2017b). Although the methods typically differ in the exact
shape of the utility function u : Y×2Y \∅ −→ [0, 1], most functions are specific members
of the following family:
u(y, Y ) =
0 if y /∈ Y
g(|Y |) if y ∈ Y , (42)
where |Y | denotes the cardinality of the predicted set Y . This family is characterized by
a sequence (g(1), . . . , g(K)) ∈ [0, 1]K with K the number of classes. Ideally, g should obey
the following properties:
(i) g(1) = 1, i.e., the utility u(y, Y ) should be maximal when the classifier returns the
true class label as a singleton set.
(ii) g(s) should be non-increasing, i.e., the utility u(y, Y ) should be higher if the true
class is contained in a smaller set of predicted classes.
(iii) g(s) ≥ 1/s, i.e., the utility u(y, Y ) of predicting a set containing the true and s− 1
additional classes should not be lower than the expected utility of randomly guessing
one of these s classes. This requirement formalizes the idea of risk-aversion: in the
face of uncertainty, abstaining should be preferred to random guessing (Zaffalon
et al., 2012).
Many existing set-based utility scores are recovered as special cases of (42), including the
three classical measures from information retrieval discussed by Del Coz et al. (2009):
precision with gP (s) = 1/s, recall with gR(s) = 1, and the F1-measure with gF1(s) =
2/(1+s). Other utility functions with specific choices for g are studied in the literature on
credal classification (Corani and Zaffalon, 2008b, 2009; Zaffalon et al., 2012; Yang et al.,
2017b; Nguyen et al., 2018), such as
gδ,γ(s) ..=δ
s− γ
s2, gexp(s) ..= 1− exp
(−δs
), glog(s) ..= log
(1 +
1
s
).
Especially gδ,γ(s) is commonly used in this community, where δ and γ can only take certain
values to guarantee that the utility is in the interval [0, 1]. Precision (here called discounted
accuracy) corresponds to the case (δ, γ) = (1, 0). However, typical choices for (δ, γ) are
(1.6, 0.6) and (2.2, 1.2) (Nguyen et al., 2018), implementing the idea of risk aversion. The
measure gexp(s) is an exponentiated version of precision, where the parameter δ also defines
the degree of risk aversion.
Another example appears in the literature on binary or multi-class classification with reject
option (Herbei and Wegkamp, 2006; Linusson et al., 2018; Ramaswamy et al., 2015). Here,
the prediction can only be a singleton or the full set Y containing K classes. The first
case typically gets a reward of one (if the predicted class is correct), while the second case
should receive a lower reward, e.g. 1 − α. The latter corresponds to abstaining, i.e., not
39
predicting any class label, and the (user-defined) parameter α specifies a penalty for doing
so, with the requirement 0 < α < 1− 1/K to be risk-averse.
Set-valued predictions are also considered in hierarchical multi-class classification, mostly
in the form of internal nodes of the hierarchy (Freitas, 2007; Rangwala and Naik, 2017;
Yang et al., 2017a). Compared to the “flat” multi-class case, the prediction space is thus
restricted, because only sets of classes that correspond to nodes of the hierarchy can be
returned as a prediction. Some of the above utility scores also appear here. For example,
Yang et al. (2017a) evaluate various members of the uδ,γ family in a framework where
hierarchies are considered for computational reasons, while Oh (2017) optimizes recall by
fixing |Y | as a user-defined parameter. Popular in hierarchical classification is the tree-
distance loss, which could also be interpreted as a way of evaluating set-valued predictions
(Bi and Kwok, 2015). This loss is not a member of the family (42), however. Besides, it
appears to be a less interesting loss function from the perspective of abstention in cases
of uncertainty, since by minimizing the tree distance loss, the classifier will almost never
predict leaf nodes of the hierarchy. Instead, it will often predict nodes close to the root of
the hierarchy, which correspond to sets with many elements — a behavior that is unfavored
if one wants to abstain only in cases of sufficient uncertainty.
Quite obviously, methods that maximize set-based utility scores are closely connected to
the quantification of uncertainty, since the decision about a suitable set of predictions
is necessarily derived from information of that kind. The overwhelming majority of the
above-mentioned methods depart from conditional class probabilities p(y |xq) that are
estimated in a classical frequentist way, so that uncertainties in decisions are of aleatoric
nature. Exceptions include (Yang et al., 2017a) and (Nguyen et al., 2018), who further
explore ideas from imprecise probability theory and reliable classification to generate label
sets that capture both aleatoric and epistemic uncertainty.
5 Discussion and conclusion
The importance to distinguish between different types of uncertainty has recently been
recognized in machine learning. The goal of this paper was to sketch existing approaches
in this direction, with an emphasis on the quantification of aleatoric and epistemic un-
certainty about predictions in supervised learning. Looking at the problem from the per-
spective of the standard setting of supervised learning, it is natural to associate epistemic
uncertainty with the lack of knowledge about the true (or Bayes-optimal) hypothesis h∗
within the hypothesis space H, i.e., the uncertainty that is in principle reducible by the
learner and could get rid of by additional data. Aleatoric uncertainty, on the other hand,
is the irreducible part of the uncertainty in a prediction, which is due to the inherently
stochastic nature of the dependency between instances x and outcomes y.
In a Bayesian setting, epistemic uncertainty is hence reflected by the (posterior) probability
p(h | D) on H: The less peaked this distribution, the less informed the learner is, and the
higher its (epistemic) uncertainty. As we argued, however, important information about
40
this uncertainty might get lost through Bayesian model averaging. In this regard, we
also argued that a (graded) set-based representation of epistemic uncertainty could be
a viable alternative to a probabilistic representation, especially due to the difficulty of
representing ignorance with probability distributions. This idea is perhaps most explicit
in version space learning, where epistemic uncertainty is in direct correspondence with the
size of the version space. The approach by Senge et al. (2014) combines concepts from
(generalized) version space learning and Bayesian inference.
There are also other methods for modeling uncertainty, such as conformal prediction, for
which the role of aleatoric and epistemic uncertainty is not immediately clear. Currently,
the field develops very dynamically and is far from being settled. New proposals for
modeling and quantifying uncertainty appear on a regular basis, some of them rather ad
hoc and others better justified. Eventually, it would be desirable to “derive” a measure
of total uncertainty as well as its decomposition into aleatoric and epistemic parts from
basic principles in a mathematically rigorous way, i.e., to develop a kind of axiomatic basis
for such a decomposition, comparable to axiomatic justifications of the entropy measure
(Csiszar, 2008).
In addition to theoretical problems of this kind, there are also many open practical ques-
tions. This includes, for example, the question of how to perform an empirical evaluation
of methods for quantifying uncertainty, whether aleatoric, epistemic, or total. In fact, un-
like for the prediction of a target variable, the data does normally not contain information
about any sort of “ground truth” uncertainty. What is often done, therefore, is to eval-
uate predicted uncertainties indirectly, that is, by assessing their usefulness for improved
prediction and decision making. As an example, recall the idea of utility maximization
discussed in Section 4.9, where the decision is a set-valued prediction. Another example
is accuracy-rejection curves, which depict the accuracy of a predictor as a function of the
percentage of rejections (Huhn and Hullermeier, 2009): A classifier, which is allowed to
abstain on a certain percentage p of predictions, will predict on those (1− p) % on which
it feels most certain. Being able to quantify its own uncertainty well, it should improve
its accuracy with increasing p, hence the accuracy-rejection curve should be monotone
increasing (unlike a flat curve obtained for random abstention).
Most approaches so far neglect model uncertainty, in the sense of proceeding from the
(perhaps implicit) assumption of a correctly specified hypothesis space H, that is, the
assumption that H contains a (probabilistic) predictor which is coherent with the data.
In (Senge et al., 2014), for example, this assumption is reflected by the definition of
plausibility in terms of normalized likelihood, which always ensures the existence of at
least one fully plausible hypothesis—indeed, (35) is a measure of relative, not absolute
plausibility. On the one side, it is true that model induction, like statistical inference in
general, is not possible without underlying assumptions, and that conclusions drawn from
data are always conditional to these assumptions. On the other side, misspecification of
the model class is a common problem in practice, and should therefore not be ignored.
In principle, conflict and inconsistency can be seen as another source of uncertainty, in
addition to randomness and a lack of knowledge. Therefore, it would be useful to reflect
41
this source of uncertainty as well (for example in the form of non-normalized plausibility
functions πH, in which the gap 1−suph πH(h) serves as a measure of inconsistency between
H and the data D, and hence as a measure of model uncertainty).
Related to this is the “closed world” assumption (Deng, 2014), which is often violated
in contemporary machine learning applications. Modeling uncertainty and allowing the
learner to express ignorance is obviously important in scenarios where new classes may
emerge in the course of time, which implies a change of the underlying data-generating
process, or the learner might be confronted with out-of-distribution queries. Some first
proposals for dealing with this case can already be found in the literature (DeVries and
Taylor, 2018; Lee et al., 2018b; Malinin and Gales, 2018; Sensoy et al., 2018).
Finally, there are many ways in which other machine learning methodology may benefit
from a proper quantification of uncertainty, and in which corresponding measures could
be used for “uncertainty-informed” decisions. For example, Nguyen et al. (2019) take
advantage of the distinction between epistemic and aleatoric uncertainty in active learning,
arguing that the former is more relevant as a selection criterion for uncertainty sampling
than the latter. Likewise, as already said, a suitable quantification of uncertainty can be
very useful in set-valued prediction, where the learner is allowed to predict a subset of
outcomes (and hence to partially abstain) in cases of uncertainty.
Acknowledgment
The authors like to thank Sebastian Destercke, Karlson Pfannschmidt, and Ammar Shaker
for helpful remarks and comments on the content of this paper. We are also grateful for
the additional suggestions made by the reviewers. The second author received funding
from the Flemish Government under the “Onderzoeksprogramma Artificiele Intelligentie
(AI) Vlaanderen” Programme.
A Background on uncertainty modeling
The notion of uncertainty has been studied in various branches of science and scientific
disciplines. For a long time, it plays a major role in fields like economics, psychology,
and the social sciences, typically in the appearance of applied statistics. Likewise, its
importance for artificial intelligence has been recognized very early on18, at the latest
with the emergence of expert systems, which came along with the need for handling
inconsistency, incompleteness, imprecision, and vagueness in knowledge representation
(Kruse et al., 1991). More recently, the phenomenon of uncertainty has also attracted
a lot of attention in engineering, where it is studied under the notion of “uncertainty
quantification” (Owhadi et al., 2012); interestingly, a distinction between aleatoric and
epistemic uncertainty, very much in line with our machine learning perspective, is also
18The “Annual Conference on Uncertainty in Artificial Intelligence” (UAI) was launched in the mid1980s.
42
credal sets
coherent lower/upper probabilities
2-monotone capacities
random sets
p-boxes
probabilities
probabilityintervals
possibilities
comonotonic clouds
clouds
„generalizes“
Figure 14: Various uncertainty calculi and common frameworks for uncertainty represen-tation (Destercke et al., 2008). Most of these formalisms are generalizations of standardprobability theory (an arrow denotes an “is more general than” relationship).
made there.
The contemporary literature on uncertainty is rather broad (cf. Fig. 14). In the following,
we give a brief overview, specifically focusing on the distinction between set-based and
distributional (probabilistic) representations. Against the background of our discussion
about aleatoric and epistemic uncertainty, this distinction is arguably important. Roughly
speaking, while aleatoric uncertainty is appropriately modeled in terms of probability
distributions, one may argue that a set-based approach is more suitable for modeling
ignorance and a lack of knowledge, and hence more apt at capturing epistemic uncertainty.
A.1 Sets versus distributions
A generic way for describing situations of uncertainty is to proceed from an underlying
reference set Ω, sometimes called the frame of discernment (Shafer, 1976). This set consists
of all hypotheses, or pieces of precise information, that ought to be distinguished in the
current context. Thus, the elements ω ∈ Ω are exhaustive and mutually exclusive, and
one of them, ω∗, corresponds to the truth. For example, Ω = H,T in the case of
coin tossing, Ω = win, loss, tie in predicting the outcome of a football match, or Ω =
R × R+ in the estimation of the parameters (expected value and standard deviation) of
a normal distribution from data. For ease of exposition and to avoid measure-theoretic
complications, we will subsequently assume that Ω is a discrete (finite or countable) set.
As an aside, we note that the assumption of exhaustiveness of Ω could be relaxed. In
a classification problem in machine learning, for example, not all possible classes might
be known beforehand, or new classes may emerge in the course of time (Hendrycks and
Gimpel, 2017; Liang et al., 2018; DeVries and Taylor, 2018). In the literature, this is
often called the “open world assumption”, whereas an exhaustive Ω is considered as a
43
“closed world” (Deng, 2014). Although this distinction may look technical at first sight, it
has important consequences with regard to the representation and processing of uncertain
information, which specifically concern the role of the empty set. While the empty set is
logically excluded as a valid piece of information under the closed world assumption, it
may suggest that the true state ω∗ is outside Ω under the open world assumption.
There are two basic ways for expressing uncertain information about ω∗, namely, in terms
of subsets and in terms of distributions. A subset C ⊆ Ω corresponds to a constraint
suggesting that ω∗ ∈ C. Thus, information or knowledge19 expressed in this way distin-
guishes between values that are (at least provisionally) considered possible and those that
are definitely excluded. As suggested by common examples such as specifying incomplete
information about a numerical quantity in terms of an interval C = [l, u], a set-based
representation is appropriate for capturing uncertainty in the sense of imprecision.
Going beyond this rough dichotomy, a distribution assigns a weight p(ω) to each element
ω, which can generally be understood as a degree of belief. At first sight, this appears
to be a proper generalization of the set-based approach. Indeed, without any constraints
on the weights, each subset C can be characterized in terms of its indicator function ICon Ω (which is a specific distribution assigning a weight of 1 to each ω ∈ C and 0 to all
ω 6∈ Ω). However, for the specifically important case of probability distributions, this view
is actually not valid.
First, probability distributions need to obey a normalization constraint. In particular,
a probability distribution requires the weights to be nonnegative and integrate to 1. A
corresponding probability measure on Ω is a set-function P : 2Ω −→ [0, 1] such that
P (∅) = 0, P (Ω) = 1, and
P (A ∪B) = P (A) + P (B) (43)
for all disjoint sets (events) A,B ⊆ Ω. With p(ω) = P (ω) for all ω ∈ Ω it follows that
P (A) =∑
ω∈A p(ω), and hence∑
ω∈Ω p(ω) = 1. Since the set-based approach does not
(need to) satify this constraint, it is no longer a special case.
Second, in addition to the question of how information is represented, it is of course im-
portant to ask how the information is processed. In this regard, the probabilistic calculus
differs fundamentally from constraint-based (set-based) information processing. The char-
acteristic property of probability is its additivity (43), suggesting that the belief in the
disjunction (union) of two (disjoint) events A and B is the sum of the belief in either
of them. In contrast to this, the set-based approach is more in line with a logical in-
terpretation and calculus. Interpreting a constraint C as a logical proposition (ω ∈ C),
an event A ⊆ Ω is possible as soon as A ∩ C 6= ∅ and impossible otherwise. Thus, the
information (ω ∈ C) can be associated with a set-function Π : 2Ω −→ 0, 1 such that
Π(A) = JA ∩ C 6= ∅K. Obviously, this set-function satisfies Π(∅) = 0, Π(Ω) = 1, and
Π(A ∪B) = max(Π(A),Π(B)
)(44)
19We do not distinguish between the notions of information and knowledge in this paper.
44
Thus, Π is “maxitive” instead of additive (Shilkret, 1971; Dubois, 2006). Roughly speak-
ing, an event A is evaluated according to its (logical) consistency with a constraint C,
whereas in probability theory, an event A is evaluated in terms of its probability of oc-
currence. The latter is reflected by the probability mass assigned to A, and requires
a comparison of this mass with the mass of other events (since only one outcome ω is
possible, the elementary events compete with each other). Consequently, the calculus of
probability, including rules for combination of information, conditioning, etc., is quite dif-
ferent from the corresponding calculus of constraint-based information processing (Dubois,
2006).
A.2 Representation of ignorance
From the discussion so far, it is clear that a probability distribution is essentially model-
ing the phenomenon of chance rather than imprecision. One may wonder, therefore, to
what extent it is suitable for representing epistemic uncertainty in the sense of a lack of
knowledge.
In the set-based approach, there is an obvious correspondence between the degree of
uncertainty or imprecision and the cardinality of the set C: the larger |C|, the larger the
lack of knowledge20. Consequently, knowledge gets weaker by adding additional elements
to C. In probability, the total amount of belief is fixed in terms of a unit mass that
is distributed among the elements ω ∈ Ω, and increasing the weight of one element ω
requires decreasing the weight of another element ω′ by the same amount. Therefore, the
knowledge expressed by a probability measure cannot be “weakened” in a straightforward
way.
Of course, there are also measures of uncertainty for probability distributions, most notably
the (Shannon) entropy
H(p) ..= −∑
ω∈Ω
p(ω) log p(ω) .
However, these are primarily capturing the shape of the distribution, namely its “peaked-
ness” or non-uniformity (Dubois and Hullermeier, 2007), and hence inform about the
predictability of the outcome of a random experiment. Seen from this point of view, they
are more akin to aleatoric uncertainty, whereas the set-based approach is arguably better
suited for capturing epistemic uncertainty.
For these reasons, it has been argued that probability distributions are less suitable for rep-
resenting ignorance in the sense of a lack of knowledge (Dubois et al., 1996). For example,
the case of complete ignorance is typically modeled in terms of the uniform distribution
p ≡ 1/|Ω| in probability theory; this is justified by the “principle of indifference” invoked
by Laplace, or by referring to the principle of maximum entropy21. Then, however, it is
not possible to distinguish between (i) precise (probabilistic) knowledge about a random
20In information theory, a common uncertainty measure is log(|C|).21Obviously, there is a technical problem in defining the uniform distribution in the case where Ω is not
finite.
45
event, such as tossing a fair coin, and (ii) a complete lack of knowledge, for example due to
an incomplete description of the experiment. This was already pointed out by the famous
Ronald Fisher, who noted that “not knowing the chance of mutually exclusive events and
knowing the chance to be equal are two quite different states of knowledge”.
Another problem in this regard is caused by the measure-theoretic grounding of probability
and its additive nature. For example, the uniform distribution is not invariant under
reparametrization (a uniform distribution on a parameter ω does not translate into a
uniform distribution on 1/ω, although ignorance about ω implies ignorance about 1/ω).
For example, expressing ignorance about the length x of a cube in terms of a uniform
distribution on an interval [l, u] does not yield a uniform distribution of x3 on [l3, u3],
thereby suggesting some degree of informedness about its volume. Problems of this kind
render the use of a uniform prior distribution, often interpreted as representing epistemic
uncertainty in Bayesian inference, at least debatable22.
A.3 Sets of distributions
Given the complementary nature of sets and distributions, and the observation that both
have advantages and disadvantages, one may wonder whether the two could not be com-
bined in a meaningful way. Indeed, the argument that a single (probability) distribution
is not enough for representing uncertain knowledge is quite prominent in the literature,
and many generalized theories of uncertainty can be considered as a combination of that
kind (Dubois and Prade, 1988; Walley, 1991; Shafer, 1976; Smets and Kennes, 1994).
Since we are specifically interested in aleatoric and epistemic uncertainty, and since these
two types of uncertainty are reasonably captured in terms of sets and probability dis-
tributions, respectively, a natural idea is to consider sets of probability distributions. In
the literature on imprecise probability, these are also called credal sets (Cozman, 2000;
Zaffalon, 2002b). An illustration is given in Fig. 15, where probability distributions on
Ω = a, b, c are represented as points in a Barycentric coordinate systems. A credal set
then corresponds to a subset of such points, suggesting a lack of knowledge about the true
distribution but restricting it in terms of a set of possible candidates.
Credal sets are typically assumed to be convex subsets of the class P of all probability
distributions on Ω. Such sets can be specified in different ways, for example in terms of
upper and lower bounds on the probabilities P (A) of events A ⊆ Ω. A specifically simple
approach (albeit of limited expressivity) is the use of so-called possibility distributions
and related possibility measures (Dubois and Prade, 1988). A possibility distribution is a
mapping π : Ω −→ [0, 1], and the associated measure is given by
Π : 2Ω −→ [0, 1], A 7→ supω∈A
π(ω) . (45)
A measure of that kind can be interpreted as an upper bound, and thus defines a set P of
22This problem is inherited by hierarchical Bayesian modeling. See work on “non-informative” priors,however (Jeffreys, 1946; Bernardo, 1979).
46
credalset
credalset
p(a)<latexit sha1_base64="K9TZUpKxZ0YMY7V8pQIVTD1Q2G8=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNdwjvD+L5cJSO02Uk90G2BiO2jsl8b1DnleGtQk1cgnOzLG2o8GBJcIldnLcOG+BXcIGzAPtirvDLAbrkXWCqpDY2PE3Jkt384UE5t1BlcCqgS7et9eT/tFlL9afCC920hJqvCtWtTMgk/TaSSljkJBcBALci9JrwS7DAKewszi1q/MmNUqArn9eghFxUWEMrqfO5q+9wnG/6HFI3ywqf9/1wkH6Udd12smvkK1NpZNXPZu58m66JNWW3SlTWfhLkcJls+w73wenROEvH2fej0fGH9Y122Bv2lh2yjH1kx+wrm7Ap4+wX+81u2G20H32OTqIvK2s0WP95zf6J6NtfkjHN0w==</latexit><latexit sha1_base64="K9TZUpKxZ0YMY7V8pQIVTD1Q2G8=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNdwjvD+L5cJSO02Uk90G2BiO2jsl8b1DnleGtQk1cgnOzLG2o8GBJcIldnLcOG+BXcIGzAPtirvDLAbrkXWCqpDY2PE3Jkt384UE5t1BlcCqgS7et9eT/tFlL9afCC920hJqvCtWtTMgk/TaSSljkJBcBALci9JrwS7DAKewszi1q/MmNUqArn9eghFxUWEMrqfO5q+9wnG/6HFI3ywqf9/1wkH6Udd12smvkK1NpZNXPZu58m66JNWW3SlTWfhLkcJls+w73wenROEvH2fej0fGH9Y122Bv2lh2yjH1kx+wrm7Ap4+wX+81u2G20H32OTqIvK2s0WP95zf6J6NtfkjHN0w==</latexit><latexit sha1_base64="K9TZUpKxZ0YMY7V8pQIVTD1Q2G8=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNdwjvD+L5cJSO02Uk90G2BiO2jsl8b1DnleGtQk1cgnOzLG2o8GBJcIldnLcOG+BXcIGzAPtirvDLAbrkXWCqpDY2PE3Jkt384UE5t1BlcCqgS7et9eT/tFlL9afCC920hJqvCtWtTMgk/TaSSljkJBcBALci9JrwS7DAKewszi1q/MmNUqArn9eghFxUWEMrqfO5q+9wnG/6HFI3ywqf9/1wkH6Udd12smvkK1NpZNXPZu58m66JNWW3SlTWfhLkcJls+w73wenROEvH2fej0fGH9Y122Bv2lh2yjH1kx+wrm7Ap4+wX+81u2G20H32OTqIvK2s0WP95zf6J6NtfkjHN0w==</latexit><latexit sha1_base64="K9TZUpKxZ0YMY7V8pQIVTD1Q2G8=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNdwjvD+L5cJSO02Uk90G2BiO2jsl8b1DnleGtQk1cgnOzLG2o8GBJcIldnLcOG+BXcIGzAPtirvDLAbrkXWCqpDY2PE3Jkt384UE5t1BlcCqgS7et9eT/tFlL9afCC920hJqvCtWtTMgk/TaSSljkJBcBALci9JrwS7DAKewszi1q/MmNUqArn9eghFxUWEMrqfO5q+9wnG/6HFI3ywqf9/1wkH6Udd12smvkK1NpZNXPZu58m66JNWW3SlTWfhLkcJls+w73wenROEvH2fej0fGH9Y122Bv2lh2yjH1kx+wrm7Ap4+wX+81u2G20H32OTqIvK2s0WP95zf6J6NtfkjHN0w==</latexit>
p(b)<latexit sha1_base64="ise98gc9cmBfqpk41k+wrlM8k5g=">AAAClHicbVHLahsxFJWnr3T6iNNCNt0MdQrpxsxk0XbRRSAUuiou1EnAM5grzVUioscgaVKMOj/TbfJD/ZtqbAdcpxcEh3OO7pM2Ujif538GyYOHjx4/2XmaPnv+4uXucO/VqTOtZThlRhp7TsGhFBqnXniJ541FUFTiGb066fWza7ROGP3DLxqsFFxowQUDH6n5cP+gNA1a8MZqUBia7pC+P0jnw1E+zpeR3QfFGozIOibzvQEva8NahdozCc7NirzxVQDrBZPYpWXrsAF2BRc4i7Av5qqwHKDL3kWmzrix8WmfLdnNHwGUcwtFo1OBv3TbWk/+T5u1nn+qgtBN61GzVSHeysybrN9GVguLzMtFBMCsiL1m7BIsMB93lpYWNf5kRinQdSg5KCEXNXJope9C6fgdTstNn0PfzYoqlH0/DGQYFV23newa2cpEjaz72cydb9M1sYZ2q0SUh0mU42WK7TvcB6dH4yIfF9+PRscf1jfaIW/IW3JICvKRHJOvZEKmhJFf5De5IbfJfvI5OUm+rKzJYP3nNfknkm9/AZRNzdQ=</latexit><latexit sha1_base64="ise98gc9cmBfqpk41k+wrlM8k5g=">AAAClHicbVHLahsxFJWnr3T6iNNCNt0MdQrpxsxk0XbRRSAUuiou1EnAM5grzVUioscgaVKMOj/TbfJD/ZtqbAdcpxcEh3OO7pM2Ujif538GyYOHjx4/2XmaPnv+4uXucO/VqTOtZThlRhp7TsGhFBqnXniJ541FUFTiGb066fWza7ROGP3DLxqsFFxowQUDH6n5cP+gNA1a8MZqUBia7pC+P0jnw1E+zpeR3QfFGozIOibzvQEva8NahdozCc7NirzxVQDrBZPYpWXrsAF2BRc4i7Av5qqwHKDL3kWmzrix8WmfLdnNHwGUcwtFo1OBv3TbWk/+T5u1nn+qgtBN61GzVSHeysybrN9GVguLzMtFBMCsiL1m7BIsMB93lpYWNf5kRinQdSg5KCEXNXJope9C6fgdTstNn0PfzYoqlH0/DGQYFV23newa2cpEjaz72cydb9M1sYZ2q0SUh0mU42WK7TvcB6dH4yIfF9+PRscf1jfaIW/IW3JICvKRHJOvZEKmhJFf5De5IbfJfvI5OUm+rKzJYP3nNfknkm9/AZRNzdQ=</latexit><latexit sha1_base64="ise98gc9cmBfqpk41k+wrlM8k5g=">AAAClHicbVHLahsxFJWnr3T6iNNCNt0MdQrpxsxk0XbRRSAUuiou1EnAM5grzVUioscgaVKMOj/TbfJD/ZtqbAdcpxcEh3OO7pM2Ujif538GyYOHjx4/2XmaPnv+4uXucO/VqTOtZThlRhp7TsGhFBqnXniJ541FUFTiGb066fWza7ROGP3DLxqsFFxowQUDH6n5cP+gNA1a8MZqUBia7pC+P0jnw1E+zpeR3QfFGozIOibzvQEva8NahdozCc7NirzxVQDrBZPYpWXrsAF2BRc4i7Av5qqwHKDL3kWmzrix8WmfLdnNHwGUcwtFo1OBv3TbWk/+T5u1nn+qgtBN61GzVSHeysybrN9GVguLzMtFBMCsiL1m7BIsMB93lpYWNf5kRinQdSg5KCEXNXJope9C6fgdTstNn0PfzYoqlH0/DGQYFV23newa2cpEjaz72cydb9M1sYZ2q0SUh0mU42WK7TvcB6dH4yIfF9+PRscf1jfaIW/IW3JICvKRHJOvZEKmhJFf5De5IbfJfvI5OUm+rKzJYP3nNfknkm9/AZRNzdQ=</latexit><latexit sha1_base64="ise98gc9cmBfqpk41k+wrlM8k5g=">AAAClHicbVHLahsxFJWnr3T6iNNCNt0MdQrpxsxk0XbRRSAUuiou1EnAM5grzVUioscgaVKMOj/TbfJD/ZtqbAdcpxcEh3OO7pM2Ujif538GyYOHjx4/2XmaPnv+4uXucO/VqTOtZThlRhp7TsGhFBqnXniJ541FUFTiGb066fWza7ROGP3DLxqsFFxowQUDH6n5cP+gNA1a8MZqUBia7pC+P0jnw1E+zpeR3QfFGozIOibzvQEva8NahdozCc7NirzxVQDrBZPYpWXrsAF2BRc4i7Av5qqwHKDL3kWmzrix8WmfLdnNHwGUcwtFo1OBv3TbWk/+T5u1nn+qgtBN61GzVSHeysybrN9GVguLzMtFBMCsiL1m7BIsMB93lpYWNf5kRinQdSg5KCEXNXJope9C6fgdTstNn0PfzYoqlH0/DGQYFV23newa2cpEjaz72cydb9M1sYZ2q0SUh0mU42WK7TvcB6dH4yIfF9+PRscf1jfaIW/IW3JICvKRHJOvZEKmhJFf5De5IbfJfvI5OUm+rKzJYP3nNfknkm9/AZRNzdQ=</latexit>
p(c)<latexit sha1_base64="aYx2BED4kAGt/LO8VPXraTHEn5A=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNd8jfH8Tz4Sgdp8tI7oNsDUZsHZP53qDOK8NbhZq4BOdmWdpQ4cGS4BK7OG8dNsCv4AJnAfbFXOGXA3TJu8BUSW1seJqSJbv5w4NybqHK4FRAl25b68n/abOW6k+FF7ppCTVfFapbmZBJ+m0klbDISS4CAG5F6DXhl2CBU9hZnFvU+JMbpUBXPq9BCbmosIZWUudzV9/hON/0OaRulhU+7/vhIP0o67rtZNfIV6bSyKqfzdz5Nl0Ta8pulais/STI4TLZ9h3ug9OjcZaOs+9Ho+MP6xvtsDfsLTtkGfvIjtlXNmFTxtkv9pvdsNtoP/ocnURfVtZosP7zmv0T0be/lmnN1Q==</latexit><latexit sha1_base64="aYx2BED4kAGt/LO8VPXraTHEn5A=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNd8jfH8Tz4Sgdp8tI7oNsDUZsHZP53qDOK8NbhZq4BOdmWdpQ4cGS4BK7OG8dNsCv4AJnAfbFXOGXA3TJu8BUSW1seJqSJbv5w4NybqHK4FRAl25b68n/abOW6k+FF7ppCTVfFapbmZBJ+m0klbDISS4CAG5F6DXhl2CBU9hZnFvU+JMbpUBXPq9BCbmosIZWUudzV9/hON/0OaRulhU+7/vhIP0o67rtZNfIV6bSyKqfzdz5Nl0Ta8pulais/STI4TLZ9h3ug9OjcZaOs+9Ho+MP6xvtsDfsLTtkGfvIjtlXNmFTxtkv9pvdsNtoP/ocnURfVtZosP7zmv0T0be/lmnN1Q==</latexit><latexit sha1_base64="aYx2BED4kAGt/LO8VPXraTHEn5A=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNd8jfH8Tz4Sgdp8tI7oNsDUZsHZP53qDOK8NbhZq4BOdmWdpQ4cGS4BK7OG8dNsCv4AJnAfbFXOGXA3TJu8BUSW1seJqSJbv5w4NybqHK4FRAl25b68n/abOW6k+FF7ppCTVfFapbmZBJ+m0klbDISS4CAG5F6DXhl2CBU9hZnFvU+JMbpUBXPq9BCbmosIZWUudzV9/hON/0OaRulhU+7/vhIP0o67rtZNfIV6bSyKqfzdz5Nl0Ta8pulais/STI4TLZ9h3ug9OjcZaOs+9Ho+MP6xvtsDfsLTtkGfvIjtlXNmFTxtkv9pvdsNtoP/ocnURfVtZosP7zmv0T0be/lmnN1Q==</latexit><latexit sha1_base64="aYx2BED4kAGt/LO8VPXraTHEn5A=">AAAClHicbVHLattAFB2rr1R9xGkhm25EnUK6MVIWbRddBEKhq+JCnQQsYa5GV8mQeYiZqxQz1c90m/xQ/6Yj2wHX6YWBwzln7rNspHCUpn8G0YOHjx4/2XkaP3v+4uXucO/VqTOt5TjlRhp7XoJDKTROSZDE88YiqFLiWXl10utn12idMPoHLRosFFxoUQsOFKj5cP8gNw1aIGM1KPRNd8jfH8Tz4Sgdp8tI7oNsDUZsHZP53qDOK8NbhZq4BOdmWdpQ4cGS4BK7OG8dNsCv4AJnAfbFXOGXA3TJu8BUSW1seJqSJbv5w4NybqHK4FRAl25b68n/abOW6k+FF7ppCTVfFapbmZBJ+m0klbDISS4CAG5F6DXhl2CBU9hZnFvU+JMbpUBXPq9BCbmosIZWUudzV9/hON/0OaRulhU+7/vhIP0o67rtZNfIV6bSyKqfzdz5Nl0Ta8pulais/STI4TLZ9h3ug9OjcZaOs+9Ho+MP6xvtsDfsLTtkGfvIjtlXNmFTxtkv9pvdsNtoP/ocnURfVtZosP7zmv0T0be/lmnN1Q==</latexit>
Figure 15: Probability distributions on Ω = a, b, c as points in a Barycentric coordi-nate system: Precise knowledge (left) versus incomplete knowledge (middle) and completeignorance (right) about the true distribution.
dominated probability distributions:
P ..=P ∈ P |P (A) ≤ Π(A) for all A ⊆ Ω
(46)
Formally, a possibility measure on Ω satisfies Π(∅) = 0, Π(Ω) = 1, and Π(A ∪ B) =
max(Π(A),Π(B)) for all A,B ⊆ Ω. Thus, it generalizes the maxitivity (44) of sets in
the sense that Π is not (necessarily) an indicator function, i.e., Π(A) is in [0, 1] and not
restricted to 0, 1. A related necessity measure is defined as N(A) = 1 − Π(A), where
A = Ω \A. Thus, an event A is plausible insofar as the complement of A is not necessary.
Or, stated differently, an event A necessarily occurs if the complement of A is not possible.
In a sense, possibility theory combines aspects of both set-based and distributional ap-
proaches. In fact, a possibility distribution can be seen as both a generalized set (in which
elements can have graded degrees of membership) and a non-additive measure. Just like a
probability, it allows for expressing graded degrees of belief (instead of merely distinguish-
ing possible from impossible events), but its calculus is maxitive instead of additive23.
The dual pair of measures (N,Π) allows for expressing ignorance in a proper way, mainly
because A can be declared plausible without declaring A implausible. In particular,
Π(A) ≡ 1 on 2Ω \ ∅ models complete ignorance: Everything is fully plausible, and hence
nothing is necessary (N(A) = 1 − Π(A) = 0 for all A). A probability measure, on the
other hand, is self-dual in the sense that P (A) = 1− P (A). Thus, a probability measure
is playing both roles simultaneously, namely the role of the possibility and the role of the
necessity measure. Therefore, it is more constrained than a representation (N,Π). In a
sense, probability and possibility distributions can be seen as two extremes on the scale
of uncertainty representations24.
23For this reason, possibility measures can also be defined on non-numerical, purely ordinal structures.24Strictly speaking, possibilities are not more expressive than probabilities, since possibility distributions
cannot model degenerate probability distributions: Π 6= N unless Π(ω∗) = 1 for some ω∗ ∈ Ω andΠ(ω) = 0 otherwise.
47
0.10.20.10.20.10.3
!1 !2 !3 !4 !5 !6
(!3)
(!5)
Figure 16: Possibility distribution as a contour function of a basic belief assignment m(values assigned to focal sets on the right). In this example, π(ω5) = 1, because ω5 iscontained in all focal sets, whereas π(ω3) = 0.6.
A.4 Distributions of sets
Sets and distributions can also be combined the other way around, namely in terms of
distributions of sets. Formalisms based on this idea include the calculus of random sets
(Matheron, 1975; Nguyen, 1978) as well as the Dempster-Shafer theory of evidence (Shafer,
1976).
In evidence theory, uncertain information is again expressed in terms of a dual pair of
measures on Ω, a measure of plausibility and a measure of belief. Both can be derived
from an underlying mass function or basic belief assignment m : 2Ω −→ [0, 1], for which
m(∅) = 0 and∑
B⊆Ωm(B) = 1. Obviously, m defines a probability distribution on the
subsets of Ω. Thus, instead of a single set or constraint C, like in the set-based approach,
we are now dealing with a set of such constraints, each one being assigned a weight m(C).
Each C ⊆ Ω such that m(C) > 0 is called a focal element and represents a single piece of
evidence (in favor of ω∗ ∈ C). Assigning masses to subsets C instead of single elements ω
allows for combining randomness and imprecision.
A plausibility and belief function are derived from a mass function m as follows:
Pl(A) ..=∑
B∩A 6=∅
m(B) , Bel(A) ..=∑
B⊆Am(B) .
Plausibility (or belief) functions indeed generalize both probability and possibility distri-
butions. A probability distribution is obtained in the case where all focal elements are
singleton sets. A possibility measure is obtained as a special case of a plausibility measure
(and, correspondingly, a necessity measure as a special case of a belief measure) for a
mass function the focal sets of which are nested, i.e., such that C1 ⊂ C2 ⊂ · · · ⊂ Cr. The
corresponding possibility distribution is the contour function of the plausibility measure:
π(ω) = Pl(ω) ..=∑
C:ω∈C m(C) for all ω ∈ Ω. Thus, π(ω) can be interpreted as the
probability that ω is contained in a subset C that is randomly chosen according to the
distribution m (see Fig. 16 for an illustration).
48
Note that we have obtained a possibility distribution in two ways and for two different
interpretations, representing a set of distributions as well as a distribution of sets. One
concrete way to define a possibility distribution π in a data-driven way, which is specifically
interesting in the context of statistical inference, is in terms of the normalized or relative
likelihood. Consider the case where ω∗ is the parameter of a probability distribution, and
we are interested in estimating this parameter based on observed data D; in other words,
we are interested in identifying the distribution within the family Pω |ω ∈ Ω from which
the data was generated. The likelihood function is then given by L(ω;D) ..= Pω(D), and
the normalized likelihood as
Ln(ω;D) ..=L(ω;D)
supω′∈Ω L(ω;D).
This function can be taken as the contour function of a (consonant) plausibility function
π, i.e., π(ω) = Ln(ω;D) for all ω ∈ Ω; the focal sets then simply correspond to the confi-
dence intervals that can be extracted from the likelihood function, which are of the form
Cα = ω |Ln(ω;D) ≥ α. This is an interesting illustration of the idea of a distribution of
sets: A confidence interval can be seen as a constraint, suggesting that the true parameter
is located inside that interval. However, a single (deterministic) constraint is not mean-
ingful, since there is a tradeoff between the correctness and the precision of the constraint.
Working with a set of constraints—or, say, a flexible constraint—is a viable alternative.
The normalized likelihood was originally introduced by Shafer (1976), and has been jus-
tified axiomatically in the context of statistical inference by Wasserman (1990). Further
arguments in favor of using the relative likelihood as the contour function of a (consonant)
plausibility function are provided by Denoeux (2014), who shows that it can be derived
from three basic principles: the likelihood principle, compatibility with Bayes’ rule, and
the so-called minimal commitment principle. See also (Dubois et al., 1997) and (Cattaneo,
2005) for a discussion of the normalized likelihood in the context of possibility theory.
B Max-min versus sum-product aggregation
Recall the definition of plausibility degrees π(y |xq) as introduced in Section 4.7. The
computation of π(+1 |xq) according to (36) is illustrated in Fig. 17, where the hypothesis
space H is shown schematically as one of the axes. In comparison to Bayesian inference
(14), two important differences are notable:
• First, evidence of hypotheses is represented in terms of normalized likelihood πH(h)
instead of posterior probabilities p(h | D), and support for a class y in terms of
π(y |h,xq) instead of probabilities h(xq) = p(y |xq).
• Second, the “sum-product aggregation” in Bayesian inference is replaced by a “max-
min aggregation”.
49
1
H(h)
H0
(+1 | xq)<latexit sha1_base64="0EsQN3WQBYH1TvsV/LkNXJSnCxg=">AAACNXicbVDLThsxFPVQnuEV2iUbi4AEAkUzYUGXqN10GSQSkDJRdMdzByz8GGwPNBryAf2aLls+hUV3FVs+gA2ekErlcST7Hp17rnx9klxw68LwLpj6MD0zOze/UFtcWl5Zra997FpdGIYdpoU2pwlYFFxhx3En8DQ3CDIReJJcfK36J1doLNfq2A1z7Es4UzzjDJyXBvXGZpzzbbob0XiP3lRXnGiR2qH0pfw+GlzubHpX2AzHoG9JNCENMkF7UH+MU80KicoxAdb2ojB3/RKM40zgqBYXFnNgF3CGPU8VSLT9cvyZEd3ySkozbfxRjo7V/ydKkLZazzsluHP7uleJ7/V6hcs+90uu8sKhYs8PZYWgTtMqGZpyg8yJoSfADPe7UnYOBpjz+dVigwqvmZYSVFrGGUguhilmUAg3KmOb/eM+reh1Nm9Jt9WM9puto1bj8Mskt3myTjbINonIATkk30ibdAgjP8hP8pvcBr+CP8Hf4P7ZOhVMZj6RFwgengD23qtT</latexit>
(+1 | h, xq)<latexit sha1_base64="QusZkpaPCQMk+a8z0lFA1bwEu6Q=">AAACOHicbVDLThsxFPXwJrxSWHZjEUAgUDQTFnSJYMOSSg0gZaLojucOsfBjsD3QaJo/4Gu6LPwIu+5Qt912U09IpfI4kn2PzrlXvj5JLrh1YfgYTExOTc/Mzs3XFhaXllfqH1bPrC4MwzbTQpuLBCwKrrDtuBN4kRsEmQg8T66OK//8Bo3lWn1xgxy7Ei4VzzgD56VefWsjzvk23Y1ovEe/VVd/j8aJFqkdSF/Kr8Pe9c5Gr94Im+EI9C2JxqRBxjjt1f/EqWaFROWYAGs7UZi7bgnGcSZwWIsLizmwK7jEjqcKJNpuOfrPkG56JaWZNv4oR0fq/xMlSFut5zsluL597VXie16ncNmnbslVXjhU7PmhrBDUaVqFQ1NukDkx8ASY4X5XyvpggDkfYS02qPCWaSlBpWWcgeRikGIGhXDDMrbZP+7Til5n85actZrRfrP1udU4PBrnNkc+knWyTSJyQA7JCTklbcLIHflO7slD8CP4GTwFv55bJ4LxzBp5geD3X661rCU=</latexit>
Figure 17: The plausibility π(+1 |xq) of the positive class is given by the maximum(dashed line) over the pointwise minima of the plausibility of hypotheses h (blue line) andthe corresponding plausibility of the positive class given h (green line).
More formally, the meaning of sum-product aggregation is that (14) corresponds to the
computation of the standard (Lebesque) integral of the class probability p(y |xq) with
respect to the (posterior) probability distribution p(h | D). Here, instead, the definition
of π(y |xq) corresponds to the Sugeno integral (Sugeno, 1974) of the support π(y |h,xq)with respect to the possibility measure ΠH induced by the distribution (35) on H:
π(y |xq) = S
∫
Hπ(y |h,xq) ΠH (47)
In general, given a measurable space (X,A) and anA-measurable function f : X −→ [0, 1],
the Sugeno integral of f with respect to a monotone measure g (i.e., a measure on A such
that g(∅) = 0, g(X) = 1, and g(A) ≤ g(B) for A ⊆ B) is defined as
S
∫
Xf(x) g ..= sup
A∈A
[min
(minx∈A
f(x), g(A)
)]= sup
α∈[0,1]
[min
(α, g(Fα)
)], (48)
where Fα ..= x | f(x) ≥ α.In comparison to sum-product aggregation, max-min aggregation avoids the loss of infor-
mation due to averaging and is more in line with the “existential” aggregation in version
space learning. In fact, it can be seen as a graded generalization of (12). Note that max-
min inference requires the two measures πH(h) and π(+1 |h,xq) to be commensurable.
This is why the normalization of the likelihood according to (35) is important.
Compared to MAP inference (15), max-min inference takes more information into account.
Indeed, MAP inference only looks at the probability of hypotheses but ignores the proba-
bilities assigned to the classes. In contrast, a class can be considered plausible according to
(36) even if not being strongly supported by the most likely hypothesis hml—this merely
requires sufficient support by another hypothesis h, which is not much less likely than hml.
50
References
Abellan J, Moral S (2000) A non-specificity measure for convex sets of probability distri-
butions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
8:357–367
Abellan J, Klir J, Moral S (2006) Disaggregated total uncertainty measure for credal sets.
International Journal of General Systems 35(1)
Aggarwal C, Kong X, Gu Q, Han J, Yu P (2014) Active learning: A survey. In: Data
Classification: Algorithms and Applications, pp. 571–606
Antonucci A, Corani G, Gabaglio S (2012) Active learning by the naive credal classifier.
In: Proceedings of the Sixth European Workshop on Probabilistic Graphical Models
(PGM), pp. 3–10
Balasubramanian V, Ho S, Vovk V (eds) (2014) Conformal Prediction for Reliable Machine
Learning: Theory, Adaptations and Applications. Morgan Kaufmann
Barber R, Candes E, Ramdas A, Tibshirani R (2020) The limits of distribution-free con-
ditional predictive inference. CoRR abs/1903.04684v2, URL http://arxiv.org/abs/
1903.04684v2
Bazargami M, Mac-Namee B (2019) The elliptical basis function data descriptor network:
A one-class classification approach for anomaly detection. In: European Conference on
Machine Learning and Knowledge Discovery in Databases
Bernardo J (1979) Reference posterior distributions for Bayesian inference. Journal of the
Royal Statistical Society, Series B (Methodological) 41(2):113–147
Bernardo J (2005) An introduction to the imprecise Dirichlet model for multinomial data.
International Journal of Approximate Reasoning 39(2–3):123–150
Bi W, Kwok J (2015) Bayes-optimal hierarchical multilabel classification. IEEE Transac-
tions on Knowledge and Data Engineering 27:1–1, DOI 10.1109/TKDE.2015.2441707
Blum M, Riedmiller M (2013) Optimization of Gaussian process hyperparameters using
Rprop. In: Proc. ESANN, 21st European Symposium on Artificial Neural Networks,
Bruges, Belgium
Breiman L (2001) Random forests. Machine Learning 45(1):5–32
Cattaneo M (2005) Likelihood-based statistical decisions. In: Proc. 4th Int. Symposium
on Imprecise Probabilities and their Applications, pp. 107–116
Chow C (1970) On optimum recognition error and reject tradeoff. IEEE Transactions on
Information Theory IT-16:41–46
51
Corani G, Zaffalon M (2008a) Learning reliable classifiers from small or incomplete data
sets: The naive credal classifier 2. Journal of Machine Learning Research 9:581–621
Corani G, Zaffalon M (2008b) Learning reliable classifiers from small or incomplete data
sets: The naive credal classifier 2. Journal of Machine Learning Research 9:581–621
Corani G, Zaffalon M (2009) Lazy naive credal classifier. In: Proceedings of the 1st ACM
SIGKDD Workshop on Knowledge Discovery from Uncertain Data, ACM, New York,
NY, USA, U ’09, pp. 30–37, DOI 10.1145/1610555.1610560
Cozman F (2000) Credal networks. Artificial Intelligence 120(2):199–233
Csiszar I (2008) Axiomatic characterizations of information measures. Entropy 10:261–273
Del Coz J, Dıez J, Bahamonde A (2009) Learning nondeterministic classifiers. The Journal
of Machine Learning Research 10:2273–2293
Deng Y (2014) Generalized evidence theory. CoRR abs/404.4801.v1, URL http://arxiv.
org/abs/404.4801
Denker J, LeCun Y (1991) Transforming neural-net output levels to probability distribu-
tions. In: Proc. NIPS, Advances in Neural Information Processing Systems
Denoeux T (2014) Likelihood-based belief function: Justification and some extensions to
low-quality data. International Journal of Approximate Reasoning 55(7):1535–1547
Depeweg S, Hernandez-Lobato J, Doshi-Velez F, Udluft S (2018) Decomposition of un-
certainty in Bayesian deep learning for efficient and risk-sensitive learning. In: Proc.
ICML, 35th International Conference on Machine Learning, Stockholm, Sweden
Der Kiureghian A, Ditlevsen O (2009) Aleatory or epistemic? does it matter? Structural
Safety 31:105–112
Destercke S, Dubois D, Chojnacki E (2008) Unifying practical uncertainty representations:
I. Generalized p-boxes. International Journal of Approximate Reasoning 49:649–663
DeVries T, Taylor G (2018) Learning confidence for out-of-distribution detection in neural
networks. CoRR abs/1802.04865, URL http://arxiv.org/abs/1802.04865
Dubois D (2006) Possibility theory and statistical reasoning. Computational Statistics and
Data Analysis 51(1):47–69
Dubois D, Hullermeier E (2007) Comparing probability measures using possibility the-
ory: A notion of relative peakedness. International Journal of Approximate Reasoning
45(2):364–385
Dubois D, Prade H (1988) Possibility Theory. Plenum Press
Dubois D, Prade H, Smets P (1996) Representing partial ignorance. IEEE Transactions
on Systems, Man and Cybernetics, Series A 26(3):361–377
52
Dubois D, Moral S, Prade H (1997) A semantics for possibility theory based on likelihoods.
Journal of Mathematical Analysis and Applications 205(2):359–380
Endres D, Schindelin J (2003) A new metric for probability distributions. IEEE Transac-
tions on Information Theory 49(7):1858–1860
Flach P (2017) Classifier calibration. In: Encyclopedia of Machine Learning and Data
Mining, Springer, pp. 210–217
Freitas AA (2007) A tutorial on hierarchical classification with applications in bioinfor-
matics. In: Research and Trends in Data Mining Technologies and Applications,, pp.
175–208
Frieden B (2004) Science from Fisher Information: A Unification. Cambridge University
Press
Gal Y, Ghahramani Z (2016) Bayesian convolutional neural networks with Bernoulli ap-
proximate variational inference. In: Proc. of the ICLR Workshop Track
Gama J (2012) A survey on learning from data streams: current and future trends.
Progress in Artificial Intelligence 1(1):45–55
Gammerman A, Vovk V (2002) Prediction algorithms and confidence measures based on
algorithmic randomness theory. Theoretical Computer Science 287:209–217
Gneiting T, Raftery A (2005) Strictly proper scoring rules, prediction, and estimation.
Tech. Rep. 463R, Department of Statistics, University of Washington
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. Goodfellow, I., Bengio, Y.,
Courville, A.: Deep Learning. The MIT Press, Cambridge, Massachusetts, London,
England
Graves A (2011) Practical variational inference for neural networks. In: Proc. NIPS, Ad-
vances in Neural Information Processing Systems, pp. 2348–2356
Hartley R (1928) Transmission of information. Bell Syst Tech Journal 7(3):535–563
Hechtlinger Y, Poczos B, Wasserman L (2019) Cautious deep learning. CoRR
abs/1805.09460.v2, URL http://arxiv.org/abs/1805.09460
Hellman M (1970) The nearest neighbor classification rule with a reject option. IEEE
Transactions on Systems, Man and Cybernetics SMC-6:179–185
Hendrycks D, Gimpel K (2017) A baseline for detecting misclassified and out-of-
distribution examples in neural networks. In: Proc. ICLR, Int. Conference on Learning
Representations
Herbei R, Wegkamp M (2006) Classification with reject option. Canadian Journal of Statis-
tics 34(4):709–721
53
Hora S (1996) Aleatory and epistemic uncertainty in probability elicitation with an ex-
ample from hazardous waste management. Reliability Engineering and System Safety
54(2–3):217–223
Huhn J, Hullermeier E (2009) FR3: A fuzzy rule learner for inducing reliable classifiers.
IEEE Transactions on Fuzzy Systems 17(1):138–149
Hullermeier E, Brinker K (2008) Learning valued preference structures for solving classi-
fication problems. Fuzzy Sets and Systems 159(18):2337–2352
Jeffreys H (1946) An invariant form for the prior probability in estimation problems.
Proceedings of the Royal Society A 186:453–461
Johansson U, Lofstrom T, Sundell H, Linusson H, Gidenstam A, Bostrom H (2018) Venn
predictors for well-calibrated probability estimation trees. In: Proc. COPA, 7th Sym-
posium on Conformal and Probabilistic Prediction and Applications, Maastricht, The
Netherlands, pp. 3–14
Jordan M, Ghahramani Z, Jaakkola T, Saul L (1999) An introduction to variational meth-
ods for graphical models. Machine Learning 37(2):183–233
Kay DM (1992) A practical Bayesian framework for backpropagation networks. Neural-
Computation 4(3):448–472
Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for
computer vision? In: Proc. NIPS, Advances in Neural Information Processing Systems,
pp. 5574–5584
Khan SS, Madden MG (2014) One-class classification: taxonomy of study and review of
techniques. The Knowledge Engineering Review 29(3):345374, URL http://dx.doi.
org/10.1017/S026988891300043X
Klement E, Mesiar R, Pap E (2002) Triangular Norms. Kluwer Academic Publishers
Klir G (1994) Measures of uncertainty in the Dempster-Shafer theory of evidence. In:
Yager R, Fedrizzi M, Kacprzyk J (eds) Advances in the Dempster-Shafer theory of
evidence, Wiley, New York, pp. 35–49
Klir G, Mariano M (1987) On the uniqueness of possibilistic measure of uncertainty and
information. Fuzzy Sets and Systems 24(2):197–219
Kolmogorov A (1965) Three approaches to the quantitative definition of information.
Problems Inform Trans 1(1):1–7
Kruppa J, Liu Y, Biau G, Kohler M, Konig I, Malley J, Ziegler A (2014) Probability
estimation with machine learning methods for dichotomous and multi-category outcome:
Theory. Biometrical Journal 56(4):534–563
54
Kruse R, Schwecke E, Heinsohn J (1991) Uncertainty and Vagueness in Knowledge Based
Systems. Springer-Verlag
Kull M, Flach P (2014) Reliability maps: A tool to enhance probability estimates and
improve classification accuracy. In: Proc. ECML/PKDD, European Conference on Ma-
chine Learning and Principles and Practice of Knowledge Discovery in Databases, Nancy,
France, pp. 18–33
Kull M, de Menezes T, Filho S, Flach P (2017) Beta calibration: a well-founded and
easily implemented improvement on logistic calibration for binary classifiers. In: Proc.
AISTATS, 20th International Conference on Artificial Intelligence and Statistics, Fort
Lauderdale, FL, USA, pp. 623–631
Lakshminarayanan B, Pritzel A, C. Blundell (2017) Simple and scalable predictive uncer-
tainty estimation using deep ensembles. In: Proc. NeurIPS, 31st Conference on Neural
Information Processing Systems, Long Beach, California, USA
Lambrou A, Papadopoulos H, Gammerman A (2011) Reliable confidence measures for
medical diagnosis with evolutionary algorithms. IEEE Trans on Information Technology
in Biomedicine 15(1):93–99
Lassiter D (2020) Representing credal imprecision: from sets of measures to hierarchical
Bayesian models. Philosophical Studies Forthcoming
Lee J, Bahri Y, Novak R, Schoenholz S, Pennington J, Sohl-Dickstein J (2018a) Deep
neural networks as Gaussian processes. In: Proc. ICLR, Int. Conference on Learning
Representations
Lee K, Lee K, Lee H, Shin J (2018b) A simple unified framework for detecting out-of-
distribution samples and adversarial attacks. CoRR abs/1807.03888.v2, URL http:
//arxiv.org/abs/1807.03888
Liang S, Li Y, R. Srikant (2018) Enhancing the reliability of out-of-distribution image
detection in neural networks. In: Proc. ICLR, Int. Conference on Learning Representa-
tions
Linusson H, Johansson U, Bostrom H, Lofstrom T (2016) Reliable confidence predictions
using conformal prediction. In: Proc. PAKDD, 20th Pacific-Asia Conference on Knowl-
edge Discovery and Data Mining, Auckland, New Zealand
Linusson H, Johansson U, Bostrom H, Lofstrom T (2018) Classification with reject op-
tion using conformal prediction. In: Proc. PAKDD, 22nd Pacific-Asia Conference on
Knowledge Discovery and Data Mining, Melbourne, VIC, Australia
Liu FT, Ting KM, hua Zhou Z (2009) Isolation forest. In: Proc. ICDM 2008, Eighth IEEE
International Conference on Data Mining, IEEE Computer Society, pp. 413–422
55
Maau DD, Cozman F, Conaty D, de Campos CP (2017) Credal sum-product networks. In:
PMLR: Proceedings of Machine Learning Research (ISIPTA 2017), vol 62, pp. 205–216
Malinin A, Gales M (2018) Predictive uncertainty estimation via prior networks. In:
Proc. NeurIPS, 32nd Conference on Neural Information Processing Systems, Montreal,
Canada
Matheron G (1975) Random Sets and Integral Geometry. John Wiley and Sons
Mitchell T (1977) Version spaces: A candidate elimination approach to rule learning. In:
Proceedings IJCAI-77, pp. 305–310
Mitchell T (1980) The need for biases in learning generalizations. Tech. Rep. TR CBM–
TR–117, Rutgers University
Mobiny A, Nguyen H, Moulik S, Garg N, Wu C (2017) DropConnect is effective in modeling
uncertainty of Bayesian networks. CoRR abs/1906.04569, URL http://arxiv.org/
abs/1906.04569
Neal R (2012) Bayesian learning for neural networks. Springer Science & Business Media
118
Nguyen H (1978) On random sets and belief functions. Journal of Mathematical Analysis
and Applications 65:531–542
Nguyen V, Destercke S, Masson M, Hullermeier E (2018) Reliable multi-class classification
based on pairwise epistemic and aleatoric uncertainty. In: Proceedings IJCAI 2018, 27th
International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 5089–
5095
Nguyen V, Destercke S, Hullermeier E (2019) Epistemic uncertainty sampling. In: Proc.
DS 2019, 22nd International Conference on Discovery Science, Split, Croatia
Oh S (2017) Top-k hierarchical classification. In: AAAI, AAAI Press, pp. 2450–2456
Owhadi H, Sullivan T, McKerns M, Ortiz M, Scovel C (2012) Optimal uncertainty quan-
tification. CoRR abs/1009.0679.v3, URL http://arxiv.org/abs/1009.0679
Papadopoulos H (2008) Inductive conformal prediction: Theory and application to neural
networks. Tools in Artificial Intelligence 18(2):315–330
Papernot N, McDaniel P (2018) Deep k-nearest neighbors: Towards confident, inter-
pretable and robust deep learning. CoRR abs/1803.04765v1, URL http://arxiv.org/
abs/1803.04765
Perello-Nieto M, Filho TS, Kull M, Flach P (2016) Background check: A general technique
to build more reliable and versatile classifiers. In: Proc. ICDM, International Conference
on Data Mining
56
Platt J (1999) Probabilistic outputs for support vector machines and comparison to regu-
larized likelihood methods. In: Smola A, Bartlett P, Schoelkopf B, Schuurmans D (eds)
Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, pp. 61–74
Pukelsheim F (2006) Optimal Design of Experiments. SIAM
Ramaswamy HG, Tewari A, Agarwal S (2015) Consistent algorithms for multiclass classi-
fication with a reject option. CoRR abs/1505.04137
Rangwala H, Naik A (2017) Large scale hierarchical classification: foundations, algorithms
and applications. In: The European Conference on Machine Learning and Principles and
Practice of Knowledge Discovery in Databases
Renyi A (1970) Probability Theory. North-Holland, Amsterdam
Sato M, Suzuki J, Shindo H, Matsumoto Y (2018) Interpretable adversarial perturbation
in input embedding space for text. In: Proceedings IJCAI 2018, 27th International Joint
Conference on Artificial Intelligence, Stockholm, Sweden, pp. 4323–4330
Seeger M (2004) Gaussian processes for machine learning. International Journal of Neural
Systems 14(2):69–104
Senge R, Bosner S, Dembczynski K, Haasenritter J, Hirsch O, Donner-Banzhoff N,
Hullermeier E (2014) Reliable classification: Learning classifiers that distinguish
aleatoric and epistemic uncertainty. Information Sciences 255:16–29
Sensoy M, Kaplan L, Kandemir M (2018) Evidential deep learning to quantify classification
uncertainty. In: Proc. NeurIPS, 32nd Conference on Neural Information Processing
Systems, Montreal, Canada
Shafer G (1976) A Mathematical Theory of Evidence. Princeton University Press
Shafer G, Vovk V (2008) A tutorial on conformal prediction. Journal of Machine Learning
Research pp. 371–421
Shaker M, Hullermeier E (2020) Aleatoric and epistemic uncertainty with random
forests. In: Proc. IDA 2020, 18th International Symposium on Intelligent Data Anal-
ysis, Springer, Konstanz, Germany, LNCS, vol 12080, pp. 444–456, DOI 10.1007/
978-3-030-44584-3\ 35
Shilkret N (1971) Maxitive measure and integration. Nederl Akad Wetensch Proc Ser A
74 = Indag Math 33:109–116
Smets P, Kennes R (1994) The transferable belief model. Artificial Intelligence 66:191–234
Sondhi JFA, Perry J, Simon N (2019) Selective prediction-set models with coverage guar-
antees. CoRR abs/1906.05473.v1, URL http://arxiv.org/abs/1906.05473
57
Sourati J, Akcakaya M, Erdogmus D, Leen T, Dy J (2018) A probabilistic active learning
algorithm based on Fisher information ratio. IEEE Transactions on Pattern Analysis
and Machine Intelligence 40(8)
Sugeno M (1974) Theory of fuzzy integrals and its application. PhD thesis, Tokyo Institute
of Technology
Tan M, Le Q (2019) EfficientNet: Rethinking model scaling for convolutional neural net-
works. In: Proc. ICML, 36th Int. Conference on Machine Learning, Long Beach, Cali-
fornia
Tax DM, Duin RP (2004) Support vector data description. Machine Learning 54(1):45–
66, DOI 10.1023/B:MACH.0000008084.60811.49, URL https://doi.org/10.1023/B:
MACH.0000008084.60811.49
Vapnik V (1998) Statistical Learning Theory. John Wiley & Sons
Varshney K (2016) Engineering safety in machine learning. In: Proc. Inf. Theory Appl.
Workshop, La Jolla, CA
Varshney K, Alemzadeh H (2016) On the safety of machine learning: Cyber-physical
systems, decision sciences, and data products. CoRR abs/1610.01256, URL http://
arxiv.org/abs/1610.01256
Vovk V, Gammerman A, Shafer G (2003) Algorithmic Learning in a Random World.
Springer-Verlag
Walley P (1991) Statistical Reasoning with Imprecise Probabilities. Chapman and Hall
Wasserman L (1990) Belief functions and statistical evidence. The Canadian Journal of
Statistics 18(3):183–196
Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural
Computation 8(7):1341–1390
Yager R (1983) Entropy and specificity in a mathematical theory of evidence. International
Journal of General Systems 9:249–260
Yang F, Wanga HZ, Mi H, de Lin C, Cai WW (2009) Using random forest for reliable
classification and cost-sensitive learning for medical diagnosis. BMC Bioinformatics 10
Yang G, Destercke S, Masson M (2017a) Cautious classification with nested dichotomies
and imprecise probabilities. Soft Computing 21:7447–7462
Yang G, Destercke S, Masson M (2017b) The costs of indeterminacy: How to determine
them? IEEE Transactions on Cybernetics 47:4316–4327
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees
and Naive Bayesian classifiers. In: Proc. ICML, Int. Conference on Machine Learning,
pp. 609–616
58
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass prob-
ability estimates. In: Proc. KDD–02, 8th International Conference on Knowledge Dis-
covery and Data Mining, Edmonton, Alberta, Canada, pp. 694–699
Zaffalon M (2002a) The naive credal classifier. Journal of Statistical Planning and Inference
105(1):5–21
Zaffalon M (2002b) The naive credal classifier. Journal of Statistical Planning and Inference
105(1):5–21
Zaffalon M, Giorgio C, Deratani Maua D (2012) Evaluating credal classifiers by utility-
discounted predictive accuracy. Int J Approx Reasoning 53:1282–1301
Ziyin L, Wang Z, Liang PP, Salakhutdinov R, Morency LP, Ueda M (2019) Deep gamblers:
Learning to abstain with portfolio theory. 1907.00208
59