nonmyopic -bayes-optimal active learning of gaussian...

9
Nonmyopic -Bayes-Optimal Active Learning of Gaussian Processes Trong Nghia Hoang NGHIAHT@COMP. NUS. EDU. SG Kian Hsiang Low LOWKH@COMP. NUS. EDU. SG Patrick Jaillet § JAILLET@MIT. EDU Mohan Kankanhalli MOHAN@COMP. NUS. EDU. SG Department of Computer Science, National University of Singapore, Republic of Singapore § Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA Abstract A fundamental issue in active learning of Gaussian processes is that of the exploration- exploitation trade-off. This paper presents a novel nonmyopic -Bayes-optimal active learn- ing (-BAL) approach that jointly and naturally optimizes the trade-off. In contrast, existing works have primarily developed myopic/greedy algorithms or performed exploration and ex- ploitation separately. To perform active learning in real time, we then propose an anytime algo- rithm based on -BAL with performance guaran- tee and empirically demonstrate using synthetic and real-world datasets that, with limited budget, it outperforms the state-of-the-art algorithms. 1. Introduction Active learning has become an increasingly important focal theme in many environmental sensing and monitoring ap- plications (e.g., precision agriculture, mineral prospecting (Low et al., 2007), monitoring of ocean and freshwater phe- nomena like harmful algal blooms (Dolan et al., 2009; Pod- nar et al., 2010), forest ecosystems, or pollution) where a high-resolution in situ sampling of the spatial phenomenon of interest is impractical due to prohibitively costly sam- pling budget requirements (e.g., number of deployed sen- sors, energy consumption, mission time): For such appli- cations, it is thus desirable to select and gather the most informative observations/data for modeling and predicting the spatially varying phenomenon subject to some budget constraints, which is the goal of active learning and also known as the active sensing problem. To elaborate, solving the active sensing problem amounts to deriving an optimal sequential policy that plans/decides the most informative locations to be observed for minimiz- Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy- right 2014 by the author(s). ing the predictive uncertainty of the unobserved areas of a phenomenon given a sampling budget. To achieve this, many existing active sensing algorithms (Cao et al., 2013; Chen et al., 2012; 2013b; Krause et al., 2008; Low et al., 2008; 2009; 2011; 2012; Singh et al., 2009) have modeled the phenomenon as a Gaussian process (GP), which allows its spatial correlation structure to be formally character- ized and its predictive uncertainty to be formally quantified (e.g., based on mean-squared error, entropy, or mutual in- formation criterion). However, they have assumed the spa- tial correlation structure (specifically, the parameters defin- ing it) to be known, which is often violated in real-world applications, or estimated crudely using sparse prior data. So, though they aim to select sampling locations that are optimal with respect to the assumed or estimated parame- ters, these locations tend to be sub-optimal with respect to the true parameters, thus degrading the predictive perfor- mance of the learned GP model. In practice, the spatial correlation structure of a phe- nomenon is usually not known. Then, the predictive per- formance of the GP modeling the phenomenon depends on how informative the gathered observations/data are for both parameter estimation as well as spatial prediction given the true parameters. Interestingly, as revealed in previous geostatistical studies (Martin, 2001; uller, 2007), poli- cies that are efficient for parameter estimation are not nec- essarily efficient for spatial prediction with respect to the true model. Thus, the active sensing problem involves a potential trade-off between sampling the most informative locations for spatial prediction given the current, possibly incomplete knowledge of the model parameters (i.e., ex- ploitation) vs. observing locations that gain more informa- tion about the parameters (i.e., exploration): How then does an active sensing algorithm trade off be- tween these two possibly conflicting sampling objectives? To tackle this question, one principled approach is to frame active sensing as a sequential decision problem that jointly and naturally optimizes the above exploration-exploitation

Upload: others

Post on 02-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

Trong Nghia Hoang† [email protected] Hsiang Low† [email protected] Jaillet§ [email protected] Kankanhalli† [email protected]†Department of Computer Science, National University of Singapore, Republic of Singapore§Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA

AbstractA fundamental issue in active learning ofGaussian processes is that of the exploration-exploitation trade-off. This paper presents anovel nonmyopic ✏-Bayes-optimal active learn-ing (✏-BAL) approach that jointly and naturallyoptimizes the trade-off. In contrast, existingworks have primarily developed myopic/greedyalgorithms or performed exploration and ex-ploitation separately. To perform active learningin real time, we then propose an anytime algo-rithm based on ✏-BAL with performance guaran-tee and empirically demonstrate using syntheticand real-world datasets that, with limited budget,it outperforms the state-of-the-art algorithms.

1. IntroductionActive learning has become an increasingly important focaltheme in many environmental sensing and monitoring ap-plications (e.g., precision agriculture, mineral prospecting(Low et al., 2007), monitoring of ocean and freshwater phe-nomena like harmful algal blooms (Dolan et al., 2009; Pod-nar et al., 2010), forest ecosystems, or pollution) where ahigh-resolution in situ sampling of the spatial phenomenonof interest is impractical due to prohibitively costly sam-pling budget requirements (e.g., number of deployed sen-sors, energy consumption, mission time): For such appli-cations, it is thus desirable to select and gather the mostinformative observations/data for modeling and predictingthe spatially varying phenomenon subject to some budgetconstraints, which is the goal of active learning and alsoknown as the active sensing problem.

To elaborate, solving the active sensing problem amountsto deriving an optimal sequential policy that plans/decidesthe most informative locations to be observed for minimiz-

Proceedings of the 31 st International Conference on MachineLearning, Beijing, China, 2014. JMLR: W&CP volume 32. Copy-right 2014 by the author(s).

ing the predictive uncertainty of the unobserved areas ofa phenomenon given a sampling budget. To achieve this,many existing active sensing algorithms (Cao et al., 2013;Chen et al., 2012; 2013b; Krause et al., 2008; Low et al.,2008; 2009; 2011; 2012; Singh et al., 2009) have modeledthe phenomenon as a Gaussian process (GP), which allowsits spatial correlation structure to be formally character-ized and its predictive uncertainty to be formally quantified(e.g., based on mean-squared error, entropy, or mutual in-formation criterion). However, they have assumed the spa-tial correlation structure (specifically, the parameters defin-ing it) to be known, which is often violated in real-worldapplications, or estimated crudely using sparse prior data.So, though they aim to select sampling locations that areoptimal with respect to the assumed or estimated parame-ters, these locations tend to be sub-optimal with respect tothe true parameters, thus degrading the predictive perfor-mance of the learned GP model.

In practice, the spatial correlation structure of a phe-nomenon is usually not known. Then, the predictive per-formance of the GP modeling the phenomenon depends onhow informative the gathered observations/data are for bothparameter estimation as well as spatial prediction giventhe true parameters. Interestingly, as revealed in previousgeostatistical studies (Martin, 2001; Muller, 2007), poli-cies that are efficient for parameter estimation are not nec-essarily efficient for spatial prediction with respect to thetrue model. Thus, the active sensing problem involves apotential trade-off between sampling the most informativelocations for spatial prediction given the current, possiblyincomplete knowledge of the model parameters (i.e., ex-ploitation) vs. observing locations that gain more informa-tion about the parameters (i.e., exploration):

How then does an active sensing algorithm trade off be-tween these two possibly conflicting sampling objectives?

To tackle this question, one principled approach is to frameactive sensing as a sequential decision problem that jointlyand naturally optimizes the above exploration-exploitation

Page 2: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

trade-off while maintaining a Bayesian belief over themodel parameters. This intuitively means a policy thatbiases towards observing informative locations for spatialprediction given the current model prior may be penalizedif it entails a highly dispersed posterior over the model pa-rameters. So, the resulting induced policy is guaranteed tobe optimal in the expected active sensing performance. Un-fortunately, such a nonmyopic Bayes-optimal policy cannotbe derived exactly due to an uncountable set of candidateobservations and unknown model parameters (Solomon &Zacks, 1970). As a result, most existing works (Diggle,2006; Houlsby et al., 2012; Park & Pillow, 2012; Zimmer-man, 2006; Ouyang et al., 2014) have circumvented thetrade-off by resorting to the use of myopic/greedy (hence,sub-optimal) policies.

To the best of our knowledge, the only notable nonmy-opic active sensing algorithm for GPs (Krause & Guestrin,2007) advocates tackling exploration and exploitation sep-arately, instead of jointly and naturally optimizing theirtrade-off, to sidestep the difficulty of solving the Bayesiansequential decision problem. Specifically, it performs aprobably approximately correct (PAC)-style explorationuntil it can verify that the performance loss of greedy ex-ploitation lies within a user-specified threshold. But, suchan algorithm is sub-optimal in the presence of budget con-straints due to the following limitations: (a) It is unclearhow an optimal threshold for exploration can be determinedgiven a sampling budget, and (b) even if such a thresholdis available, the PAC-style exploration is typically designedto satisfy a worst-case sample complexity rather than to beoptimal in the expected active sensing performance, thusresulting in an overly-aggressive exploration (Section 4.1).

This paper presents an efficient decision-theoretic planningapproach to nonmyopic active sensing/learning that canstill preserve and exploit the principled Bayesian sequentialdecision problem framework for jointly and naturally opti-mizing the exploration-exploitation trade-off (Section 3.1)and consequently does not incur the limitations of the al-gorithm of Krause & Guestrin (2007). In particular, al-though the exact Bayes-optimal policy to the active sens-ing problem cannot be derived (Solomon & Zacks, 1970),we show that it is in fact possible to solve for a nonmy-opic ✏-Bayes-optimal active learning (✏-BAL) policy (Sec-tions 3.2 and 3.3) given a user-defined bound ✏, which isthe main contribution of our work here. In other words, ourproposed ✏-BAL policy can approximate the optimal ex-pected active sensing performance arbitrarily closely (i.e.,within an arbitrary loss bound ✏). In contrast, the algorithmof Krause & Guestrin (2007) can only yield a sub-optimalperformance bound1. To meet the real-time requirement

1Its induced policy is guaranteed not to achieve worse than theoptimal performance by more than a factor of 1/e.

in time-critical applications, we then propose an asymp-totically ✏-optimal, branch-and-bound anytime algorithmbased on ✏-BAL with performance guarantee (Section 3.4).We empirically demonstrate using both synthetic and real-world datasets that, with limited budget, our proposed ap-proach outperforms state-of-the-art algorithms (Section 4).

2. Modeling Spatial Phenomena withGaussian Processes (GPs)

The GP can be used to model a spatial phenomenon of in-terest as follows: The phenomenon is defined to vary asa realization of a GP. Let X denote a set of sampling lo-cations representing the domain of the phenomenon suchthat each location x 2 X is associated with a realized (ran-dom) measurement z

x

(Zx

) if x is observed/sampled (un-observed). Let Z

X

, {Zx

}x2X

denote a GP, that is, everyfinite subset of Z

X

has a multivariate Gaussian distribution(Chen et al., 2013a; Rasmussen & Williams, 2006). TheGP is fully specified by its prior mean µ

x

, E[Zx

] andcovariance �

xx

0|�

, cov[Zx

, Z

x

0 |�] for all x, x0 2 X , thelatter of which characterizes the spatial correlation struc-ture of the phenomenon and can be defined using a covari-ance function parameterized by �. A common choice is thesquared exponential covariance function:

xx

0|�

, (�

s

)

2

exp

�1

2

P

X

i=1

[s

x

]

i

� [s

x

0]

i

`

i

2

!

+(�

n

)

2

xx

0

where [s

x

]

i

([s

x

0]

i

) is the i-th component of the P -dimensional feature vector s

x

(s

x

0), the set of realized pa-

rameters � ,�

n

,�

s

, `

1

, . . . , `

P

2 ⇤ are, respectively,the square root of noise variance, square root of signal vari-ance, and length-scales, and �

xx

0 is a Kronecker delta thatis 1 if x = x

0 and 0 otherwise.

Supposing � is known and a set zD

of realized measure-ments is available for some set D ⇢ X of observed lo-cations, the GP can exploit these observations to predictthe measurement for any unobserved location x 2 X \ Das well as provide its corresponding predictive uncertaintyusing the Gaussian predictive distribution p(z

x

|zD

,�) ⇠N (µ

x|D,�

,�

xx|D,�

) with the following posterior mean andvariance, respectively:

µ

x|D,�

, µ

x

+ ⌃

xD|�

�1

DD|�

(z

D

� µ

D

) (1)

xx|D,�

, �

xx|�

� ⌃

xD|�

�1

DD|�

Dx|�

(2)

where, with a slight abuse of notation, zD

is to be perceivedas a column vector in (1), µ

D

is a column vector with meancomponents µ

x

0 for all x0 2 D, ⌃xD|�

is a row vector withcovariance components �

xx

0|�

for all x0 2 D, ⌃Dx|�

is thetranspose of ⌃

xD|�

, and ⌃

DD|�

is a covariance matrix withcomponents �

ux

0|�

for all u, x0 2 D. When the spatial

Page 3: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

correlation structure (i.e., �) is not known, a probabilisticbelief b

D

(�) , p(�|zD

) can be maintained/tracked over allpossible � and updated using Bayes’ rule to the posteriorbelief b

D[{x}

(�) given a newly available measurement zx

:

b

D[{x}

(�) / p(z

x

|zD

,�) b

D

(�) . (3)

Using belief bD

, the predictive distribution p(z

x

|zD

) can beobtained by marginalizing out �:

p(z

x

|zD

) =

X

�2⇤

p(z

x

|zD

,�) b

D

(�) . (4)

3. Nonmyopic ✏-Bayes-Optimal ActiveLearning (✏-BAL)

3.1. Problem Formulation

To cast active sensing as a Bayesian sequential deci-sion problem, let us first define a sequential active sens-ing/learning policy ⇡ given a budget of N sampling loca-tions: Specifically, the policy ⇡ , {⇡

n

}Nn=1

is structuredto sequentially decide the next location ⇡

n

(z

D

) 2 X \ Dto be observed at each stage n based on the current ob-servations z

D

over a finite planning horizon of N stages.Recall from Section 1 that the active sensing problem in-volves planning/deciding the most informative locations tobe observed for minimizing the predictive uncertainty ofthe unobserved areas of a phenomenon. To achieve this, weuse the entropy criterion (Cover & Thomas, 1991) to mea-sure the informativeness and predictive uncertainty. Then,the value under a policy ⇡ is defined to be the joint entropyof its selected observations when starting with some priorobservations z

D0 and following ⇡ thereafter:

V

1

(z

D0) , H[Z

|zD0 ] , �

Z

p(z

|zD0) log p(z⇡|zD0) dz⇡

(5)where Z

(z⇡

) is the set of random (realized) measure-ments taken by policy ⇡ and p(z

|zD0) is defined in a sim-

ilar manner to (4).

To solve the active sensing problem, the notion of Bayes-optimality2 is exploited for selecting observations of largestpossible joint entropy with respect to all possible inducedsequences of future beliefs (starting from initial prior be-lief b

D0 ) over candidate sets of model parameters �, asdetailed next. Formally, this entails choosing a sequen-tial policy ⇡ to maximize V

1

(z

D0) (5), which we call theBayes-optimal active learning (BAL) policy ⇡

⇤. That is,V

1

(z

D0) , V

1

(z

D0) = max

V

1

(z

D0). When ⇡

⇤ isplugged into (5), the following N -stage Bellman equationsresult from the chain rule for entropy:

2Bayes-optimality is previously studied in reinforcementlearning whose developed theories (Poupart et al., 2006; Hoang& Low, 2013) cannot be applied here because their assumptionsof discrete-valued observations and Markov property do not hold.

V

n

(z

D

) = H⇥

Z

⇤n

(zD)

|zD

+E⇥

V

n+1

z

D

[ {Z⇡

⇤n

(zD)

}�|zD

= max

x2X\D

Q

n

(z

D

, x)

Q

n

(z

D

, x), H[Z

x

|zD

] + E⇥

V

n+1

(z

D

[ {Zx

})|zD

H[Z

x

|zD

], �Z

p(z

x

|zD

) log p(z

x

|zD

) dz

x

(6)

for stage n = 1, . . . , N where p(z

x

|zD

) is defined in (4)and the expectation terms are omitted from the right-handside (RHS) expressions of V ⇤

N

and Q

N

at stage N . At eachstage, the belief b

D

(�) is needed to compute Q

n

(z

D

, x)

in (6) and can be uniquely determined from initial priorbelief b

D0 and observations z

D\D0using (3). To under-

stand how the BAL policy ⇡

⇤ jointly and naturally opti-mizes the exploration-exploitation trade-off, its selected lo-cation ⇡

n

(z

D

) = argmax

x2X\D

Q

n

(z

D

, x) at each stagen affects both the immediate payoff H

Z

⇤n

(zD)

|zD

givencurrent belief b

D

(i.e., exploitation) as well as the posteriorbelief b

D[{⇡

⇤n

(zD)}

, the latter of which influences expectedfuture payoff E[V ⇤

n+1

(z

D

[ {Z⇡

⇤n

(zD)

})|zD

] and builds inthe information gathering option (i.e., exploration).

Interestingly, the work of Low et al. (2009) has revealedthat the above recursive formulation (6) can be perceived asthe sequential variant of the well-known maximum entropysampling problem (Shewry & Wynn, 1987) and establishedan equivalence result that the maximum-entropy observa-tions selected by ⇡

⇤ achieve a dual objective of minimizingthe posterior joint entropy (i.e., predictive uncertainty) re-maining in the unobserved locations of the phenomenon.Unfortunately, the BAL policy ⇡

⇤ cannot be derived ex-actly because the stage-wise entropy and expectation termsin (6) cannot be evaluated in closed form due to an uncount-able set of candidate observations and unknown model pa-rameters � (Section 1). To overcome this difficulty, weshow in the next subsection how it is possible to solve foran ✏-BAL policy ⇡

, that is, the joint entropy of its selectedobservations closely approximates that of ⇡⇤ within an ar-bitrary loss bound ✏ > 0.

3.2. ✏-BAL PolicyThe key idea underlying the design and construction of ourproposed nonmyopic ✏-BAL policy ⇡

✏ is to approximatethe entropy and expectation terms in (6) at every stage us-ing a form of truncated sampling to be described next:Definition 1 (⌧ -Truncated Observation) Define randommeasurement b

Z

x

by truncating Z

x

at �b⌧ and b⌧ as follows:

b

Z

x

,

8

<

:

�b⌧ if Zx

�b⌧ ,

Z

x

if � b⌧ < Z

x

< b⌧ ,

b⌧ if Zx

� b⌧ .

Then, b

Z

x

has a distribution of mixed type with its contin-uous component defined as f(

b

Z

x

= z

x

|zD

) , p(Z

x

=

z

x

|zD

) for �b⌧ < z

x

< b⌧ and its discrete component de-fined as f( bZ

x

= b⌧ |zD

) , P (Z

x

� b⌧ |zD

) =

R

1

b⌧ p(Z

x

=

Page 4: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

z

x

|zD

)dz

x

and f(

b

Z

x

= �b⌧ |zD

) , P (Z

x

�b⌧ |zD

) =

R

�b⌧�1

p(Z

x

= z

x

|zD

)dz

x

.

Let µ(D,⇤) , max

x2X\D,�2⇤

µ

x|D,�

, µ(D,⇤) ,min

x2X\D,�2⇤

µ

x|D,�

, and

b⌧ , max

µ(D,⇤)� ⌧

, |µ(D,⇤) + ⌧ |� (7)

for some 0 ⌧ b⌧ . Then, a realized measurement of bZx

is said to be a ⌧ -truncated observation for location x.

Specifically, given that a set zD

of realized measurements isavailable, a finite set of S ⌧ -truncated observations {zi

x

}Si=1

can be generated for every candidate location x 2 X \ Dat each stage n by identically and independently samplingfrom p(z

x

|zD

) (4) and then truncating each of them ac-cording to z

i

x

z

i

x

min

�|zix

|, b⌧�/|zix

|. These generated ⌧ -truncated observations can be exploited for approximatingV

n

(6) through the following Bellman equations:

V

n

(z

D

), max

x2X\D

Q

n

(z

D

, x)

Q

n

(z

D

, x), 1

S

S

X

i=1

� log p

z

i

x

|zD

+ V

n+1

z

D

[ �zix

(8)

for stage n = 1, . . . , N such that there is no V

N+1

term on the RHS expression of Q

N

at stage N . Likethe BAL policy ⇡

⇤ (Section 3.1), the location ⇡

n

(z

D

) =

argmax

x2X\D

Q

n

(z

D

, x) selected by our ✏-BAL policy⇡

✏ at each stage n also jointly and naturally optimizes thetrade-off between exploitation (i.e., by maximizing imme-diate payoff S�1

P

S

i=1

� log p(z

i

n

(zD)

|zD

) given the cur-rent belief b

D

) vs. exploration (i.e., by improving poste-rior belief b

D[{⇡

n

(zD)}

to maximize average future payoffS

�1

P

S

i=1

V

n+1

(z

D

[ {zi⇡

n

(zD)

})). Unlike the determinis-tic BAL policy ⇡

⇤, our ✏-BAL policy ⇡

✏ is stochastic dueto its use of the above truncated sampling procedure.

3.3. Theoretical AnalysisThe main difficulty in analyzing the active sensing per-formance of our stochastic ✏-BAL policy ⇡

✏ (i.e., relativeto that of BAL policy ⇡

⇤) lies in determining how its ✏-Bayes optimality can be guaranteed by choosing appropri-ate values of the truncated sampling parameters S and ⌧

(Section 3.2). To achieve this, we have to formally under-stand how S and ⌧ can be specified and varied in terms ofthe user-defined loss bound ✏, budget of N sampling lo-cations, domain size |X | of the phenomenon, and proper-ties/parameters characterizing the spatial correlation struc-ture of the phenomenon (Section 2), as detailed below.

The first step is to show that Q✏

n

(8) is in fact a good approx-imation of Q⇤

n

(6) for some chosen values of S and ⌧ . Thereare two sources of error arising in such an approximation:(a) In the truncated sampling procedure (Section 3.2), onlya finite set of ⌧ -truncated observations is generated for ap-proximating the stage-wise entropy and expectation terms

in (6), and (b) computing Q

n

does not involve utilizing thevalues of V ⇤

n+1

but that of its approximation V

n+1

instead.To facilitate capturing the error due to finite truncated sam-pling described in (a), the following intermediate functionis introduced:

W

n

(z

D

, x) , 1

S

S

X

i=1

� log p

z

i

x

|zD

+ V

n+1

z

D

[ {zix

}�

(9)for stage n = 1, . . . , N such that there is no V

N+1

term onthe RHS expression of W ⇤

N

at stage N . The first lemma be-low reveals that if the error |Q⇤

n

(z

D

, x) �W

n

(z

D

, x)| dueto finite truncated sampling can be bounded for all tuples(n, z

D

, x) generated at stage n = n

0

, . . . , N by (8) to com-pute V

n

0 for 1 n

0 N , then Q

n

0 (8) can approximateQ

n

0 (6) arbitrarily closely:Lemma 1 Suppose that a set z

D

0 of observations, a budgetof N�n0

+1 sampling locations for 1 n

0 N , S 2 Z+,and � > 0 are given. If

|Q⇤

n

(z

D

, x)�W

n

(z

D

, x)| � (10)

for all tuples (n, zD

, x) generated at stage n = n

0

, . . . , N

by (8) to compute V

n

0(zD

0), then, for all x0 2 X \ D0,

|Q⇤

n

0(zD

0, x

0

)�Q

n

0(zD

0, x

0

)| (N � n

0

+ 1)� . (11)

Its proof is given in Appendix A.1. The next two lemmasshow that, with high probability, the error |Q⇤

n

(z

D

, x) �W

n

(z

D

, x)| due to finite truncated sampling can indeed bebounded from above by � (10) for all tuples (n, z

D

, x) gen-erated at stage n = n

0

, . . . , N by (8) to compute V

n

0 for1 n

0 N :Lemma 2 Suppose that a set z

D

0 of observations, a budgetof N�n0

+1 sampling locations for 1 n

0 N , S 2 Z+,and � > 0 are given. For all tuples (n, z

D

, x) generated atstage n = n

0

, . . . , N by (8) to compute V

n

0(zD

0),

P

|Q⇤

n

(z

D

, x)�W

n

(z

D

, x)| �

� 1�2 exp✓

�2S�

2

T

2

where T ,O✓

N

2

2N

2

2

n

+N log

o

n

+log |⇤|◆

by setting

⌧ =O0

@

o

s

log

2

o

N

2

2N

+�

2

o

2

n

+N log

o

n

+log |⇤|◆◆

1

A

with , �2

n

, and �

2

o

defined as follows:, 1 + 2 max

x

0,u2X\D:x

06=u,�2⇤,D

x

0u|D,�

/�

uu|D,�

, (12)

2

n

,min

�2⇤

(�

n

)

2

, and �

2

o

, max

�2⇤

(�

s

)

2

+ (�

n

)

2

. (13)

Refer to Appendix A.2 for its proof.

Remark 1. Deriving such a probabilistic bound in Lemma 2typically involves the use of concentration inequalities forthe sum of independent bounded random variables like theHoeffding’s, Bennett’s, or Bernstein’s inequalities. How-ever, since the originally Gaussian distributed observations

Page 5: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

are unbounded, sampling from p(z

x

|zD

) (4) without trun-cation will generate unbounded versions of {zi

x

}Si=1

andconsequently make each summation term � log p(z

i

x

|zD

)+

V

n+1

(z

D

[ {zix

}) on the RHS expression of W

n

(9) un-bounded, hence invalidating the use of these concentrationinequalities. To resolve this complication, our trick is toexploit the truncated sampling procedure (Section 3.2) togenerate bounded ⌧ -truncated observations (Definition 1)(i.e., |zi

x

| b⌧ for i = 1, . . . , S), thus resulting in eachsummation term � log p(z

i

x

|zD

) + V

n+1

(z

D

[ {zix

}) beingbounded (Appendix A.2). This enables our use of Hoeffd-ing’s inequality to derive the probabilistic bound.

Remark 2. It can be observed from Lemma 2 that theamount of truncation has to be reduced (i.e., higher cho-sen value of ⌧ ) when (a) a tighter bound � on the error|Q⇤

n

(z

D

, x)�W

n

(z

D

, x)| due to finite truncated samplingis desired, (b) a greater budget of N sampling locationsis available, (c) a larger space ⇤ of candidate model pa-rameters is preferred, (d) the spatial phenomenon varieswith more intensity and less noise (i.e., assuming all can-didate signal and noise variance parameters, respectively,(�

s

)

2 and (�

n

)

2 are specified close to the true large sig-nal and small noise variances), and (e) its spatial corre-lation structure yields a bigger . To elaborate on (e),note that Lemma 2 still holds for any value of largerthan that set in (12): Since |�

x

0u|D,�

|2 �

x

0x

0|D,�

uu|D,�

for all x0 6= u 2 X \ D due to the symmetric positive-definiteness of ⌃

(X\D)(X\D)|D,�

, can be set to 1 +

2max

x

0,u2X\D,�2⇤,D

p

x

0x

0|D,�

/�

uu|D,�

. Then, sup-posing all candidate length-scale parameters are specifiedclose to the true length-scales, a phenomenon with extremelength-scales tending to 0 (i.e., with white-noise processmeasurements) or 1 (i.e., with constant measurements)will produce highly similar �

x

0x

0|D,�

for all x0 2 X \ D,thus resulting in smaller and hence smaller ⌧ .

Remark 3. Alternatively, it can be proven that Lemma 2and the subsequent results hold by setting = 1 if a cer-tain structural property of the spatial correlation structure(i.e., for all z

D

(D ✓ X ) and � 2 ⇤, ⌃DD|�

is diagonallydominant) is satisfied, as shown in Lemma 9 (Appendix B).Consequently, the term can be removed from T and ⌧ .

Lemma 3 Suppose that a set zD

0 of observations, a bud-get of N � n

0

+ 1 sampling locations for 1 n

0 N ,S 2 Z+, and � > 0 are given. The probability that|Q⇤

n

(z

D

, x) � W

n

(z

D

, x)| � (10) holds for all tuples(n, z

D

, x) generated at stage n = n

0

, . . . , N by (8) to com-pute V

n

0(zD

0) is at least 1 � 2(S|X |)N exp(�2S�

2

/T

2

)

where T is previously defined in Lemma 2.

Its proof is found in Appendix A.3. The first step is con-cluded with our first main result, which follows from Lem-mas 1 and 3. Specifically, it chooses the values of S and⌧ such that the probability of Q

n

(8) approximating Q

n

(6) poorly (i.e., |Q⇤

n

(z

D

, x) � Q

n

(z

D

, x)| > N�) can bebounded from above by a given 0 < � < 1:Theorem 1 Suppose that a set z

D

of observations, a bud-get of N � n + 1 sampling locations for 1 n N ,� > 0, and 0 < � < 1 are given. The probability that|Q⇤

n

(z

D

, x) � Q

n

(z

D

, x)| N� holds for all x 2 X \ Dis at least 1� � by setting

S = O✓

T

2

2

N log

N |X |T 2

2

+log

1

◆◆

where T is previously defined in Lemma 2. By assuming N ,|⇤|, �

o

, �n

, , and |X | as constants, ⌧ = O(

p

log(1/�))

and hence S = O

(log (1/�))

2

2

log

log (1/�)

��

!

.

Its proof is provided in Appendix A.4.

Remark. It can be observed from Theorem 1 that the num-ber of generated ⌧ -truncated observations has to be in-creased (i.e., higher chosen value of S) when (a) a lowerprobability � of Q✏

n

(8) approximating Q

n

(6) poorly (i.e.,|Q⇤

n

(z

D

, x)�Q

n

(z

D

, x)| > N�) is desired, and (b) a largerdomain X of the phenomenon is given. The influence of �,N , |⇤|, �

o

, �n

, and on S is similar to that on ⌧ , as previ-ously reported in Remark 2 after Lemma 2.

Thus far, we have shown in the first step that, with highprobability, Q✏

n

(8) approximates Q⇤

n

(6) arbitrarily closelyfor some chosen values of S and ⌧ (Theorem 1). The nextstep uses this result to probabilistically bound the perfor-mance loss in terms of Q⇤

n

by observing location ⇡

n

(z

D

)

selected by our ✏-BAL policy ⇡

✏ at stage n and followingthe BAL policy ⇡

⇤ thereafter:Lemma 4 Suppose that a set z

D

of observations, a bud-get of N � n + 1 sampling locations for 1 n N ,� > 0, and 0 < � < 1 are given. Q

n

(z

D

,⇡

n

(z

D

)) �Q

n

(z

D

,⇡

n

(z

D

)) 2N� holds with probability at least1� � by setting S and ⌧ according to that in Theorem 1.

See Appendix A.5 for its proof. The final step extendsLemma 4 to obtain our second main result. In particular,it bounds the expected active sensing performance loss ofour stochastic ✏-BAL policy ⇡

✏ relative to that of BAL pol-icy ⇡

⇤, that is, policy ⇡

✏ is ✏-Bayes-optimal:

Theorem 2 Given a set zD0 of prior observations, a bud-

get of N sampling locations, and ✏ > 0, V

1

(z

D0) �E⇡

[V

1

(z

D0)] ✏ by setting and substituting � =

✏/(4N

2

) and � = ✏/(2N(N log(�

o

/�

n

) + log |⇤|)) intoS and ⌧ in Theorem 1 to give ⌧ = O(

p

log(1/✏)) and

S = O

(log (1/✏))

2

2

log

log (1/✏)

2

!

.

Its proof is given in Appendix A.6.

Remark 1. The number of generated ⌧ -truncated observa-tions and the amount of truncation have to be, respectively,

Page 6: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

increased and reduced (i.e., higher chosen values of S and⌧ ) when a tighter user-defined loss bound ✏ is desired.

Remark 2. The deterministic BAL policy ⇡

⇤ is Bayes-optimal among all candidate stochastic policies ⇡ sinceE⇡

[V

1

(z

D0)] V

1

(z

D0), as proven in Appendix A.7.

3.4. Anytime ✏-BAL (h↵, ✏i-BAL) Algorithm

Unlike the BAL policy ⇡

⇤, our ✏-BAL policy ⇡

✏ can be de-rived exactly because its time complexity is independent ofthe size of the set of all possible originally Gaussian dis-tributed observations, which is uncountable. But, the costof deriving ⇡

✏ is exponential in the length N of planninghorizon since it has to compute the values V

n

(z

D

) (8) forall (S|X |)N possible states (n, z

D

). To ease this computa-tional burden, we propose an anytime algorithm based on✏-BAL that can produce a good policy fast and improve itsapproximation quality over time, as discussed next.

The key intuition behind our anytime ✏-BAL algorithm(h↵, ✏i-BAL of Algo. 1) is to focus the simulation of greedyexploration paths through the most uncertain regions of thestate space (i.e., in terms of the values V ✏

n

(z

D

)) instead ofevaluating the entire state space like ⇡

✏. To achieve this,our h↵, ✏i-BAL algorithm maintains both lower and upperheuristic bounds (respectively, V

n

(z

D

) and V

n

(z

D

)) foreach encountered state (n, z

D

), which are exploited for rep-resenting the uncertainty of its corresponding value V ✏

n

(z

D

)

to be used in turn for guiding the greedy exploration (or,put differently, pruning unnecessary, bad exploration of thestate space while still guaranteeing policy optimality).

To elaborate, each simulated exploration path (EXPLOREof Algo. 1) repeatedly selects a sampling location x andits corresponding ⌧ -truncated observation z

i

x

at every stageuntil the last stage N is reached. Specifically, at each stagen of the simulated path, the next states (n+ 1, z

D

[ {zix

})with uncertainty |V ✏

n+1

(z

D

[{zix

})�V ✏

n+1

(z

D

[{zix

})| ex-ceeding ↵ (line 6) are identified (lines 7-8), among whichthe one with largest lower bound V

n+1

(z

D

[ {zix

}) (line10) is prioritized/selected for exploration (if more than oneexists, ties are broken by choosing the one with most uncer-tainty, that is, largest upper bound V

n+1

(z

D

[ {zix

}) (line11)) while the remaining unexplored ones are placed in theset U (line 12) to be considered for future exploration (lines3-6 in h↵, ✏i-BAL). So, the simulated path terminates if theuncertainty of every next state is at most ↵; the uncertaintyof a state at the last stage N is guaranteed to be zero (14).

Then, the algorithm backtracks up the path to up-date/tighten the bounds of previously visited states (line 7

in h↵, ✏i-BAL and line 14 in EXPLORE) as follows:

V

n

(z

D

) min

V

n

(z

D

), max

x2X\D

Q

n

(z

D

, x)

V

n

(z

D

) max

V

n

(z

D

), max

x2X\D

Q

n

(z

D

, x)

◆ (14)

Algorithm 1 h↵, ✏i-BAL(zD0)

h↵, ✏i-BAL�zD0

1: U

�(1, zD0 )

2: while |V

1

�zD0

�� V

1

�zD0

�| > ↵ do

3: V argmax(n,zD)2U V

n

(zD)

4: �n

0, zD0

� argmax(n,zD)2V V

n

(zD)

5: U U \

��n

0, zD0

6: EXPLORE(n0, zD0 ,U) /⇤ U is passed by reference ⇤/

7: UPDATE(n0, zD0 )

8: return ⇡

h↵,✏i1

�zD0

� argmax

x2X\D0Q

1(zD0 , x)

EXPLORE(n, zD,U)

1: T ;

2: for all x 2 X \ D do3: {z

i

x

}

S

i=1 sample from p(z

x

|zD) (4)4: for i = 1, . . . , S do5: z

i

x

z

i

x

min

�|z

i

x

|, b⌧�/|z

i

x

|

6: if |V ✏

n+1

�zD [

�z

i

x

�� V

n+1

�zD [

�z

i

x

�| > ↵ then

7: T T [

��n + 1, zD [

�z

i

x

8: parent�n + 1, zD [

�z

i

x

� (n, zD)

9: if |T | > 0 then10: V argmax(n+1,zD[{zi

x

})2T V

n+1

�zD [

�z

i

x

11: (n + 1, zD0 ) argmax(n+1,zD[{zix

})2VV

n+1

�zD [

�z

i

x

12: U U [ (T \ {(n + 1, zD0 )})13: EXPLORE(n + 1, zD0 ,U)

14:Update V

n

(zD) and V

n

(zD) using (14)

UPDATE(n, zD)

1: Update V

n

(zD) and V

n

(zD) using (14)2: if n > 1 then3: (n� 1, zD0 ) parent(n, zD)

4: UPDATE(n� 1, zD0 )

Q

n

(z

D

, x), 1

S

S

X

i=1

� log p

z

i

x

|zD

+ V

n+1

z

D

[ �zix

Q

n

(z

D

, x), 1

S

S

X

i=1

� log p

z

i

x

|zD

+ V

n+1

z

D

[ �zix

for stage n = 1, . . . , N such that there is no V

N+1

(V

N+1

) term on the RHS expression of Q✏

N

(Q

N

) at stageN . When the planning time runs out, we provide thegreedy policy induced by the lower bound: ⇡h↵,✏i

1

(z

D0) ,argmax

x2X\D0Q

1

(z

D0 , x) (line 8 in h↵, ✏i-BAL).

Central to the anytime performance of our h↵, ✏i-BALalgorithm is the computational efficiency of deriving in-formed initial heuristic bounds V ✏

n

(z

D

) and V

n

(z

D

) whereV

n

(z

D

) V

n

(z

D

) V

n

(z

D

). Due to lack of space,we have shown in Appendix A.8 how they can be derivedefficiently. We have also derived a theoretical guaranteesimilar to that of Theorem 2 on the expected active sens-ing performance of our h↵, ✏i-BAL policy ⇡

h↵,✏i, as shownin Appendix A.9. We have analyzed the time complexityof simulating k exploration paths in our h↵, ✏i-BAL algo-rithm to be O(kNS|X |(|⇤|(N3

+ |X |N2

+ S|X |) +�+

log(kNS|X |))) (Appendix A.10) where O(�) denotes thecost of initializing the heuristic bounds at each state. Inpractice, h↵, ✏i-BAL’s planning horizon can be shortened toreduce its computational cost further by limiting the depthof each simulated path to strictly less than N . In that case,

Page 7: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

although the resulting ⇡

h↵,✏i’s performance has not beentheoretically analyzed, Section 4.1 demonstrates empiri-cally that it outperforms the state-of-the-art algorithms.

4. Experiments and DiscussionThis section evaluates the active sensing performance andtime efficiency of our h↵, ✏i-BAL policy ⇡

h↵,✏i (Sec-tion 3) empirically under limited sampling budget usingtwo datasets featuring a simple, simulated spatial phe-nomenon (Section 4.1) and a large-scale, real-world trafficphenomenon (i.e., speeds of road segments) over an urbanroad network (Section 4.2). All experiments are run on aMac OS X machine with Intel Core i7 at 2.66 GHz.

4.1. Simulated Spatial Phenomenon

The domain of the phenomenon is discretized into a finiteset of sampling locations X = {0, 1, . . . , 99}. The phe-nomenon is a realization of a GP (Section 2) parameterizedby �

= {��

n

= 0.25,�

s

= 10.0, `

⇤= 1.0}. For sim-

plicity, we assume that ��

n

and �

s

are known, but thetrue length-scale `

⇤= 1 is not. So, a uniform prior belief

b

D0=;

is maintained over a set L = {1, 6, 9, 12, 15, 18, 21}of 7 candidate length-scales `

�. Using root mean squaredprediction error (RMSPE) as the performance metric, theperformance of our h↵, ✏i-BAL policies ⇡

h↵,✏i with plan-ning horizon length N

0

= 2, 3 and ↵ = 1.0 are com-pared to that of the state-of-the-art GP-based active learn-ing algorithms: (a) The a priori greedy design (APGD)policy (Shewry & Wynn, 1987) iteratively selects andadds argmax

x2X\S

n

P

�2⇤

b

D0(�)H[Z

S

n

[{x}

|zD0 ,�] to

the current set Sn

of sampling locations (where S0

= ;)until S

N

is obtained, (b) the implicit exploration (IE) pol-icy greedily selects and observes sampling location x

IE=

argmax

x2X\D

P

�2⇤

b

D

(�)H[Z

x

|zD

,�] and updates thebelief from b

D

to b

D[{x

IE}

over L; if the upper bound onthe performance advantage of using ⇡

⇤ over APGD pol-icy is less than a pre-defined threshold, it will use APGDwith the remaining sampling budget, and (c) the explicitexploration via independent tests (ITE) policy performs aPAC-based binary search, which is guaranteed to find `

with high probability, and then uses APGD to select theremaining locations to be observed.

Both nonmyopic IE and ITE policies are proposed byKrause & Guestrin (2007): IE is reported to incur the low-est prediction error empirically while ITE is guaranteed notto achieve worse than the optimal performance by morethan a factor of 1/e. Fig. 1a shows results of the activesensing performance of the tested policies averaged over 20realizations of the phenomenon drawn independently fromthe underlying GP model described earlier. It can be ob-served that the RMSPE of every tested policy decreaseswith a larger budget of N sampling locations. Notably,our h↵, ✏i-BAL policies perform better than the APGD, IE,

5 10 15 20 258.5

9

9.5

10

10.5

11

11.5

12

Budget of N sampling locations

RMSPE

APGDIEITEπ ⟨α,ϵ⟩ (N’ = 2)π ⟨α,ϵ⟩ (N’ = 3)

0 200 400 600 8000

1

2

3

4

5 x 105

No. of simulated paths

Tim

e(m

s)

0 200 400 600 800

12

13

14

15

16

17

18

19

No. of simulated paths

Entrop

y(n

at)

V ϵ1(zD)

Vϵ1(zD)

(a) (b) (c)Figure 1. Graphs of (a) RMSPE of APGD, IE, ITE, and h↵, ✏i-BAL policies with planning horizon length N 0 = 2, 3 vs. budgetof N sampling locations, (b) stage-wise online processing cost ofh↵, ✏i-BAL policy with N 0 = 3 and (c) gap between V

✏1(zD0)

and V ✏1(zD0) vs. number of simulated paths.

and ITE policies, especially when N is small. The perfor-mance gap between our h↵, ✏i-BAL policies and the otherpolicies decreases as N increases, which intuitively meansthat, with a tighter sampling budget (i.e., smaller N ), it ismore critical to strike a right balance between explorationand exploitation.

Fig. 2 shows the stage-wise sampling designs produced bythe tested policies with a budget of N = 15 samplinglocations. It can be observed that our h↵, ✏i-BAL policyachieves a better balance between exploration and exploita-tion and can therefore discern `

⇤much faster than the IE

and ITE policies while maintaining a fine spatial coverageof the phenomenon. This is expected due to the followingissues faced by IE and ITE policies: (a) The myopic explo-ration of IE tends not to observe closely-spaced locations(Fig. 2a), which are in fact informative towards estimat-ing the true length-scale, and (b) despite ITE’s theoreticalguarantee in finding `

⇤, its PAC-style exploration is too

aggressive, hence completely ignoring how informative theposterior belief b

D

over L is during exploration. This leadsto a sub-optimal exploration behavior that reserves too lit-tle budget for exploitation and consequently entails a poorspatial coverage, as shown in Fig. 2b.

Our h↵, ✏i-BAL policy can resolve these issues by jointlyand naturally optimizing the trade-off between observingthe most informative locations for minimizing the predic-tive uncertainty of the phenomenon (i.e., exploitation) vs.the uncertainty surrounding its length-scale (i.e., explo-ration), hence enjoying the best of both worlds (Fig. 2c). Infact, we notice that, after observing 5 locations, our h↵, ✏i-BAL policy can focus 88.10% of its posterior belief on`

⇤while IE only assigns, on average, about 18.65% of its

posterior belief on `

⇤, which is hardly more informative

than the prior belief bD0(`

⇤) = 1/7 ⇡ 14.28%. Finally,

Fig. 1b shows that the online processing cost of h↵, ✏i-BALper sampling stage grows linearly in the number of sim-ulated paths while Fig. 1c reveals that its approximationquality improves (i.e., gap between V

1

(z

D0) and V

1

(z

D0)

decreases) with increasing number of simulated paths. In-terestingly, it can be observed from Fig. 1c that althoughh↵, ✏i-BAL needs about 800 simulated paths (i.e., 400 s)to close the gap between V

1

(z

D0) and V

1

(z

D0), V✏

1

(z

D0)

Page 8: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

0 10 20 30 40 50 60 70 80 90 100Spatial Field

Stages

1 - 5

1 - 9

1 - 13

1 - 15

0 10 20 30 40 50 60 70 80 90 100Spatial Field

Exploration: Find the true length-scale by binary search

Exploitation: Too little remaining budget entails poor coverage

Stages

1 - 5

1 - 9

1 - 13

1 - 15

0 10 20 30 40 50 60 70 80 90 100Spatial Field

Exploration: Observing closely-spaced locations to identify true length-scale

Exploitation: Sufficient budget remaining to achieve even coverage

Stages

1 - 4

1 - 8

1 - 12

1 - 15

(a) IE policy (b) ITE policy (c) h↵, ✏i-BAL policy

Figure 2. Stage-wise sampling designs produced by (a) IE, (b) ITE, and (c) h↵, ✏i-BAL policy with a planning horizon length N 0 = 3using a budget of N = 15 sampling locations. The final sampling designs are depicted in the bottommost rows of the figures.

only takes about 100 simulated paths (i.e., 50 s). This im-plies the actual computation time needed for h↵, ✏i-BAL toreach V

1

(z

D0) (via its lower bound V

1

(z

D0)) is much lessthan that required to verify the convergence of V ✏

1

(z

D0) toV

1

(z

D0) (i.e., by checking the gap). This is expected sinceh↵, ✏i-BAL explores states with largest lower bound first(Section 3.4).

4.2. Real-World Traffic PhenomenonFig. 3a shows the traffic phenomenon (i.e., speeds (km/h)of road segments) over an urban road network X compris-ing 775 road segments (e.g., highways, arterials, slip roads,etc.) in Tampines area, Singapore during lunch hours onJune 20, 2011. The mean speed is 52.8 km/h and the stan-dard deviation is 21.0 km/h. Each road segment x 2 Xis specified by a 4-dimensional vector of features: length,number of lanes, speed limit, and direction. The phe-nomenon is modeled as a relational GP (Chen et al., 2012)whose correlation structure can exploit both the road seg-ment features and road network topology information. Thetrue parameters �⇤

= {��

n

,�

s

, `

⇤} are set as the maxi-mum likelihood estimates learned using the entire dataset.We assume that ��

n

and �

s

are known, but `�⇤

is not.So, a uniform prior belief b

D0=;

is maintained over a setL = {`�i}6

i=0

of 7 candidate length-scales `�0= `

⇤and

`

i

= 2(i+ 1)`

⇤for i = 1, . . . , 6.

The performance of our h↵, ✏i-BAL policies with planninghorizon length N

0

= 3, 4, 5 are compared to that of APGDand IE policies (Section 4.1) by running each of them ona mobile probe to direct its active sensing along a pathof adjacent road segments according to the road networktopology; ITE cannot be used here as it requires observ-ing road segments separated by a pre-computed distanceduring exploration (Krause & Guestrin, 2007), which vio-lates the topological constraints of the road network sincethe mobile probe cannot “teleport”. Fig. 3 shows resultsof the tested policies averaged over 5 independent runs: Itcan be observed from Fig. 3b that our h↵, ✏i-BAL policiesoutperform APGD and IE policies due to their nonmyopicexploration behavior. In terms of the total online process-ing cost, Fig. 3c shows that h↵, ✏i-BAL incurs < 4.5 hoursgiven a budget of N = 240 road segments, which can beafforded by modern computing power. To illustrate the be-havior of each policy, Figs. 3d-f show, respectively, theroad segments observed (shaded in black) by the mobile

8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 90003150

3200

3250

3300

3350

3400

3450

3500

3550

0

16

32

48

64

80

96

60 120 180 24015

16

17

18

19

20

21

22

Budget of N road segments

RMSP

E(km/h

)

APGDIEπ ⟨α ,ϵ⟩ (N’ = 3)π ⟨α ,ϵ⟩ (N’ = 4)π ⟨α ,ϵ⟩ (N’ = 5)

60 120 180 240102

104

106

108

Budget of N road segments

Tim

e(m

s)

π ⟨α ,ϵ⟩ (N’ = 3)π ⟨α ,ϵ⟩ (N’ = 4)π ⟨α ,ϵ⟩ (N’ = 5)

(a) (b) (c)

8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 90003150

3200

3250

3300

3350

3400

3450

3500

3550

0

16

32

48

64

80

96

8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 90003150

3200

3250

3300

3350

3400

3450

3500

3550

0

16

32

48

64

80

96

8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 90003150

3200

3250

3300

3350

3400

3450

3500

3550

0

16

32

48

64

80

96

(d) (e) (f)Figure 3. (a) Traffic phenomenon (i.e., speeds (km/h) of road seg-ments) over an urban road network in Tampines area, Singapore,graphs of (b) RMSPE of APGD, IE, and h↵, ✏i-BAL policies withhorizon length N 0 = 3, 4, 5 and (c) total online processing cost ofh↵, ✏i-BAL policies with N 0 = 3, 4, 5 vs. budget of N segments,and (d-f) road segments observed (shaded in black) by respectiveAPGD, IE, and h↵, ✏i-BAL policies (N 0 = 5) with N = 60.

probe running APGD, IE, and h↵, ✏i-BAL policies withN

0

= 5 given a budget of N = 60. It can be observedfrom Figs. 3d-e that both APGD and IE cause the probe tomove away from the slip roads and highways to low-speedsegments whose measurements vary much more smoothly;this is expected due to their myopic exploration behavior.In contrast, h↵, ✏i-BAL nonmyopically plans the probe’spath and can thus direct it to observe the more informativeslip roads and highways with highly varying measurements(Fig. 3f) to achieve better performance.

5. ConclusionThis paper describes and theoretically analyzes an ✏-BALapproach to nonmyopic active learning of GPs that canjointly and naturally optimize the exploration-exploitationtrade-off. We then provide an anytime h↵, ✏i-BAL algo-rithm based on ✏-BAL with real-time performance guaran-tee and empirically demonstrate using synthetic and real-world datasets that, with limited budget, it outperforms thestate-of-the-art GP-based active learning algorithms.

Acknowledgments. This work was supported by Singa-pore National Research Foundation in part under its In-ternational Research Center @ Singapore Funding Initia-tive and administered by the Interactive Digital Media Pro-gramme Office and in part through the Singapore-MIT Al-liance for Research and Technology Subaward AgreementNo. 52.

Page 9: Nonmyopic -Bayes-Optimal Active Learning of Gaussian …proceedings.mlr.press/v32/hoang14.pdfProceedings of the 31st International Conference on Machine Learning,Beijing,China,2014

Nonmyopic ✏-Bayes-Optimal Active Learning of Gaussian Processes

ReferencesCao, N., Low, K. H., and Dolan, J. M. Multi-robot infor-

mative path planning for active sensing of environmentalphenomena: A tale of two algorithms. In Proc. AAMAS,pp. 7–14, 2013.

Chen, J., Low, K. H., Tan, C. K.-Y., Oran, A., Jaillet, P.,Dolan, J. M., and Sukhatme, G. S. Decentralized datafusion and active sensing with mobile sensors for mod-eling and predicting spatiotemporal traffic phenomena.In Proc. UAI, pp. 163–173, 2012.

Chen, J., Cao, N., Low, K. H., Ouyang, R., Tan, C. K.-Y.,and Jaillet, P. Parallel Gaussian process regression withlow-rank covariance matrix approximations. In Proc.UAI, pp. 152–161, 2013a.

Chen, J., Low, K. H., and Tan, C. K.-Y. Gaussian process-based decentralized data fusion and active sensing formobility-on-demand system. In Proc. RSS, 2013b.

Cover, T. and Thomas, J. Elements of Information Theory.John Wiley & Sons, NY, 1991.

Diggle, P. J. Bayesian geostatistical design. Scand. J.Statistics, 33(1):53–64, 2006.

Dolan, J. M., Podnar, G., Stancliff, S., Low, K. H., Elfes,A., Higinbotham, J., Hosler, J. C., Moisan, T. A., andMoisan, J. Cooperative aquatic sensing using the tele-supervised adaptive ocean sensor fleet. In Proc. SPIEConference on Remote Sensing of the Ocean, Sea Ice,and Large Water Regions, volume 7473, 2009.

Hoang, T. N. and Low, K. H. A general framework for in-teracting Bayes-optimally with self-interested agents us-ing arbitrary parametric model and model prior. In Proc.IJCAI, pp. 1394–1400, 2013.

Houlsby, N., Hernandez-Lobato, J. M., Huszar, F., andGhahramani, Z. Collaborative Gaussian processes forpreference learning. In Proc. NIPS, pp. 2105–2113,2012.

Krause, A. and Guestrin, C. Nonmyopic active learningof Gaussian processes: An exploration-exploitation ap-proach. In Proc. ICML, pp. 449–456, 2007.

Krause, A., Singh, A., and Guestrin, C. Near-optimal sen-sor placements in Gaussian processes: Theory, efficientalgorithms and empirical studies. JMLR, 9:235–284,2008.

Low, K. H., Gordon, G. J., Dolan, J. M., and Khosla,P. Adaptive sampling for multi-robot wide-area explo-ration. In Proc. IEEE ICRA, pp. 755–760, 2007.

Low, K. H., Dolan, J. M., and Khosla, P. Adaptive multi-robot wide-area exploration and mapping. In Proc. AA-MAS, pp. 23–30, 2008.

Low, K. H., Dolan, J. M., and Khosla, P. Information-theoretic approach to efficient adaptive path planning formobile robotic environmental sensing. In Proc. ICAPS,pp. 233–240, 2009.

Low, K. H., Dolan, J. M., and Khosla, P. Active Markovinformation-theoretic path planning for robotic environ-mental sensing. In Proc. AAMAS, pp. 753–760, 2011.

Low, K. H., Chen, J., Dolan, J. M., Chien, S., and Thomp-son, D. R. Decentralized active robotic exploration andmapping for probabilistic field classification in environ-mental sensing. In Proc. AAMAS, pp. 105–112, 2012.

Martin, R. J. Comparing and contrasting some environmen-tal and experimental design problems. Environmetrics,12(3):303–317, 2001.

Muller, W. G. Collecting Spatial Data: Optimum Designof Experiments for Random Fields. Springer, 3rd edition,2007.

Ouyang, R., Low, K. H., Chen, J., and Jaillet, P. Multi-robot active sensing of non-stationary Gaussian process-based environmental phenomena. In Proc. AAMAS,2014.

Park, M. and Pillow, J. W. Bayesian active learning withlocalized priors for fast receptive field characterization.In Proc. NIPS, pp. 2357–2365, 2012.

Podnar, G., Dolan, J. M., Low, K. H., and Elfes, A. Telesu-pervised remote surface water quality sensing. In Proc.IEEE Aerospace Conference, 2010.

Poupart, P., Vlassis, N., Hoey, J., and Regan, K. An ana-lytic solution to discrete Bayesian reinforcement learn-ing. In Proc. ICML, pp. 697–704, 2006.

Rasmussen, C. E. and Williams, C. K. I. Gaussian Pro-cesses for Machine Learning. MIT Press, 2006.

Shewry, M. C. and Wynn, H. P. Maximum entropy sam-pling. J. Applied Statistics, 14(2):165–170, 1987.

Singh, A., Krause, A., Guestrin, C., and Kaiser, W. J. Effi-cient informative sensing using multiple robots. J. Arti-ficial Intelligence Research, 34:707–755, 2009.

Solomon, H. and Zacks, S. Optimal design of samplingfrom finite populations: A critical review and indicationof new research areas. J. American Statistical Associa-tion, 65(330):653–677, 1970.

Zimmerman, D. L. Optimal network design for spatial pre-diction, covariance parameter estimation, and empiricalprediction. Environmetrics, 17(6):635–652, 2006.