q-learning quadratic loss qualitative attribute quality ... · quality threshold (qt) clustering...

15
Q Q-Learning Peter Stone Department of Computer Science, The University of Texas at Austin, Austin, TX, USA Abstract Definition of Q-learning. Definition Q-learning is a form of temporal differ- ence learning. As such, it is a model-free reinforcement learning method combining elements of dynamic programming with Monte Carlo estimation. Due in part to Watkins’(1989) proof that it converges to the optimal value function, Q-learning is among the most commonly used and well-known reinforcement learning algorithms. Cross-References Reinforcement Learning Temporal Difference Learning Recommended Reading Watkins CJCH (1989) Learning from delayed rewards. PhD thesis. King’s College, Cambridge Quadratic Loss Mean Squared Error Qualitative Attribute Categorical Attribute Quality Threshold Quality Threshold Clustering Quality Threshold Clustering Xin Jin 1 and Jiawei Han 2 1 PayPal Inc., San Jose, CA, USA 2 University of Illinois at Urbana-Champaign, Urbana, IL, USA Abstract Quality Threshold is a clustering algorithm without specifying the number of clusters. It uses the maximum cluster diameter as the parameter to control the quality of clusters. Synonyms Quality threshold © Springer Science+Business Media New York 2017 C. Sammut, G.I. Webb (eds.), Encyclopedia of Machine Learning and Data Mining, DOI 10.1007/978-1-4899-7687-1

Upload: nguyenngoc

Post on 10-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Q

Q-Learning

Peter StoneDepartment of Computer Science, TheUniversity of Texas at Austin, Austin, TX, USA

Abstract

Definition of Q-learning.

Definition

Q-learning is a form of � temporal differ-ence learning. As such, it is a model-free� reinforcement learning method combiningelements of � dynamic programming with MonteCarlo estimation. Due in part to Watkins’ (1989)proof that it converges to the optimal valuefunction, Q-learning is among the mostcommonly used and well-known � reinforcementlearning algorithms.

Cross-References

�Reinforcement Learning�Temporal Difference Learning

Recommended Reading

Watkins CJCH (1989) Learning from delayed rewards.PhD thesis. King’s College, Cambridge

Quadratic Loss

�Mean Squared Error

Qualitative Attribute

�Categorical Attribute

Quality Threshold

�Quality Threshold Clustering

Quality Threshold Clustering

Xin Jin1 and Jiawei Han2

1PayPal Inc., San Jose, CA, USA2University of Illinois at Urbana-Champaign,Urbana, IL, USA

Abstract

Quality Threshold is a clustering algorithmwithout specifying the number of clusters. Ituses the maximum cluster diameter as theparameter to control the quality of clusters.

Synonyms

Quality threshold

© Springer Science+Business Media New York 2017C. Sammut, G.I. Webb (eds.), Encyclopedia of Machine Learning and Data Mining,DOI 10.1007/978-1-4899-7687-1

1034 Quantitative Attribute

Definition

Quality Threshold (QT) clustering (Heyer et al.1999) is a partitioning clustering algorithm orig-inally proposed for gene clustering. The focus ofthe algorithm is to find clusters with guaranteedquality. Instead of specifying K, the number ofclusters, QT uses the maximum cluster diameteras the parameter.

The basic idea of QT is as follows: form acandidate cluster by starting with a random pointand iteratively add other points, with each itera-tion adding the point that minimizes the increasein cluster diameter. The process continues untilno point can be added without surpassing thediameter threshold. If surpassing the threshold,a second candidate cluster is formed by startingwith a point and repeating the procedure. In orderto achieve reasonable clustering quality, alreadyassigned points are available for forming anothercandidate cluster.

For data partition, QT selects the largestcandidate cluster and removes the points whichbelong to the cluster from consideration andrepeats the procedure on the remaining set ofdata.

The advantage of QT clustering is that it canguarantee cluster quality and does not require theprior knowledge of the cluster number. The dis-advantage is that the algorithm is computationallyexpensive as much as O.N 3/.

Softwares

The following softwares have implementationsof the Quality Threshold (QT) clustering algo-rithm:

• Flexclust: Flexible Cluster Algorithms.R package. http://cran.r-project.org/web/packages/flexclust/index.html

• FinMath. A numerical library that providescomponents for the development of mathe-matical, scientific, and financial applicationson the .NET platform. https://www.rtmath.net

Recommended Reading

Heyer L, Kruglyak S, Yooseph S (1999) Exploringexpression data: identification and analysis of coex-pressed genes. Genome Res 9:1106–1115

Quantitative Attribute

�Numeric Attribute

Quantum Machine Learning

Maria Schuld1 and Francesco Petruccione2

1Quantum Research Group, School of Chemistry& Physics, University of KwaZulu-Natal,Durban, South Africa2National Institute of Theoretical Physics(NITheP), KwaZulu-Natal, South Africa

Abstract

Quantum machine learning is a young re-search area investigating which consequencesthe emerging technology of quantum comput-ing has for machine learning. This article in-troduces into basic concepts of quantum infor-mation and summarises some major strategiesof implementing machine learning algorithmson a quantum computer.

Definition

Quantum machine learning (QML) is asubdiscipline of quantum information processingresearch, with the goal of developing quantumalgorithms that learn from data in order toimprove existing methods in machine learning.A quantum algorithm is a routine that can beimplemented on a quantum computer, a devicethat exploits the laws of quantum theory in orderto process information.

A number of quantum algorithms havebeen proposed for various machine learningmodels such as neural networks, support

Quantum Machine Learning 1035

Q

vector machines, and graphical models, someof which claim runtimes that under certainconditions grow only logarithmic with the sizeof the input space and/or dataset comparedto conventional methods. A crucial point forruntime considerations is to find a procedurethat efficiently encodes classical data intothe properties of a quantum system. QMLalgorithms are often based on well-knownquantum subroutines (such as quantum phaseestimation or Grover search) or exploit fastannealing techniques through quantum tunnelingand can make use of an exponentially compactrepresentation of data through the probabilisticdescription of quantum systems.

Besides finding quantum algorithms for pat-tern recognition and data mining, QML also in-vestigates more fundamental questions about theconcept of learning from the perspective of quan-tum theory. Sometimes the definition of QMLis extended by research that applies machinelearning to quantum information, such as is fre-quently done when the full evolution or state ofa quantum system has to be reconstructed fromlimited experimental data.

Motivation and Background

The accurate solution of many learning problemsis known to be NP-hard, such as the training ofBoltzmann machines or inference in graphicalmodels. But also methods for which tractablealgorithms are known suffer from the increasingsize of datasets available in today’s applications.The idea behind QML is to approach these prob-lems from the perspective of quantum informa-tion and harvest the power of quantum computersfor applications in artificial intelligence and datamining.

The motivation to find quantum analogues for“classical” machine learning algorithms derivesfrom the success of the dynamic research field ofquantum information. Some speedups comparedto the best or best-known classical algorithmshave already been shown, the most prominentbeing Shor’s factorization algorithm (Shor 1997)(providing an exponential speedup compared to

the best classical algorithm known) and Grover’ssearch algorithm for unsorted databases (Grover1996) (providing a quadratic speedup to thebest possible classical algorithm). Althoughit is still an open question whether “true”exponential speedups are possible, the numberof quantum algorithms is constantly growing.Also the technological implementation of large-scale universal quantum computers makessteady progress, and many proof-of-principleexperiments have confirmed the theoreticalpredictions (The reader can get a first impressionof the current progress in Wikipedia’s “timelineof quantum computing” https://en.wikipedia.org/wiki/Timeline of quantum computing.) The firstrealizations of quantum annealing devices,which solve a very specific type of optimizationproblem and are thus not universal, are alreadycommercially available (e.g., http://www.dwavesys.com/).

Proposals that apply quantum computing todata mining in general and learning tasks inparticular have been sporadically put forwardsince quantum computing became a well-established research area in the 1990s. Aspecifically large share of attention has beendevoted to so-called quantum neural networkmodels which simulate the behavior of artificialneural networks based on quantum information.They were initially motivated by questions ofwhether quantum mechanics can help to explainthe functioning of our brain (Kak 1995) and varyin the degree of a rigorous application of quantumtheory (Schuld et al. 2015). Since around2012, there has been a rapid increase in othercontributions to QML, consisting of proposalsfor quantum versions of hidden Markov models(Barry et al. 2014), Boltzmann machines (Wiebeet al. 2014; Adachi and Henderson 2015), beliefnets (Low et al. 2014), support vector machines(Rebentrost et al. 2014), linear regression (Schuldet al. 2016), Gaussian processes (Zhao et al.2015) and many more. Several collaborationsbetween IT companies and academic institutionshave been created and promise to advance thefield of QML in future. For example, Googleand NASA founded the Quantum ArtificialIntelligence Lab in 2013, the University of

1036 Quantum Machine Learning

Oxford and Nokia set up a Quantum Optimisationand Machine Learning program in 2015, and theUniversity of Southern California collaborateswith Lockheed Martin on machine learningapplications through the Quantum ComputationCenter.

Quantum Computing

In order to present the major approaches to QMLresearch below, it is necessary to introduce somebasic concepts of quantum information. The in-terested reader shall be referred to the excellentintroduction by Nielsen and Chuang (2010).

In conventional computers, the state of a phys-ical system represents bits of information and ismanipulated by the Newtonian laws of physics(e.g., the presence of a current in a circuit rep-resents 0 and 1 and is manipulated by the laws ofelectrodynamics). A quantum computer followsa very similar concept, only that the underly-ing physical system is governed by the laws ofquantum theory and is therefore called a quantumsystem.

Quantum theory is a mathematical apparatusdescribing physical objects on very small scales(i.e., electrons, atoms, photons). More precisely,it is a probabilistic description of the resultsof physical measurements on quantum systems,and although confirmed in many experiments, itshows a number of features distinct to classicalor Newtonian mechanics. Quantum computersexploit these features through information pro-cessing based on the rules of quantum theory.Although a number of exciting results have beenachieved, it is still unknown whether BQP, theclass of decision problems solvable by a quantumcomputer in polynomial time, is larger than BPP,its classical analogue. In short, quantum comput-ing is a very dynamic research area with manypromising results and open questions.

The quantum information community uses avariety of computational models that have beenshown to be equivalent, but which constitutedifferent building blocks of universal quantumcomputation. The following will give a shortintroduction to the most influential model, the

circuit model, to clarify important concepts onwhich QML algorithms are based.

The Concept of a QubitA central concept in the major quantum compu-tational models is the qubit, an abstraction of aquantum system that has two possible configura-tions or states. As long as certain properties arefulfilled (DiVincenzo 2000), such a two-level sys-tem can have many possible physical realizations(just like bits may be encoded in currents of cir-cuits or the pits and lands of CDs), for example,a hydrogen atom in the energetic ground or firstexcited state, the current in a superconductingcircuit or the path a light photon chooses througha semitransparent mirror.

Qubits are often introduced as “bits that canbe in states 0 and 1 at the same time,” whichmystifies rather than explains the concept. In fact,qubits can be compared to a probabilistic de-scription of a classical physical system with twodifferent states, say a coin with the states “heads”and “tails.” As illustrated in Table 1, the prob-abilities p00; p01; p10; and p11 with

Pi pi D 1

describe our expectation to get the respectiveresult “head and head,” “head and tail,” “tail andhead,” and “tail and tail” after tossing two coins.Note that the coin tosses do not necessarily needto be statistically independent events.

The probabilistic description of a qubit showsa significant difference (see Table 2). The fourconfigurations “00,” “01,” “10,” and “11” of atwo-qubit system such as two simplified atoms

Quantum Machine Learning, Table 1Probabilistic description of a classical system oftwo coins. Each of the four possible outcomesor configurations after tossing both coins isassociated with a probability.

Quantum Machine Learning 1037

Q

Quantum Machine Learning, Table 2 Important el-ements in the description of a two-qubit system. Anexample is two atoms that can each be in the ground andfirst excited state, so that the system has four possibleabstract configurations. Quantum theory associates each

configuration (or potential measurement outcome) with anamplitude, and the absolute square of the amplitude is theprobability of measuring this state. In the mathematicalnotation, each configuration corresponds to a unit basisvector or, in Dirac notation, a Dirac basis state

are each associated with a complex number calledamplitude, and the probability of observing thetwo qubits in one of the four possible joint statesis given by the absolute square of the amplitude.The sum of absolute squares of the amplitudesai ; i D 1; : : : ; 2n of an n-qubit system conse-quently has to add up to one,

Pi jai j

2 D 1. Inboth the classical and the quantum case, once thecoins or atoms are observed in one of the jointconfigurations, their state is fully determined,and repeated observations will only confirm theresult. As will be explained below, this conceptof complex amplitudes is central to quantuminformation and has up to the present – 100 yearsafter the beginning of quantum theory – still notfound a satisfying interpretation for our everydayintuition.

Algorithmic Manipulations of QubitsInformation processing is about the manipulationof bits by elementary logic gates such as ANDor XOR, and quantum information processinglikewise needs to define elementary operations onqubit systems (of course derived from the laws ofquantum theory), from which algorithms with awell-defined output can be constructed.

In a probabilistic description, manipulatinginformation corresponds to a transformation ofthe system’s probability distribution. For exam-ple, in the case of the two coins, this couldmean drawing a “heads” over the “tails” symbol,causing the coin to only toss “heads.” Usingthe mathematical language of Markov chains,

changes of a classical probability distribution canbe expressed by a linear transformation applied tothe vector of probabilities, written as a stochasticmatrix S D .sij / multiplied from the left. Thestochastic matrix has the properties that its entriesare nonnegative and all columns sum up to one,in order to guarantee that the resulting vector onthe right side is again a probability distribution.In our two-coin example, this reads

S

0

BB@

p00

p01

p10

p11

1

CCA D

0

BB@

p000p0

01p0

10p0

11

1

CCA ;

sij � 0;P

i sij D 1:(1)

For quantum systems, any physically possibleevolution can be mathematically represented by aunitary matrix U D .uij / applied to the vector ofamplitudes, which in the two-qubit example reads

U

0

BB@

a00

a01

a10

a11

1

CCA D

0

BB@

a000a0

01a0

10a0

11

1

CCA ;

uij 2 C;

S�S D 1:(2)

A unitary matrix has orthogonal column vectors,guaranteeing that the resulting vector on the rightside is again a quantum amplitude vector. Equa-tion (2) describes in fact any possible closed evo-lution of a two-qubit system in quantum theory.

Quantum algorithms (as well as QML algo-rithms) are usually formulated using the Diracnotation, in which one decomposes the amplitude

1038 Quantum Machine Learning

vector into a linear combination of unit vectorsand rewrites the unit vectors as Dirac vectors:

a D a1

0

BBB@

10:::

0

1

CCCA

C : : :C a2n

0

BBB@

00:::

1

1

CCCA

(3)

m (4)

j i D a1 j0 : : : 0i C : : :C a2n j1 : : : 1i : (5)

Dirac notation is very handy as it visualizesthe actual measurement result of the n qubitscorresponding to an amplitude.

Similarly to elementary gates, the circuitmodel defines elementary unitary transforma-tions as building blocks to manipulate thequantum state of a qubit system. For example,consider a single qubit described by the complexamplitude vector .a1; a2/

T . If the quantumsystem is in state .1; 0/T , we know with certaintythat a measurement will produce the state 0 (sincethe probability of measuring the 0 state is givenby p0 D ja1j2 D 1:0, while p1 D ja2j2 D 0:0).The unitary transformation

Ux D

�0 11 0

then transforms this state into .0; 1/T , which willcertainly result in a measurement of state 1. Ux

hence effectively performs a bit flip or NOTgate on the state of the qubits. In a similarfashion, other quantum gates can be defined thattogether form a set of universal gates for quantumcomputation.

Why Is Quantum Computing Different?Returning to the question why complex ampli-tudes change the rules of classical informationprocessing, consider another elementary quan-tum gate that has no classical equivalent since itcannot be expressed as a stochastic matrix withpositive entries. The Hadamard gate

UH D1

p2

�1 11 �1

;

will, applied to a state .1; 0/T , produce. 1p

2; 1p

2/T , which is called a superposition of

states 0 and 1. A classical equivalent would be astate of maximum uncertainty, as the probabilityof measuring the qubit in state 0 or 1 is j 1p

2j2 D 1

2

each. However, the difference of a superpositionbecomes apparent when applying UH once more,which transforms the state back into .1; 0/T asthe minus in UH cancels the two amplitudeswith each other when calculating the secondentry of the resulting amplitude vector. In otherwords, amplitudes can annihilate each other, aphenomenon called interference which is oftenmentioned as the crucial resource of quantumcomputing. Beyond this illustration, the eleganttheory of quantum Turing machines allows amore sophisticated comparison between quantumand classical computing (Deutsch 1985), but goesbeyond our scope here.

Quantum Machine LearningAlgorithms

Most existing QML algorithms solve problems ofsupervised or unsupervised pattern classificationand regression, although first advancements toreinforcement learning have been made (e.g.,Paparo et al. 2014). Given a (classical) datasetD and a new instance Qx for which we wouldmake a prediction, a QML algorithm usuallyconsists of three parts: First, the input has tobe encoded into a quantum system through astate preparation routine. Second, the quantumalgorithm is executed by unitary transformations(Note that nonunitary evolutions are possible inso-called open quantum systems, but correspondto a unitary evolution of a larger system.) Third,the result is read out by measuring the quantumsystem (see Fig. 1). The encoding and readoutsteps are often the bottlenecks of a QML algo-rithm; for example, reading out an amplitude ina quantum state that is in a uniform superposi-tion of all possibilities will on average take anumber of measurements that is exponential inthe number of qubits. In particular, claims ofquantum algorithms that run in time logarithmicin the size of the dataset and input vectors oftenignore the resources it takes for the crucial step ofencoding the information carried by a dataset into

Quantum Machine Learning 1039

Q

ML algorithm

Dataset D, new instance x

Machine learningalgorithm

Prediction

QML algorithm

Dataset D, new instance x

Encoding

Quantum machinelearning algorithm

Read out

Prediction

Quantum system

state preparation

unitary evolution

measurement

Quantum Machine Learning, Fig. 1 Comparison ofthe basic scheme of classical (left) and quantum (center)machine learning algorithms for pattern classification, to-gether with the operations on the quantum system (right).In order to solve machine learning tasks based on classical

datasets, the quantum algorithm requires an informationencoding and readout step that are in general highly non-trivial procedures, and it is important to consider them inthe runtime

a quantum system. Such algorithms can still bevaluable for pure quantum information process-ing, i.e., if the “quantum data” is generated byprevious routines or experiments.

The QML algorithm and readout step dependheavily on the way information is encoded intothe quantum system; one can distinguish threeways of information encoding into an n-qubitsystem:

1. Interpreting the possible measurement out-comes of a qubit system as a bit sequence.

2. Interpreting the amplitude vector as (i) a 2n-dimensional classical real vector or (ii) a prob-ability distribution over n binary variables.

3. Encoding the result to an optimization prob-lem into the ground state (state of the lowestenergy) of a quantum system.

These strategies help to distinguish different ap-proaches to develop QML algorithms.

Associating Qubits with BitsThe most straightforward method of informationencoding into quantum systems is to associatebits with qubits. For example, the two-qubit state.1; 0; 0; 0/T in the example in Table 2 representsthe bit string Œ00�, since the system has unitprobability of being measured in the ‘00’ state.

To encode a full dataset in this fashion, itneeds to be given in binary form, meaning that

every feature vector (and, if applicable, its la-bel) has been translated into an n-bit binarysequence. For example, the dataset D [D D8<

:

0

@001

1

A ;

0

@011

1

A ;

0

@110

1

A

9=

;] can be encoded into the

quantum state aD D 1p3.01010010/T :

In this case, the Dirac notation introducedabove is helpful as it explicitly contains the en-coded feature vectors:

jDi D1

p3.j001i C j011i C j110i/:

An early example of a QML algorithm basedon such a “quantum dataset” has been developedfor pattern completion (finding feature vectorscontaining a given bit sequence) by an associativememory mechanism as known from Hopfieldmodels (Ventura and Martinez 2000). The authorssuggest a routine to construct the state aD effi-ciently and use a modified Grover search algo-rithm, in which the amplitudes corresponding tothe desired measurement outcomes are markedin one single step, after which the amplitudesof the marked states are amplified. The resultingquantum state has a high probability of beingmeasured in one of the basis states containing thedesired bit sequence.

An example of a QML algorithm forsupervised pattern classification is a quantum

1040 Quantum Machine Learning

version of k-nearest neighbor (Schuld et al.2014b). Beginning with a superposition as inEq. (6) where some selected qubits encodethe class label, the idea is to weigh theamplitudes by the Hamming distance betweeneach corresponding training vector and thenew input. Only the “class-label qubits” getmeasured, so that close inputs contribute moreto the probability of measuring their class labelthan distant ones. An alternative is presented byWiebe et al. (2015), who also prepare a quantumstate with distance-weighted amplitudes and thenperformed a subroutine based on the Groversearch to find the basis state representing theclosest neighbor.

Encoding Information into AmplitudesAnother way to encode information is to asso-ciate the quantum amplitude vector with a realclassical vector:

0

B@

a1:::

a2n

1

CA $

0

B@

x1:::

x2n

1

CA ;

X

i

jxi j2 D 1; xi 2 R:

Note that since amplitude vectors are normal-ized, the classical vector has to be preprocessedaccordingly. A quantum system of n qubits cantherefore in principle encode 2n real numbers,which is an exponentially compact representa-tion. There are some vectors for which statepreparation can be done in time that grows onlylinear with the number of qubits, and if theQML algorithm and readout step have the sameproperty, an algorithm which is logarithmic in theinput dimension is found.

Two different strategies to use this encodingfor QML can be distinguished, one that asso-ciates the amplitude vector with one or all featurevectors in order to use the power of eigenvaluedecomposition inherent in the formalism of quan-tum theory, and the other in which amplitudes areused to encode classical probability distributions.

Quantum Eigenvalue DecompositionsAn important branch of QML research is basedon the intrinsic feature of quantum theory to

evaluate eigenvalues of operators, which has beenexploited in an important quantum algorithm forthe solution of systems of linear equations (Har-row et al. 2009). The routine takes a quantumstate described by the amplitude vector b whichcorresponds to the (normalized) right side of aclassical linear system of equations Ax D b.Through a set of involved operations (includingthe Hamiltonian simulation of an operator corre-sponding to A, a quantum phase estimation algo-rithm and a selective measurement that has to berepeated until a certain result was obtained), thequantum state is transformed into

Pj �

�1j uT

j b uj

with eigenvalues �j and eigenvectors uj of A,which equals the correct solution x. Due to theexponentially compact representation of informa-tion, the complexity of the algorithm dependsonly logarithmically on the size of b when weignore the encoding and readout step. However,its running time depends sensibly on other param-eters such as the condition number and sparsity ofA, as well as the desired accuracy in the result.This makes the linear systems algorithm onlyapplicable to very special problems (Aaronson2015). QML researchers have tried to find suchapplications in different areas of machine learn-ing that rely on matrix inversion.

The first full QML example exploiting theideas of the linear systems algorithm was thequantum support vector machine (Rebentrostet al. 2014). The main idea is to take the dualformulation of support vector machines writtenas a least squares problem, in which a linearsystem of equations with the kernel matrixKij D xi xj and xi ; xj 2 D has to be solved,and apply the above quantum routine. By makinguse of a trick, the linear systems algorithm cantake a quantum state encoding Kij (instead ofa quantum operator as in the original version).Creating a quantum version ofKij is surprisinglyelegant if one can prepare a quantum state:

.x11 ; : : : ; x

1N ; : : : ; x

M1 ; : : : ; xM

N / (6)

whose amplitudes encode the MN features ofall training vectors xm D .xm

1 ; : : : ; xmN /

T m D

1; : : : ;M . The statistics of a specific subset of thequbits in state Eq. (6) include a covariance matrix

Quantum Machine Learning 1041

Q

(in quantum theory known as density matrix) thatis entrywise equivalent to the kernel and whichcan be accessed by further processing.

Data fitting by linear regression has been ap-proached by means of the quantum linear systemsalgorithm by Wiebe et al. (2012) to obtain thewell-known least squares solution:

w D XCy

for the linear regression parameters w with thepseudoinverse XC D .XCX/�1XC where thecolumns of X are the training inputs. Schuld et al.(2016), propose another version of the quantumalgorithm that is suited for prediction. The al-gorithm is based on a quantum computation ofthe singular value decomposition of XC which inthe end encodes the result of the prediction of anew input into the measurement result of a singlequbit.

Other QML algorithms based on the principleof matrix inversion and eigenvalue estimationon a quantum computer have been proposed forGaussian processes (Zhao et al. 2015) as well asto find topological and geometric features of data(Lloyd et al. 2016). The routines discussed herespecify the core algorithm as well as the readoutstep in the scheme of Fig. 1 and are logarithmicin the dimension of the feature vectors. However,they leave the crucial encoding step open, whichmight merely “hide” the complexity for all butsome selected problems as has been criticallyremarked by Aaronson (2015).

Quantum Probability DistributionsSince quantum theory defines probability distri-butions over measurement results, it is immedi-ately apparent that probability distributions overbinary variables can very genuinely be repre-sented by the amplitudes of a qubit system.

More precisely, given n random binaryvariables, an amplitude vector can be used toencode the square roots of 2n probabilities ofthe different realizations of these variables.For example, the probability distribution overthe possible results of the two-coin toss inTable 1 could be encoded into an amplitudevector

�pp00;

pp01;

pp10;

pp11

�of the two-

qubit system in Table 2. Despite the efficientrepresentation of probability distributions, alsothe marginalization of variables, which isintractable in classical models, corresponds to thesimple step of excluding the qubits correspondingto these variables from measurements andconsidering the resulting statistics.

While these advantages sound impressive, itturns out that the problem of statistical inferenceremains prohibitive: Conditioning the qubit prob-ability distribution on the state of all but onequbit, p.x1; : : : ; xN / ! p.xN jx1; : : : ; xN �1/,requires measuring these qubits in exactly thedesired state, which has in general an exponen-tially small probability. Measuring the state canbe understood as sampling from the probabilitydistribution, and one has to do an unfeasibly largenumber of measurements to obtain the condi-tional statistics, while after each measurement anew quantum state has to be prepared. It hasin fact been shown that the related problem ofBayesian updating through quantum distributionis intractable (Wiebe and Granade 2015), as itcorresponds to a Grover search which can onlybe quadratically faster than classically possible.

Even without the ability for efficient infer-ence, quantum systems can still be interestingfor probabilistic machine learning models. Lowet al. (2014) exploit the quadratic speedup fora problem of inference with “quantum Bayesiannets.” Hidden Markov models have been shownto have an elegant formal generalization in thelanguage of open quantum systems (Barry et al.2014). Wiebe et al. (2014) show how quantumstates that approximate Boltzmann distributionscan be prepared to get samples for the train-ing of Boltzmann machines through contrastivedivergence. The same authors propose a semi-classical routine for Bayesian updating (Wiebeand Granade 2015). These contributions suggestthat a lot of potential lies in approaches thatexploit the genuinely stochastic structure of quan-tum theory for probabilistic machine learningmethods.

Optimization and Quantum AnnealingAnother branch of QML research is based ontechniques of quantum annealing, which can be

1042 Quantum Machine Learning

understood as an analogue version of quantumcomputing (Das and Chakrabarti 2008). Similarto the metaheuristic of simulated annealing, theidea is to drive a physical system into its energeticground state which encodes the desired result ofan optimization problem. To associate each basisstate of a qubit with an energy, one has to intro-duce externally controllable physical interactionsbetween the qubits.

The main difference between classical andquantum annealing is that “thermal fluctuations”are replaced by quantum fluctuations whichenable the system to tunnel through high andthin energy barriers (the probability of quantumtunneling decreases exponentially with thebarrier width, but is independent of its height).That makes quantum annealing especially fitfor problems with a “sharply ragged” objectivefunction (see Fig. 2). Quantum annealing can beunderstood as a heuristic version of the famouscomputational model of quantum adiabaticcomputation, which is why some authors speakof adiabatic quantum machine learning.

The significance of quantum annealing lies inits relatively simple technological implementa-tion, and quantum annealing devices are availablecommercially. Current machines are limited tosolving quadratic unconstrained binary optimiza-tion (QUBO) problems:

E

x

Quantum Machine Learning, Fig. 2 Illustration ofquantum annealing in an energy landscape over (herecontinuous) states or configurations x. The ground stateis the configuration of the lowest energy (black dot).Quantum tunneling allows the system state to transgresshigh and thin energy barriers (gray dot on the left), whilein classical annealing technique stochastic fluctuationshave to be large enough to allow for jumps over peaks(gray dot on the right)

argmin.x1;:::;xN /

X

ij

wij xixj with xi ; xj 2 Œ0; 1�:

(7)An important step is therefore to translate theproblem into QUBO form, which has beendone for simple binary classifiers or perceptrons(Pudenz and Lidar 2013; Denchev et al. 2012),image matching problems (Neven et al. 2008) andBayesian network structure learning (O’Gormanet al. 2015). Other machine learning modelsnaturally relate to the form of Eq. (7). Forexample, a number of contributions investigatequantum annealing for the sampling step requiredin the training of Boltzmann machines viacontrastive divergence (Adachi and Henderson2015; Amin et al. 2016). Another example isthe Hopfield model for pattern recognition viaassociative memory, which has been investigatedfrom the perspective of adiabatic quantumcomputation with nuclear magnetic resonancesystems (Neigovzen et al. 2009).

Measuring the performance of quantumannealing compared to classical annealingschemes is a non-trivial problem, and althoughadvantages of the quantum schemes have beendemonstrated in the literature mentioned above,general statements about speedups are stillcontroversial.

Experimental Realizations

The reason why one rarely finds classical com-puter simulations of quantum machine learningalgorithms in the literature is that the descriptionof quantum systems is classically intractable dueto the exponential size of the amplitude vectors.Until a large-scale universal quantum computeris built, only QML algorithms based on quan-tum annealing can be tested on real devices andbenchmarked against classical machine learningalgorithms. Some proof-of-principle experimentshave nevertheless implemented few-qubit exam-ples of proposed QML algorithms in the lab.Among those are experimental realizations of thequantum support vector machine (Cai et al. 2015)as well as quantum clustering algorithms (Li et al.2015; Neigovzen et al. 2009).

Quasi-Interpolation 1043

Q

Further Reading

The interested reader may be referred to existingreviews on quantum machine learning research(Schuld et al. 2014a, 2015; Adcock et al. 2015).

Recommended Reading

Aaronson S (2015) Read the fine print. Nat Phys11(4):291–293

Adachi SH, Henderson MP (2015) Application ofquantum annealing to training of deep neural net-works. arXiv preprint arXiv:1510.06356

Adcock J, Allen E, Day M, Frick S, Hinchliff J,Johnson M, Morley-Short S, Pallister S, Price A,Stanisic S (2015) Advances in quantum machinelearning. arXiv preprint arXiv:1512.02900

Amin MH, Andriyash E, Rolfe J, Kulchytskyy B,Melko R (2016) Quantum boltzmann machine.arXiv preprint arXiv:1601.02036

Barry J, Barry DT, Aaronson S (2014) Quantum par-tially observable markov decision processes. PhysRev A 90:032311

Cai X-D, Wu D, Su Z-E, Chen M-C, Wang X-L, Li L,Liu N-L, Lu C-Y, Pan J-W (2015) Entanglement-based machine learning on a quantum computer.Phys Rev Lett 114(11):110504

Das A, Chakrabarti BK (2008) Colloquium: quantumannealing and analog quantum computation. RevMod Phys 80(3):1061

Denchev V, Ding N, Neven H, Vishwanathan S (2012)Robust classification with adiabatic quantum opti-mization. In: Proceedings of the 29th internationalconference on machine learning (ICML-12), Edin-burgh, pp 863–870

Deutsch D (1985) Quantum theory, the church-turingprinciple and the universal quantum computer. ProcR Soc Lond A: Math Phys Eng Sci 400:97–117. TheRoyal Society

DiVincenzo DP (2000) The physical implementationof quantum computation. Fortschritte der Physik48(9–11):771–783 ISSN 1521–3978

Grover LK (1996) A fast quantum mechanical algo-rithm for database search. In: Proceedings of thetwenty-eighth annual ACM symposium on theory ofcomputing. ACM, New York, pp 212–219

Harrow AW, Hassidim A, Lloyd S (2009) Quantumalgorithm for linear systems of equations. Phys RevLett 103(15):150502

Kak SC (1995) Quantum neural computing. AdvImaging Electron Phys 94:259–313

Li Z, Liu X, Xu N, Du J (2015) Experimentalrealization of a quantum support vector machine.Phys Rev Lett 114(14):140504

Lloyd S, Garnerone S, Zanardi P (2016) Quantumalgorithms for topological and geometric analysis ofdata. Nat Commun 7:10138

Low GH, Yoder TJ, Chuang IL (2014) Quantum infer-ence on Bayesian networks. Phys Rev A 89:062315

Neigovzen R, Neves JL, Sollacher R, Glaser SJ(2009) Quantum pattern recognition with liquid-state nuclear magnetic resonance. Phys Rev A79(4):042321

Neven H, Rose G, Macready WG (2008) Imagerecognition with an adiabatic quantum computeri. Mapping to quadratic unconstrained binary opti-mization. arXiv preprint arXiv:0804.4457

Nielsen MA, Chuang IL (2010) Quantum computationand quantum information. Cambridge UniversityPress, Cambridge

O’Gorman B, Babbush R, Perdomo-Ortiz A, Aspuru-Guzik A, Smelyanskiy V (2015) Bayesian networkstructure learning using quantum annealing. EurPhys J Spec Top 224(1):163–188

Paparo GD, Dunjko V, Makmal A, Martin-DelgadoMA, Briegel HJ (2014) Quantum speedup for activelearning agents. Phys Rev X 4(3):031002

Rebentrost P, Mohseni M, Lloyd S (2014) Quantumsupport vector machine for big data classification.Phys Rev Lett 113:130503

Schuld M, Sinayskiy I, Petruccione F (2014a) Thequest for a quantum neural network. Q Inf Process13 (11):2567–2586

Schuld M, Sinayskiy I, Petruccione F (2014b) Quan-tum computing for pattern classification. Pham,Duc-Nghia, Park, Seong-Bae (Eds.) Springer Inter-national Publishing In: Lecture notes in computerscience, vol 8862. Springer, pp 208–220

Schuld M, Sinayskiy I, Petruccione F (2015) Introduc-tion to quantum machine learning. Contemp Phys56(2):172–185

Schuld M, Sinayskiy I, Petruccione F (2016) Predic-tion by linear regression on a quantum computer.Phys Rev A 94(2):022342

Shor PW (1997) Polynomial-time algorithms for primefactorization and discrete logarithms on a quantumcomputer. SIAM J Comput 26(5):1484–1509

Ventura D, Martinez T (2000) Quantum associativememory. Inf Sci 124(1):273–296

Wiebe N, Granade C (2015) Can small quantumsystems learn? arXiv preprint arXiv:1512.03145

Wiebe N, Braun D, Lloyd S (2012) Quantum algorithmfor data fitting. Phys Rev Lett 109(5):050505

Wiebe N, Kapoor A, Svore K (2014) Quantum deeplearning. arXiv: 1412.3489v1

Wiebe N, Kapoor A, Svore K (2015) Quantum nearest-neighbor algorithms for machine learning. Q InfComput 15:0318–0358

Zhao Z, Fitzsimons JK, Fitzsimons JF (2015) Quantumassisted Gaussian process regression. arXiv preprintarXiv:1512.03929

Quasi-Interpolation

�Radial Basis Function Networks

1044 Quasi-Interpolation

Query-Based Learning

Sanjay Jain1 and Frank Stephan2

1School of Computing, National University ofSingapore, Singapore, Singapore2Department of Mathematics, NationalUniversity of Singapore, Singapore, Singapore

Abstract

Query learning models the learning processas a dialogue between a pupil (learner) and ateacher; the learner has to figure out the targetconcept by asking questions of certain typesand whenever the teacher answers these ques-tions correctly, the learner has to learn withinthe given complexity bounds. Complexity canbe measured by both, the number of queriesas well as the computational complexity of thelearner. Query learning has close connectionsto statistical models like PAC learning.

Definition

Most learning scenarios consider learning as arelatively passive process where the learner ob-serves more and more data and eventually formu-lates a hypothesis that explains the data observed.Query-based learning is an � active learning pro-cess where the learner has a dialogue with ateacher, which provides on request useful infor-mation about the concept to be learned.

Detail

This article will mainly focus on query-basedlearning of finite classes and of parameterizedfamilies of finite classes. In some cases, an in-finite class has to be learned where then thebehavior of the learner is measured in terms of aparameter belonging to the concept. For example,when learning the class of all singletons fxg withx 2 f0; 1g�, the parameter would be the lengthn of x, and an algorithm based on membershipqueries would need up to 2n � 1 queries of the

form “Is y in L?” to learn an unknown set L D

fxg with x 2 f0; 1gn. In Query-based learning,the questions asked are similar to the following:Which classes can be learned using queries of thisor that type? If queries of a given type are used tolearn a parameterized class

SCn, is it possible to

make a learner which (with or without knowledgeof n) succeeds to learn every L 2 Cn with anumber of queries that is polynomial in n? Whatis the exact bound on queries needed to learna finite class C in dependence of the topologyof C and the cardinality of C ? If a query-basedlearner using polynomially many queries existsfor a parameterized class

SCn, can this learner

also be implemented such that it is computable inpolynomial time?

In the following, let C be the class of conceptsto be learned and the concepts L 2 C aresubsets of some basic set X . Now the learningprocess is a dialogue between a learner and ateacher in order to identify a language L 2 C ,which is known to the teacher but not to thelearner. The dialogue goes in turns and follows aspecific protocol that goes over a finite number ofrounds. Each round consists of a query placed bythe learner to the teacher and the answer of theteacher to this query. The query and the answerhave to follow a specific format (see Table 1) andthere are the following common types, where a 2

X andH 2 C are data items and concepts chosenby the learner and b 2 X is a counterexamplechosen by the teacher:

While for subset queries and superset queriesit is not required by all authors that the teacherprovides a counterexample in the case that theanswer is “no,” this requirement is quite stan-dard for the case of equivalence queries. With-out counterexamples, a learner would not haveany real benefit from these queries in settingswhere faster convergence is required, than by justchecking “Is H0 D L?,” “Is H1 D L?,” “IsH2 D L?,” : : :, which would be some trivial kindof algorithm.

Here is an example: Given the class C of allfinite subsets of f0; 1g�, a learner using supersetqueries could just work as given in Table 2 tolearn each set of the form L D fx1; x2; : : : ; xng

with nC 1 queries.

Query-Based Learning 1045

Q

Query-Based Learning, Table 1 Types of Queries

Query name Precise Query Answer if true Answer if false

Membership query Is a 2 L? “Yes” “No”

Equivalence query Is H D L? “Yes” “No” plus b (where b 2 H � L [ L � H )

Subset query Is H � L? “Yes” “No” plus b (where b 2 H � L)

Superset query Is H � L? “Yes” “No” plus b (where b 2 L � H )

Disjointness query Is H \ L D ;? “Yes” “No” plus b (where b 2 H \ L)

Query-Based Learning, Table 2 Learning finite sets using superset queries

Round Query Answer Counterexample1 Is L � ;? “No” x1

2 Is L � fx1g? “No” x2

3 Is L � fx1; x2g? “No” x3

::::::

::::::

n Is L � fx1; x2; : : : ; xn�1g? “No” xn

n C 1 Is L � fx1; x2; : : : ; xn�1; xng? “Yes” —

Here, of course, the order on how the coun-terexamples come up does not matter; the givenorder was just preserved for the reader’s con-venience. Note that the same algorithm worksalso with equivalence queries in place of supersetqueries. In both cases, the algorithm stops withoutputting “L D fx1; x2; : : : ; xng” after the lastquery. However, the given class is not learnableusing membership and subset queries which canbe seen as follows: Assume that such a learnerlearns ; using the subset queries “Is H0 � L?,”“IsH1 � L?,” “IsH2 � L?,” : : : , “IsHm � L?”and the membership queries “Is y0 2 L?,” “Isy1 2 L?,” “Is y2 2 L?,” : : : , “Is yk 2 L?” Fur-thermore, let D be the set of all counterexamplesprovided by the learner to subset queries. Now letE D D[H0 [H1 [ : : :[Hm [fy0; y1; : : : ; ykg.Note that E is a finite set and let x be an elementof f0; 1g� � E. If L D fxg, then the answers tothese queries are the same to the case that L D ;.Hence, the learner cannot distinguish between thesets ; and fxg; therefore, the learner is incorrecton at least one of these sets.

In the case that C is finite, one could just askwhat is the number of queries needed to deter-mine the target L in the worst case. This dependson the types of queries permitted and also on thetopology of the class C . For example, if C is thepower set of fx1; x2; : : : ; xng, then nmembershipqueries are enough; but if C is the set of all

singleton sets fxg with x 2 f0; 1gn, then 2n � 1membership queries are needed to learn the con-cept, although in both cases the cardinality ofC is2n. One can do with log.jC j/ many equivalencequeries with counterexamples in the case that aclass-comprising hypothesis space is permitted.For this, each conjecture H has that for all x,H.x/ follows the majority of those L 2 C whichare consistent with all previous counterexamples.Then each counterexample would invalidate themajority of the still valid/consistent members ofC and thus give the logarithmic bound.

Angluin (2004) provides a survey of the priorresults on questions like how many queries areneeded to learn a given finite class. Maass andTuran (1992) showed that usage of membershipqueries in addition to equivalence queries doesnot speed up learning too much compared tothe case of using equivalence queries alone. IfEQ is the number of queries needed to learnC from equivalence queries alone (with coun-terexamples) and EMQ is the number of queriesneeded to learn C with equivalence queries andmembership queries, then

EQ

log.EQC 1/� EMQ � EQI

here the logarithm is base 2. This result isbased on a result of Littlestone (1988) who

1046 Query-Based Learning

characterized the number of queries neededto learn from equivalence queries alone andprovided a “standard optimal algorithm” forthis task. Note that these two results usedclass-comprising hypothesis spaces, whereone can make an equivalence query with ahypothesis which is not in the class to be learned– this technique permits to get meaningfulcounterexample.

Angluin (1987) showed that the class of allregular languages can be learned in polynomialtime using queries and counterexamples. Herethe learning time is measured in terms of twoparameters: the number n of states that the small-est deterministic finite automaton generating thelanguage has and the numberm of symbols in thelongest counterexample provided by the teacher.Ibarra and Jiang (1988) showed that the algorithmcan be improved to need at most dn3 equiva-lence queries when the teacher always returns theshortest counterexample; Birkendorf et al. (2000)improved the bound to dn2. In these bounds, dis the size of the alphabet used for defining theregular languages to be learned.

Much attention has been paid to the followingquestion: Which classes of Boolean formulasover n variables can be learned with polyno-mially many queries, uniformly in n (see, e.g.,Aizenstein et al. 1992; Aizenstein and Pitt 1995;Angluin et al. 1993; Hellerstein et al. 1996)?Angluin et al. (1993) showed that read-once for-mulas, in which every variable occurs only once,are learnable in polynomial time using member-ship and equivalence queries. On the other hand,read-thrice DNF (disjunctive normal form) for-mulas cannot be learned in polynomial time usingthe same queries (Aizenstein et al. 1992) unlessP D NP. In other words, such a learner wouldnot succeed because of the limited computationalpower of a polynomial time learner; hence, equip-ping the learner with an additional oracle thatcan provide this power would permit to buildsuch a learner. Here an oracle – in contrast to ateacher – does not know the task to be learned butgives information which is difficult or impossibleto compute. Such an oracle could, for example,be the set SAT of all satisfiable formulas, andthus the learner could gain additional power by

asking the oracle whether certain formulas aresatisfiable. A special class of Boolean formulasis that of Horn clauses (see, e.g., Angluin et al.1992; Arias 2004; Arias and Balcazar 2009; Ariasand Khardon 2002).

There are links to other fields. Angluin (1988,1990) investigated the relation between querylearning and � PAC Learning. She found thatevery class which is learnable using member-ship queries and equivalence queries is also PAClearnable (Angluin 1988); the PAC learner alsoworks in polynomial time and needs at mostpolynomially many examples. More recent re-search on learning Boolean formulas also com-bines queries with probabilistic aspects (Jack-son 1997). Furthermore, query learning has alsobeen applied to � Inductive Inference (see, e.g.,Gasarch and Lee 2008; Gasarch and Smith 1992;Jain et al. 2007; Lange and Zilles 2005). Herethe power of the learner depends not only on thetype of queries permitted but also on whetherqueries of the corresponding type can be askedfinitely often or infinitely often; the latter appliesof course only to learning models where thelearner converges in the limit and may revisethe hypothesis from time to time. Furthermore,queries to oracles have been studied widely; seethe entry on �Complexity of Inductive Inference.

Acknowledgements Sanjay Jain was supported in partby NUS grant numbers C252-000-087-001, R146-000-181-112, and R252-000-534-112. Frank Stephen was sup-ported in part by NUS grant numbers R146-000-181-112and R252-000-534-112.

Recommended Reading

Aizenstein H, Pitt L (1995) On the learnability ofdisjunctive normal form formulas. Mach Learn19(3):183–208

Aizenstein H, Hellerstein L, Pitt L (1992) Read-thriceDNF is hard to learn with membership and equiv-alence queries. In: Thirty-third annual symposiumon foundations of computer science, Pittsburgh, 24–27 Oct 1992. IEEE Computer Society, Washington,DC, pp 523–532

Angluin D (1987) Learning regular sets from queriesand counterexamples. Info Comput 75(2):87–106

Angluin D (1988) Queries and concept learning. MachLearn 2(4):319–342

Query-Based Learning 1047

Q

Angluin D (1990) Negative results for equivalencequeries. Mach Learn 5:121–150

Angluin D (2004) Queries revisited. Theor Comput Sci313:175–194

Angluin D, Frazier M, Pitt L (1992) Learning conjunc-tions of Horn clauses. Mach Learn 9:147–164

Angluin D, Hellerstein L, Karpinski M (1993) Learn-ing read-once formulas with queries. J Assoc Com-put Mach 40:185–210

Arias M (2004) Exact learning of first-order Hornexpressions from queries. Ph.D. thesis, Tufts Uni-versity

Arias M, Balcazar JL (2009) Canonical Horn represen-tations and query learning. In: Algorithmic learn-ing theory: twentieth international conference ALT2009. LNAI, vol 5809. Springer, Berlin, pp 156–170

Arias M, Khardon R (2002) Learning closed Hornexpressions. Info Comput 178(1):214–240

Birkendorf A, Boker A, Simon HU (2000) Learningdeterministic finite automata from smallest coun-terexamples. SIAM J Discret Math 13(4):465–491

Hellerstein L, Pillaipakkamnatt K, Raghavan VV,Wilkins D (1996) How many queries are needed tolearn? J Assoc Comput Mach 43:840–862

Gasarch W, Lee ACY (2008) Inferring answers toqueries. J Comput Syst Sci 74(4):490–512

Gasarch W, Smith CH (1992) Learning via queries. JAssoc Comput Mach 39(3):649–674

Ibarra OH, Jiang T (1988) Learning regular languagesfrom counterexamples. In: Proceedings of the firstannual workshop on computational learning the-ory. MIT, Cambridge/Morgan Kaufmann, San Fran-cisco, pp 371–385

Jackson J (1997) An efficient membership-query algo-rithm for learning DNF with respect to the uniformdistribution. J Comput Syst Sci 55(3):414–440

Jain S, Lange S, Zilles S (2007) A general compari-son of language learning from examples and fromqueries. Theor Comput Sci 387(1):51–66

Lange S, Zilles S (2005) Relations between Gold-style learning and query learning. Infor Comput203:211–237

Littlestone N (1988) Learning quickly when irrelevantattributes abound: A new linear threshold algorithm.Mach Learn 2:285–318

Maass W, Turan G (1992) Lower bound methods andseparation results for on-line learning models. MachLearn 9:107–145