transduction and active learning in the quantum learning of unitary transformations

1
Transduction and Active Learning in the Quantum Learning of Unitary Transformations Peter Wittek [email protected] C LASSICAL R EGRESSION Machine learning is a field of artificial intel- ligence that seeks patterns in empirical data without forcing models on it. Consider a train- ing set {(x 1 ,y 1 ),..., (x N ,y N )}, where x i is a fi- nite dimensional data point, and y i is an asso- ciated outcome. In regression, the task is to ap- proximate an unknown function f with ˆ f such that ˆ f (x i )= y i for all i. Then, encountering a previously unseen x, we calculate ˆ f (x). L EARNING A U NITARY If we denote the unitary transformation by U , the goal of quantum process tomography is to derive an estimate for ˆ U while probing it with ρ in input states and observing the correspond- ing output states ρ out . To make it analogous to regression, we have the input and correspond- ing output states fixed, and estimate the unitary such that ˆ U (ρ in )= ρ out . Then, just as in the classical case, we would like to calculate ˆ U (ρ) for a new state. Unlike in the classical setting, learning a uni- tary requires a double maximization: we need an optimal measuring strategy that optimally ap- proximates the unitary and we also need an op- timal input state that best captures the informa- tion of the unitary. Outline of learning a unitary In indirect process tomography, we use the Choi-Jamiolkowski isomorphism to imprint the unitary on a state. The key steps are [1]: Learning the unitary translates to the op- timal storage and parallel application of the unitary on a suitable input state. It requires an optimal input state, a super- position of maximally entangled states. This resembles active learning. Applying the learned unitary either with a coherent strategy, that is, retrieving from quantum memory, or with an inco- herent strategy, that is, after measurement and retrieving it from classical memory. The optimal incoherent strategy is a mix- ture of inductive and transductive learn- ing, whereas the suboptimal coherent strategy is purely transduction. I NDUCTION VERSUS T RANSDUCTION Class 1 Class 2 Decision surface Unlabeled instances Class 1 Class 2 Inductive learning Transductive learning -N Labelled or unlabelled points -N Labelled or unlabelled points -A function is inferred -No inference -Learned function is applied on -K additional data points are known in unseen data advance: {x N +1 ,..., x N +K } Transduction avoids inference to the more general, and it infers from particular instances to the particular instances, asking for less: an inductive function implies a transductive one. Transduction is similar to instance-based learning, a family of algorithms that compares new problem instances with training instances. The goal in transductive learning is actually to minimize test error, instead of the more abstract goal of maximizing generalization performance [3]. The optimal retrieving of U from the memory state |φ U i is achieved by measuring the ancilla with the optimal covariant POVM in the form Ξ= |η ihη | [1], namely P ˆ U = |η ˆ U ihη ˆ U |, where |η ˆ U i = j p d j | ˆ U j i, and, conditionally on outcome ˆ U, by performing the unitary ˆ U on the new input system. The optimal probability coefficients can be written as p j = d j m j d N , where m j is the multiplicity of the corresponding space [2]. This true for the K =1 case. The more generic case for arbitrary K takes the global fidelity between the estimate channel C U and U K , that is, the objective function averages over (|tr(U C U )|/d) 2 . Hence the exact values of p j always depend on K , tying the optimal input state to the number of applications, making it a case of transduction. A CTIVE L EARNING In active learning, the learning algorithm is able to solicit labels for problematic unlabeled instances from an appropriate information source, for instance, from a human annotator [5]. Some typical classical strategies: Uncertainty sampling: the selected set corresponds to those data instances where the confidence is low. Expected error reduction: select those data instances where the model performs poorly, that is, where the generalization error is most likely to be reduced. Variance reduction: generalization performance is hard to measure, whereas minimizing output vari- ance is far more feasible; select those data instances which minimize output variance. Optimal quantum learning of unitaries is similar to active learning in a sense: it requires an optimal input state. Since the learner has access to U , by calling the transformation on an optimal input state, the learner ensures that the most important characteristics are imprinted on the approximation state of the unitary. The optimal input state for storage can be taken of the form |φi = j Irr(U N ) r p j d j |I j i∈ ˜ H, where p j are probabilities, the index j runs over the set Irr(U N ) of all irreducible representations {U j } contained in the decomposition of {U N },d j is the dimension of the corresponding subspace H j , and ˜ H = j Irr(U N ) (H j H j ) is a subspace of H o H i carrying the representation ˜ U = j Irr(U N ) (U j I j ), I j being the identity in H j . Binary classification scheme based on the quantum state tomography of state classes through Hel- strom measurements also requires an optimal input state [4]. R EFERENCES [1] A. Bisio, G. M. D’Ariano, P. Perinotti, and M. Sedlák. Quantum learning algorithms for quantum measurements. Physics Letters A, 375:3425–3434, 2011. [2] G. Chiribella, G. M. D’ariano, and M. F. Sacchi. Optimal estimation of group transformations using entanglement. Physical Review A, 72(4):042338, 2005. [3] R. El-Yaniv and D. Pechyony. Transductive Rademacher complexity and its applications. In Proceedings of COLT-07, 20th Annual Conference on Learning Theory, 2007. [4] M. Gu¸ a and W. Kotlowski. Quantum learning: Asymptotically optimal classification of qubit states. New Journal of Physics, 12(12):123032, 2010. [5] B. Settles. Active learning literature survey. Technical Report 1648, University of Wisconsin–Madison, 2009.

Upload: peter-wittek

Post on 11-May-2015

205 views

Category:

Science


0 download

DESCRIPTION

Quantum learning of a unitary transformation estimates a quantum channel in a process similar to quantum process tomography. The classical counterpart of this goal, finding an unknown function, is regression, although the methodology hardly resembles the outline of classical algorithms. To gain a better understanding what such a methodology means to learning theory, we anchor it to the familiar concepts of active learning and transduction. Learning the unitary transformation translates to optimally storing it in quantum memory, but the quantum learning procedure also requires an optimal, maximally entangled input state. We argue that this is akin to active learning. Two different retrieval strategies apply when we would like to use the learned unitary transformation: a coherent strategy, which stores the unitary in quantum memory, and an incoherent one, which measures the unitary and stores it in classical memory; the latter strategy is considered optimal. We further argue that the incoherent strategy is a blend of inductive and transductive learning, as the optimal input state depends on the number of target states on which the transformation should be applied, yet once it is learned, the transformation can be used an arbitrary number of times. On the other hand, the sub-optimal coherent strategy of storing and applying the unitary is a form of transduction with no inductive element.

TRANSCRIPT

Page 1: Transduction and Active Learning in the Quantum Learning of Unitary Transformations

Transduction and Active Learning in theQuantum Learning of Unitary

TransformationsPeter Wittek

[email protected]

CLASSICAL REGRESSIONMachine learning is a field of artificial intel-ligence that seeks patterns in empirical datawithout forcing models on it. Consider a train-ing set {(x1, y1), . . . , (xN , yN )}, where xi is a fi-nite dimensional data point, and yi is an asso-ciated outcome. In regression, the task is to ap-proximate an unknown function f with f suchthat f(xi) = yi for all i. Then, encountering apreviously unseen x, we calculate f(x).

LEARNING A UNITARYIf we denote the unitary transformation by U ,the goal of quantum process tomography is toderive an estimate for U while probing it withρin input states and observing the correspond-ing output states ρout. To make it analogous toregression, we have the input and correspond-ing output states fixed, and estimate the unitarysuch that U(ρin) = ρout. Then, just as in theclassical case, we would like to calculate U(ρ)for a new state.Unlike in the classical setting, learning a uni-tary requires a double maximization: we need anoptimal measuring strategy that optimally ap-proximates the unitary and we also need an op-timal input state that best captures the informa-tion of the unitary.

Outline of learning a unitary

In indirect process tomography, we use theChoi-Jamiołkowski isomorphism to imprint theunitary on a state. The key steps are [1]:

• Learning the unitary translates to the op-timal storage and parallel application ofthe unitary on a suitable input state.

• It requires an optimal input state, a super-position of maximally entangled states.This resembles active learning.

• Applying the learned unitary either witha coherent strategy, that is, retrievingfrom quantum memory, or with an inco-herent strategy, that is, after measurementand retrieving it from classical memory.The optimal incoherent strategy is a mix-ture of inductive and transductive learn-ing, whereas the suboptimal coherentstrategy is purely transduction.

INDUCTION VERSUS TRANSDUCTION

Class 1Class 2Decisionsurface

Unlabeled instancesClass 1Class 2

Inductive learning Transductive learning-N Labelled or unlabelled points -N Labelled or unlabelled points

-A function is inferred -No inference-Learned function is applied on -K additional data points are known in

unseen data advance: {xN+1, . . . ,xN+K}

Transduction avoids inference to the more general, and it infers from particular instances to theparticular instances, asking for less: an inductive function implies a transductive one. Transductionis similar to instance-based learning, a family of algorithms that compares new problem instanceswith training instances. The goal in transductive learning is actually to minimize test error, insteadof the more abstract goal of maximizing generalization performance [3].The optimal retrieving of U from the memory state |φU 〉 is achieved by measuring the ancillawith the optimal covariant POVM in the form Ξ = |η〉〈η| [1], namely PU = |ηU 〉〈ηU |, where|ηU 〉 = ⊕j

√dj |Uj〉, and, conditionally on outcome U , by performing the unitary U on the new

input system. The optimal probability coefficients can be written as

pj =djmj

dN,

where mj is the multiplicity of the corresponding space [2]. This true for the K = 1 case. Themore generic case for arbitrary K takes the global fidelity between the estimate channel CU andU⊗K , that is, the objective function averages over (|tr(U†CU )|/d)2. Hence the exact values of pjalways depend on K, tying the optimal input state to the number of applications, making it a caseof transduction.

ACTIVE LEARNINGIn active learning, the learning algorithm is able to solicit labels for problematic unlabeled instancesfrom an appropriate information source, for instance, from a human annotator [5]. Some typicalclassical strategies:

• Uncertainty sampling: the selected set corresponds to those data instances where the confidence is low.

• Expected error reduction: select those data instances where the model performs poorly, that is, wherethe generalization error is most likely to be reduced.

• Variance reduction: generalization performance is hard to measure, whereas minimizing output vari-ance is far more feasible; select those data instances which minimize output variance.

Optimal quantum learning of unitaries is similar to active learning in a sense: it requires an optimalinput state. Since the learner has access toU , by calling the transformation on an optimal input state,the learner ensures that the most important characteristics are imprinted on the approximation stateof the unitary. The optimal input state for storage can be taken of the form

|φ〉 = ⊕j∈Irr(U⊗N )

√pjdj|Ij〉 ∈ H,

where pj are probabilities, the index j runs over the set Irr(U⊗N ) of all irreducible representations{Uj} contained in the decomposition of {U⊗N}, dj is the dimension of the corresponding subspaceHj , and H = ⊕j∈Irr(U⊗N )(Hj ⊗ Hj) is a subspace of Ho ⊗ Hi carrying the representation U =⊕j∈Irr(U⊗N )(Uj ⊗ Ij), Ij being the identity in Hj .Binary classification scheme based on the quantum state tomography of state classes through Hel-strom measurements also requires an optimal input state [4].

REFERENCES

[1] A. Bisio, G. M. D’Ariano, P. Perinotti, and M. Sedlák. Quantum learning algorithms for quantum measurements. Physics Letters A, 375:3425–3434, 2011.[2] G. Chiribella, G. M. D’ariano, and M. F. Sacchi. Optimal estimation of group transformations using entanglement. Physical Review A, 72(4):042338, 2005.[3] R. El-Yaniv and D. Pechyony. Transductive Rademacher complexity and its applications. In Proceedings of COLT-07, 20th Annual Conference on Learning Theory, 2007.[4] M. Guta and W. Kotłowski. Quantum learning: Asymptotically optimal classification of qubit states. New Journal of Physics, 12(12):123032, 2010.[5] B. Settles. Active learning literature survey. Technical Report 1648, University of Wisconsin–Madison, 2009.