transduction and active learning in the quantum learning of unitary transformations

Transduction and Active Learning in theQuantum Learning of Unitary

TransformationsPeter Wittek

[email protected]

CLASSICAL REGRESSIONMachine learning is a field of artificial intel-ligence that seeks patterns in empirical datawithout forcing models on it. Consider a train-ing set {(x1, y1), . . . , (xN , yN )}, where xi is a fi-nite dimensional data point, and yi is an asso-ciated outcome. In regression, the task is to ap-proximate an unknown function f with f suchthat f(xi) = yi for all i. Then, encountering apreviously unseen x, we calculate f(x).

LEARNING A UNITARYIf we denote the unitary transformation by U ,the goal of quantum process tomography is toderive an estimate for U while probing it withρin input states and observing the correspond-ing output states ρout. To make it analogous toregression, we have the input and correspond-ing output states fixed, and estimate the unitarysuch that U(ρin) = ρout. Then, just as in theclassical case, we would like to calculate U(ρ)for a new state.Unlike in the classical setting, learning a uni-tary requires a double maximization: we need anoptimal measuring strategy that optimally ap-proximates the unitary and we also need an op-timal input state that best captures the informa-tion of the unitary.

Outline of learning a unitary

In indirect process tomography, we use theChoi-Jamiołkowski isomorphism to imprint theunitary on a state. The key steps are [1]:

• Learning the unitary translates to the op-timal storage and parallel application ofthe unitary on a suitable input state.

• It requires an optimal input state, a super-position of maximally entangled states.This resembles active learning.

• Applying the learned unitary either witha coherent strategy, that is, retrievingfrom quantum memory, or with an inco-herent strategy, that is, after measurementand retrieving it from classical memory.The optimal incoherent strategy is a mix-ture of inductive and transductive learn-ing, whereas the suboptimal coherentstrategy is purely transduction.

INDUCTION VERSUS TRANSDUCTION

Class 1Class 2Decisionsurface

Unlabeled instancesClass 1Class 2

Inductive learning Transductive learning-N Labelled or unlabelled points -N Labelled or unlabelled points

-A function is inferred -No inference-Learned function is applied on -K additional data points are known in

unseen data advance: {xN+1, . . . ,xN+K}

Transduction avoids inference to the more general, and it infers from particular instances to theparticular instances, asking for less: an inductive function implies a transductive one. Transductionis similar to instance-based learning, a family of algorithms that compares new problem instanceswith training instances. The goal in transductive learning is actually to minimize test error, insteadof the more abstract goal of maximizing generalization performance [3].The optimal retrieving of U from the memory state |φU 〉 is achieved by measuring the ancillawith the optimal covariant POVM in the form Ξ = |η〉〈η| [1], namely PU = |ηU 〉〈ηU |, where|ηU 〉 = ⊕j

√dj |Uj〉, and, conditionally on outcome U , by performing the unitary U on the new

input system. The optimal probability coefficients can be written as

pj =djmj

dN,

where mj is the multiplicity of the corresponding space [2]. This true for the K = 1 case. Themore generic case for arbitrary K takes the global fidelity between the estimate channel CU andU⊗K , that is, the objective function averages over (|tr(U†CU )|/d)2. Hence the exact values of pjalways depend on K, tying the optimal input state to the number of applications, making it a caseof transduction.

ACTIVE LEARNINGIn active learning, the learning algorithm is able to solicit labels for problematic unlabeled instancesfrom an appropriate information source, for instance, from a human annotator [5]. Some typicalclassical strategies:

• Uncertainty sampling: the selected set corresponds to those data instances where the confidence is low.

• Expected error reduction: select those data instances where the model performs poorly, that is, wherethe generalization error is most likely to be reduced.

• Variance reduction: generalization performance is hard to measure, whereas minimizing output vari-ance is far more feasible; select those data instances which minimize output variance.

Optimal quantum learning of unitaries is similar to active learning in a sense: it requires an optimalinput state. Since the learner has access toU , by calling the transformation on an optimal input state,the learner ensures that the most important characteristics are imprinted on the approximation stateof the unitary. The optimal input state for storage can be taken of the form

|φ〉 = ⊕j∈Irr(U⊗N )

√pjdj|Ij〉 ∈ H,

where pj are probabilities, the index j runs over the set Irr(U⊗N ) of all irreducible representations{Uj} contained in the decomposition of {U⊗N}, dj is the dimension of the corresponding subspaceHj , and H = ⊕j∈Irr(U⊗N )(Hj ⊗ Hj) is a subspace of Ho ⊗ Hi carrying the representation U =⊕j∈Irr(U⊗N )(Uj ⊗ Ij), Ij being the identity in Hj .Binary classification scheme based on the quantum state tomography of state classes through Hel-strom measurements also requires an optimal input state [4].

REFERENCES

[1] A. Bisio, G. M. D’Ariano, P. Perinotti, and M. Sedlák. Quantum learning algorithms for quantum measurements. Physics Letters A, 375:3425–3434, 2011.[2] G. Chiribella, G. M. D’ariano, and M. F. Sacchi. Optimal estimation of group transformations using entanglement. Physical Review A, 72(4):042338, 2005.[3] R. El-Yaniv and D. Pechyony. Transductive Rademacher complexity and its applications. In Proceedings of COLT-07, 20th Annual Conference on Learning Theory, 2007.[4] M. Guta and W. Kotłowski. Quantum learning: Asymptotically optimal classification of qubit states. New Journal of Physics, 12(12):123032, 2010.[5] B. Settles. Active learning literature survey. Technical Report 1648, University of Wisconsin–Madison, 2009.

transduction and active learning in the quantum learning of unitary transformations

Science