quantum minimax theorem in statistical decision theory (rims2014)

This presentation is shown in RIMS conference, Kyoto, JAPAN

[email protected]

Quantum Minimax Theorem i St ti ti l D i i Thin Statistical Decision Theory

November 10, 2014

Publi V r i nPublic Version

田中冬彦(Fuyuhiko TANAKA)田中冬彦(Fuyuhiko TANAKA)

Graduate School of Engineering Science, Osaka University

If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!

Prologue:gWhat is Bayesian Statistics?y

(Non-technical Short Version)

*It is for non-statisticians esp. researchers in quantum information.

Mission of Statistics

Mission of StatisticsTo give a desirable and plausible method according to statistical models, purpose of inference (point estimation, prediction, hypothesis testing, model selection, etc.), and loss functions., ),*Precise evaluation of estimation error is secondary.

Difference from Pure MathStatistics is required to meet various needs in the world

Significant problems in statistics ha e changed according to the times→ Significant problems in statistics have changed according to the times

(This trivial fact does not seem to be known so much except for statisticians.)

What’s Bayesian Statistics? (1/2)

S i i d i i h f i di ib i

Bayesian Statistics

Statistics admitting the usage of prior distributions

prior distribution= prob. dist. on the unknown parameter

(We do not care about the interpretation of this prob )(We do not care about the interpretation of this prob.)

*Interpretation heavily depends on individual researcher.

What’s Bayesian Statistics? (2/2)Relation to frequentist’s statistics (traditional statistics)

1 Bayesian statistics has enlarged the framework and satisfies

2 Sticking to frequentists’ methods like MLEs is the preference to a

1. Bayesian statistics has enlarged the framework and satisfies various needs. (Not an alternative of frequentists’ statistics!)

2. Sticking to frequentists methods like MLEs is the preference to a specific prior distribution (narrow-minded!) (Indeed, MLEs often fail; their independence of loss is unstuaible )

3. Frequentists’ results give good starting points to Bayesian analysis

(Indeed, MLEs often fail; their independence of loss is unstuaible )

For details, see, e.g., C. Robert, Bayesian Choice Chap.11

Frequentists

(T diti l

Bayesian Statistics(Traditional statistics)

MisunderstandingSince priors are not objective Bayesian methods should not be used forSince priors are not objective, Bayesian methods should not be used for scientific result.

Objective priors (Noninformative priors) are proposed for such usage.Bayesian statistics do not deny frequentists’ way and statistical inference based on an objective prior often agrees with frequentists’ one. (ex: MLE = MAP estimate under the uniform prior)

It seems unsolved which noninformative prior should be used.

Th f hi bl i h h d lThe essence of this problem is how to choose a good statistical method with small sample. It is NOT inherent to Bayesian statistics, but has been a central issue in traditional statisticsbut has been a central issue in traditional statistics.Frequentists and mathematicians avoid this and only consider mathematically tractable models and/or asymptotic theory (it’s large-sample approximation!)

In Bayesian statistics, this problem is transformed into the choice of noninformative prior

Further Reference

Preface in RIMS Kokyuroku 1834 (Theory and applications of statistical inference in quantum theory)q y)

POINT

- Point out some misunderstanding of Statistics

POINT

gfor non-experts (esp. Physicists)

- Define the word “quantum statistical inference”Define the word quantum statistical inference

as an academic term

Sorry but it is written in Japanese.

CONTENTS

0. Notation1. Introduction2. Quantum Statistical Models and Measurements3 Quantum Statistical Decision Problems3. Quantum Statistical Decision Problems 4. Risk Functions and Optimality Criteria5. Main Result6. Minimax POVMs and Least Favorable Priors7. Summary

0. Notation

Description of Probability in QM

Quantum Mechanics (QM): described by operators in a Hilbert space

P b bilit d ib d b d it d t

D it t

Probability: described by density op. and measurement

Density operator

)M(d)(Tr)d( xx )(Probability distributions

)M(d xMeasurement（POVM）

M t

)d( x)( xax)(

Measurements

)d(M xStatistical processing of p gdata

Density Operators

{:)(S }ll d Hilb t

Density operators (density matrices, state)

{:)( S }all d.o.s on a Hilbert space

}01Tr|)({ L }0,1Tr|)({ L

When nHdim

n*01

Up

U

0ip

1Tr1

j

jp0 pn

Measurements in QM (1/2)Mathematical model of measurements

Positive Operator Valued Measure (POVM）

POVM (def for finite sets)

= Positive Operator Valued Measure (POVM）

POVM (def. for finite sets)

= Possible values for a measurementSample space

},,,{ 21 kxxx

kjjM 1}{POVM = a family of linear ops satisfying

OxMM jj })({:kjjM ,,1}{ POVM = a family of linear ops. satisfying

j

j IMjj })({

Positive operator

(positive semidefinite matrix)

j

Measurements in QM (2/2)How to construct a probability space (Axiom of QM)

)( Sd o

},,,{ 21 kxxx sample space

)( Sd.o.

kjjM ,,1}{ POVM (Measurement)

xthe prob obtaining a measurement outcomeThen

jjp MTr:

jxthe prob. obtaining a measurement outcomeThen

jjp

* The above prob. satisfies conditions (positive and summing to one

OM j

p (p gdue to both def.s of d.o.s and POVMs.

0

jj IM

j

1Tr

and

1. Introduction

Quantum Bayesian Hypothesis Testing(1/2)

Alice chooses one alphabet (denoted by an integer) and sends one di (d ib d b d i )

1.

)H(},...,,{ 21 Sk k,...,2,1corresponding quantum state (described by a density operator)

2. Bob receives the quantum state and perform a measurement and

kjj ,,2,1}M{

guesses the alphabet Alice really sent to him.The whole process is described by a POVM

0,,,1 11 kk

3. The proportion of alphabets is given by a distribution(called a prior distribution)

23214

1 24 23

Quantum Bayesian Hypothesis Testing(2/2)

* The probability that Bob guesses j when Alice sends i.

jiiAjBp MTr)|(

),( jiw* Bob’s loss (estimation error) when he guesses j while Alice sends i.

),( jiw

k* Bob’s risk for the i-th alphabet

k

jiAjBpjiwiR

1M )|(),()(

k

iR )(

* Bob’s average risk (his task is to minimize by using a good POVM)

i

i iR1

M )(

Summary of Q-BHT

)H(},...,,{ 21 Sk k21

Chooses one alphabet and sends one quantum state

)H(},...,,{ 21 Sk k,...,2,1

0,11 kk Proportion of each alphabet(prior distribution) 1 kk

Guess the alphabet by choosing a measurement (POVM)

(prior distribution)

kjj ,,2,1}M{

k

iRr )(: k

jiiR MT)()(

Average risk

i

i iRr1

MM, )(:

j

jijiwiR1

M MTr),()(

Bayes POVM w.r.t. = POVM minimizing the average risk

Minimax POVM

１．Chooses one alphabet and sends one quantum state

)H(},...,,{ 21 Sk k,...,2,1


kjj ,,2,1}M{

Minimize the worst-case risk

kjj ,,2,1

Bob has no prior information

*

The worst-case risk

)(sup: M*

M iRri

Minimax POVM = POVM minimizing the worst-case risk

Quantum Minimax Theorem (Simple Ver.)

１．Chooses one alphabet and sends one quantum state

)H(},...,,{ 21 Sk k,...,2,1


kjj ,,2,1}M{ kjj ,,2,1

Theorem ( Hirota and Ikehara, 1982)

k

ii

iiRiR

1MMMM

)(infsup)(supinf

( , )

Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior.

ii 1

*The worst-case prior to Bob is called a least favorable prior.

Example (1/2)Quantum states

02/12/1001

000

Quantum states

,00002/12/1,

000000 21

.

1000003

000000 100

0-1 loss ijjiw 1),(

Minimax POVM

ijj ),(

,02/112/102/12/11

21,02/112/1

02/12/11

21

21

MM .000000

3

M000

2000

2 21

100

3

*This is a counterexample of Theorem 2 in Hirota and Ikehara (1982), where their proof seems not mathematically rigorous.

Quantum statesExample (2/2)

02/12/1001

000

Quantum states

,00002/12/1,

000000 21

.1000003

0-1 loss ijjiw 1),(

LFP0)3(,2/1)2()1( LFLFLF

li i

Minimax POVM is constructed as a Bayes POVM w.r.t. LFP.

When completely unknown, it is not necessarily wise to find an optimal POVM

Important Implication

with the uniform prior. (Although Statisticians already know this fact long decades ago…)

Rewritten in Technical Terms１ N t１．Nature

Quantum Statistical ModelParameter space

)H(}{ S k,...,2,1

Experimenter and StatisticianExperimenter and Statistician

Decision space POVM on U

Uuu }M{ kU ,...,2,1

L f tiLoss function),( uw

Risk function

uuwR MTr),()(M

)()(i f)(i f RR

Uu

Theorem

)()(infsup)(supinf MMMMRR

Main Result (Brief Summary)

Quantum Minimax Theorem

)d()(infsup)(supinf MM)(

MM

RR

PoPPo )(

R )d(M)(T)()( U

uuwR )d(M)(Tr),(:)(M where

)d()(infsup)(supinf RR

DD

cf)

)(

DPD

Conditions, assumptions, and essence of proof are explained.

Previous WorksStatistical Decision Theory

Q t St ti ti l D i i Th

Wald (1950)

H l (1973) Quantum Statistical Decision TheoryHolevo (1973)

Recent results and applications (many!)Kumagai and Hayashi (2013)

Guta and Jencova (2007)

Hayashi (1998)

(Classical) Minimax Theorem in statistical decision theory

Le Cam (1964) First ver. is given by Wald (1950)( ) y

we show

Quantum Minimax Theorem in quantum statistical decision theory

2 Quantum Statistical Models2. Quantum Statistical Modelsand Measurements

Formal Definition of Statistical Models(naïve ) Quantum statistical model = A parametric family of density operators

213 i11 1)()()(0 222

Ex.

321

213

1ii1

21)(

1)()()(0 321 R,, 321

)()()();,( srsrsr R

Basic Idea (Holevo, 1976)

A quantum statistical model is defined by a map)H(: 1L

)( )(

Next, we see the required conditions for this map

Regular Operator-Valued Function (1/2)

Locally compact metric spaceDefinition Locally compact metric space

K}),(:),{(: dKKK

compact set

0}),(:),{(: dKKK 0

)H(: 1LT trace-class operators on a Hilbert space)H(1L

For a map we define

KXTTXXKT ),(,)()(:inf:)(1

)H(: 1LT For a map , we define

DefinitionAn operator-valued function T

0)(lim0

KT K

is regular

Regular Operator-Valued Function (2/2)Remark 1

0)(lim0

KT K

0

as

Uniformly continuous w.r.t. trace norm on every compact set

0)()(sup TT

0

Converse does not hold if

as 0)()(sup1

),(

TT

K0

HdimConverse does not hold if

Remark 2

Hdim

Remark 2The regularity assures that the following operator-valued integral is well-defined

)d()()( Tf

)( Pfor every )(0 Cf and

KXTTXXKT ),(,)()(:inf:)(1

)( Py )(0f

Quantum Statistical Models

Definition

}01T)H({)H( XXLXS }0,1Tr:)H({:)H( XXLXS)H(: S is called a quantum statistical model

if it satisfies the conditions below.

Conditions

1. Identifiability (one-to-one map) 2121 ),()(

2. Regularity 0)(lim0

KT K

↑Necessary for our arguments

Measurements (POVMs)

U Decision space (locally compact space)

D fi i i

UA Borel sets (the smallest sigma algebra containing all open sets)

Positive-operator valued function is called a POVM if it satisfies

Definition

M

)(UAB

)(MM BB

0M B

UBB A jiBB

11)(MM

jjj

jBB UBB A,, 21 jiBB ji ,

IU M

)(UPo All POVMs over U

IU M

)(UPo All POVMs over U

Born Rule

Axiom of Quantum MechanicsFor the quantum system described by

t t i di t ib t d di t

For the quantum system described by and a measurement described by a POVM

)d(M xmeasurement outcome is distributed according to

)d()d(MTr~ xxx )d()d(MTr xxx

x POVM)d(M x

)d( x)d(M x

3 Quantum Statistical3. Quantum Statistical Decision Problems

Basic SettingSit tiSituation

For the quantum system specified with the unknown parameteri t t t i f ti fexperimenters extract information for some purposes.

)d( x)( xax)( POVM

)d(M x)d( x

1 To choose a measurement (POVM)Typical sequence of task1. To choose a measurement (POVM)2. To perform the measurement over the quantum system3 To take an action a(x) among choices based on the measurement3. To take an action a(x) among choices based on the measurement outcome x

formally called a decision function

Decision FunctionsExample

- Estimate the unknown parameter xxxa n

1)(Estimate the unknown parameter

- Estimate the unknown d.o. rather than parameter n

xa )(

0)()( xaxa

)()(1000)()(0)()(

31

3*2

21

xaxaxaxaxaxa

Constr ct confidence region (credible region)

- Validate the entanglement/separability 1,0)( xa

)]()([- Construct confidence region (credible region) )](),([ xaxa RL

Remarks for Non-statisticians

Remarks1 If h i l l k h h di ib i f1. If the quantum state is completely known, then the distribution of measurement outcome is also known and the best action is chosen.

2. Action has to be made from finite data.

3. Precise estimation of the parameter is only a typical example.

)d( x)( xax)( POVM

)d(M x

Loss Functions

Performance of decision functions is compared by adopting

the loss functions (smaller is better).

})({ mQuantum Statistical Model

Ex: Parameter Estimation

}R:)({ m

Action space

2),( aaw Loss function (squared error)

Later we see the formal definition of the loss function.

Measurements Over Decisions

From the beginning, we only consider POVMs over the action space.Basic Idea

g g, y p

Put together

)d( x)( xax)( POVM

)d(M x)d( x

)d( a

aPOVM)d(N a

)(

)(

Quantum Statistical Decision Problems

Formal Definitions Parameter space (locally compact space)

Quantum statistical model )(U *Decision space (locally compact space)

Parameter space (locally compact space)

Loss function (lower semi continuous**; bounded below)}{R: Uwp ( y p p )

Th t i l t ))(( U l d blThe triplet ),),(( wU

cf) statistical decision problem

is called a quantum statistical decision problem.

)( wUp

Statistical model )d( xp

U Decision space

cf) statistical decision problem ),,( wUp

Loss function }{R: UwU Decision space

*we follow Holevo’s terminology instead of saying “action space”.** we impose a slightly stronger condition on the loss than LeCam, Holevo.

4 Risk Functions and4. Risk Functions and Optimality Criteriap y

Comparison of POVMsThe loss (e.g. squared error) depends on both unknown parameter and

our decision u which is a random variable ( ))d(M)(Tr)d(~ uuu

our decision u, which is a random variable.( )

In order to compare two POVMs, we focus on the average of the loss w r t the distribution of u

)d(M)(Tr)d(~ uuu

w.r.t. the distribution of u.

)( POVM

)],([E uw u)(

)d( u

POVM)d(M u

)'d(N u

Compared at the same

)]',(['E uw POVM)d(N u

)( 'u)'d( u

D fi itiRisk Functions

The risk function of

Definition

)(M UPo

U

uuwR )M(d)(Tr),(:)(M ),(E uw

)M(d)(Tr~ uu

R k f N i i i1. Smaller risk is better.

Remarks for Non-statisticians

2. Generally there exists no POVM achieving the uniformly smallest risk among POVMs.

)(RSince depends on the unknown parameter, we need additional optimality criteria.

)(M R

Optimality Criteria (1/2)

Definition

)(supinf)(sup MM RR

A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e.,

)(supinf)(sup M)(MM *

RR

UPo

Historical RemarkBogomolov showed the existence of a minimax POVM (Bogomolov1981) in a more general framework.(Description of quantum states and measurements is different from recent one.)

Optimality Criteria (2/2)All probability distributions (defined on Borel sets) )(P

In Bayesian statistics, )( P is called a prior distribution.

Definition

y , )( p

The average risk of )(M UPo w.r.t.

)(d)( R

A POVM is said to be Bayes if it minimizes the average risk, i.e.,

)(d)(: MM, Rr

M,)(MM, inf rr

UPo

Holevo showed the existence of a Bayes POVM (Holevo 1976; see alsoHistorical Remark

Holevo showed the existence of a Bayes POVM (Holevo 1976; see also Ozawa, 1980) in a more general framework.

Ex. of Loss Functions and Risk Functions

Parameter Estimation}R:)({ m

U 2),( uuw

}R:)({

U

uuR )M(d)(Tr)( 2M ][E 2u

Construction of Predictive Density Operator

)H( mSU )||)(()( uDuw m

}R:)({ pn

)H(SU )||)((),( uDuw

U

mm uuDR )M(d)()Tr||)(()(M U

5. Main Result

Main Result (Quantum Minimax Theorem)

and be a compact metric space.Let UTheorem (*)

Then the following equality holds for every quantum statistical decision problem.

)d()(infsup)(supinf M)(M)(

M)(M

RR

UPoPUPo

U

uuwR )M(d)(Tr),(:)(M where the risk function is given by

Corollary

U

For every closed convex subset )(UPoQ the above assertion holds.

y

*see quant-ph 1410.3639 for proof

Key Lemmas For Theorem

Compactness of the POVMs

)(UP

Holevo (1976)

If the decision space U is compact, then )(UPo is also compact.

Lemma by FTEquicontinuity of risk functions Lemma by FT

Loss function R: Uw

The equicontinuity implies )()}(M:{: M CUPoRFIf w is bounded and continuous, then

the compactness of under the uniform convergence topology

)( CFis (uniformly) equicontinuous.

We show main theorem by using Le Cam’s result with both lemmas.y g

However, not a consequence of their previous old results.

6 Minimax POVMs6. Minimax POVMs and Least Favorable Priors

Previous Works (LFPs)Statistical Decision TheoryWald (1950)

Jeffreys (1961)

Bernardo (1979, 2005)

Jeffreys (1961) Jeffreys prior

Reference prior; reference analysis

( , )

Komaki (2011) Latent information prior

MDP prior (Reference prior for pure-states model)Tanaka (2012)

p ( p p )

Objective priors in Bayesian analysis are indeed least j p y yfavorable priors.

Main Result (Existence Theorem of LFP) Definition

If a prior achieves the supremum i e the following holds

)d()(infsup)d()(inf M)(M)(

M)(M RR

UPoPLFUPo

If a prior achieves the supremum, i.e., the following holds

)(M)()(M UPoPUPo

The prior is called a least favorable prior（LFP）.LF

U

uuwR )d(M)(Tr),(:)(M where

and be a compact metric space.Let UTheorem

p pw: continuous loss function.

Then for every decision problem there exists a LFPThen, for every decision problem, there exists a LFP.

Pathological Example

Even on a compact space, a bounded lower semicontinuous (and not Remark

continuous) loss function does not necessarily admit a LFP.

Example ]1,0[)(R

),()(M uLR 1

0

)(

0 1

1

0 0 1

MM )d()(sup)(sup1 RR ]1,0[ MM

]1,0[)d()(sup)(sup1

RR

But for every prior, )d()(1 RBut for every prior, ]1,0[ M )d()(1 R

Minimax POVM is constructed using LFP

and be a compact metric spaceLet UCorollary

Every minimax POVM is a Bayes POVM with respect to the LFP

and be a compact metric space.Let If a LFP exists for a quantum statistical decision problem,

U

Every minimax POVM is a Bayes POVM with respect to the LFP.

In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is minimaxminimax.

Corollary2

For every closed convex subset )(UPoQ the above assertion holds.

y

Much of theoretical results are derived from quantum minimax theorem.

7. Summary

Discussion1. We show quantum minimax theorem, which gives theoretical basis in quantum statistical decision theory, in particular in objective Bayesian

l ianalysis.2. For every closed convex subset of POVMs, all of assertions still hold.

Future Works

Our result has also practical meanings for experimenters.

Future Works- Show other decision-theoretic results and previous known results by using p y gquantum minimax theorem

- Propose a practical algorithm of finding a least favorable priorPropose a practical algorithm of finding a least favorable prior

- Define geometrical structure in the quantum statistical decision problem

If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!