quantum minimax theorem in statistical decision theory (rims2014)
TRANSCRIPT
This presentation is shown in RIMS conference, Kyoto, JAPAN
Quantum Minimax Theorem i St ti ti l D i i Thin Statistical Decision Theory
November 10, 2014
Publi V r i nPublic Version
田中冬彦(Fuyuhiko TANAKA)田中冬彦(Fuyuhiko TANAKA)
Graduate School of Engineering Science, Osaka University
If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!
Prologue:gWhat is Bayesian Statistics?y
(Non-technical Short Version)
*It is for non-statisticians esp. researchers in quantum information.
Mission of Statistics
Mission of StatisticsTo give a desirable and plausible method according to statistical models, purpose of inference (point estimation, prediction, hypothesis testing, model selection, etc.), and loss functions., ),*Precise evaluation of estimation error is secondary.
Difference from Pure MathStatistics is required to meet various needs in the world
Significant problems in statistics ha e changed according to the times→ Significant problems in statistics have changed according to the times
(This trivial fact does not seem to be known so much except for statisticians.)
What’s Bayesian Statistics? (1/2)
S i i d i i h f i di ib i
Bayesian Statistics
Statistics admitting the usage of prior distributions
prior distribution= prob. dist. on the unknown parameter
(We do not care about the interpretation of this prob )(We do not care about the interpretation of this prob.)
*Interpretation heavily depends on individual researcher.
What’s Bayesian Statistics? (2/2)Relation to frequentist’s statistics (traditional statistics)
1 Bayesian statistics has enlarged the framework and satisfies
2 Sticking to frequentists’ methods like MLEs is the preference to a
1. Bayesian statistics has enlarged the framework and satisfies various needs. (Not an alternative of frequentists’ statistics!)
2. Sticking to frequentists methods like MLEs is the preference to a specific prior distribution (narrow-minded!) (Indeed, MLEs often fail; their independence of loss is unstuaible )
3. Frequentists’ results give good starting points to Bayesian analysis
(Indeed, MLEs often fail; their independence of loss is unstuaible )
For details, see, e.g., C. Robert, Bayesian Choice Chap.11
Frequentists
(T diti l
Bayesian Statistics(Traditional statistics)
MisunderstandingSince priors are not objective Bayesian methods should not be used forSince priors are not objective, Bayesian methods should not be used for scientific result.
Objective priors (Noninformative priors) are proposed for such usage.Bayesian statistics do not deny frequentists’ way and statistical inference based on an objective prior often agrees with frequentists’ one. (ex: MLE = MAP estimate under the uniform prior)
It seems unsolved which noninformative prior should be used.
Th f hi bl i h h d lThe essence of this problem is how to choose a good statistical method with small sample. It is NOT inherent to Bayesian statistics, but has been a central issue in traditional statisticsbut has been a central issue in traditional statistics.Frequentists and mathematicians avoid this and only consider mathematically tractable models and/or asymptotic theory (it’s large-sample approximation!)
In Bayesian statistics, this problem is transformed into the choice of noninformative prior
Further Reference
Preface in RIMS Kokyuroku 1834 (Theory and applications of statistical inference in quantum theory)q y)
POINT
- Point out some misunderstanding of Statistics
POINT
gfor non-experts (esp. Physicists)
- Define the word “quantum statistical inference”Define the word quantum statistical inference
as an academic term
Sorry but it is written in Japanese.
CONTENTS
0. Notation1. Introduction2. Quantum Statistical Models and Measurements3 Quantum Statistical Decision Problems3. Quantum Statistical Decision Problems 4. Risk Functions and Optimality Criteria5. Main Result6. Minimax POVMs and Least Favorable Priors7. Summary
0. Notation
Description of Probability in QM
Quantum Mechanics (QM): described by operators in a Hilbert space
P b bilit d ib d b d it d t
D it t
Probability: described by density op. and measurement
Density operator
)M(d)(Tr)d( xx )(Probability distributions
)M(d xMeasurement(POVM)
M t
)d( x)( xax)(
Measurements
)d(M xStatistical processing of p gdata
Density Operators
{:)(S }ll d Hilb t
Density operators (density matrices, state)
{:)( S }all d.o.s on a Hilbert space
}01Tr|)({ L }0,1Tr|)({ L
When nHdim
n*01
Up
U
0ip
1Tr1
j
jp0 pn
Measurements in QM (1/2)Mathematical model of measurements
Positive Operator Valued Measure (POVM)
POVM (def for finite sets)
= Positive Operator Valued Measure (POVM)
POVM (def. for finite sets)
= Possible values for a measurementSample space
},,,{ 21 kxxx
kjjM 1}{POVM = a family of linear ops satisfying
OxMM jj })({:kjjM ,,1}{ POVM = a family of linear ops. satisfying
j
j IMjj })({
Positive operator
(positive semidefinite matrix)
j
Measurements in QM (2/2)How to construct a probability space (Axiom of QM)
)( Sd o
},,,{ 21 kxxx sample space
)( Sd.o.
kjjM ,,1}{ POVM (Measurement)
xthe prob obtaining a measurement outcomeThen
jjp MTr:
jxthe prob. obtaining a measurement outcomeThen
jjp
* The above prob. satisfies conditions (positive and summing to one
OM j
p (p gdue to both def.s of d.o.s and POVMs.
0
jj IM
j
1Tr
and
1. Introduction
Quantum Bayesian Hypothesis Testing(1/2)
Alice chooses one alphabet (denoted by an integer) and sends one di (d ib d b d i )
1.
)H(},...,,{ 21 Sk k,...,2,1corresponding quantum state (described by a density operator)
2. Bob receives the quantum state and perform a measurement and
kjj ,,2,1}M{
guesses the alphabet Alice really sent to him.The whole process is described by a POVM
0,,,1 11 kk
3. The proportion of alphabets is given by a distribution(called a prior distribution)
23214
1 24 23
Quantum Bayesian Hypothesis Testing(2/2)
* The probability that Bob guesses j when Alice sends i.
jiiAjBp MTr)|(
),( jiw* Bob’s loss (estimation error) when he guesses j while Alice sends i.
),( jiw
k* Bob’s risk for the i-th alphabet
k
jiAjBpjiwiR
1M )|(),()(
k
iR )(
* Bob’s average risk (his task is to minimize by using a good POVM)
i
i iR1
M )(
Summary of Q-BHT
)H(},...,,{ 21 Sk k21
Chooses one alphabet and sends one quantum state
)H(},...,,{ 21 Sk k,...,2,1
0,11 kk Proportion of each alphabet(prior distribution) 1 kk
Guess the alphabet by choosing a measurement (POVM)
(prior distribution)
kjj ,,2,1}M{
k
iRr )(: k
jiiR MT)()(
Average risk
i
i iRr1
MM, )(:
j
jijiwiR1
M MTr),()(
Bayes POVM w.r.t. = POVM minimizing the average risk
Minimax POVM
1.Chooses one alphabet and sends one quantum state
)H(},...,,{ 21 Sk k,...,2,1
Guess the alphabet by choosing a measurement (POVM)
kjj ,,2,1}M{
Minimize the worst-case risk
kjj ,,2,1
Bob has no prior information
*
The worst-case risk
)(sup: M*
M iRri
Minimax POVM = POVM minimizing the worst-case risk
Quantum Minimax Theorem (Simple Ver.)
1.Chooses one alphabet and sends one quantum state
)H(},...,,{ 21 Sk k,...,2,1
Guess the alphabet by choosing a measurement (POVM)
kjj ,,2,1}M{ kjj ,,2,1
Theorem ( Hirota and Ikehara, 1982)
k
ii
iiRiR
1MMMM
)(infsup)(supinf
( , )
Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior.
ii 1
*The worst-case prior to Bob is called a least favorable prior.
Example (1/2)Quantum states
02/12/1001
000
Quantum states
,00002/12/1,
000000 21
.
1000003
000000 100
0-1 loss ijjiw 1),(
Minimax POVM
ijj ),(
,02/112/102/12/11
21,02/112/1
02/12/11
21
21
MM .000000
3
M000
2000
2 21
100
3
*This is a counterexample of Theorem 2 in Hirota and Ikehara (1982), where their proof seems not mathematically rigorous.
Quantum statesExample (2/2)
02/12/1001
000
Quantum states
,00002/12/1,
000000 21
.1000003
0-1 loss ijjiw 1),(
LFP0)3(,2/1)2()1( LFLFLF
li i
Minimax POVM is constructed as a Bayes POVM w.r.t. LFP.
When completely unknown, it is not necessarily wise to find an optimal POVM
Important Implication
with the uniform prior. (Although Statisticians already know this fact long decades ago…)
Rewritten in Technical Terms1 N t1.Nature
Quantum Statistical ModelParameter space
)H(}{ S k,...,2,1
Experimenter and StatisticianExperimenter and Statistician
Decision space POVM on U
Uuu }M{ kU ,...,2,1
L f tiLoss function),( uw
Risk function
uuwR MTr),()(M
)()(i f)(i f RR
Uu
Theorem
)()(infsup)(supinf MMMMRR
Main Result (Brief Summary)
Quantum Minimax Theorem
)d()(infsup)(supinf MM)(
MM
RR
PoPPo )(
R )d(M)(T)()( U
uuwR )d(M)(Tr),(:)(M where
)d()(infsup)(supinf RR
DD
cf)
)(
DPD
Conditions, assumptions, and essence of proof are explained.
Previous WorksStatistical Decision Theory
Q t St ti ti l D i i Th
Wald (1950)
H l (1973) Quantum Statistical Decision TheoryHolevo (1973)
Recent results and applications (many!)Kumagai and Hayashi (2013)
Guta and Jencova (2007)
Hayashi (1998)
(Classical) Minimax Theorem in statistical decision theory
Le Cam (1964) First ver. is given by Wald (1950)( ) y
we show
Quantum Minimax Theorem in quantum statistical decision theory
2 Quantum Statistical Models2. Quantum Statistical Modelsand Measurements
Formal Definition of Statistical Models(naïve ) Quantum statistical model = A parametric family of density operators
213 i11 1)()()(0 222
Ex.
321
213
1ii1
21)(
1)()()(0 321 R,, 321
)()()();,( srsrsr R
Basic Idea (Holevo, 1976)
A quantum statistical model is defined by a map)H(: 1L
)( )(
Next, we see the required conditions for this map
Regular Operator-Valued Function (1/2)
Locally compact metric spaceDefinition Locally compact metric space
K}),(:),{(: dKKK
compact set
0}),(:),{(: dKKK 0
)H(: 1LT trace-class operators on a Hilbert space)H(1L
For a map we define
KXTTXXKT ),(,)()(:inf:)(1
)H(: 1LT For a map , we define
DefinitionAn operator-valued function T
0)(lim0
KT K
is regular
Regular Operator-Valued Function (2/2)Remark 1
0)(lim0
KT K
0
as
Uniformly continuous w.r.t. trace norm on every compact set
0)()(sup TT
0
Converse does not hold if
as 0)()(sup1
),(
TT
K0
HdimConverse does not hold if
Remark 2
Hdim
Remark 2The regularity assures that the following operator-valued integral is well-defined
)d()()( Tf
)( Pfor every )(0 Cf and
KXTTXXKT ),(,)()(:inf:)(1
)( Py )(0f
Quantum Statistical Models
Definition
}01T)H({)H( XXLXS }0,1Tr:)H({:)H( XXLXS)H(: S is called a quantum statistical model
if it satisfies the conditions below.
Conditions
1. Identifiability (one-to-one map) 2121 ),()(
2. Regularity 0)(lim0
KT K
↑Necessary for our arguments
Measurements (POVMs)
U Decision space (locally compact space)
D fi i i
UA Borel sets (the smallest sigma algebra containing all open sets)
Positive-operator valued function is called a POVM if it satisfies
Definition
M
)(UAB
)(MM BB
0M B
UBB A jiBB
11)(MM
jjj
jBB UBB A,, 21 jiBB ji ,
IU M
)(UPo All POVMs over U
IU M
)(UPo All POVMs over U
Born Rule
Axiom of Quantum MechanicsFor the quantum system described by
t t i di t ib t d di t
For the quantum system described by and a measurement described by a POVM
)d(M xmeasurement outcome is distributed according to
)d()d(MTr~ xxx )d()d(MTr xxx
x POVM)d(M x
)d( x)d(M x
3 Quantum Statistical3. Quantum Statistical Decision Problems
Basic SettingSit tiSituation
For the quantum system specified with the unknown parameteri t t t i f ti fexperimenters extract information for some purposes.
)d( x)( xax)( POVM
)d(M x)d( x
1 To choose a measurement (POVM)Typical sequence of task1. To choose a measurement (POVM)2. To perform the measurement over the quantum system3 To take an action a(x) among choices based on the measurement3. To take an action a(x) among choices based on the measurement outcome x
formally called a decision function
Decision FunctionsExample
- Estimate the unknown parameter xxxa n
1)(Estimate the unknown parameter
- Estimate the unknown d.o. rather than parameter n
xa )(
0)()( xaxa
)()(1000)()(0)()(
31
3*2
21
xaxaxaxaxaxa
Constr ct confidence region (credible region)
- Validate the entanglement/separability 1,0)( xa
)]()([- Construct confidence region (credible region) )](),([ xaxa RL
Remarks for Non-statisticians
Remarks1 If h i l l k h h di ib i f1. If the quantum state is completely known, then the distribution of measurement outcome is also known and the best action is chosen.
2. Action has to be made from finite data.
3. Precise estimation of the parameter is only a typical example.
)d( x)( xax)( POVM
)d(M x
Loss Functions
Performance of decision functions is compared by adopting
the loss functions (smaller is better).
})({ mQuantum Statistical Model
Ex: Parameter Estimation
}R:)({ m
Action space
2),( aaw Loss function (squared error)
Later we see the formal definition of the loss function.
Measurements Over Decisions
From the beginning, we only consider POVMs over the action space.Basic Idea
g g, y p
Put together
)d( x)( xax)( POVM
)d(M x)d( x
)d( a
aPOVM)d(N a
)(
)(
Quantum Statistical Decision Problems
Formal Definitions Parameter space (locally compact space)
Quantum statistical model )(U *Decision space (locally compact space)
Parameter space (locally compact space)
Loss function (lower semi continuous**; bounded below)}{R: Uwp ( y p p )
Th t i l t ))(( U l d blThe triplet ),),(( wU
cf) statistical decision problem
is called a quantum statistical decision problem.
)( wUp
Statistical model )d( xp
U Decision space
cf) statistical decision problem ),,( wUp
Loss function }{R: UwU Decision space
*we follow Holevo’s terminology instead of saying “action space”.** we impose a slightly stronger condition on the loss than LeCam, Holevo.
4 Risk Functions and4. Risk Functions and Optimality Criteriap y
Comparison of POVMsThe loss (e.g. squared error) depends on both unknown parameter and
our decision u which is a random variable ( ))d(M)(Tr)d(~ uuu
our decision u, which is a random variable.( )
In order to compare two POVMs, we focus on the average of the loss w r t the distribution of u
)d(M)(Tr)d(~ uuu
w.r.t. the distribution of u.
)( POVM
)],([E uw u)(
)d( u
POVM)d(M u
)'d(N u
Compared at the same
)]',(['E uw POVM)d(N u
)( 'u)'d( u
D fi itiRisk Functions
The risk function of
Definition
)(M UPo
U
uuwR )M(d)(Tr),(:)(M ),(E uw
)M(d)(Tr~ uu
R k f N i i i1. Smaller risk is better.
Remarks for Non-statisticians
2. Generally there exists no POVM achieving the uniformly smallest risk among POVMs.
)(RSince depends on the unknown parameter, we need additional optimality criteria.
)(M R
Optimality Criteria (1/2)
Definition
)(supinf)(sup MM RR
A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e.,
)(supinf)(sup M)(MM *
RR
UPo
Historical RemarkBogomolov showed the existence of a minimax POVM (Bogomolov1981) in a more general framework.(Description of quantum states and measurements is different from recent one.)
Optimality Criteria (2/2)All probability distributions (defined on Borel sets) )(P
In Bayesian statistics, )( P is called a prior distribution.
Definition
y , )( p
The average risk of )(M UPo w.r.t.
)(d)( R
A POVM is said to be Bayes if it minimizes the average risk, i.e.,
)(d)(: MM, Rr
M,)(MM, inf rr
UPo
Holevo showed the existence of a Bayes POVM (Holevo 1976; see alsoHistorical Remark
Holevo showed the existence of a Bayes POVM (Holevo 1976; see also Ozawa, 1980) in a more general framework.
Ex. of Loss Functions and Risk Functions
Parameter Estimation}R:)({ m
U 2),( uuw
}R:)({
U
uuR )M(d)(Tr)( 2M ][E 2u
Construction of Predictive Density Operator
)H( mSU )||)(()( uDuw m
}R:)({ pn
)H(SU )||)((),( uDuw
U
mm uuDR )M(d)()Tr||)(()(M U
5. Main Result
Main Result (Quantum Minimax Theorem)
and be a compact metric space.Let UTheorem (*)
Then the following equality holds for every quantum statistical decision problem.
)d()(infsup)(supinf M)(M)(
M)(M
RR
UPoPUPo
U
uuwR )M(d)(Tr),(:)(M where the risk function is given by
Corollary
U
For every closed convex subset )(UPoQ the above assertion holds.
y
*see quant-ph 1410.3639 for proof
Key Lemmas For Theorem
Compactness of the POVMs
)(UP
Holevo (1976)
If the decision space U is compact, then )(UPo is also compact.
Lemma by FTEquicontinuity of risk functions Lemma by FT
Loss function R: Uw
The equicontinuity implies )()}(M:{: M CUPoRFIf w is bounded and continuous, then
the compactness of under the uniform convergence topology
)( CFis (uniformly) equicontinuous.
We show main theorem by using Le Cam’s result with both lemmas.y g
However, not a consequence of their previous old results.
6 Minimax POVMs6. Minimax POVMs and Least Favorable Priors
Previous Works (LFPs)Statistical Decision TheoryWald (1950)
Jeffreys (1961)
Bernardo (1979, 2005)
Jeffreys (1961) Jeffreys prior
Reference prior; reference analysis
( , )
Komaki (2011) Latent information prior
MDP prior (Reference prior for pure-states model)Tanaka (2012)
p ( p p )
Objective priors in Bayesian analysis are indeed least j p y yfavorable priors.
Main Result (Existence Theorem of LFP) Definition
If a prior achieves the supremum i e the following holds
)d()(infsup)d()(inf M)(M)(
M)(M RR
UPoPLFUPo
If a prior achieves the supremum, i.e., the following holds
)(M)()(M UPoPUPo
The prior is called a least favorable prior(LFP).LF
U
uuwR )d(M)(Tr),(:)(M where
and be a compact metric space.Let UTheorem
p pw: continuous loss function.
Then for every decision problem there exists a LFPThen, for every decision problem, there exists a LFP.
Pathological Example
Even on a compact space, a bounded lower semicontinuous (and not Remark
continuous) loss function does not necessarily admit a LFP.
Example ]1,0[)(R
),()(M uLR 1
0
)(
0 1
1
0 0 1
MM )d()(sup)(sup1 RR ]1,0[ MM
]1,0[)d()(sup)(sup1
RR
But for every prior, )d()(1 RBut for every prior, ]1,0[ M )d()(1 R
Minimax POVM is constructed using LFP
and be a compact metric spaceLet UCorollary
Every minimax POVM is a Bayes POVM with respect to the LFP
and be a compact metric space.Let If a LFP exists for a quantum statistical decision problem,
U
Every minimax POVM is a Bayes POVM with respect to the LFP.
In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is minimaxminimax.
Corollary2
For every closed convex subset )(UPoQ the above assertion holds.
y
Much of theoretical results are derived from quantum minimax theorem.
7. Summary
Discussion1. We show quantum minimax theorem, which gives theoretical basis in quantum statistical decision theory, in particular in objective Bayesian
l ianalysis.2. For every closed convex subset of POVMs, all of assertions still hold.
Future Works
Our result has also practical meanings for experimenters.
Future Works- Show other decision-theoretic results and previous known results by using p y gquantum minimax theorem
- Propose a practical algorithm of finding a least favorable priorPropose a practical algorithm of finding a least favorable prior
- Define geometrical structure in the quantum statistical decision problem
If you are interested in details, please see quant-ph 1410.3639 ; citations are also welcome!