doctor of philosophy information theoretic parameters for ... · information theoretic parameters...
TRANSCRIPT
DOCTOR OF PHILOSOPHY
Information theoretic parameters for graphs and operator systems
Boreland, Gareth
Award date:2020
Awarding institution:Queen's University Belfast
Link to publication
Terms of useAll those accessing thesis content in Queen’s University Belfast Research Portal are subject to the following terms and conditions of use
• Copyright is subject to the Copyright, Designs and Patent Act 1988, or as modified by any successor legislation • Copyright and moral rights for thesis content are retained by the author and/or other copyright owners • A copy of a thesis may be downloaded for personal non-commercial research/study without the need for permission or charge • Distribution or reproduction of thesis content in any format is not permitted without the permission of the copyright holder • When citing this work, full bibliographic details should be supplied, including the author, title, awarding institution and date of thesis
Take down policyA thesis can be removed from the Research Portal if there has been a breach of copyright, or a similarly robust reason.If you believe this document breaches copyright, or there is sufficient cause to take down, please contact us, citing details. Email:[email protected]
Supplementary materialsWhere possible, we endeavour to provide supplementary materials to theses. This may include video, audio and other types of files. Weendeavour to capture all content and upload as part of the Pure record for each thesis.Note, it may not be possible in all instances to convert analogue formats to usable digital formats for some supplementary materials. Weexercise best efforts on our behalf and, in such instances, encourage the individual to consult the physical thesis for further information.
Download date: 24. Aug. 2020
Queen’s University Belfast
Doctoral Thesis
Information theoretic parameters for
graphs and operator systems.
Author:
Gareth Boreland
Supervisor:
Prof. Ivan Todorov
A thesis submitted for the degree of
Doctor of Philosophy
in the
Mathematical Sciences Research Centre,
School of Mathematics and Physics.
April 8, 2020
i
“For from him and through him and to him are all things.”
Romans 11:36.
Acknowledgements
It has been a great privilege for me, someone who is most certainly not an expert, to have had
the opportunity to learn from those who are. First and foremost I want to thank my super-
visor, Professor Ivan Todorov, for introducing me to this beautiful area of mathematics, and
for patiently guiding me through it. This thesis owes much to his expertise and enthusiasm.
I would also like to thank Professor Andreas Winter of Universitat Autonoma de Barcelona
for many helpful observations and numerous useful discussions on many different aspects of
this project. I must also thank Dr Peter Vrana of Budapest University of Technology and
Economics for many helpful comments.
To be part of the Mathematical Sciences Research Centre at Queen’s for the last four years
has been both a pleasure and a privilege, and my thanks go to all the staff, both academic
and non-academic, for making this experience so positive. I cannot mention every individual
here, but in particular I would like to thank Miss Sheila O’Brien for all her administration
work and Dr Ian Stewart for computer support.
Among the highlights of the course have been the fellowship and encouragement I have
enjoyed from my fellow PhD students, and I want to thank them all for their friendship over
these last years and to wish each of them well for the future.
My PhD studies were completed during a career-break from Sullivan Upper School, Holy-
wood. I want to thank all my friends and colleagues there, in particular Mr. Chris Peel, the
Headmaster, and the Board of Governors for facilitating this opportunity, and for allowing
me to return to school in 2020. I hope I can do so with renewed enthusiasm and energy.
I gratefully acknowledge receipt of the research studentship from the Department of Employ-
ment and Learning which has allowed me to complete this course.
ii
Abstract
This thesis explores connections between information theory and graph theory. One such
connection, the notion of graph entropy, was first introduced by Janos Korner in [23]. Here we
prove a number of seemingly new results on graph entropy, including the determination of the
graph entropy of the odd cycles and their complements under certain probability distributions.
We recall how convex corners in Rd provide a useful tool for describing graph theoretic and
information theoretic ideas, and we develop a theory of convex corners in Md appropriate
for applications in quantum information. We recall from [13] how non-commutative graphs
can be regarded as generalisations of graphs. With a given non-commutative graph we
associate a number of convex corners and prove a ‘quantum sandwich theorem’. We define
several new parameters for non-commutative graphs, and show them to be generalisations of
corresponding graph parameters. This includes two quantum versions of the Lovasz number,
one of which is seen to be an upper bound on Shannon capacity. Finally we return to examine
graph entropy in the case of a non-i.i.d. classical source, and generalise the Kolmogorov–Sinai
entropy of a dynamical system to this setting.
iii
Contents
Acknowledgements ii
Abstract iii
Symbols viii
Introduction xii
1 Entropic quantities for the classical i.i.d. source 1
1.1 Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Entropy over a convex corner in Rd . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Bounds on the entropy over a convex corner. . . . . . . . . . . . . . . 6
1.2.3 Anti-blockers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Graph entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Some graph theoretic preliminaries . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Graph entropy and the problem of source coding without complete
distinguishability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 A lower bound on graph entropy . . . . . . . . . . . . . . . . . . . . . 20
1.4 Convex corners associated with a graph . . . . . . . . . . . . . . . . . . . . . 30
iv
CONTENTS v
2 Convex corners in Md 33
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Convex corners and anti-blockers in Md . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Examples of convex corners in Md . . . . . . . . . . . . . . . . . . . . 43
2.3 Reflexivity of Md-convex corners . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.1 The second anti-blocker theorem . . . . . . . . . . . . . . . . . . . . . 48
2.3.2 Consequences of reflexivity . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4 Entropic quantities in quantum information theory . . . . . . . . . . . . . . . 58
2.4.1 Some background on quantum information . . . . . . . . . . . . . . . 58
2.4.2 Diagonal expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.4.3 Entropy over an Md-convex corner . . . . . . . . . . . . . . . . . . . . 69
3 Non-commutative graphs and associated convex corners 81
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.1.1 Classical channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.1.2 Quantum measurement . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.1.3 Quantum channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.1.4 Non-commutative graphs . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.2 Convex corners from non-commutative graphs . . . . . . . . . . . . . . . . . . 96
3.2.1 The abelian, clique and full projection convex corners . . . . . . . . . 96
3.2.2 Embedding the classical in the quantum setting . . . . . . . . . . . . . 100
3.2.3 The Lovasz corner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.2.4 A quantum sandwich theorem . . . . . . . . . . . . . . . . . . . . . . . 120
CONTENTS vi
4 Parameters for non-commutative graphs 124
4.1 Parameters for non-commutative graphs from convex corners . . . . . . . . . 124
4.1.1 Defining non-commutative graph parameters . . . . . . . . . . . . . . 125
4.1.2 Properties of non-commutative graph parameters . . . . . . . . . . . . 131
4.1.3 Non-commutative graph homomorphisms . . . . . . . . . . . . . . . . 134
4.1.4 Weighted parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.2 Operator anti-systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.3 Non-commutative graph entropy . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.4 Another quantum generalisation of θ(G) . . . . . . . . . . . . . . . . . . . . . 146
4.5 Capacity bounds, the Witsenhausen rate and other limits . . . . . . . . . . . 151
4.6 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.7 Further questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5 The classical source with memory 174
5.1 Entropy and the source with memory . . . . . . . . . . . . . . . . . . . . . . 174
5.2 Graph theoretic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3 Graph entropy for the source with memory . . . . . . . . . . . . . . . . . . . 186
5.3.1 The graph G[B] and its graph entropy . . . . . . . . . . . . . . . . . . 187
5.3.2 The quantity h(G[B], T ) and its properties . . . . . . . . . . . . . . . . 193
5.3.3 Generalising the Kolmogorov–Sinai Theorem . . . . . . . . . . . . . . 195
5.4 Further questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.4.1 Finite subalgebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.4.2 Source coding with partial distinguishability . . . . . . . . . . . . . . . 204
5.4.3 Distinguishability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.4.4 Further generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
CONTENTS vii
A Convexity and semi-continuity 207
B Linear algebra 213
C Source coding for the ergodic source 218
Symbols
For reference, listed here are some of the main symbols used, with the page number of either
their definition or first occurrence. In some cases to avoid a lengthy description, the reader
is referred to the main text for the appropriate definition.
R+ set of non-negative real numbers, p. 1
H(p) Shannon entropy of p, p. 2
[n] 1, . . . , n, p. 1
X finite alphabet, p. 2
Pd set of probability distributions in Rd, p. 1
D(p‖q) relative entropy, p. 3
1 all ones vector, p. 1
Cd Rd-unit corner, p. 4
Bd Rd-unit cube, p. 4
HA(p) entropy of p ∈ Pd over convex corner A ⊆ Rd, p. 5
γ(A), A ⊆ Rd max〈u,1〉 : u ∈ A, p. 6
A[, A ⊆ Rd+ anti-blocker of A, p. 10
V (G) vertex set of G, p. 13
E(G) edge set of G, p. 13
G graph complement of G, p. 13
α(·) independence number, p. 13
ω(·) clique number, p. 13
χ(·) chromatic number, p. 14
ωf(·) fractional clique number, p. 18
χf(·) fractional chromatic number, p. 15
θ(·) Lovasz number, p. 31
Kn complete graph on n vertices, p. 14
viii
SYMBOLS ix
Kn empty graph on n vertices, p. 14
F ∗G co-normal product, p. 14
x ∼ y x is adjacent to y, p. 13
x ' y x is adjacent to or equal to y, p. 13
Gn nth co-normal power of G, p. 14
VP(G) vertex packing polytope of G, p. 16
FVP(G) fractional vertex packing polytope of G, p. 30
TH(G) Theta corner of G, p. 30
N(A), A ⊆ Rd maxβ : β1 ∈ A, p. 9
M(A), A ⊆ Rd see (1.15), p. 11
H(G,P ) graph entropy of (G, p), p. 16
〈·, ·〉 inner product, p. 6
Cn cycle graph on n vertices, p. 23
C(G) see (1.34), p. 31
Mm,n(S) set of m× n matrices with entries in S, p. 213
Mm,n set of m× n matrices with entries in C, p. 213
Md set of d× d matrices with entries in C, p. 33
Dd set of diagonal d× d matrices with entries in C, p. 33
Mhd set of Hermitian d× d matrices, p. 33
M+d set of positive semi-definite d× d matrices, p. 33
M++d set of positive definite d× d matrices, p. 33
D+d set of diagonal d× d positive semi-definite matrices, p. 33
Id d× d identity matrix, p. 213
Jd d× d all ones matrix, p. 213
γ(A), A ⊆Md maxTrA : A ∈ A, p. 57
TrA trace of A, p. 213
A∗, A ∈Mm,n Hermitian transpose of A, p. 213
dim(V ) dimension of V, p. 213
V ⊥ orthogonal complement of V, p. 214
‖A‖, A ∈Md operator norm of A, p. 214
‖A‖2, A ∈Md Hilbert–Schmidt norm of A, p. 214
ran(M) range of M, p. 214
ker(M) kernel of M, p. 214
SYMBOLS x
⊗ tensor product, p. 216
δij Kronecker delta, p. 216
conv convex hull, p. 207
conv closure of convex hull, p. 42
her hereditary cover, p. 41
C(A) her(conv(A)), p. 43
φ canonical mapping Rd+ → D+d , p. 35
A], A ⊆M+d anti-blocker of A, p. 36
Rd set of states in Md, p. 59
AX , X ∈Mhd M ∈M+
d : Tr(MX) ≤ 1), p. 44
BX , X ∈Mhd M ∈M+
d : M ≤ X), p. 44
N(A),A ⊆Md maxβ : βId ∈ A, p. 57
M(A),A ⊆Md see (2.22), p. 57
DV set of matrices diagonal in basis V, p. 62
∆V diagonal expectation with respect to basis V, p. 62
∆ diagonal expectation with respect to canonical basis, p. 63
H(ρ) von Neumann entropy, p. 60
HA(ρ) entropy of ρ over convex corner A, p. 70
GH strong product, p. 83
Gn nth strong power of G, p. 83
c(·) Shannon capacity, p. 83
R(·) Witsenhausen rate, p. 85
Φ∗ adjoint channel, p. 108
SG operator system associated to graph G, p. 93
SΦ operator system associated to channel Φ, p. 90
ap(S) abelian projection convex corner, p. 98
fp(S) full projection convex corner, p. 98
cp(S) clique projection convex corner, p. 98
th(S), thk(S) Lovasz corner, kth Lovasz corner, p. 109, p. 110
Pa,Pf ,Pc set of abelian, full, clique projections, p. 127, p. 121, p. 127
vp(G) diagonal convex corner corresponding to VP(G), p. 101
fvp(G) diagonal convex corner corresponding to FVP(G), p. 101
SYMBOLS xi
th(G) diagonal convex corner corresponding to TH(G), p. 108
P0(G) φ(C(G)), p. 108
P(G) see (3.45), p. 115
C(S),Ck(S) see (3.30), (3.33), p. 109, p. 110
P(S),Pk(S) see (3.32), (3.35), p. 109, p. 110
ω(S) full number, p. 126
Ωf(S) fractional clique covering number, p. 128
Ωf(S) fractional full covering number, p. 128
θ(S), θk(S) Lovasz number, kth Lovasz number, p. 126
Ω(S) clique covering number, p. 127
Ω(S) full covering number, p. 127
χs(T ) strong chromatic number, p. 138
Sd constant diagonal operator system, p. 165
ϑ(S) sup‖I + T‖ : T ∈Md, I + T ≥ 0, T ∈ S⊥, p. 146
ϑ(S) supd∈N ϑ(Md(S)), p. 146
λmin(A) smallest eigenvalue of A, p. 147
θ(S) second Lovasz number, p. 147
F ·G lexicographic graph product, p. 182
at(B) set of atoms of finite sub-algebra B, p. 176
A0 time-0 sub-algebra, p. 176
A ∨ B σ-algebra generated by A ∪ B, p. 176
H(B) entropy of finite sub-algebra B, p. 177
H(B|C) conditional entropy of B given C, p. 177
h(B, T ) entropy of B relative to T, p. 177
h(T ) Kolmogorov–Sinai entropy of T, p. 177
G[B] distinguishability graph on at(B), p. 187
F the σ-algebra∨∞n=−∞ T
nA0, p. 175
F0 the algebra⋃∞n=0
∨ni=−n T
iA0, p. 177
S set of finite sub-algebras of the form∨nk=1 T
−ikA0, p. 177
GZ0 infinite co-normal power of G0, p. 189
H(G[C|B], P ) conditional graph entropy of C given B, p. 189
h(G[B], T ) graph entropy of B relative to T, p. 193
h(G,T ) graph entropy of T, p. 195
Introduction
Connections between information theory and graph theory continue to grow, and this thesis
intends to explore just some of them. The relationship between the two fields has been
beneficial to both: from the beginnings of information theory in the work of Claude Shannon
([47], [48]) it was seen that some information theoretic ideas could naturally be expressed
in the language of graph theory, and many subsequent advances in information theory have
motivated new developments in graph theory. As an example of this mutual relationship
we need look no further than the problem of zero-error communication through the classical
channel. This problem was first discussed in [48] and is an important piece of background for
this thesis. Letting N denote a classical channel from finite alphabet X to finite alphabet Y,
we construct graph GN , known as the confusability graph for channel N , with vertex set X ,
and in which edges join elements of X which are confusable in the sense that their outputs
are not guaranteed to be distinct. For a single use of the channel, zero-error communication
is only possible if the sender is restricted to send only elements of X belonging to some
independent set in GN . The independence number of GN , denoted α(GN ), is then known as
the one-shot zero-error capacity of N . If the channel is used many times in succession, the
channel’s zero-error performance is described by the Shannon capacity of GN , given by
c(GN ) = limn→∞
n
√α(GnN ),
where Gn denotes the nth strong power of G. Since independence number is known to be
super-multiplicative over strong products, we have c(G) ≥ α(G); we note that equality does
hold here in some cases, for example for perfect graphs. However, the determination of the
Shannon capacity of a given graph is in general a notoriously difficult problem, and has led
to fruitful work in graph theory, often in finding bounds on c(G). The simplest graph for
which the determination of Shannon capacity proved to be problematic was C5, the cycle
graph on five vertices: in [48] Shannon left the calculation of c(C5) as an open problem. The
resulting story of mathematical discovery has given to C5 an important place in the history
xii
INTRODUCTION xiii
of both graph theory and information theory. Clearly α(C5) = 2, and it is not hard to see
that α(C25 ) = 5, so by super-multiplicativity c(C5) ≥
√5. A major breakthrough came in
Laszlo Lovasz’s paper [28] where he introduced the new graph parameter θ(G), now known
as the Lovasz number, and proved that c(G) ≤ θ(G) for any graph G. Lovasz showed that
θ(C5) ≤√
5, and thus established that c(C5) = θ(C5) =√
5. However, it does not hold for all
graphs G that c(G) = θ(G), and though θ(C2n+1) is known for all n ∈ N, the determination
even of c(C7) remains an open problem, and attracts active research interest [37].
It has been shown for any graph G that the ‘sandwich theorem’
α(G) ≤ θ(G) ≤ χf(G) (1)
holds, where χf(G) is the fractional chromatic number of G, a parameter which itself will be
seen to have information theoretic significance.
Discussion of the sandwich theorem leads naturally to a consideration of convex corners,
the concept of which will be an important unifying theme through much of this thesis. A
convex corner A ⊆ Rd is a set which is non-empty, closed, convex, non-negative (meaning
that A ⊆ Rd+) and with the hereditarity property that b ≤ a ∈ A ⇒ b ∈ A, where the
ordering is coordinate-wise [22]. Given a convex corner A ⊆ Rd, we define the functional
γ(A) = max
d∑i=1
vi : v ∈ A
,
and also a set A[, known as the anti-blocker of A and itself a convex corner, given by
A[ = v ∈ Rd+ : 〈v, u〉 ≤ 1 for all u ∈ A.
Given a graph G, the convex hull of the characteristic vectors of the independent sets of
G forms a convex corner VP(G), known as the vertex packing polytope of G, and α(G) =
γ(VP(G)). The fractional vertex packing polytope of graph G is the convex corner FVP(G) =
VP(G)[ and satisfies γ(FVP(G)) = χf(G). Significantly, [17] showed how to associate with
graph G a convex corner TH(G) satisfying both γ(TH(G)) = θ(G) and
VP(G) ⊆ TH(G) ⊆ FVP(G). (2)
This result also justifies being called a sandwich theorem, and it is clearly a stronger result
than (1), which is obtained from (2) by applying the functional γ. We note that a number
INTRODUCTION xiv
of important graph parameters with information theoretic significance can be expressed as
parameters of convex corners. One of the aims of this thesis is to explore further some of
these important links between graph theory, information theory and the theory of convex
corners.
A central concept in information theory is entropy. For probability distribution p =
(p1, . . . , pn) describing the outcomes of an experiment, Shannon introduced the function
H(p) = −n∑i=1
pi log pi
as a measure of the amount of a priori ‘uncertainty’ in the outcome of the event, or equiv-
alently as a measure of the amount of ‘information’ gained by subsequently learning the
outcome. A now famous story [52] tells us that John von Neumann suggested to Shannon
that he call this quantity ‘entropy’, firstly because the same function had already been used
in statistical mechanics with that name, and secondly because no one fully understood it,
giving Shannon an advantage in any resulting debate! Shannon’s source coding theorem given
in [47] gives further information theoretic significance to H(p). He considered the case that
a source emits a sequence (Xi)∞i=1 where the Xi’s are independent, identically distributed
(i.i.d.) random variables taking values in some finite alphabet, each following probability
distribution p. Shannon shows that H(p) may be regarded as the ‘mean number of encoding
bits required per source letter sent’ when we encode with negligible probability of error.
It is instructive to review a number of different ways in which Shannon’s entropy function
has been generalised.
1. Given a convex corner A ⊆ Rd and probability distribution p = (p1, . . . , pd), [11] defined
the quantity HA(p), the entropy of p over convex corner A, and showed that H(p) can
be written in this form.
2. If a source is not i.i.d., then successively emitted symbols from alphabet X are not
independent random variables, but rather follow some arbitrary joint distribution. In
this case it is required to consider a probability measure P on the sample space Ω = X Z
equipped with a canonically defined σ-algebra F and shift function T : Ω → Ω. An
analysis of such a source can be found for example in [4] and [20], and defines for each
finite sub-algebra A ⊂ F the quantity h(A, T ), known as the entropy of A relative to
INTRODUCTION xv
T . The Kolmogorov–Sinai entropy of the dynamical system (Ω,F , P, T ) is defined by
h(T ) = suph(A, T ) : A is a finite sub-algebra of F.
The Kolmogorov–Sinai entropy is the generalisation of H(p) to this setting, and can be
shown to be invariant under isomorphisms, that is under certain structure preserving
mappings between dynamical systems. The well-known Kolmogorov–Sinai Theorem
states that if finite sub-algebra B generates F in the sense that F =∨∞n=−∞ T
nB, then
h(T ) = h(B, T ).
3. The discovery of quantum mechanics has led to the development of the field of quantum
information, where data is transmitted not by elements of an alphabet, but by quantum
states. In this context, instead of probability distributions in Rd, we consider positive
semi-definite operators ρ ∈Mn with unit trace, known as states. Here the appropriate
generalisation of H(p) is the von Neumann entropy, given by
H(ρ) = −Tr(ρ log ρ).
4. In [23] Janos Korner considers the source coding problem in the case of an i.i.d. source
with probability distribution p on alphabet X whose symbols are not all distinguishable.
A graph G is constructed on vertex set X in which vertices are adjacent precisely when
they are distinguishable. In this setting Korner shows that the source coding problem
is solved by the quantity H(G, p), which he calls graph entropy. In [11] it is shown that
H(G, p) = HVP(G)(p),
where HVP(G)(p) is the entropy over the vertex packing polytope, giving another ex-
ample of the links between graph theory, information theory and convex corners.
Where our exploration leaves the classical case and enters the field of quantum infor-
mation, our starting point is [13], where the notion of the confusability graph of a classical
channel is generalised to the quantum channel. Central to this theory is the idea of an op-
erator system, namely a self-adjoint, unital subspace of Md. It is shown in [13] how, for
a given quantum channel Φ, we can construct an operator system SΦ, and also that every
operator system can be associated in this way with some quantum channel. Furthermore,
[13] shows how to construct from graph G an operator system SG from which G can be recov-
ered, and also how the classical channel can naturally be embedded in the operator system
INTRODUCTION xvi
approach. For these reasons operator systems are known as non-commutative graphs (we use
the terms interchangeably) and are regarded as ‘quantum’ generalisations of graphs; indeed
the same objects are sometimes, but not here, known as quantum graphs [55]. From there
[13] proceeds to define for an operator system S the independence number α(S) and the
Shannon capacity c(S), and in [35] a definition of the chromatic number χ(S) is given. These
parameters are noted to generalise the corresponding graph parameters in the sense that
α(SG) = α(G), c(SG) = c(G) and χ(SG) = χ(G) for all graphs G. For a non-commutative
graph S, [13] also defines the parameter ϑ(S) which is shown to be a generalisation of the
Lovasz number of a graph in that for any graph G
ϑ(SG) = θ(G).
The parameter ϑ(S) is also shown to be an upper bound on the Shannon capacity c(S).
Finding bounds on c(S) continues to be an important area of research [27]. We note that [21]
and [51] have both given quantum versions of (1), but these results have not been related to
convex corners as in (2).
From this brief summary of some existing links between information theory, graph theory
and the theory of convex corners, the motivation for this study is hopefully clear. Some of
the specific questions which we wish to pursue are the following:
• By analogy with the theory of convex corners in Rd, which we now call Rd-convex
corners, can we develop a theory of Md-convex corners, that is convex corners whose
elements lie in Md, and do they have quantum information theoretic significance?
• In particular, is there a quantum information theoretic version of (2)?
• What other parameters for non-commutative graphs can be defined that naturally gen-
eralise corresponding parameters of graphs?
• In the quantum case, can we develop a theory of the entropy of a state over anMd-convex
corner, and can a quantum version of graph entropy be defined for non-commutative
graphs?
• In the classical case, can the notion of graph entropy be extended for the non-i.i.d.
source?
The reader may find the following brief overview of the thesis to be useful.
INTRODUCTION xvii
Chapter 1 concerns the classical setting and begins with some background on entropy,
graph entropy and Rd-convex corners. We recall from [50] for a given graph G that
maxp∈P
H(G, p) = logχf(G),
where P is the set of probability distributions on V (G). Noting that H(G, p) = HVP(G)(p),
we prove the corresponding result for the entropy over any convex corner A ⊆ Rd, namely
that
maxp∈Pd
HA(p) = log γ(A[), (3)
where Pd denotes the set of probability distributions in Rd. On the quantity HA(p) we
establish the straightforward lower bound
HA(p) ≥ H(p)− log γ(A),
and we give a necessary and sufficient condition for equality. An immediate Corollary of this
result is that for a graph G with probability distribution p on its vertex set,
H(G, p) ≥ H(p)− logα(G). (4)
From (4) we are able to calculate the graph entropy of the odd cycles and their complements
under certain classes of probability distributions, along with a number of other new results.
In Chapter 2 we introduce the concept of a convex corner in Md, a subset of Md that
is non-empty, closed, convex, non-negative and hereditary. Motivated by the Rd case, for
Md-convex corner B ⊆Md we define
γ(B) = max TrB : B ∈ B ,
and the anti-blocker B] by
B] = M ∈M+d : 〈M,B〉 ≤ 1 for all B ∈ B.
We introduce HB(ρ), the entropy of state ρ ∈Md over Md-convex corner B, and show that
maxHB(ρ) : ρ ∈Md, ρ is a state = log γ(B]).
INTRODUCTION xviii
For a convex corner A ⊆ Rd it is known that
(A[)[ = A,
the so-called second anti-blocker theorem [22]. A significant section of Chapter 2 is devoted
to proving the corresponding result for an Md-convex corner B, namely that
(B])] = B.
This chapter also introduces the diagonal expectation operator ∆ which will be used later to
examine relationships between convex corners in Rd and Md.
After reviewing the theory of the classical channel and its associated confusability graph,
Chapter 3 examines the quantum channel and gives the necessary background from [13] on
non-commutative graphs. The main focus of this chapter is to develop a generalisation of
(2) in the quantum case. For a non-commutative graph S we define the Md-convex corners
ap(S), cp(S), fp(S) and th(S), known respectively as the abelian projection convex corner,
clique projection convex corner, full projection convex corner and Lovasz corner of S. It is
shown that ap(S) is a generalisation of VP(G) in the sense that VP(G) is related in a natural
way to ap(SG). Similarly, cp(S)] and fp(S)] are quantum generalisations of FVP(G), as is
th(S) of TH(G). The chapter concludes by proving that for any non-commutative graph S
the following ‘quantum sandwich theorem’ holds:
ap(S) ⊆ th(S) ⊆ fp(S)]. (5)
Just as many graph parameters can be defined in terms of Rd-convex corners such as
VP(G),TH(G) and FVP(G), Chapter 4 shows how to define parameters for a given non-
commutative graph S ⊆ Md in terms of the Md-convex corners we associated with S in
Chapter 3. For instance, the independence number α(S) as given in [13] can be defined by
α(S) = γ(ap(S)), and we also show how the chromatic number χ(S) as given in [35] can
be written as a functional defined on ap(S). We use functionals on convex corners to define
a number of new parameters for non-commutative graphs. These include the fractional
chromatic number χf(S), the fractional clique number ωf(S), the clique covering number
Ω(S) and the fractional clique covering number Ωf(S). Each is shown to generalise the
corresponding graph parameter in the sense that when evaluated for the operator system
SG, the corresponding graph parameter of graph G is obtained. We know that the result
INTRODUCTION xix
ωf(G) = χf(G) holds for all graphs, and we show for a non-commutative graph S that
ωf(S) = χf(S). By analogy with graph entropy, we define the non-commutative graph entropy
of operator system S ⊆Md and state ρ ∈Md by
H(S, ρ) = Hap(S)(ρ),
and we obtain the result
maxH(S, ρ) : ρ ∈Md, ρ is a state = logχf(S),
showing the significance of the fractional chromatic number. From (5) we immediately have
that γ(ap(S)) ≤ γ(th(S)) ≤ γ(fp(S)]). The quantity γ(fp(S)]) will be known as the fractional
full covering number of S and denoted by Ωf(S). We call γ(th(S)) the Lovasz number of S,
and denote it by θ(S); this definition is justified by the fact that θ(SG) = θ(G) when G is a
graph. This leads to the result
α(S) ≤ θ(S) ≤ Ωf(S),
a quantum generalisation of (1). We note in general that θ(S) 6= ϑ(S). Furthermore, it
is not known whether θ is sub-multiplicative over tensor products, and consequently we do
not know if θ(S) upper bounds c(S). A number of equivalent expressions for θ(G) are given
in [28], and one of these motivates the definition of θ(S), another non-commutative graph
parameter. It satisfies θ(SG) = θ(G) and crucially is shown to be sub-multiplicative and thus
to be an upper bound on c(S). We also show how the classical concept of the Witsenhausen
rate of a graph can be generalised for non-commutative graphs.
Chapter 5 is a return to the classical setting and attempts to extend the concept of graph
entropy to the case of the ‘non-i.i.d.’ source. We begin with the dynamical system (Ω,F , P, T )
and construct an infinite graph G with vertex set Ω to describe a distinguishability relation
on Ω. For a finite sub-algebra B ⊂ F we form a finite graph G[B], and then define a quantity
h(G[B], T ), the graph entropy of B relative to T . The graph entropy of the resulting system
is then defined by
h(G,T ) = suph(G[B], T ),
where the supremum is taken over a certain class of finite sub-algebras, and is shown to be
invariant under a notion of isomorphism which we will specify. In this setting the analysis of
the Bernoulli shift reduces, as we would expect, to the problem addressed by Korner in [23],
INTRODUCTION xx
and we also consider the Markov shift. The problem of generalising the Kolmogorov–Sinai
Theorem to this context is discussed.
Necessary results and concepts from information theory, both classical and quantum, will
be developed from scratch, but a basic knowledge of linear algebra, probability theory and
analysis will be assumed. Some relevant background material is included in appendices A
and B. Much of the work in Chapter 1 on graph entropy has previously been published in
[5]. Chapters 3 to 4 contain joint work which has been published in [6].
Chapter 1
Entropic quantities for the classical
i.i.d. source
In 1948 Claude Shannon laid the foundations of information theory in his landmark paper [47].
A central concept of [47], and also of this chapter, is entropy. Some relevant preliminaries will
be introduced as needed, but the reader can find a review of necessary background material
on convexity and semi-continuity in Appendix A.
Entropic quantities typically involve the logarithm function, and unless stated otherwise
it will be assumed throughout that all logarithms are binary. The following conventions will
be used: log 0 = −∞, 0 log 0 = 0, and
pi logpiqi
=
0 if pi = 0
∞ if pi > 0 and qi = 0.
The logarithm function is strictly concave, a fact upon which many fundamental information
theoretic results rely.
We use the notation [d] := 1, . . . , d. We will denote the vector v ∈ Rd with ith coordinate
vi by (vi)i∈[d], or just (vi) where context allows, and we write v > 0 when vi > 0 for all
i ∈ [d]. Similarly, by v ≥ 0 we mean vi ≥ 0 for all i ∈ [d]. If v − w ≥ 0 for v, w ∈ Rd, we
write v ≥ w. The set of all probability distributions in Rd will be denoted by Pd. We write
Rd+ = v ∈ Rd : v ≥ 0 for the set of non-negative elements in Rd. Let 1 ∈ Rd the all ones
vector.
The next well-known result, known as the ‘log-sum’ lemma, follows from the convexity of
1
1.1 Shannon entropy 2
the function x→ x log x, and is a useful property of the logarithm function.
Lemma 1.0.1. ([10, Theorem 2.7.1].) When pi ≥ 0 and vi ≥ 0, i = 1, . . . , n, then
n∑i=1
pi log
(pivi
)≥
(n∑i=1
pi
)log
(∑ni=1 pi∑ni=1 vi
).
Equality holds if and only if there exists λ ∈ R such that vipi
= λ for all i = 1, . . . , n.
This chapter begins by recalling the definition of Shannon entropy and its connection to
the source coding theorem for the noiseless channel. We also recall the concept of a convex
corner in Rd, and then discuss the notion of entropy over a convex corner, as introduced in
[11]. We prove new upper and lower bounds on the entropy of a probability distribution over
a convex corner. We then discuss the vertex packing polytope, a widely studied convex corner
associated to a given graph, before proving some new results on graph entropy, a quantity
introduced by Korner in [23] and given by the entropy over the vertex packing polytope
of a given graph. We discuss two more convex corners associated to a graph, namely the
fractional vertex packing polytope and the theta corner, and we recall how the latter leads
to a definition of the Lovasz number of a graph as given in [28].
1.1 Shannon entropy
Throughout this thesis we will discuss a number of entropic quantities, but the most funda-
mental is the well-known Shannon entropy.
Definition 1.1.1. Given p = (p1, . . . , pd) ∈ Pd, the Shannon entropy (or just entropy), H(p),
is given by
H(p) = −d∑i=1
pi log pi.
Shannon’s source coding theorem for the noiseless channel is proved in [47] and attaches
an information theoretic meaning to H(p) which we now briefly discuss. Consider a finite
alphabet X = 1, . . . , d. A source emits a sequence (Xi)∞i=1 where the Xi’s are independent,
identically distributed (i.i.d.) random variables taking values in X and each following proba-
bility distribution p ∈ Pd. Such a source is referred to as discrete, memoryless and stationary.
We equip X k with the product probability distribution pk where pk(x1, . . . , xk) =∏ki=1 pxi .
A k-to-m binary code is an injection X k → 0, 1m, encoding ‘words’ of length k as binary
1.2 Entropy over a convex corner in Rd 3
strings of length m. Shannon’s source coding theorem states that for all λ ∈ (0, 1),
H(p) = limk→∞
1
klog(
min|E| : E ⊆ X k, pk(E) ≥ 1− λ
). (1.1)
This means H(p) can be intuitively understood to be the ‘long-term mean number of encoding
bits required per source letter’ with negligible probability of error. We explain this idea more
fully in Section 1.3.2, where we discuss a generalisation of this problem.
For p, q ∈ Pd, the relative entropy D(p‖q) is given by
D(p‖q) =
d∑i=1
pi logpiqi.
By Lemma 1.0.1, it holds that D(p‖q) ≥ 0, with equality if and only if p = q (see [10], [20].)
The relative entropy can be rewritten in terms of the Shannon entropy H(p) as
D(p‖q) = −H(p)−∑i∈X
pi log qi ≥ 0. (1.2)
We denote by u = (ui) ∈ Pd the uniform distribution given by ui = 1/d, i = 1, . . . , d. Putting
q = u in (1.2) gives that
0 ≤ H(p) ≤ H(u) = log d (1.3)
for all probability distributions p ∈ Pd. Also note from (1.2) that H(p) ≤ −∑
i∈X pi log qi for
all q ∈ Pd. Setting q = p gives equality, and thus,
H(p) = minq∈Pn
−∑i∈X
pi log qi
. (1.4)
This expression for H(p), equivalent to Definition 1.1.1, motivates the introduction of more
general entropic quantities, of which H(p) is a special case. This approach was first used in
[11], and is outlined in the next section.
1.2 Entropy over a convex corner in Rd
1.2.1 Definitions
Here we recall the definition and some basic properties of the new concept of entropy intro-
duced in [11]. We begin with some definitions.
1.2 Entropy over a convex corner in Rd 4
We define Cd, the ‘Rd-unit corner’ by letting
Cd =
v ∈ Rd : v ≥ 0,
d∑i=1
vi ≤ 1
,
and Bd, the ‘Rd-unit cube’, by letting
Bd = v ∈ Rd : 0 ≤ vi ≤ 1 for all i ∈ [d].
If a set A satisfies A ⊆ Rd+, then we say A is non-negative. We say set A ⊆ Rd+ is hereditary
if w ∈ A for all w ∈ Rd+ satisfying w ≤ v for some v ∈ A. Fundamental to the approach given
in [11] is the concept of a convex corner.
Definition 1.2.1. ([11], [22].) A set A ⊆ Rd will be called an Rd-convex corner (or just
a convex corner where context allows) if it is closed, convex, non-empty, non-negative and
hereditary. If in addition the convex corner A is bounded and has non-empty interior, then
A will be called a standard convex corner.
Note that bounded convex corners are compact sets.
Lemma 1.2.2. Let A be an Rd-convex corner. The following are equivalent:
(i) A has a non-empty interior;
(ii) there exists r > 0 such that r1 ∈ A;
(iii) A contains a strictly positive element.
Proof. (i)⇒(ii) Suppose that A has non-empty interior. Writing B(a, δ) for the open ball with
centre a and radius δ, let a ∈ A and δ > 0 be such that B(a, δ) ⊆ A. Then a + 12√dδ1 ∈ A
and, since A is hereditary and a ≥ 0, we have 12√dδ1 ∈ A.
(ii)⇒(iii) is trivial.
(iii)⇒(i) Let b ∈ A be strictly positive. Setting r = mini∈[d]bi, it holds that c ≤ b
for all c ∈ B(0, r). By the hereditarity of A it follows that Rd+ ∩ B(0, r) ⊆ A. Now let
B0 = B(
r2√d1, r
2√d
). It is easy to see that B0 ⊆ Rd+. For all x ∈ B0, by the triangle
inequality
‖x‖ ≤∥∥∥∥x− r
2√d1
∥∥∥∥+
∥∥∥∥ r
2√d1
∥∥∥∥ ≤ r
2√d
+r
2≤ r.
Then B0 ⊆ Rd+ ∩ B(0, r) ⊆ A, and A has non-empty interior.
1.2 Entropy over a convex corner in Rd 5
It is clear that Cd and Bd are standard Rd-convex corners. The following result is given
in [11]. For completeness, a full proof is supplied here.
Lemma 1.2.3. For any probability distribution p ∈ Pd and bounded convex corner A ⊆ Rd+,
the function f : A → R ∪ ∞ where f(v) = −∑d
i=1 pi log vi attains a minimum value f(a)
for some a ∈ A. If f(v) <∞ for some v ∈ A (in other words, the minimum is finite) and if
pi > 0, then the coordinate ai of the minimising vector a is uniquely determined.
Proof. For a convex corner A, let
A0(p) = v ∈ A : ∀i ∈ [d], pi > 0⇒ vi > 0.
Observe that f(v) is finite for all v ∈ A0(p), and that the restriction of f to A0(p) is
continuous. However, f(v) = ∞ for all v ∈ A\A0(p). It is clear that lima→v f(a) = ∞ for
all v ∈ A\A0(p). This implies that f is lower semi-continuous on A. Since A is bounded, it
follows that A is compact. The first assertion then follows from Theorem A.0.6.
Now for a ∈ A such that f(a) = minv∈A f(v) < ∞, it is required to show that ai is
uniquely determined when pi > 0. Consider b ∈ A with f(b) = f(a). Then a+b2 ∈ A by
convexity and f(a+b2 ) = −
∑di=1 pi log(ai+bi2 ). The log function is strictly concave, and so for
i = 1, . . . , d, we have
− pi log
(ai + bi
2
)≤ −pi
2log ai −
pi2
log bi <∞, (1.5)
with equality only if pi = 0 or ai = bi. However, since f(a) = f(b) = minv∈A f(v), it follows
that f(a+b2 ) ≥ f(a). This implies for each i ∈ [d] that the first inequality in (1.5) is an
equality, and so for each i = 1, . . . , d we have pi = 0 or ai = bi.
Motivated by Lemma 1.2.3, we now recall the definition of the entropy over a convex
corner, as found in [11].
Definition 1.2.4. The entropy of probability distribution p ∈ Pd over the bounded Rd-
convex corner A is given by
HA(p) = mina∈A
(−
d∑i=1
pi log ai
). (1.6)
Remark 1.2.5. Using the notation of Lemma 1.2.3, we see that if A0(p) = ∅, then f(v) =∞
for all v ∈ A and so HA(p) = ∞. As a trivial example, it is clear that H0(p) = ∞ for all
1.2 Entropy over a convex corner in Rd 6
p ∈ Pd. Indeed, if A has empty interior, then there exists i ∈ [d] such that vi = 0 for all
v ∈ A. Then for p ∈ Pd with pi > 0, it holds that A0(p) = ∅ and thus HA(p) =∞. Whereas,
if A is bounded and has non-empty interior, then A0(p) 6= ∅ and HA(p) is finite for all p ∈ Pd.
If a ≤ b with a, b ∈ Rd, then −∑d
i=1 pi log ai ≥ −∑d
i=1 pi log bi. Thus the minimising
vector in (1.6) can be chosen as a vector that cannot be majorised coordinate-wise in A.
In the case where A = Cd, the minimising vector a in (1.6) can then be chosen to satisfy∑di=1 ai = 1. Then by this fact and (1.4),
HCd(p) = mina∈Pd
(−
d∑i=1
pi log ai
)= H(p). (1.7)
This shows, as [11] notes, that Shannon entropy is a special case of this more general entropy
concept.
Since 1 ∈ Bd, it is immediate that HBd(p) = 0 for all p ∈ Pd. By Definition 1.2.4, if A
and B are bounded convex corners with A ⊆ B, then HB(p) ≤ HA(p) for all p ∈ Pd. The
following lemma is an immediate consequence of these observations.
Lemma 1.2.6. If a convex corner A satisfies Cd ⊆ A ⊆ Bd, then
0 ≤ HA(p) ≤ H(p) for all p ∈ Pd.
1.2.2 Bounds on the entropy over a convex corner.
For a general convex corner A and a probability distribution p, a direct calculation of HA(p)
is likely to be difficult. Finding upper and lower bounds, as in Lemma 1.2.6, will often be a
useful way to proceed. In this subsection we state and prove Theorem 1.2.9, a lower bound
on HA(p), and Theorem 1.2.13, an upper bound on HA(p). (We will note that [24, (14)] is in
fact a special case of Theorem 1.2.9, and [50, Lemma 4] is a special case of Theorem 1.2.13.)
For u, v ∈ Rd we use the inner product given by 〈u, v〉 =∑d
i=1 uivi and the associated
norm ‖u‖ =√〈u, u〉. We note that
∑di=1 ui = 〈u,1〉 .
Definition 1.2.7. For a bounded Rd-convex corner A, we define
γ(A) = max〈u,1〉 : u ∈ A. (1.8)
It is clear that γ(A) = 0 if and only if A = 0. If Rd-convex corner A is unbounded, we
define γ(A) =∞.
1.2 Entropy over a convex corner in Rd 7
Remark 1.2.8. The existence of the maximum in (1.8) follows from the extreme value theorem,
recalling that a bounded convex corner is compact and the function u→ 〈u,1〉 is continuous.
Lemma 1.0.1 yields the following lower bound on HA(p) in terms of the Shannon entropy
H(p) and the quantity γ(A).
Theorem 1.2.9. Let A be a bounded Rd-convex corner. For all p ∈ Pd
HA(p) ≥ H(p)− log γ(A).
Equality holds if and only if γ(A)p ∈ A. In the case of equality, v = γ(A)p is the unique
vector in A satisfying∑d
i=1−pi log vi = HA(p).
Proof. Using Lemma 1.2.3, choose u ∈ A satisfying HA(p) = −∑d
i=1 pi log ui. Lemma 1.0.1
and the definition of γ(A) give
d∑i=1
pi log
(piui
)≥ − log
(d∑i=1
ui
)≥ − log γ(A). (1.9)
The left hand side is equal to HA(p)−H(p), whence it is immediate that
HA(p) ≥ H(p)− log γ(A). (1.10)
By Lemma 1.0.1, equality in the first inequality of (1.9) requires ui = λpi with λ ∈ R for
all i = 1, . . . , d, and equality in the second inequality of (1.9) requires∑d
i=1 ui = γ(A). Thus,
equality in (1.10) implies that there exists u ∈ A such that γ(A) =∑d
i=1 ui = λ∑d
i=1 pi = λ.
This gives u = λp = γ(A)p ∈ A.
Conversely, if γ(A)p ∈ A, then
HA(p) = minv∈A
(−
d∑i=1
pi log vi
)≤−
d∑i=1
pi log(γ(A)pi)
=H(p)− log γ(A).
With (1.10), this gives HA(p) = H(p)− log γ(A).
To show that in the case of equality, v = γ(A)p is the unique vector in A satisfying
1.2 Entropy over a convex corner in Rd 8
−∑d
i=1 pi log vi = HA(p), suppose there exists w ∈ A where
−d∑i=1
pi logwi = HA(p) = H(p)− log γ(A).
Then setting u = w gives equality throughout (1.9) and, as previously, w = γ(A)p.
Remark 1.2.10. It is easy to see that for any bounded convex corner A ⊆ Rd, the lower
bound in Theorem 1.2.9 is achieved for some p ∈ Pd. Let a ∈ A satisfy∑d
i=1 ai = γ(A).
Then p = 1γ(A)a ∈ Pd trivially satisfies the equality condition of Theorem 1.2.9.
We now work towards an upper bound on HA(p). Although the following lemma is intu-
itive, its proof is complicated by the fact that if pi = 0 for some i ∈ [d], then the minimising
vector v ∈ A in Definition 1.2.4 can have vi = 0, and for such a vector v the function
h : Pd → R ∪ ∞ given by h(p) = −∑d
i=1 pi log vi can take the value ∞.
Lemma 1.2.11. If A is a standard Rd-convex corner, then the function f : Pd → R given
by f(p) = HA(p) is upper semi-continuous. Furthermore, the function f is continuous at all
p ∈ Pd satisfying p > 0.
Proof. By Remark 1.2.5, f(p) is finite for all p ∈ Pd. Note by Lemma 1.2.2 that 1h1 ∈ A
for some h > 0. Let (p(m))m∈N be a sequence in Pd converging to p ∈ Pd. For p, u ∈ Rd+,
let g(p, u) =∑d
i=1−pi log ui. Denote the minimising vectors in the definitions of HA(p) and
HA(p(m)) by v ∈ A and v(m) ∈ A respectively; that is, HA(p) = g(p, v) and HA(p(m)) =
g(p(m), v(m)). If pi = 0 it can hold that vi = 0, so we form w = (1 − µ)v + µh1 > 0 where
1h1 ∈ A and µ ∈ (0, 1). By convexity, w ∈ A. We have
g(p, v) = HA(p) ≤ g(p, w) =d∑i=1
−pi log(
(1− µ)vi +µ
h
).
Noting that pi = 0 when vi = 0, we see that g(p, w)→µ→0 g(p, v). Then for any δ > 0, there
exists µ ∈ (0, 1) such that g(p, v) ≤ g(p, w) ≤ g(p, v) + δ. For all m we have HA(p(m)) =
g(p(m), v(m)) ≤ g(p(m), w). Then because p(m) →m→∞ p and w > 0,
lim supm→∞
HA(p(m)) ≤ limm→∞
g(p(m), w) = g(p, w).
Finally, since δ > 0 was arbitrary and g(p, w) ≤ g(p, v) + δ = HA(p) + δ, it follows that
lim supm→∞
f(p(m)) = lim supm→∞
HA(p(m)) ≤ HA(p) = f(p),
1.2 Entropy over a convex corner in Rd 9
as required.
We now show further that the function f is continuous at all p ∈ Pn satisfying p > 0. To
see this, observe for all m ∈ N that
HA(p) = g(p, v) ≤ g(p, v(m)) = g(p− p(m), v(m)) + g(p(m), v(m)) (1.11)
= g(p− p(m), v(m)) +HA(p(m)).
Since A is bounded, we have for all m ∈ N that v(m) ≤ t1 for some t > 0. Now 1h1 ∈ A for
some h > 0, and so HA(p(m)) =∑d
i=1−p(m)i log v
(m)i ≤ log h. Then
−p(m)j log v
(m)j ≤ log h+
∑i 6=j
p(m)i log v
(m)i ≤ log h+ log t = log(th).
This gives log v(m)j ≥ − log(th)
p(m)j
. Since p(m) →m→∞ p > 0, for sufficiently large m and for all
j, we have v(m)j ≥ s for some s > 0. In other words, there exist s, t ∈ (0,∞) such that
s1 ≤ v(m) ≤ t1 for all sufficiently large m. Then since p(m) → p, we have limm→∞ g(p −
p(m), v(m)) = 0. It is now clear in (1.11) that we have HA(p) ≤ lim infm→∞HA(p(m)).
Corollary 1.2.12. The function f defined in Lemma 1.2.11 attains a finite maximum value
on Pd.
Proof. Since Pd is compact and f is upper semi-continuous, this follows from Lemma 1.2.11
by Theorem A.0.6.
This shows the existence of maxp∈Pd HA(p) for a standard Rd-convex corner A. After the
following definition, we will use the ‘minimax’ theorem as given in Theorem A.0.8 to evaluate
this upper bound on HA(p).
If A 6= Rd+ is a convex corner, then β ∈ R+ : β1 ∈ A is bounded and we define
N(A) = maxβ ∈ R+ : β1 ∈ A. (1.12)
The maximum exists by the fact that A is closed. Note that N(A) = 0 if and only if A has
empty interior. We write N(Rd+) =∞.
Theorem 1.2.13. Let A be a standard Rd-convex corner. Then
maxp∈Pd
HA(p) = − logN(A).
1.2 Entropy over a convex corner in Rd 10
Proof. We have
maxp∈Pd
HA(p) = maxp∈Pd
minv∈A
d∑i=1
−pi log vi = supp∈Pd
infv∈A
d∑i=1
−pi log vi.
Note that Pd and A are compact and convex subsets of finite dimensional spaces.
Consider the function f : Pd × A → R ∪ ∞ given by f(p, v) =∑d
i=1−pi log vi. We
recall from the proof of Lemma 1.2.3 that the function v → f(p, v) is lower semi-continuous
and it is also clear that it is convex. The function p→ f(p, v) is linear and thus concave for
a fixed v ∈ A. Thus the conditions for applying Theorem A.0.8 are met, and interchange of
the supremum and infimum yields
maxp∈Pd
HA(p) = infv∈A
supp∈Pd
d∑i=1
−pi log vi
= infv∈A
log1
mini∈[d] vi= log
1
supv∈Amini∈[d] vi. (1.13)
Let m = supv∈Amini∈[d] vi. Recalling that N(A)1 ∈ A, it is clear that m ≥ N(A).
Conversely, for all ε > 0, there exists v ∈ A such that m − ε < mini∈[d] vi and hence
(m− ε)1 < v ∈ A. By hereditarity it follows that (m− ε)1 ∈ A. Thus N(A) ≥ m− ε for all
ε > 0 and so N(A) ≥ m. Thus m = N(A) and putting this in (1.13) completes the proof.
1.2.3 Anti-blockers
Here we recall the definition of the anti-blocker of a subset of Rd, and then survey some as-
sociated properties. We work towards Corollary 1.2.21, a result giving equivalent expressions
for the upper bound determined in Theorem 1.2.13.
Definition 1.2.14. ([22].) For A ⊆ Rd+, we define A[, the anti-blocker of A, by
A[ = v ∈ Rd+ : 〈v, u〉 ≤ 1 for all u ∈ A.
It is well known (see [11] and [22]) and easy to verify that A[ is a convex corner and that
A ⊆ B ⇒ A[ ⊇ B[. (1.14)
We will refer to (A[)[ as the as the second anti-blocker of A, and we will write (A[)[ = A[[.
The following important result will then be known as the ‘second anti-blocker theorem’.
1.2 Entropy over a convex corner in Rd 11
Theorem 1.2.15. ([22, Section 30].) If A is a convex corner, then A[[ = A.
Lemma 1.2.16. For k > 0, the unit corner Cd ⊆ Rd and unit cube Bd ⊆ Rd satisfy
(kBd)[ =1
kCd and (kCd)[ =
1
kBd.
Proof. Since k1 ∈ kBd, it follows that if u ∈ (kBd)[, then 〈u, k1〉 = k∑d
i=1 ui ≤ 1 and
u ∈ 1kCd. Conversely, if u ∈ 1
kCd, then∑d
i=1 ui ≤1k . Then for all v ∈ kBd it follows that
〈u, v〉 ≤ 〈u, k1〉 = k∑n
i=1 ui ≤ 1 and u ∈ (kBd)[. Thus, (kBd)[ = 1kCd. Anti-blocking both
sides and applying Theorem 1.2.15 yields kBd = ( 1kCd)
[, and the second assertion follows.
Lemma 1.2.17. It holds that A is a standard convex corner if and only if A[ is a standard
convex corner.
Proof. Assume A is a standard convex corner. Then A has non-empty interior and by Lemma
1.2.2, s1 ∈ A for some s > 0. Thus sBd ⊆ A by hereditarity. Since A is bounded, A ⊆ tCdfor some finite t > 0. Then sBd ⊆ A ⊆ tCd and hence, using the previous lemma and (1.14),
1tBd ⊆ A
[ ⊆ 1sCd. It follows that A[ is a standard convex corner. The converse holds by
Theorem 1.2.15.
Definition 1.2.18. For an Rd-convex corner A with non-empty interior, we define the pa-
rameter M(A) by letting
M(A) = inf
k∑i=1
λi : ∃k ∈ N, vi ∈ A and λi > 0 for i ∈ [k] such that
k∑i=1
λivi ≥ 1
.
(1.15)
If A has empty interior, Lemma 1.2.2 shows that the condition on the right hand side of
(1.15) cannot be satisfied, and we set M(A) =∞.
Note that M(Rd+) = 0. It is also easy to see that a standard convex corner A, being
bounded and containing a positive definite element, satisfies 0 < M(A) <∞.
Lemma 1.2.19. If A is a standard convex corner, the infimum in Definition 1.2.18 is at-
tained.
Proof. First note that for a standard convex corner A we have 0 < M(A) < ∞. From
(1.15), for all n ∈ N we can form xn =∑
i λ(n)i v
(n)i ≥ 1 with v
(n)i ∈ A and λ
(n)i > 0 such
that M(A) ≤∑
i λ(n)i ≤ M(A) + 1/n, giving that
∑i λ
(n)i →n→∞ M(A). By convexity,
1.2 Entropy over a convex corner in Rd 12
(∑i λ
(n)i
)−1xn ∈ A for all n ∈ N. Since A is compact, the sequence
((∑i λ
(n)i
)−1xn
)n∈N
has a subsequence
((∑i λ
(nj)i
)−1xnj
)j∈N
convergent to some a ∈ A. Then
M(A)∑i λ
(nj)i
xnj →j→∞ M(A)a.
Comparing this to the limitM(A)∑i λ
(nj)i
1→j→∞ 1,
and noting that xnj ≥ 1 for all j ∈ N, it is clear that M(A)a ≥ 1. Since a ∈ A, this shows
that the infimum of (1.15) is achieved.
Recalling the definition of N(A) in (1.12), the following proposition links N(A),M(A)
and γ(A[). We use the convention that 1∞ = 0 and 1
0 =∞.
Proposition 1.2.20. Let A be an Rd-convex corner. It holds that
M(A) =1
N(A)= γ(A[).
Proof. First we consider the case that A 6= Rd+ has non-empty interior. Set y = N(A)1, and
observe that y ∈ A and N(A) > 0. Then 1N(A)y = 1 and M(A) ≤ 1
N(A) .
The reverse inequality follows from the proof of Lemma 1.2.19. Recall that for all n ∈ N
we had(∑
i λ(n)i
)−1xn ∈ A with
(∑i λ
(n)i
)−1≥ (M(A) + 1/n)−1 and xn ≥ 1. Hereditarity
gives that (M(A) + 1/n)−11 ∈ A for all n ∈ N. It follows that N(A) ≥ 1
M(A) .
By (1.14) and Lemma 1.2.16, it is easy to see that A = Rd+ if and only if A[ = 0 if
and only if γ(A[) = 0. It is also clear that if A has non-empty interior, then kCd ⊆ A for
some k > 0, and A[ ⊆ (kCd)[ = 1kBd, giving that A[ is bounded. Thus, when A 6= Rd+
has non-empty interior, we obtain 0 < γ(A[) < ∞. To prove the second equality in this
case, let w ∈ A[ satisfy 〈w,1〉 = γ(A[) ∈ (0,∞). By the definition of A[, it holds that 1 ≥
〈w,N(A)1〉 = N(A)γ(A[), and so γ(A[) ≤ 1N(A) . For the reverse inequality, set v = 1
γ(A[)1.
For all u ∈ A[, we have 〈v, u〉 = 1γ(A[) 〈1, u〉 ≤ 1. This shows that v ∈ A[[, and so v ∈ A by
Theorem 1.2.15. This is sufficient to show that N(A) ≥ 1γ(A[) , as required.
In the case that A = Rd+, the result holds with M(A) = 0, N(A) =∞, and γ(A[) = 0.
Finally, if A has empty interior, by Lemma 1.2.2 for some i ∈ [d] we have vi = 0 for all
v ∈ A, and then ui can be arbitrarily large for u ∈ A[, and A[ is unbounded. So when A has
1.3 Graph entropy 13
empty interior, the result holds with M(A) =∞, N(A) = 0 and γ(A[) =∞.
Theorem 1.2.13 and Proposition 1.2.20 immediately give the following corollary.
Corollary 1.2.21. For a standard Rd-convex corner A,
maxp∈Pd
HA(p) = − logN(A) = logM(A) = log γ(A[).
1.3 Graph entropy
This section concerns graph entropy, a real functional defined on a graph with a given prob-
ability distribution on its vertices which was first introduced in [23]. After a summary of
some basic terminology from graph theory in Section 1.3.1, the motivation of graph entropy
will be discussed in Section 1.3.2, where we recall that graph entropy is simply the entropy
of a probability distribution over a certain convex corner associated with the given graph.
Indeed, we show that a number of important graph parameters can be defined in terms of
convex corners. Finally, in Section 1.3.3, Theorem 1.2.9 will be used to yield a number of
new results on graph entropy.
1.3.1 Some graph theoretic preliminaries
Let G be a graph on vertex set V (G) with edge set E(G), where each edge is an unordered
pair of distinct vertices in V (G). In terminology and notation we broadly follow [15], and
here we consider only finite graphs. If i, j is an edge, we say vertex i is adjacent to vertex
j and write i ∼ j. If i is adjacent to or equal to j, we will write i ' j. If graphs F and G
are such that V (F ) ⊆ V (G) and E(F ) ⊆ E(G), then F is said to be a subgraph of G. A
spanning subgraph of G is a subgraph of G with vertex set V (G). An induced subgraph of G
is a subgraph of G, any two vertices of which are adjacent if and only if they are adjacent
in G. Specifically, if S ⊆ V (G), then the subgraph GS of G, induced by S, has V (GS) = S,
and the edges of GS are precisely those edges of G between vertices of S. A stable set or
independent set is a set of vertices in G no two of which are adjacent. The empty set will be
considered stable. The independence number, α(G), of graph G is the size of a largest stable
set. A set S ⊆ V (G) is a clique of G if i ∼ j for all distinct i, j ∈ S. The clique number,
ω(G), of graph G is the size of a largest clique. The complement of G is the graph G where
i ∼ j in G if and only if i 6' j in G. It is clear that G = G. The stable sets of G are precisely
the cliques of G and hence α(G) = ω(G).
1.3 Graph entropy 14
A colouring of graph G is an assignment of colours to its vertices such that adjacent
vertices are assigned distinct colours. The smallest number of colours needed to colour graph
G is called the chromatic number, χ(G). Equivalently, χ(G) is the smallest m ∈ N such
that V (G) can be partitioned into m independent sets. The clique covering number χ(G)
is the smallest m ∈ N such that V (G) can be partitioned into m cliques; it is clear that
χ(G) = χ(G). The complete graph Kn has vertex set 1, . . . , n and edges joining every
pair of distinct vertices. The graph Kn is known as the empty graph on n vertices, having
vertex set 1, . . . , n and no edges. Graphs F and G are isomorphic if there exists a bijection
φ : V (F ) → V (G) such that x ∼ y in F if and only if φ(x) ∼ φ(y) in G. In that case we
say φ is an isomorphism from F to G and write F ∼= G. An automorphism of graph G is
an isomorphism from G to itself. A graph G is vertex transitive if for every pair of distinct
vertices i, j ∈ V (G), there exists an automorphism f such that f(i) = j. If F and G are
graphs, a mapping f : V (F )→ V (G) is a homomorphism if f(x) ∼ f(y) in G whenever x ∼ y
in F . A probabilistic graph (G, p) is a graph G equipped with a probability distribution p
defined on its vertices.
Many different graph products have been defined in the literature. Others will be discussed
later, but for now the following definition will suffice.
Definition 1.3.1. ([23].) The co-normal product (alternatively, ‘or’ product or ‘disjunctive’
product) of graphs F and G is the graph F ∗ G for which V (F ∗ G) = V (F ) × V (G) and
(i1, j1) ∼ (i2, j2) if and only if i1 ∼ i2 in F or j1 ∼ j2 in G. We denote G ∗ G ∗ . . . ∗ G, the
nth co-normal power of G, by Gn. Note F ∗G ∼= G ∗ F .
Definition 1.3.2. ([23]) A kernel or maximal stable set of G is a stable set which is not a
proper subset of a stable set of G.
Lemma 1.3.3. If K ⊆ V (F ∗ G), then K is a kernel of F ∗ G if and only if K = S × T
where S and T are kernels of F and G, respectively.
Proof. Clearly, if S and T are kernels of F and G respectively, S × T is stable in F ∗G. To
show that S × T is maximally stable in F ∗G, consider (i, j) ∈ V (F ∗G) with (i, j) /∈ S × T .
Then i /∈ S or j /∈ T . Without loss of generality, suppose i /∈ S. Since S is maximally
stable in F , it follows that i ∼ k in F for some k ∈ S and thus (i, j) ∼ (k, l) in F ∗ G for
some (k, l) ∈ S × T . We can conclude that S × T is a kernel of F ∗ G. Conversely, let K
be any kernel in F ∗ G. The projection of K onto V (F ), which we denote by projV (F )(K),
cannot contain adjacent vertices in F , so projV (F )(K) ⊆ S for some kernel S of F . Similarly,
1.3 Graph entropy 15
projV (G)(K) ⊆ T for some kernel T of G. Then K ⊆ S×T . But as a kernel, K is maximally
stable and hence K = S × T .
Corollary 1.3.4. If F and G are graphs then
α(F ∗G) = α(F )α(G).
Let graph G have d vertices and label its independent sets S1, . . . , Sk. We identify RV (G),
the set of real-valued functions on V (G), with Rd. Now let v(Si) be the characteristic vector
of Si. It is easy to see the definition of χ(G) can be stated equivalently as
χ(G) = min
k∑i=1
µi : µi ∈ 0, 1,k∑i=1
µiv(Si) ≥ 1
.
Here the µi are simply weightings, each either 0 or 1, put on the independent sets of G. If
the restriction on µi is weakened to allow any non-negative weightings then we obtain the
fractional chromatic number, χf(G), given by
χf(G) = min
k∑i=1
µi : k ∈ N, µi ≥ 0,
k∑i=1
µiv(Si) ≥ 1
. (1.16)
It is clear that χf(G) ≤ χ(G).
1.3.2 Graph entropy and the problem of source coding without complete
distinguishability
The concept of graph entropy was introduced by Janos Korner in 1973 [23] to solve the
source coding problem in the case of a discrete, memoryless, stationary source with alphabet
X whose symbols are not all distinguishable. This could arise, for instance, if the symbols are
handwritten. We take distinguishability to be a symmetric (but not necessarily transitive)
binary relation on X that is known and fixed. We construct a graph G, known as a distin-
guishability graph, to describe this distinguishability relation where V (G) = X and where
vertices i and j are adjacent in G if and only if i and j are distinguishable. As in Section 1.1,
we consider a source consisting of a sequence (Xi)∞i=1 of independent, identically distributed
random variables, each taking values in X and following distribution p.
Now x, y ∈ X k are distinguishable if and only if they are distinguishable at least at one
coordinate. We form the graph Gk, the kth co-normal power of G, where V (Gk) = X k and
1.3 Graph entropy 16
x ∼ y in Gk if and only if xi ∼ yi in G for at least one i ∈ 1, . . . , k. The graph Gk is then
the distinguishability graph for elements of X k. We set up a k-to-m binary code, but do not
insist on encoding all of X k, but merely a ‘probable set’ E ⊆ X k, where pk(E) ≥ 1 − λ for
some fixed λ ∈ (0, 1). Let GkE be the subgraph of Gk induced by E. Our encoding must map
distinguishable elements of E, or equivalently, adjacent vertices of GkE , to distinct binary
strings in 0, 1m. (The reason for this is that if x, y ∈ E are not distinguishable, then they
have potentially already been confused.) Assigning distinct codewords in 0, 1m to adjacent
vertices of GkE is a graph colouring problem and requires at least χ(GkE) codewords. Let
NG(k, λ) be the smallest integer m such that this encoding E → 0, 1m is possible, that is
NG(k, λ) =⌈log(
minχ(GkE) : E ⊆ X k, pk(E) ≥ 1− λ
)⌉,
where dxe denotes the smallest integer greater than or equal to x.
As k → ∞, Korner shows in [23] that NG(k, λ)/k tends to a well defined limit which
is independent of λ. This limit is called the graph entropy H(G, p), and thus we have the
following definition.
Definition 1.3.5. ([23]) The graph entropy of the probabilistic graph (G, p) is defined as
H(G, p) = limk→∞
1
klog(
minχ(GkE) : E ⊆ X k, pk(E) ≥ 1− λ
)(1.17)
where λ ∈ (0, 1).
We have the intuitive understanding of graph entropy as the ‘long-term mean number
of encoding bits required per source symbol sent’ for such encoding. For more details and
a survey of known results on graph entropy, see [49]. We can see how graph entropy gen-
eralises Shannon entropy by considering the graph entropy of the complete graph Kn under
an arbitrary probability distribution. This corresponds to an alphabet with no confusability,
the situation that Shannon’s source coding theorem addresses. For p ∈ Pn it should thus
hold that H(Kn, p) = H(p). To see this is indeed the case, note that if G = Kn, then GkE is
the complete graph with vertex set E and χ(GkE) = |E|. Then the expression for H(G, p) in
(1.17) is identical to the expression for H(p) given in (1.1), and the claimed equality holds.
Some definitions are needed before an alternative but equivalent expression for graph
entropy can be stated.
Definition 1.3.6. [49], [17]. We define VP(G), the vertex packing polytope of G, to be the
convex hull of the characteristic vectors of the stable sets of G.
1.3 Graph entropy 17
Lemma 1.3.7. For any graph G, VP(G) is hereditary, that is, if v ∈ VP(G), then for all w
satisfying 0 ≤ w ≤ v, we have w ∈ VP(G).
Proof. Let v(A) denote the characteristic vector of set A. Take w ≤ v =∑
k αkv(Sk) ∈ VP(G)
where αk ≥ 0,∑
k αk = 1 and each Sk is stable in G. Choose i ∈ V (G) and let ε = vi − wi
where vi ≥ ε ≥ 0. For each stable set Sk containing i choose εk ≤ αk such that∑
k:i∈Sk εk = ε.
Note that if Sk is stable, so is Sk\i, where we use the fact that the empty set is stable in
the case Sk = i. We form
v′ =∑k:i∈Sk
((αk − εk)v(Sk) + εkv
(Sk\i))
+∑k:i/∈Sk
αkv(Sk) ∈ VP(G).
Then v′i = wi and v′j = vj for j 6= i. Repeating for each i ∈ V (G) gives the required result.
Lemma 1.3.8. If G is a graph on n vertices then Cn ⊆ VP(G) ⊆ Bn.
Proof. Since i is a stable set for each i ∈ V (G), we have u ∈ Rn+ :∑n
i=1 ui = 1 ⊆ VP(G)
and Cn ⊆ VP(G) by Lemma 1.3.7. Every characteristic vector v of a stable set in a graph G
satisfies v ≤ 1, and so VP(G) ⊆ Bn.
Consider graph G with V (G) = [n]. As the convex hull of a finite set of non-negative
vectors in Rn, VP(G) is convex, non-negative, non-empty and bounded. The fact that VP(G)
is closed follows from the standard result that the convex hull of a finite set in Rn is compact
(see for example [54, p. 45].) Hereditarity was established by Lemma 1.3.7, and it is thus
clear that VP(G) is a convex corner [11]. By Lemma 1.3.8, 1n1 ∈ VP(G), and so VP(G) is a
standard convex corner.
We now show that a number of important graph parameters can be defined in terms of
convex corners; this is an important theme.
Lemma 1.3.9. For any graph G, the independence number α(G) is given by
α(G) = γ(VP(G)).
Proof. Let u be the characteristic vector of a stable set of cardinality α(G). Since u ∈ VP(G),
it follows that max∑
i vi : v ∈ VP(G) ≥ α(G). If v ∈ VP(G), then v =∑
k λkv(Sk) where
v(Sk) is the characteristic vector of stable set Sk and∑
k λk = 1 where λk ≥ 0. For each k,
we have∑
i v(Sk)i ≤ α(G), and so
∑i vi ≤
∑k λkα(G) = α(G).
1.3 Graph entropy 18
Since a set of vertices of G is independent in G if and only if it is a clique in G, it follows
immediately that
ω(G) = γ(VP(G)). (1.18)
In an analogous way to (1.16) on page 15, we can modify the definition of the clique
number ω(G) to obtain the fractional clique number ωf(G), as for example in [15]. First note
that set S ⊆ V (G) is a clique in G if and only if S contains no distinct elements in the same
independent set. This is equivalent to the condition that v(S), the characteristic vector of S,
satisfies 〈vi, w〉 ≤ 1 where w is the characteristic vector of any independent set, which is in
turn equivalent to the condition v(S) ∈ VP(G)[. Letting V (G) = [n], we can then see that
ω(G) = max
n∑i=1
λi : λi ∈ 0, 1, (λi)i∈[n] ∈ VP(G)[
,
where the λi are weightings given to the vertices of G. Weakening the restriction to λi ≥ 0
gives the fractional clique number,
ωf(G) = max
n∑i=1
λi : λi ≥ 0, (λi)i∈[n] ∈ VP(G)[
,
and we thus have
ωf(G) = γ(VP(G)[). (1.19)
It is immediate that ωf(G) ≥ ω(G). The vertex packing polytope VP(G) is given further
significance from the following result in [11].
Theorem 1.3.10. The graph entropy H(G, p) is given by
H(G, p) = min
−
n∑i=1
pi log vi : v ∈ VP(G)
. (1.20)
As VP(G) is a convex corner, we can write
H(G, p) = HVP(G)(p),
and we see that graph entropy is an example of entropy over a convex corner.
The stable sets of the complete graph Kn are i, for i ∈ [n], and the empty set ∅, and so
VP(Kn) =
v ∈ Rn+ :
n∑i=1
vi ≤ 1
= Cn.
1.3 Graph entropy 19
Recalling (1.7) we observe that
H(Kn, p) = HCn(p) = H(p) for all p ∈ Pn. (1.21)
Since 1, . . . , n is stable in the empty graph Kn, it follows that 1 ∈ VP(Kn), and by Lemma
1.3.8, VP(Kn) = Bn. Then
H(Kn, p) = HBn(p) = 0 for all p ∈ Pn. (1.22)
By Lemmas 1.3.8 and 1.2.6, for every graph G on n vertices and for all p ∈ Pn we have
0 ≤ H(G, p) ≤ H(p). (1.23)
We return to the definition of χf(G) in (1.16). Since every v ∈ VP(G) is a convex
combination of the v(Si),
χf(G) = min
k∑i=1
λi : λi > 0,k∑i=1
λiu(i) ≥ 1, u(i) ∈ VP(G), k ∈ N
. (1.24)
In the notation of Definition 1.2.18 this can be written as
χf(G) = M(VP(G)). (1.25)
Equations (1.25) and (1.19) now show that the well-known result
χf(G) = ωf(G), (1.26)
given for instance in [15, Section 7.5], is just a special case of Proposition 1.2.20.
By writing H(G, p) = HVP(G)(p), the following corollary is immediate from Corollary
1.2.21 . (Note that the first equality is proved in [50, Lemma 4] by considering the properties
of VP(G). The result given in [50] is thus a special case of Theorem 1.2.13.)
Corollary 1.3.11. The maximum graph entropy of a given graph G over all probability
distributions is given by
maxp∈Pn
H(G, p) = logχf(G) = logωf(G) = log γ(VP(G)[) = − logN(VP(G)).
1.3 Graph entropy 20
1.3.3 A lower bound on graph entropy
The lower bound H(G, p) ≥ 0 is given in (1.23). Applying Theorem 1.2.9 to the vertex
packing polytope yields a less trivial lower bound. We will show that the resulting lower
bound gives the exact graph entropy of a class of probabilistic graphs which includes the odd
cycles and their complements under certain probability distributions. A number of other new
results will also be given. We begin with the following lemma.
Lemma 1.3.12. Let the probability distribution p ∈ Pn satisfy p > 0. For graph G on n
vertices, let a ∈ VP(G) be a vector such that
−n∑i=1
pi log ai = H(G, p). (1.27)
Then a is in the convex hull of characteristic vectors of the kernels of G.
Proof. Let v(S) be the characteristic vector of S ⊆ V (G), and let a =∑m
j=1 αjv(Sj), where
S1, . . . , Sm are stable sets but not all kernels, and where αj > 0 satisfy∑m
j=1 αj = 1. Every
stable set that is not maximal is a proper subset of one that is. For each j ∈ [m], let Tj = Sj
if Sj is a kernel, and if Sj is not a kernel, choose kernel Tj such that Sj ⊂ Tj . Now let
b =∑m
j=1 αjv(Tj). Clearly b ∈ VP(G). We have v(Tj) − v(Sj) ≥ 0 for all j ∈ [m] and so
bi ≥ ai and − log bi ≤ − log ai for all i ∈ [n]. Furthermore, for some i ∈ [n], we have bi > ai
and − log bi < − log ai. Since pi > 0, it then holds that a is not the minimizing vector in
(1.20).
Remark 1.3.13. Lemma 1.2.3 shows that if p > 0 there is a unique a satisfying (1.27). How-
ever, if pi = 0 for some i, there may be more than one vector a satisfying (1.27). Furthermore,
such a vector amay not lie in the convex hull of the characteristic vectors of kernels ofG. How-
ever, if this is the case, using the method above we can form a vector b which lies in the convex
hull of the characteristic vectors of kernels of G and satisfies∑n
i=1 pi log 1bi
=∑n
i=1 pi log 1ai
.
Thus, for any (G, p), there will always be a vector satisfying (1.27) which lies in the convex
hull of the characteristic vectors of kernels of G.
Theorem 1.2.9 has the following corollary. This important result will be widely used in
the rest of the chapter to obtain some new results on graph entropy.
Corollary 1.3.14. For any probabilistic graph (G, p),
H(G, p) ≥ H(p)− logα(G). (1.28)
1.3 Graph entropy 21
Equality holds if and only if α(G)p ∈ VP(G); in this case v = α(G)p is the unique vector in
VP(G) satisfying∑n
i=1 pi log 1vi
= H(G, p), where n = |V (G)|.
Proof. Set A = VP(G) in Theorem 1.2.9.
We note that (1.28), though not the equality condition given in Corollary 1.3.14, appears
as equation (14) in [24]. The proof given in [24] follows a different method.
Remark 1.3.15. Lemma 1.3.12 and the following remark show that there is a vector v satisfy-
ing∑n
i=1 pi log(1/vi) = H(G, p) in the convex hull of the characteristic vectors of kernels of
G. For equality in Corollary 1.3.14 we note that such a vector v must satisfy∑n
i=1 vi = α(G),
that is v must lie in the convex hull of the characteristic vectors of kernels of G of size α(G).
It is interesting to compare Corollary 1.3.14 to another known bound. Letting
α(G, p) = max
∑i∈S
pi : S is a stable set of G
,
Cardinal, Fiorini and Joret in [7] established that
− logα(G, p) ≤ H(G, p) (1.29)
Set B1(G, p) = H(p)− logα(G), the lower bound on H(G, p) from Corollary 1.3.14, and
B2(G, p) = − logα(G, p), the lower bound on H(G, p) from (1.29). In the case of the uniform
distribution u ∈ Pn and V (G) = [n], we have
B1(G, u) = H(u)− logα(G) = log
(n
α(G)
)= − logα(G, u) = B2(G, u).
However, it is easy to find probabilistic graphs with B1(G, p) > B2(G, p) and those with
B1(G, p) < B2(G, p). Take, for instance, the complete graph Kn. Here we have
B1(Kn, p) = H(p) = H(Kn, p),
but
B2(Kn, p) = − log(maxpj : j ∈ [n]) =−n∑i=1
pi log(maxpj : j ∈ [n])
≤−n∑i=1
pi log pi = H(Kn, p),
1.3 Graph entropy 22
with equality if and only if the non-zero coordinates of p are equal.
Consider the path graph G shown below.
•1 •2 •3
The kernels of G are 1, 3 and 2, so H(G, p) can be found directly from (1.20) on
page 18 by a straightforward minimisation; just note that the minimising vector a lies in the
convex hull of the kernels of G and so is of the form a = (α, 1− α, α)t for α ∈ [0, 1].
With (p1, p2, p3) = (12 ,
14 ,
14) we have H(G, p) = 2 − 3
4 log 3. We also have B1(G, p) = 12 and
B2(G, p) = log 43 <
12 . In this case neither bound is optimal, but B1(G, p) is closer.
Now take the same graph with (p1, p2, p3) = (14 ,
12 ,
14) to yield H(G, p) = 1, B1(G, p) = 1
2
and B2(G, p) = 1, and only bound B2 is optimal.
As a final example, let G be the graph
•1 •2 •3 •4
and let (p1, p2, p3, p4) = (18 ,
38 ,
18 ,
38). This gives B1(G, p) = 2 − 3
4 log 3 and B2(G, p) = 2 −
log 3 < B1(G, p). It is easy to see that
α(G)p = 2p =1
4(1, 0, 1, 0)t +
3
4(0, 1, 0, 1)t ∈ VP(G),
and thus the equality condition in Corollary 1.3.14 is satisfied giving H(G, p) = B1(G, p).
We look now at some applications of Corollary 1.3.14 and in particular at those situations
where equality holds. Observe that every graph G has at least one kernel K with |K| = α(G).
Let G have kernels K1, . . . ,Km satisfying |Ki| = α(G), and let the characteristic vector of
Ki be v(i). For αi ≥ 0 and∑m
i=1 αi = 1 it then holds that v =∑m
i=1 αiv(i) ∈ VP(G)
and p = (1/α(G))v is a probability distribution satisfying α(G)p ∈ VP(G). Thus, we have
equality in Corollary 1.3.14, andH(G, p) = H(p)−logα(G). Let E(G) be the set of probability
distributions giving equality in Corollary 1.3.14 for graph G. Then E(G) is non-empty for
every graph G. In fact if m > 1 for graph G, then E(G) is an infinite set. Indeed, for the
complete graph Kn, the kernels are just the singletons, and it is trivial that E(Kn) = Pn.
In this section we prove some results concerning E(G) for certain graphs G. If G is vertex
1.3 Graph entropy 23
transitive, we show that the uniform distribution u ∈ E(G). We characterize E(G) completely
when G is an odd cycle or its complement.
Proposition 1.3.16. Let G be a vertex transitive graph with n vertices. Then, when p is the
uniform distribution u with ui = 1n , i = 1, . . . , n, equality holds in Corollary 1.3.14 and
H(G, u) = H(u)− logα(G) = logn
α(G). (1.30)
Proof. Let G have N kernels of size α(G). We label them Aj for j = 1, . . . , N . By vertex
transitivity, each vertex is included in the same number k of the Aj ’s. Let v(j) be the
characteristic vector of Aj . We have∑N
j=1 v(j) = k1. The sum of the n components of this
vector is nk = Nα(G). Consider the vector v = 1N
∑Nj=1 v
(j). By construction, v ∈ VP(G).
We note v = kN 1 = α(G)
n 1 = α(G)u, giving equality in Corollary 1.3.14.
Proposition 1.3.16 also follows immediately from results in [50], [39] and [40]. We briefly
outline this approach.
A graph G is called symmetric if
maxp∈Pn
H(G, p) = H(G, u),
that is, its entropy is maximised by the uniform distribution u. It holds that every vertex
transitive graph is symmetric; for example, this is noted in [39]. Furthermore, G is symmetric
if and only if χf (G) = nα(G) [40, Corollary 3.4]; see also [46, Proposition 3.1.1]. Then if G is
vertex transitive, Corollary 1.3.11 gives that
H(G, p) ≤ logn
α(G)(1.31)
for any distribution p, with equality in the case of the uniform distribution.
We now analyse the odd cycle, C2n+1, with V (C2n+1) = [2n + 1] and i ∼ j if and only
if j − i ≡ ±1 mod 2n+ 1. (While we work with the odd cycles and their complements, our
arithmetic on the set of vertices 1, . . . , 2n+ 1 will be done modulo 2n+ 1.)
A graph G is called bipartite if one may partition its vertex set into two subsets V1 and
V2 such that no edge joins vertices in the same subset. Bipartite graphs have been widely
studied and their graph entropies can be found as outlined by Simonyi in [49]. A graph G is
called perfect (see [15]) if the chromatic number of any induced subgraph H of G is equal to
ω(H). Perfect graphs have attracted a considerable amount of attention in the literature, see
1.3 Graph entropy 24
[50]. Note that odd cycles are not bipartite and C2n+1 is not perfect for n > 1; this motivates
us to consider them. (To see C2n+1 is not perfect for n > 1, consider C2n+1 as a subgraph
of itself. We have χ(C2n+1) = 3, but ω(C2n+1) = 2. The 3-cycle C3 = K3 and is perfect.)
We will see how Corollary 1.3.14 allows us to determine the graph entropy of odd cycles with
respect to certain probability distributions.
Since α(C2n+1) = n, Corollary 1.3.14 immediately gives H(C2n+1, p) ≥ H(p)− log n. Cy-
cles are vertex transitive, so by (1.30) equality certainly occurs with the uniform distribution,
and so H(C2n+1, u) = log 2n+1n . We now characterise the probability distributions p for which
equality in Corollary 1.3.14 holds for C2n+1.
Proposition 1.3.17. Let p be a probability distribution on V (C2n+1). The following are
equivalent:
(i) H(C2n+1, p) = H(p)− log n
(ii) pi + pj ≤ 1/n whenever i ∼ j.
Proof. The cycle C2n+1 has 2n + 1 kernels of size α(C2n+1) = n, which we label Ji with
i = 1, . . . , 2n+ 1. We write Ji = i, i+ 2, i+ 4, . . . , i+ 2(n− 1), modulo 2n+ 1. Let v(i) be
the characteristic vector of Ji.
We will show that if p ∈ R2n+1, then
α(C2n+1)p = np =
2n+1∑i=1
v(i)
(2n+1∑k=1
pk − npi−1 − npi−2
). (1.32)
Note in (1.32) p need not be a probability distribution. Each vertex is in n kernels. Specifi-
cally, vertex j lies in Ki for
i = j, j − 2, j − 4, . . . , j − 2(n− 1) mod (2n+ 1).
Thus the jth component of the right hand side of our identity is
n2n+1∑k=1
pk − n(pj−1 + pj−3 + . . .+ pj−2n+1)− n(pj−2 + pj−4 + . . .+ pj−2n) = npj
as required. Because (1.32) holds for any p ∈ R2n+1, the vectors v(1) . . . , v(2n+1) span R2n+1,
and so are linearly independent. Then this representation of α(C2n+1)p in terms of the v(i)’s
is unique.
1.3 Graph entropy 25
To see that the second statement in our proposition implies the first, note that when p is
a probability distribution,
2n+1∑i=1
(2n+1∑k=1
pk − npi−1 − npi−2
)= 2n+ 1− n− n = 1.
Also, when pi + pj ≤ 1/n for all adjacent pairs of vertices i and j, we have that
0 ≤ 1− npi−1 − npi−2 =
2n+1∑k=1
pk − npi−1 − npi−2
for i ∈ [2n+ 1]. By (1.32), we now have that α(C2n+1)p ∈ VP(C2n+1), and Corollary 1.3.14
then completes the argument.
To show the converse, note by Corollary 1.3.14 and Remark 1.3.15 that if H(C2n+1, p) =
H(p)− log n for p ∈ P2n+1 then α(C2n+1)p lies in the convex hull of the characteristic vectors
of kernels of C2n+1 of size α(C2n+1) = n. Then by the uniqueness of the representation of
α(C2n+1)p in (1.32), we have 1 − npl−1 − npl−2 ≥ 0 for all l ∈ [2n + 1], giving pi + pj ≤ 1/n
when i ∼ j.
Remark 1.3.18. (i) Note the condition for Proposition 1.3.17 is certainly satisfied by the
uniform distribution, and by “small” disturbances from it.
(ii) When n = 1, Proposition 1.3.17 gives that H(K3, p) = H(p) for all p ∈ P3, as in (1.21).
Before discussing C2n+1, the complement of the odd cycle C2n+1, we give two well-known
lemmas concerning graphs having the same vertex set. We supply the proof of the first
because it uses a method we will employ later.
Lemma 1.3.19. (Monotonicity, [49, Lemma 3.1].) If F and G are graphs and F is a spanning
subgraph of G, then H(F, p) ≤ H(G, p) for any probability distribution p.
Proof. Stable sets in G are also stable in F , and hence VP(G) ⊆ VP(F ), whence the expres-
sion for H(G, p) in (1.20) yields the result.
Lemma 1.3.20. (Sub-additivity, [49, Lemma 3.2].) Suppose graphs F and G satisfy V (F ) =
V (G). The graph F ∪G is constructed such that V (F ∪G) = V (F ) and E(F ∪G) = E(F )∪
E(G). For any probability distribution p it follows that H(F ∪G, p) ≤ H(F, p) +H(G, p).
Remark 1.3.21. For any graph G, the graph G ∪G is complete, and Lemma 1.3.20 yields
H(G, p) +H(G, p) ≥ H(G ∪G, p) = H(p).
1.3 Graph entropy 26
In the case of perfect graphs [50, Theorem 1] gives that H(G, p) + H(G, p) = H(p). When
n > 1 we have α(C2n+1) = 2 and so by Corollary 1.3.14, H(C2n+1, p) ≥ H(p) − log 2. Since
C2n+1 is vertex transitive, Proposition 1.3.16 gives for n > 1 that
H(C2n+1, u) = H(u)− log 2 = log(2n+ 1)− log 2.
Recalling that C2n+1 is not perfect for n > 1, we note for n > 1 that
H(C2n+1, u) +H(C2n+1, u) = 2 log(2n+ 1)− log(2n) ≥ log(2n+ 1) = H(u),
in line with the observation above. (Note C3 is the empty graph on three vertices and
H(C3, p) = 0 for any distribution p. Of course, C3 = K3 and H(C3, p) = H(p). Then
H(C3, p) +H(C3, p) = H(p), consistent with C3 being perfect.)
Proposition 1.3.22. Let p be a probability distribution on V (C2n+1) where n > 1. The
following are equivalent:
(i) H(C2n+1, p) = H(p)− log 2
(ii) pi+2 + pi+4 + . . .+ pi+2n ≤ 1/2 for all i = 1, . . . , 2n+ 1.
Proof. For n > 1, C2n+1 has 2n + 1 kernels of size α(C2n+1) = 2, given by Li = i, i + 1,
modulo 2n+ 1, with i = 1, . . . , 2n+ 1. Let v(i) be the characteristic vector of Li.
We now claim that if p ∈ R2n+1, then
α(C2n+1)p = 2p =
2n+1∑i=1
v(i)
(2n+1∑k=1
pk − 2pi+2 − 2pi+4 − . . .− 2pi+2n
).
Vertex j lies in Li for i ≡ j mod (2n + 1) and i ≡ j − 1 mod (2n + 1). Then the jth
component of the right hand side of our identity is
22n+1∑k=1
pk − 2(pj+2 + pj+4 + . . .+ pj+2n)− 2(pj+1 + pj+3 + . . .+ pj+2n−1) = 2pj ,
as required. Then the vectors v(1), . . . , v(2n+1) span R2n+1 and so are linearly independent,
and this representation of α(C2n+1)p in terms of the v(i)’s is unique. The proof now proceeds
exactly as that of Proposition 1.3.17. We merely note if p ∈ P2n+1 that
2n+1∑i=1
(2n+1∑k=1
pk − 2pi+2 − 2pi+4 − . . .− 2pi+2n
)= 2n+ 1− 2n = 1,
1.3 Graph entropy 27
and that the condition
2n+1∑k=1
pk − 2pi+2 − 2pi+4 − . . .− 2pi+2n ≥ 0
is equivalent to
pi+2 + pi+4 + . . .+ pi+2n ≤1
2
when p is a probability distribution.
Consider G′, a spanning subgraph of G, formed by removing an edge or edges from G, but
retaining the same vertex set. By the monotonicity of graph entropy as described by Lemma
1.3.19, H(G′, p) ≤ H(G, p). Here we use Corollary 1.3.14 to give a sufficient condition for
the equality H(G′, p) = H(G, p).
Proposition 1.3.23. Let the graph G and probability distribution p satisfy α(G)p ∈ VP(G).
If G′ is a spanning subgraph of G such that α(G′) = α(G), then H(G′, p) = H(G, p).
Proof. The stable sets of G are also stable sets of G′, and so VP(G) ⊆ VP(G′). Then
α(G′)p = α(G)p ∈ VP(G′) and so by Corollary 1.3.14,
H(G′, p) = H(p)− logα(G′) = H(G, p),
as required.
The complete bipartite graph Km,n is a graph whose vertices can be partitioned into two
subsets V1 and V2 with |V1| = m and |V2| = n and such that no edge joins vertices in the
same subset, but every vertex of V1 is adjacent to every vertex of V2. A graph G with 2n
vertices consisting of a disjoint union of n copies of the complete graph K2 is denoted M2n
and called a perfect matching. In [49] Simonyi showed that H(Km,m, u) = H(M2m, u) = 1.
Here we prove a more general result using Proposition 1.3.23. (Note that M2m is a spanning
subgraph of Km,m and that α(Km,m) = α(M2m) = m.)
Corollary 1.3.24. Let G be a spanning subgraph of Km,m satisfying α(G) = m. Then
H(G, u) = 1.
Proof. This fact follows immediately from Propositions 1.3.16 and 1.3.23 because Km,m is
vertex transitive and because α(G) = m = α(Km,m).
1.3 Graph entropy 28
Example 1.3.25. As another example of Proposition 1.3.23, consider the graph C7 on vertex
set 0, . . . , 6. Let G\e denote the graph G with one edge e removed. It is easy to verify
that, for example, α(C7\0, 3) = 2 and thus by Proposition 1.3.23 and Proposition 1.3.16
we have H(C7\0, 3, u) = H(C7, u) = log(7/2).
It is interesting to note that the same argument does not apply to every edge of C7. For
instance, let F = C7\0, 2. We see that 0, 1, 2 is stable in F and α(F ) = 3, meaning that
Proposition 1.3.23 does not yield the value of H(F, u).
In fact, in this case H(F, u) = log 7− (3 log 3+4)/7 < log(7/2), as the following argument
shows. The kernels of F are K1 = 0, 1, 2 and Ki = i, i + 1 for i = 2, . . . , 6. We let v(i)
denote the characteristic vector of Ki. By Lemma 1.3.12 there exist αi ≥ 0 for i = 1, . . . , 6
with∑6
i=1 αi = 1 such that H(F, u) = −(1/7)∑6
i=0 log vi with
v =6∑i=1
αiv(i) = (α1 + α6, α1, α1 + α2, α2 + α3, α3 + α4, α4 + α5, α5 + α6)t ∈ VP(F ).
Now let β1 = α1+(α2+α6)/2 and β3 = β5 = (α2+2α3+2α4+2α5+α6)/4. Set β2 = β4 = β6 =
0. We note that βi ≥ 0 and∑6
i=1 βi = 1 and so v′ =∑6
i=1 βiv(i) = (β1, β1, β1, β3, β3, β3, β3)t ∈
VP(F ). Now let r = (v0 +v1 +v2)/3 = α1 +(α2 +α6)/3 and s = (v3 +v4 +v5 +v6)/4. Observe
that β1 ≥ r and β3 = s. Letting w = (r, r, r, s, s, s, s)t we have v′i ≥ wi for all i = 0, . . . , 6. By
the concavity of the log function,
log r ≥ (log v0 + log v1 + log v2)/3 and log s ≥ (log v3 + log v4 + log v5 + log v6)/4.
We conclude that
−(1/7)
6∑i=0
log v′i ≤ −(1/7)
6∑i=0
logwi ≤ −(1/7)
6∑i=0
log vi = H(F, u).
Since v′ ∈ VP(F ), Lemma 1.2.3 shows that v = v′ and v is of the form v = (l, l, l,m,m,m,m)t.
This gives α2 = α4 = α6 = 0, and so α1 = l and α3 = α5 = m. Then 0 < l < 1
and m = (1 − l)/2, whence a short calculation shows that −∑6
i=0 log vi is minimised when
l = 3/7. Thus v = 17(3, 3, 3, 2, 2, 2, 2)t, and the value of H(F, u) is as given.
Thus deletion of an edge may leave the graph entropy unchanged, but in general it may be
decreased: indeed, removing all the edges of graph G results in the graph entropy vanishing.
We prove the following lemma, which gives an upper bound on the reduction in graph entropy
due to deletion of an edge.
1.3 Graph entropy 29
Proposition 1.3.26. Let G be a graph with vertex set V (G) and non-empty edge set E(G).
For some x, y ∈ E(G), let graph G′xy have V (G′xy) = V (G) and E(G′xy) = E(G)\x, y.
For any probability distribution on V (G) it then holds that
H(G′xy, p) ≥ H(G, p)− (px + py) log(px + py) + px log px + py log py.
Proof. Let F be the graph defined by V (F ) = V (G) with the single edge x, y. Then
F ∪G′xy = G and Lemma 1.3.20 gives
H(G′xy, p) ≥ H(G, p)−H(F, p). (1.33)
The kernels of F are K1 = V (G)\y and K2 = V (G)\x. By Remark 1.3.13, a vector
v satisfying −∑
i pi log vi = H(F, p) can be chosen of the form v = αv(1) + (1 − α)v(2),
where v(i) is the characteristic vector of Ki and α ∈ [0, 1]. Then −∑
i∈V (G) pi log vi =
−px logα− py log(1− α), where we note vi = 1 when i /∈ x, y. Elementary calculus shows
this is minimised by α = pxpx+py
. Thus we have,
H(F, p) = (px + py) log(px + py)− px log px − py log py,
which, with (1.33), yields the required result.
Having considered the deletion of edges, we now supply a proof of a rather intuitive result
concerning the deletion of zero-probability vertices.
Lemma 1.3.27. From probabilistic graph (G, p) we form probabilistic graph (G′, p′) by delet-
ing any number of vertices i ∈ V (G) with pi = 0 and deleting all edges incident on each deleted
vertex. We define the probability distribution p′ on V (G′) by p′j = pj for all j ∈ V (G′). Then
H(G, p) = H(G′, p′).
Proof. A stable set in G′ is stable in G. Thus if v′ ∈ VP(G′), we can define v ∈ VP(G) by vi =
v′i for i ∈ V (G′) and vi = 0 for i ∈ V (G)\V (G′). Let v′ ∈ VP(G′) satisfy−∑
i∈V (G′) p′i log v′i =
H(G′, p′). Then, with the convention 0 log 0 = 0,
H(G, p) ≤ −∑
i∈V (G)
pi log vi = −∑
i∈V (G′)
p′i log v′i = H(G′, p′).
We now work towards the reverse inequality. Let Sk be stable in G with characteristic
vector v(Sk), and let a =∑
k αkv(Sk) ∈ VP(G) satisfy −
∑i∈V (G) pi log ai = H(G, p). Then
1.4 Convex corners associated with a graph 30
Tk = Sk ∩ V (G′) is stable in G′ with characteristic vector u(Tk), given by u(Tk)i = v
(Sk)i for
i ∈ V (G′). Then b =∑
k αku(Tk) ∈ VP(G′) satisfies bi = ai for i ∈ VP(G′), and
H(G′, p′) ≤ −∑
i∈V (G′)
p′i log bi = −∑
i∈V (G)
pi log ai = H(G, p),
again setting 0 log 0 = 0.
1.4 Convex corners associated with a graph
The concept of graph entropy highlights the significance of the convex corner VP(G). In the
literature, the information theoretic significance of other convex corners associated with a
graph G has been discussed. Two of these convex corners, FVP(G) and TH(G), motivate
some of the work in later chapters, and it is now useful to recall their definitions and basic
properties. It will be an important theme of our work to observe how many important graph
parameters can be defined from associated convex corners. This section will be concluded by
applying Corollary 1.2.21 and Theorem 1.2.9 to FVP(G) and TH(G).
Definition 1.4.1. ([17], [22].) Let G be a graph. The fractional vertex packing polytope is
the convex corner given by
FVP(G) = VP(G)[.
Lemma 1.4.2. ([22].) If G is a graph on d vertices, then
Cd ⊆ FVP(G) ⊆ Bd and γ(FVP(G)) = χf(G).
Proof. Lemma 1.3.8 gives Cd ⊆ VP(G) ⊆ Bd and thus by (1.14) and Lemma 1.2.16,
Cd = B[d ⊆ FVP(G) ⊆ C[d = Bd.
Then note that γ(FVP(G)) = γ(VP(G)[) = χf(G) by Corollary 1.3.11.
The definition of TH(G), the theta corner of G, requires some background. Here we
follow [17]. Let graph G have d vertices. An orthonormal labelling (o.n.l.) of G is a family
(a(i))i∈V (G) of unit vectors in Rk for some k ∈ N which satisfy⟨a(i), a(j)
⟩= 0 when i 6' j
in G. We say((a(i))i∈V (G), c
)is a handled orthonormal labelling (h.o.n.l.) of G in Rk when
(a(i))i∈V (G) ⊆ Rk is an orthonormal labelling of G and c ∈ Rk is a unit vector. Let the set
1.4 Convex corners associated with a graph 31
C(G) ⊆ Rd+ be given by
C(G) =
(∣∣∣⟨c, a(i)⟩∣∣∣2)
i∈V (G)
:(
(a(i))i∈V (G), c)
a h.o.n.l. of G in Rk, k ∈ N
. (1.34)
Definition 1.4.3. ([17, Theorem 3.2], [18, Equation (9.3.3)].) The theta corner of G is given
by
TH(G) = C(G)[.
As the anti-blocker of a non-empty set in Rd+, TH(G) is a convex corner. In [28], Lovasz
introduced θ(G), a new graph parameter now known as the Lovasz number of G. There are
a number of equivalent ways to define θ(G); the following is given in [17].
Definition 1.4.4. For graph G, we define θ(G), the Lovasz number of G, by
θ(G) = γ(TH(G)). (1.35)
In [17, Corollary 3.4] it is shown that
TH(G) = TH(G)[. (1.36)
The following ‘classical sandwich theorem’ is given in [17, Theorem 3.6] and is also proved in
[22].
Theorem 1.4.5. If G is a graph, then
VP(G) ⊆ TH(G) ⊆ FVP(G).
By Lemmas 1.4.2 and 1.3.8 it then follows for graph G on d vertices that
γ(Cd) ≤ γ(VP(G)) ≤ γ(TH(G)) ≤ γ(FVP(G)) ≤ γ(Bd).
Lemmas 1.4.2 and 1.3.9 and (1.35) then yield the following result, as given in [22]:
1 ≤ α(G) ≤ θ(G) ≤ χf(G) ≤ d. (1.37)
It is also immediate from Theorem 1.4.5 that for all p ∈ Pd
HVP(G)(p) ≥ HTH(G)(p) ≥ HFVP(G)(p). (1.38)
1.4 Convex corners associated with a graph 32
Corollary 1.3.11 shows that maxp∈Pd HVP(G)(p) = logχf(G). We now give the analogous
results for FVP(G) and TH(G). Note that Corollary 1.4.7 is given in [29].
Corollary 1.4.6. For graph G it holds that maxp∈Pd HFVP(G)(p) = logα(G).
Proof. Corollary 1.2.21 gives
maxp∈Pd
HFVP(G)(p) = log γ(FVP(G)[) = log γ(VP(G)) = logα(G),
where we have also used Definition 1.4.1, Theorem 1.2.15 and Lemma 1.3.9.
Corollary 1.4.7. For graph G on d vertices maxp∈Pd HTH(G)(p) = log θ(G).
Proof. We again use Corollary 1.2.21 to obtain
maxp∈Pd
HTH(G)(p) = log γ(TH(G)[) = log γ(TH(G)) = log θ(G),
where the final two equalities follow from (1.35) and (1.36).
Indeed, from Corollary 1.2.21 it is clear that any graph parameter expressible in the
form γ(A) for some Rd-convex corner A is given by the maximum entropy of a probability
distribution p ∈ Pd over A[. Exploring links between graph parameters and entropy over
convex corners is a profitable line of enquiry: see for instance [53, Theorem 5.10].
Note that Corollary 1.3.14 may be written as HVP(G)(p) ≥ H(p)− logα(G), and we now
give the equivalent results for FVP(G) and TH(G).
Corollary 1.4.8. For any graph G it holds that
HFVP(G)(p) ≥ H(p)− logχf(G) and HTH(G)(p) ≥ H(p)− log θ(G).
Proof. This is immediate from Theorem 1.2.9, Lemma 1.4.2 and (1.35).
Chapter 2
Convex corners in Md
This chapter introduces the concept of an Md-convex corner, that is a convex corner whose
elements are d × d matrices. New results, analogous to those in the previous chapter for
Rd-convex corners will be given, including the ‘second anti-blocker’ theorem for Md-convex
corners. We will conclude by showing how the concept of Md-convex corners leads to the
definition of new entropic quantities in the field of quantum information, which generalise
the well-known von Neumann entropy.
2.1 Preliminaries
We let e1, . . . , ed denote the canonical basis of Cd and Md be the set of d × d matrices
with entries in C. We set Dd = span(eie∗i : i ∈ [d]), so that Dd is the algebra of d ×
d diagonal matrices. We let Mhd ,M
+d ,M
++d denote the sets of d × d Hermitian, positive
semi-definite and positive definite matrices respectively, and we set D+d = Dd ∩M+
d . For
M = (mij), N = (nij) with M,N ∈ Md we will use the Hilbert–Schmidt inner product
〈M,N〉 =∑d
i,j=1mijnij = Tr(MN∗). The associated Hilbert–Schmidt norm will be denoted
‖M‖2 and is given by ‖M‖2 =√〈M,M〉. We write ‖M‖ for the operator norm, given by
‖M‖ = sup‖Mv‖ : v ∈ Cd, ‖v‖ = 1. In this section we give some important preliminaries;
some other helpful but standard linear algebraic results are given in Appendix B.
The following lemma gives a useful property of positive semi-definite matrices that will be
used later in the chapter. The result is well-known, but we include a proof for completeness.
Recall that for M ∈M+d there exists a unique matrix M1/2 ∈M+
d satisfying M = (M1/2)2.
Lemma 2.1.1. If M ∈M+d is given by M =
∑ijmijviv
∗j where vi : i ∈ [d] is an orthonor-
33
2.1 Preliminaries 34
mal basis of Cd, then mii = ‖M1/2vi‖2 ≥ 0 and |mij | ≤√miimjj ≤ maxmii,mjj.
Proof. We havemij = 〈Mvj , vi〉 =⟨M1/2vj ,M
1/2vi⟩. It is immediate thatmii = ‖M1/2vi‖2 ≥
0. By the Cauchy–Schwarz inequality,
|mij | ≤ ‖M1/2vi‖‖M1/2vj‖ =√miimjj ,
as stated.
The following is an immediate but useful corollary concerning the zero elements of a
positive semi-definite matrix.
Corollary 2.1.2. [34, Lemma 15.17]. If M ∈M+d as above and mii = 0, then mij = mji = 0
for all j ∈ [d].
Lemma 2.1.3. The following are equivalent for a set A ⊆M+d :
(i) A is bounded, that is there exists k > 0 such that ‖M‖ ≤ k for all M ∈ A.
(ii) The set TrM : M ∈ A is bounded.
(iii) The set 〈u,Mu〉 : M ∈ A, u ∈ Cd with ‖u‖ = 1 is bounded.
Proof. (i) ⇒ (ii). Let M = (mij) ∈ M+d have eigenvalues λ1, . . . , λd. If ‖M‖ ≤ k, then
0 ≤ λi ≤ k for all i ∈ [d] and so 0 ≤ TrM =∑d
i=1 λi ≤ dk.
(ii) ⇒ (iii). Any unit vector u ∈ Cd is an element of some orthonormal basis V . If M ∈ A,
then M ≥ 0 and 〈v,Mv〉 ≥ 0 for all v ∈ Cd. Now TrM =∑
v∈V 〈v,Mv〉 and so if TrM ≤ k
for all M ∈ A, then 〈u,Mu〉 ≤ k for all M ∈ A, as required.
(iii) ⇒ (i). If 〈u,Mu〉 ≤ k for every unit vector u ∈ Cd, then ‖M‖ ≤ k, since ‖M‖ is the
largest eigenvalue of M .
Lemma 2.1.4. Let A1, . . . , An ∈ M+d and M =
∑ni=1Ai. If 〈v,Mv〉 = 0 for some v ∈ Cd,
then Aiv = 0 for all i = 1, . . . , n.
Proof. We have 〈v,Mv〉 =∑n
i=1 〈v,Aiv〉 =∑n
i=1 ‖A1/2i v‖2. If 〈v,Mv〉 = 0, then for all
i ∈ [n] we have ‖A1/2i v‖ = 0, and hence A
1/2i v = 0, giving Aiv = 0.
Remark 2.1.5. Lemma 2.1.4 shows that we can conclude that the sum∑n
i=1Ai of finitely
many positive semi-definite matrices is strictly positive when there exists no non-zero v ∈ Cd
satisfying Aiv = 0 for all i ∈ [n].
2.2 Convex corners and anti-blockers in Md 35
2.2 Convex corners and anti-blockers in Md
This section begins by defining the new concept of a convex corner in Md. We work by analogy
with convex corners in Rd, from where we are led to a natural definition of the anti-blocker
of a subset of M+d . Some basic properties of these convex corners and their anti-blockers will
be given, and a number of simple examples of convex corners in Md will be used to illustrate
the theory.
2.2.1 Definitions and basic properties
Recall that an Rd-convex corner is closed, convex, non-empty, non-negative and hereditary.
It is now natural to define an Md-convex corner as a subset A ⊆ Md which has these same
five properties. In this context, the terms closed, convex and non-empty are taken with their
standard meanings. A matrix M ∈ Md will be called non-negative if it is positive semi-
definite, and we then write M ≥ 0. By M ≥ N we mean M − N ≥ 0. If M ≥ 0 for all
M ∈ A, we say A is non-negative. If N ∈ A for all N satisfying 0 ≤ N ≤ M for some
M ∈ A, we say A is hereditary.
Definition 2.2.1. Set A ⊆Md is an Md-convex corner (or just convex corner where context
allows) if A ⊆Md is non-empty, closed, convex, non-negative and hereditary.
Our analysis of Md-convex corners will proceed analogously to that of Rd-convex corners
in the previous chapter. It is as important as it is trivial to note that an Rd-convex corner is
not an example of an Md-convex corner. Thus the results in Chapter 1, though analogous, are
not merely special cases of the results we now give in the Md case. The following definition and
lemma concern sets we will call diagonal convex corners, and will help clarify the connection
between Rd-convex corners and Md-convex corners. For v1, . . . , vd ∈ R+, we let φ denote the
canonical bijection Rd+ → D+d given by
φ
(d∑i=1
viei
)=
d∑i=1
vieie∗i . (2.1)
Definition 2.2.2. The set B ⊆ Dd is a diagonal Md-convex corner, or just diagonal convex
corner, (resp. standard diagonal Md-convex corner) precisely when B = φ(A) = φ(v) : v ∈
A for an Rd-convex corner (resp. standard Rd-convex corner) A.
Lemma 2.2.3. If C is an Md-convex corner, then φ−1(Dd ∩ C) is an Rd-convex corner, and
Dd ∩ C is a diagonal convex corner.
2.2 Convex corners and anti-blockers in Md 36
Proof. Note that φ−1(Dd ∩ C) is non-empty because the zero matrix is in Dd ∩ C by the
hereditarity of C. When A ≥ 0 it holds that φ−1(A) ≥ 0, thus φ−1(Dd ∩ C) inherits non-
negativity from Dd ∩ C. Similarly φ−1(Dd ∩ C) is both convex and closed because Dd ∩ C
is. Finally, if a ∈ φ−1(Dd ∩ C) and b ∈ Rd+ satisfies b ≤ a, then we have φ(a) ∈ Dd ∩ C
and also φ(a) ≥ φ(b) ∈ Dd. So φ(b) ∈ Dd ∩ C by the hereditarity of C, and it follows that
b ∈ φ−1(Dd ∩ C), establishing the hereditarity of φ−1(Dd ∩ C).
Remark 2.2.4. It is trivial that a diagonal convex corner B is non-empty, closed, convex and
non-negative because B = φ(A) for some A ⊆ Rd with these properties. Furthermore B is
hereditary over Dd in the sense that if 0 ≤ A ≤ B where B ∈ B and A ∈ Dd, then A ∈ B.
In the previous chapter the construction of an anti-blocker for a subset of Rn+ was dis-
cussed. We now define the anti-blocker of a subset of M+d .
Definition 2.2.5. For A ⊆M+d , we define its anti-blocker, A], by
A] = N ∈M+d : 〈N,M〉 ≤ 1 for all M ∈ A.
Definition 2.2.6. If B ⊆ D+d , we define its diagonal anti-blocker by B[ = Dd ∩ B].
Lemma 2.2.7. Let B = φ(A) where A ⊆ Rd+. It holds that B[ = φ(A[). If A is an Rd-convex
corner, then B[[ = B.
Proof. For the first assertion use the obvious identity 〈φ(u), φ(u)〉 = 〈u, v〉 for u, v ∈ Rd+. We
then have B[[ = φ(A[[), and Theorem 1.2.15 completes the proof.
Definition 2.2.8. If A is an Rd-convex corner and B = φ(A), recalling Definition 1.2.7,
(1.12) on page 9, and Definition 1.2.18, we define the following.
(i) γ(B) = γ(A). (Note then that γ(B) = maxTrT : T ∈ B.)
(ii) N(B) = N(A). (Note then that N(B) = maxβ ∈ R+ : βI ∈ B.)
(iii) M(B) = M(A).
Lemma 2.2.7 and Proposition 1.2.20 immediately give the following proposition.
Proposition 2.2.9. If B is a diagonal convex corner, then
M(B) =1
N(B)= γ(B[).
2.2 Convex corners and anti-blockers in Md 37
In view of the preceding results and definitions, it is often convenient simply to regard
the Rd-convex corner A and the diagonal Md-convex corner φ(A) as the same object. Note
that elements of a diagonal convex corner will commute, but that in general elements of an
Md-convex corner will not. For this reason we refer to the theory of Md-convex corners as
the non-commutative case, and that of Rd-convex corners as the commutative case.
In the remainder of this section, some basic properties of convex corners and their anti-
blockers will be discussed. Note that, as in the Rd case, we write (A])] = A]] to denote the
second anti-blocker of A.
Lemma 2.2.10. The following results hold:
(i) If A ⊆M+d is non-empty, then A] is a convex corner;
(ii) If A,B ⊆M+d and A ⊆ B then B] ⊆ A];
(iii) If A ⊆M+d then A ⊆ A]].
Proof. These are straightforward from the definitions. In (i), we can establish the hereditarity
of A] as follows. If B ≥ 0 satisfies B ∈ A], then Tr(BA) ≤ 1 for all A ∈ A. Consider C
such that 0 ≤ C ≤ B. By Lemma B.0.2 (v) it follows that Tr(CA) ≤ Tr(BA) ≤ 1 and so
C ∈ A]. For (iii) just note that if M ∈ A ⊆ M+d , then 〈M,N〉 ≤ 1 for all N ∈ A], and so
M ∈ A]].
Remark 2.2.11. By Lemma 2.2.10 (i) and Lemma 2.2.3, we see that for any B ⊆ D+d , the set
B[ is a diagonal Md-convex corner.
A set A ⊆ M+d satisfying A]] = A will be called reflexive. In the Rd case, Theorem
1.2.15 shows that A[[ = A for every Rd-convex corner. A main objective in this chapter is
to prove an analogous ‘second anti-blocker’ theorem for Md-convex corners. A number of
related issues and examples will be discussed in the remainder of this section, before this
problem is tackled in the next.
For δ > 0, consider the δ-ball with centre M ∈Md given by
B(M, δ) = N ∈Md : ‖N −M‖2 < δ.
Given a set A ⊆M+d , we define its interior relative to M+
d to be the set
M ∈ A : B(M, δ) ∩M+d ⊆ A for some δ > 0.
2.2 Convex corners and anti-blockers in Md 38
(The reason for examining the interior relative to M+d rather than the interior is that every
δ-ball will include elements of Md not in M+d , for instance elements M = (mij) ∈ Md that
fail to be Hermitian because mij 6= mji for some i, j.)
Lemma 2.2.12. Let A ⊆M+d be a convex corner. The following are equivalent:
(i) A has a non-empty interior relative to M+d ;
(ii) there exists r > 0 such that rI ∈ A;
(iii) for every non-zero vector v ∈ Cd there exists s > 0 such that svv∗ ∈ A;
(iv) A contains a positive definite element.
Proof. (i)⇒(ii). Suppose that A has a non-empty interior relative to M+d . Let A ∈ A and
δ > 0 be such that B(A, δ) ∩M+d ⊆ A. Noting ‖Id‖2 =
√d we have A + 1
2√dδI ∈ B(A, δ),
and, since A+ 12√dδI ≥ 0, it lies in A. Finally, since 1
2√dδI ≤ A+ 1
2√dδI, we have 1
2√dδI ∈ A
by the hereditarity of A.
(ii)⇒(iii). Suppose that r > 0 is such that rI ∈ A and let v ∈ Cd be a non-zero vector. Since
r‖v‖2 vv
∗ ≤ rI, the hereditarity of A implies that r‖v‖2 vv
∗ ∈ A.
(iii)⇒(iv). Let vidi=1 be an orthonormal basis of Cd and, for each i, let si > 0 be such that
siviv∗i ∈ A. Since A is convex, A =
∑di=1
sid viv
∗i ∈ A. Letting s = min sid : i ∈ [d] > 0, we
have sI ≤ A and so, by the hereditarity of A, it follows that sI ∈ A.
(iv)⇒(i). Let A ∈ A be strictly positive. Letting r be the minimum eigenvalue of A, we
have that 0 < rI ≤ A. For all M ∈ B(0, r) ∩ M+d , we have ‖M‖ ≤ ‖M‖2 ≤ r and so
0 ≤M ≤ rI ≤ A, giving that B(0, r) ∩M+d ⊆ A by hereditarity.
Definition 2.2.13. If convex corner A is bounded and has non-empty interior relative to
M+d , then we say A is a standard convex corner.
As an example, note that if P is a projection with rankP < d, then M ∈M+d : M ≤ P
is a convex corner, but not a standard convex corner, because it has empty interior relative
to M+d by Lemma 2.2.12.
Lemma 2.2.14. If C is a standard Md-convex corner, then φ−1(Dd ∩ C) is a standard Rd-
convex corner, and Dd ∩ C is a standard diagonal convex corner.
2.2 Convex corners and anti-blockers in Md 39
Proof. We apply Lemma 2.2.3. To complete the proof, note that if C is a standard Md-convex
corner, then by Lemma 2.2.12, rI ∈ C for some r > 0, and so r1 ∈ φ−1(Dd ∩ C). Lemma
1.2.2 then gives that φ−1(Dd ∩ C) has non-empty interior. Clearly φ−1(Dd ∩ C) is bounded if
C is bounded.
Before the next proposition we need the following well-known lemma; the proof is short
and is included for completeness.
Lemma 2.2.15. If X is a separable metric space and Y ⊂ X , then Y has a countable, dense
subset.
Proof. Let B = x1, x2, . . . be a countable, dense subset of X . For all n ∈ N and r ∈ N such
that Y ∩ B(xn,1r ) is non-empty, choose y(n,r) ∈ Y ∩ B(xn,
1r ). We claim that
F =
y(n,r) : n, r ∈ N, Y ∩ B
(xn,
1
r
)6= ∅
is a countable, dense subset of Y. To see this, take y ∈ Y. Then for all s ∈ N there exists
xi ∈ B such that xi ∈ B(y, 1s ). Note then that y ∈ B(xi,
1s ), and so Y ∩B(xi,
1s ) is non-empty.
We then have y(i,s) ∈ B(xi,1s ), and so y(i,s) ∈ B(y, 2
s ).
Proposition 2.2.16. Convex corner A ⊆M+d has non-empty interior relative to M+
d if and
only if A] is bounded.
Proof. If A has non-empty interior relative to M+d , then by Lemma 2.2.12, rI ∈ A for some
r > 0. Then 〈M, rI〉 = rTrM ≤ 1 for all M ∈ A]. Then TrM ≤ 1/r for all M ∈ A], and by
Lemma 2.1.3, A] is bounded.
To complete the proof we show thatA] is unbounded whenA has empty interior relative to
M+d . By Lemma 2.2.12, it is equivalent to show that A] is unbounded when it is assumed that
A contains no strictly positive element. Since Md is separable and A ⊆Md, by Lemma 2.2.15
there is a countable set E = A1, A2 . . . ⊆ A which is dense in A. Let Em = A1, . . . , Am
and write
Km = v ∈ Cd : ‖v‖ = 1, Av = 0 for all A ∈ Em.
It is clear that each Km is compact and that Km+1 ⊆ Km, m ∈ N. Now consider Bm =
(1/m)∑m
i=1Ai. By convexity, Bm ∈ A and so, by assumption, is not strictly positive. Thus
there exists v ∈ Cd such that Bmv = 0, and by Lemma 2.1.4, we have Aiv = 0 for all
i = 1, . . . ,m. Therefore Km is non-empty for all m ∈ N. Then by Cantor’s intersection
2.2 Convex corners and anti-blockers in Md 40
theorem [43, Corollary 2.36],⋂∞i=1Ki is non-empty. Then there exists v ∈ Cd of unit norm
such that Av = 0 for all A ∈ E . For any M ∈ A, there exists a sequence (Mi)i∈N where
Mi ∈ E and Mi →i→∞ M . This gives that Mv = 0 for all M ∈ A. Then Tr(Mvv∗) = 0 for
all M ∈ A, and so λvv∗ ∈ A] for all λ ≥ 0, meaning A] is unbounded.
Proposition 2.2.17. A convex corner A ⊆M+d is bounded if and only if A] has non-empty
interior relative to M+d .
Proof. If A] has non-empty interior relative to M+d , then, by the previous proposition, A]] is
bounded. But by Lemma 2.2.10, A ⊆ A]], and so A is bounded. Conversely, by Lemma 2.1.3
note that if A is bounded there exists k > 0 such that TrM = 〈I,M〉 ≤ k for all M ∈ A.
Thus 〈(1/k)I,M〉 ≤ 1 for all M ∈ A. Then (1/k)I ∈ A], and A] has non-empty interior
relative to M+d by Lemma 2.2.12.
The following is immediate.
Corollary 2.2.18. Suppose A is an Md-convex corner. Then A is a standard convex corner
if and only if A] is a standard convex corner.
Reflexivity is an important property, but that all Md-convex corners are reflexive will not
be shown until Section 2.3. However, we can now give the following results.
Lemma 2.2.19. If C ⊆M+d satisfies C = D] for some D ⊆M+
d , then C is a reflexive convex
corner.
Proof. Lemma 2.2.10 gives that C is a convex corner and that C ⊆ C]]. Since C = D], it
holds that C]] = D]]]. However, D ⊆ D]] and so, again by Lemma 2.2.10, D]]] ⊆ D]. This is
equivalent to C]] ⊆ C and completes the proof that C]] = C.
Lemma 2.2.20. An arbitrary intersection of Md-convex corners is an Md-convex corner.
Proof. Let Aα : α ∈ A be a collection of Md-convex corners for some set A of arbitrary
cardinality. The non-negativity of⋂α∈AAα is obvious and non-emptiness follows because
0 ∈ Aα for all α ∈ A. Finally note that an arbitrary intersection of closed, convex and
hereditary sets will retain these properties.
Remark 2.2.21. In contrast, note that even a union of twoMd-convex corners is not necessarily
an Md-convex corner.
2.2 Convex corners and anti-blockers in Md 41
Lemma 2.2.22. If Bα ⊆M+d for each α ∈ A, where A has arbitrary cardinality, then
(⋃α∈ABα
)]=⋂α∈AB]α.
Proof. For each β ∈ A we have Bβ ⊆⋃α∈A Bα, and Lemma 2.2.10 gives that
(⋃α∈A Bα
)] ⊆B]β. Hence,
(⋃α∈A Bα
)] ⊆ ⋂α∈A B]α.
Let M ∈⋂α∈A B
]α. Then Tr(MN) ≤ 1 for all N ∈
⋃α∈A Bα, giving that M ∈
(⋃α∈A Bα
)]and completing the proof.
Remark 2.2.23. In general it can happen that
(⋂α∈ABα
)]6=⋃α∈AB]α, (2.2)
even in the case each Bα is a convex corner. We will return to this in Proposition 2.3.16, but
for now note that in (2.2) the right hand side may not be a convex corner, but the left hand
side always is.
Lemma 2.2.24. An arbitrary intersection of reflexive Md-convex corners is a reflexive Md-
convex corner.
Proof. For each α ∈ A let Aα ⊆M+d be a reflexive convex corner, and consider A =
⋂α∈AAα.
Lemma 2.2.20 shows that A is a convex corner. By the reflexivity of Aα,
A =⋂α∈AA]]α =
⋂α∈AT ]α ,
where Tα = A]α. By Lemma 2.2.22,
A =
( ⋃α∈ATα)],
whence Lemma 2.2.19 gives A]] = A, completing the proof.
Given a subset G ⊆M+d , which may or may not be a convex corner, the next results show
that there is, in a sense we will explain, a ‘smallest’ convex corner C(G) which contains G.
Definition 2.2.25. We define her(B), the hereditary cover of B ⊆M+d , by letting
her(B) = M ∈M+d : ∃N ∈ B such that M ≤ N.
2.2 Convex corners and anti-blockers in Md 42
Let G ⊆ M+d be non-empty and bounded. (In many of the cases we will examine in the
next chapter, G will be a family of projections.) Let conv(G) be the closure of the convex
hull of G, which is clearly both bounded and closed, and so compact. The next lemma shows
that the set her(conv(G)), which trivially contains G, is an Md-convex corner.
Lemma 2.2.26. If G ⊆M+d is non-empty and bounded, then
A = her(conv(G))
is a bounded Md-convex corner.
Proof. The hereditarity, non-emptiness and non-negativity of A are obvious. To show the
convexity of A, take arbitrary A,B ∈ A and choose C,D ∈ conv(G) with A ≤ C and
B ≤ D. Let sequences (Cn)n∈N ⊆ conv(G) and (Dn)n∈N ⊆ conv(G) satisfy Cn →n→∞ C and
Dn →n→∞ D. For all n ∈ N and λ ∈ [0, 1], by the convexity of a convex hull, it holds that
λCn + (1− λ)Dn ∈ conv(G). Then
λCn + (1− λ)Dn →n→∞ λC + (1− λ)D ∈ conv(G).
Noting that λA+(1−λ)B ≤ λC+(1−λ)D gives λA+(1−λ)B ∈ A by the hereditarity of A,
thus demonstrating the convexity of A. To show that A is closed, suppose that (Tn)n∈N ⊆ A
and Tn →n→∞ T . Let Cn ∈ conv(G) be such that Tn ≤ Cn for all n ∈ N. Since conv(G) is
compact, (Cn)n∈N has a convergent subsequence, converging to a limit, say C, in conv(G).
Then T ≤ C and so T ∈ A, giving that A is closed. The boundedness of A follows from
that of G: indeed, there exists k > 0 such that G ≤ kI for all G ∈ G, and so A ≤ kI for all
A ∈ A.
Corollary 2.2.27. Suppose G ⊆ M+d is bounded and conv(G) contains a strictly positive
element. Then A = her(conv(G)) is a standard convex corner.
Proof. By Lemma 2.2.26, A is a bounded convex corner, and then since it contains a strictly
positive element, it has non-empty interior relative to M+d by Lemma 2.2.12.
Corollary 2.2.28. If C is a bounded (resp. standard) diagonal Md-convex corner, then her(C)
is a bounded (resp. standard) Md-convex corner.
Proof. Since conv(C) = C, this follows immediately from Lemma 2.2.26 and Corollary 2.2.27.
2.2 Convex corners and anti-blockers in Md 43
Lemma 2.2.29. If B is a convex corner and G ⊆ B, then
her(conv(G)) ⊆ B.
Proof. If G ⊆ B, then conv(G) ⊆ conv(B) = B, since B is convex. Then conv(G) ⊆ B = B,
since B is closed. Finally, her(conv(G)) ⊆ her(B) = B, by the hereditarity of B.
For any G ⊆ M+d , Lemma 2.2.26 guarantees the existence of a convex corner which
contains G. We make the following definition.
Definition 2.2.30. For G ⊆M+d , we define C(G) to be the intersection of all convex corners
containing G.
Theorem 2.2.31. For G ⊆M+d we have that C(G) is the smallest convex corner containing
G and C(G) = her(conv(G)).
Proof. It is clear from Lemma 2.2.20 that C(G) is the smallest convex corner to contain G.
The second assertion follows from Lemmas 2.2.26 and 2.2.29.
We call C(G) the convex corner generated by G.
Lemma 2.2.32. For A ⊆M+d it holds that A] = (her(A))] = (C(A))].
Proof. Since A ⊆ her(A) ⊆ C(A), Lemma 2.2.10 gives
A] ⊇ (her(A))] ⊇ (C(A))].
The proof will be completed by showing that A] ⊆ (C(A))]. First let Q ∈ conv(A) be such
that Q =∑n
i=1 λiAi, with Ai ∈ A and λ1, . . . , λn ∈ R+ satisfying∑n
i=1 λi = 1. For M ∈ A]
we have Tr(MQ) ≤∑n
i=1 λi = 1. For any P ∈ conv(A) we can choose a sequence (Qj)j∈N ⊆
conv(A) such that Qj →j→∞ P . By the continuity of the function M+d → R mapping A
to Tr(MA), it then holds that Tr(MP ) = limj→∞Tr(MQj) ≤ 1. Finally, since Tr(MN) ≤
Tr(MP ) when 0 ≤ N ≤ P , it follows that Tr(MN) ≤ 1 for all N ∈ her(conv(A)) = C(A),
and we conclude that M ∈ (C(A))] as required.
2.2.2 Examples of convex corners in Md
In this section we use some simple but instructive examples of Md-convex corners to illustrate
some of the results in the last section.
2.2 Convex corners and anti-blockers in Md 44
For C ∈Mhd and λ ∈ R, let
AC,λ =M ∈M+
d : Tr(MC) ≤ λ
and AC = AC,1. (2.3)
Further, let
NC =M ∈M+
d : Tr(MC) = 0, (2.4)
and
BC = M ∈M+d : M ≤ C. (2.5)
For λ > 0 it is clear that AC,λ = A(1/λ)C . For C ≥ 0 we have NC = AC,0.
Lemma 2.2.33. If C ∈M+d , then BC is a reflexive convex corner and B]C = AC .
Proof. The zero matrix is in BC , showing it is non-empty. Non-negativity is by definition.
That it is closed, convex and hereditary follows from basic properties in Definition B.0.1 and
Lemma B.0.2, and thus BC is a convex corner.
We then claim
B]C = N ∈M+d : Tr(NC) ≤ 1 = AC . (2.6)
To see this, note that if 0 ≤ M ≤ C and N ≥ 0, then 0 ≤ Tr(MN) ≤ Tr(CN) by Lemma
B.0.2 (v). So if N ∈ AC we have Tr(CN) ≤ 1 with N ≥ 0, and then Tr(MN) ≤ 1 for all
M ∈ BC , and we conclude N ∈ B]C . Conversely, if N ∈ B]C , then N ≥ 0 by definition, and
Tr(CN) ≤ 1, because C ∈ BC , giving that N ∈ AC . This proves (2.6).
By Lemma 2.2.10, in order to prove the reflexivity of BC , it suffices to show that B]]C ⊆ BC .
We will prove this by showing that if Q ≥ 0 and Q /∈ BC , then Q /∈ B]]C . To see this, note
that if 0 ≤ Q /∈ BC , then C −Q /∈M+d . However, C −Q is self adjoint because C,Q ∈M+
d .
Thus we can write
C −Q =d∑i=1
λiviv∗i ,
where v1, . . . , vd is an orthonormal basis for Cd, where λ1, . . . , λd ∈ R, and where λj < 0
for some j ∈ [d].
Now let D = αvjv∗j with α > 0 to be fixed shortly. We have D ≥ 0 and Tr(CD) =
αTr(Cvjv∗j ) = α 〈Cvj , vj〉 ≥ 0 because C ≥ 0. We have that
Tr((C −Q)D
)= Tr
(d∑i=1
λiviv∗i αvjv
∗j
)= λjα. (2.7)
2.2 Convex corners and anti-blockers in Md 45
Our aim is to show that Q /∈ B]]C , and to this end we identify two cases.
(i) In the case that 〈Cvj , vj〉 = 0, let α = −2/λj > 0. We have Tr(CD) = 0, and so
D ∈ AC = B]C . But using (2.7), Tr(QD) = Tr(CD)− λjα = 2 > 1 and we conclude Q /∈ B]]C .
(ii) Since C ≥ 0, the remaining case is when 〈Cvj , vj〉 > 0. Here we let α = (〈Cvj , vj〉)−1 > 0,
giving Tr(CD) = 1, D ≥ 0 and D ∈ AC = B]C . Then Tr(QD) = Tr(CD)− λjα > 1, and we
again conclude Q /∈ B]]C , and the proof is complete.
Lemmas 2.2.19 and 2.2.33 have the following corollary.
Corollary 2.2.34. Let C ∈M+d and λ > 0. Then AC,λ is a reflexive Md-convex corner, and
A]C,λ = B(1/λ)C .
Proof. By Lemma 2.2.33,
AC,λ = A(1/λ)C = B](1/λ)C . (2.8)
That AC,λ is a reflexive convex corner now follows from Lemma 2.2.19. Since Lemma 2.2.33
shows that B(1/λ)C is reflexive, anti-blocking (2.8) gives that A]C,λ = B(1/λ)C .
It is instructive to consider the set AC for different C ∈Mhd . Putting λ = 1 in Corollary
2.2.34 gives that AC is a reflexive convex corner for all C ∈ M+d . Furthermore, we have the
following proposition.
Proposition 2.2.35. (i) If C ∈M++d , then AC and A]C are standard convex corners;
(ii) If C ∈M+d \M
++d , then AC and A]C are convex corners, but neither is a standard convex
corner;
(iii) If C ∈ Mhd but ±C /∈ M+
d , then AC is not a convex corner, AC is not reflexive and
A]C = 0;
(iv) If −C ∈M+d , then AC = M+
d , which is reflexive, and A]C = 0.
Proof. Write C =∑d
i=1 µiviv∗i for some orthonormal basis vi, . . . , vd ⊆ Cd of eigenvectors
of C with µi, . . . , µd ∈ R.
(i) For M ∈ M+d we let M =
∑di,j=1mijviv
∗j , giving Tr(MC) =
∑di=1 µimii, where each
mii ≥ 0 by Lemma 2.1.1. In this case each µi > 0, and so if M ∈ AC we have 0 ≤ mii ≤ 1/µi
for each i ∈ [d], and AC is a bounded convex corner by Lemma 2.1.3. By Corollary 2.2.34,
A]C = BC , and so for all M ∈ A]C we have that 0 ≤ M ≤ C giving that A]C is bounded.
Then, by Propositions 2.2.16 and 2.2.17, AC and A]C have non-empty interiors relative to
M+d and we see that both AC and A]C are standard convex corners.
2.2 Convex corners and anti-blockers in Md 46
(ii) If C ∈ M+d \M
++d then µi ≥ 0 for all i and µj = 0 for some j. Then for all α ≥ 0,
we have Tr(αvjv∗jC) = αµj = 0 and αvjv
∗j ∈ AC for all α ≥ 0. Then AC is an unbounded
convex corner. By Corollary 2.2.34, A]C = BC , and so A]C is bounded. Propositions 2.2.16
and 2.2.17 then give that AC has non-empty interior relative to M+d and that A]C has empty
interior relative to M+d . The claim follows.
(iii) Here we can write C = C∗ =∑d
i=1 µiviv∗i with µj < 0 for some j and µk > 0 for
some k. In this case AC lacks hereditarity, and so is not a convex corner. To see this, take
M = − 2µjvjv∗j + 2
µkvkv∗k ≥ 0. We have Tr(MC) = 0 and so M ∈ AC . However, consider
0 ≤ N = 2µkvkv∗k ≤ M . We have Tr(NC) = 2 and N /∈ AC . Note that αvjv
∗j ∈ AC for all
α ≥ 0, and so AC is unbounded. The following argument shows that her(AC) = M+d . Take
M ∈ M+d and let k = 〈M,C〉. If k ≤ 1, then M ∈ AC ⊆ her(AC). If k > 1, then M ≤ M ′
where M ′ = M − kµjvjv∗j satisfies 〈M ′, C〉 = 0. Thus M ′ ∈ AC and M ∈ her(AC). Lemma
2.2.32 then yields A]C = (M+d )] = 0 and AC is not reflexive; indeed, A]]C = M+
d 6= AC .
(iv) In this case each µi ≤ 0, and Tr(MC) ≤ 0 for all M ∈ M+d by Lemma B.0.2 (iv),
giving that AC = M+d and A]C = 0. Here, rather trivially, A]]C = M+
d = AC .
We complete a similar analysis for BC where C ∈Mhd .
Proposition 2.2.36. (i) If C ∈M++d , then BC is reflexive and is a standard convex corner;
(ii) If C ∈M+d \M
++d , then BC is reflexive and is a convex corner with empty interior relative
to M+d ;
(iii) If C ∈Mhd \M
+d , then BC = ∅.
Proof. For C ≥ 0, Lemma 2.2.33 gives that BC is a reflexive convex corner satisfying B]C =
AC . For all M ∈ BC , we have 0 ≤ TrM ≤ TrC and by Lemma 2.1.3, the set BC is bounded.
It is now straightforward to complete the proof of the three statements above.
(i) If C ∈ M++d we have 0 < C ∈ BC and by Lemma 2.2.12, BC has non-empty interior
relative to M+d and is a standard convex corner.
(ii) Here there exists u ∈ Cd satisfying 〈u,Cu〉 = 0. Since 〈u, (C −M)u〉 ≥ 0 when
0 ≤ M ≤ C, we have that 〈u,Mu〉 = 0 and thus M /∈ M++d for all M ∈ BC . It is then clear
by Lemma 2.2.12 that BC has empty interior relative to M+d .
(iii) Suppose that M ∈ BC for some M ∈ M+d . Then 0 ≤ M ≤ C, requiring that 0 ≤ C.
From this contradiction we conclude BC = ∅.
2.2 Convex corners and anti-blockers in Md 47
Recall for C ∈Mhd that NC = A ∈M+
d : Tr(AC) = 0.
Lemma 2.2.37. If C ∈M+d , then NC is a reflexive convex corner.
Proof. It is clear that NC satisfies the required conditions to be a convex corner. Write
C =∑d
i=1 λiviv∗i where vidi=1 is an orthonormal basis of Cd and 0 ≤ λ1 ≤ . . . ≤ λd. We let
P be the projection onto ran(C) and P⊥ = I −P . Let 0s ∈Ms denote the zero s× s matrix
and set rank(P ) = r, so that λ1 = . . . = λd−r = 0 and λi > 0 for i = d − r + 1, . . . , d. For
M ∈M+d , write M =
∑di,j=1 αijviv
∗j with αij ∈ C. Then Tr(MC) =
∑di=1 λiαii. By Lemma
2.1.1, αii ≥ 0 for all i. Now, suppose that M ∈ NC . For i = 1, . . . , d− r we see that αii can
be arbitrarily large. However, we must have αii = 0 for i = d − r + 1, . . . , d, and then by
Corollary 2.1.2, αij = 0 whenever i = d − r + 1, . . . , d or j = d − r + 1, . . . , d. Working in
basis viv∗j : i, j ∈ [d] we then have
NC = M ⊕ 0r : M ∈M+d−r (2.9)
Now suppose N =∑n
i,j=1 βijviv∗j ∈ N
]C . We have 〈N,M〉 ≤ 1 for all M ∈ NC , and
thus βii = 0 whenever i = 1, . . . , d − r. Corollary 2.1.2 then gives that βij = 0 whenever
i = 1, . . . , d − r or j = 1, . . . , d − r. If i = d − r + 1, . . . , d, then βii can be arbitrarily large.
It follows that N ]C is given by
N ]C =
0d−r ⊕N : N ∈M+
r
, (2.10)
whence a repetition of the same argument gives that N ]]C = NC .
We give some further properties of the set NC .
Proposition 2.2.38. (i) If C ∈ M+d , then neither of the convex corners NC and N ]
C are
standard convex corners;
(ii) If C ∈M++d , then NC = 0;
(iii) If C ∈Mhd and ±C /∈M+
d , then NC is not a convex corner and N ]C = 0.
Proof. Let C =∑d
i=1 λiviv∗i , where vidi=1 is an orthonormal basis of Cd and each λi ∈ R.
Again denote the projection onto ran(C) by P , and let r = rank(P ).
(i) Note from (2.9) and (2.10) that if r ≥ 1, then NC has empty interior relative to M+d
and N ]C is unbounded. Similarly, if d − r ≥ 1, then N ]
C has empty interior relative to M+d
and NC is unbounded.
2.3 Reflexivity of Md-convex corners 48
(ii) If C > 0, then r = d and (2.9) gives NC = 0.
(iii) In this case NC is not a convex corner, and N ]C = 0. To see this note that λj > 0 for
some j ∈ [d] and λk < 0 for some k ∈ [d]. Let M = λjvkv∗k−λkvjv∗j . Note M ≥ 0 and satisfies
Tr(MC) = 0, giving that αM ∈ NC for all α ≥ 0. It is immediate that NC is unbounded.
Since M ≥ λjvkv∗k /∈ NC , it is clear that NC lacks hereditarity and is not a convex corner.
Let N =∑
r,s αrsvrv∗s ∈ N
]C for αrs ∈ C. Since N ≥ 0, it holds that αii ∈ R+ for all i ∈ [d].
We have αM ∈ NC , so it is required that Tr(αMN) = α(λjαkk − λkαjj) ≤ 1 for all α ≥ 0,
giving αjj = αkk = 0: indeed αii = 0 for all i such that λi 6= 0. If λm = 0 for some m ∈ [d],
then Tr(βvmv∗mC) = 0 and so βvmv
∗m ∈ NC for all β ≥ 0. Then Tr(βvmv
∗mN) = βαmm, and
we require αmm = 0 for N ∈ N ]C . By Corollary 2.1.2 it follows that N = 0 and N ]
C = 0, as
claimed.
Remark 2.2.39. Note that N−C = NC , so the cases −C ∈M+d and −C ∈M++
d do not require
separate consideration.
2.3 Reflexivity of Md-convex corners
In the previous section, the reflexivity of a number of different Md-convex corners has been
demonstrated: indeed, any subset of Md we have examined that has failed to be reflexive
has also failed to be a convex corner. In this section we will prove the ‘second anti-blocker
theorem’ in this non-commutative setting, showing indeed that every Md-convex corner is
reflexive.
2.3.1 The second anti-blocker theorem
In [22] it is proved that ifA is an Rd-convex corner, thenA[[ = A, a result we called the second
anti-blocker theorem. However, the method of the proof given there does not generalise to
the case of Md-convex corners, and we will need to modify the approach.
First we recall the Hahn-Banach separation theorem.
Theorem 2.3.1. [42, Theorem 3.4] Suppose P and Q are disjoint non-empty convex sets in
a normed vector space V. If P is closed and Q is compact, then there exists a continuous
linear functional Λ : V → C and γ ∈ R such that
Re(Λ(p)) < γ < Re(Λ(q))
2.3 Reflexivity of Md-convex corners 49
for all p ∈ P and for all q ∈ Q.
Recall from (2.3) the notation AC =M ∈M+
d : Tr(MC) ≤ 1
. The following proposi-
tion shows the fundamental importance of sets of this form.
Proposition 2.3.2. Every Md-convex corner A 6= M+d is of the form
A =⋂α∈AAQα ,
for some set A and some Qα ∈Mhd , α ∈ A.
Proof. We work in the Hilbert space Md. If Λ : Md → C is a linear functional, standard
theory for Hilbert spaces ([42, Theorem 12.5]) implies that there exists N ∈ Md such that
Λ(M) = 〈M,N〉. Then, applying Theorem 2.3.1 to the Md-convex corner A and the single
point W ∈M+d \A, there exists γ ∈ R and N ∈Md such that
Re(〈M,N〉) < γ < Re(〈W,N〉) for all M ∈ A.
Now let T = 12(N +N∗) = T ∗. It is easy to see that for M ∈M+
d ,
Re(〈M,N〉) =1
2(〈M,N〉+ 〈M,N〉) =
1
2Tr(MN∗ +NM∗) =
1
2Tr(MN∗ +MN) = 〈M,T 〉 ,
where we have used that M = M∗ and the cyclicality of trace. Thus
〈M,T 〉 < γ < 〈W,T 〉 for all M ∈ A,
and the set M ∈ Md : 〈M,T 〉 = γ is a hyperplane separating A from W ∈ M+d \A. We
note that 0 ∈ A and so γ > 0. In this way, for each V ∈ M+d \A, we can find TV ∈ Mh
d and
γV ∈ (0,∞) such that 〈M,TV 〉 ≤ γV < 〈V, TV 〉 for all M ∈ A. Then letting QV = 1γVTV , it
holds that V /∈ AQV but that A ⊆ AQV for each V ∈M+d \A. Hence
A =⋂
V ∈M+d \A
AQV ,
and the result is proved.
Corollary 2.3.3. If Qα ∈ M+d for all α ∈ A, then A =
⋂α∈AAQα is a reflexive Md-convex
corner.
2.3 Reflexivity of Md-convex corners 50
Proof. This is immediate from Lemma 2.2.24 and Corollary 2.2.34.
Remark 2.3.4. Note that in Proposition 2.3.2, A can be chosen to be countable. To see this,
note from Lemma 2.2.15, that since Md is separable, E = Qα : α ∈ A ⊆Md has a countable
dense subset F = Qα : α ∈ A0 where A0 ⊆ A is countable. Let A0 =⋂α∈A0
AQα . Clearly
A ⊆ A0. It remains to show that A0 ⊆ A. Because F is dense in E, for each α ∈ A we have a
sequence (αi)i∈N ⊆ A0 such that Qαn →n→∞ Qα. Then for M ∈ A0, we have Tr(MQαn) ≤ 1
for all n ∈ N and it follows by continuity that Tr(MQα) = limn→∞Tr(MQαn) ≤ 1. We
conclude M ∈ A. Thus A0 ⊆ A as required.
Before establishing a second anti-blocker theorem for all Md-convex corners, we note that
a general convex corner may have empty interior relative to M+d . This situation is addressed
now, beginning with the following linear algebraic lemmas.
Lemma 2.3.5. Let u1, . . . , un with n ≤ d be linearly independent, but not necessarily or-
thonormal, vectors in Cd. Then
span(u1, . . . , un) = ran
(n∑i=1
uiu∗i
).
Proof. It is immediate that
ran
(n∑i=1
uiu∗i
)⊆ span(u1, . . . , un).
Since∑n
i=1 uiu∗i is a linear operator, to prove the reverse inclusion it suffices to show that
uj ∈ ran (∑n
i=1 uiu∗i ) for all j ∈ [n]. The result is trivial when n = 1, and we proceed by
induction. For k ∈ [n], write Sk = span(u1, . . . , uk) and Mk =∑k
i=1 uiu∗i . Assume the
lemma holds for n = k, that is Sk ⊆ ran(Mk) for some k ∈ [n]. Because we are free to
reorder the ui’s as we please, it suffices to prove that uk+1 ∈ ran(Mk+1). As u1, . . . , un are
linearly independent, uk+1 /∈ Sk and so Sk ( Sk+1 with dim(Sk) = dim(Sk+1) − 1. Then
dim(S⊥k ) = dim(S⊥k+1) + 1, giving the strict inclusion S⊥k+1 ( S⊥k . Now choose v ∈ S⊥k \S⊥k+1;
it must hold that 〈v, ui〉 = 0 for all i ∈ [k], but 〈v, uk+1〉 6= 0. Then Mkv = 0 and
Mk+1
(1
〈v, uk+1〉v
)= uk+1,
as required.
For the following analysis we fix 0 ( A ( M+d , an Md-convex corner with possibly
empty interior relative to M+d . (We note that the convex corners 0 and M+
d are trivially
2.3 Reflexivity of Md-convex corners 51
reflexive, which is why we may exclude them here.) Let
U = v ∈ Cd : there exists r > 0 such that rvv∗ ∈ A. (2.11)
Lemma 2.3.6. Let A be an Md-convex corner. If v ∈ Cd is an eigenvector of an element
M ∈ A with corresponding eigenvalue λ > 0, then v ∈ U .
Proof. We have M − λ‖v‖2 vv
∗ ≥ 0 and so 0 ≤ λ‖v‖2 vv
∗ ≤ M ∈ A, giving that λ‖v‖2 vv
∗ ∈ A by
hereditarity.
Now let P be the projection onto span(U) so that Pu = u for all u ∈ span(U).
Lemma 2.3.7. There exists r > 0 such that rP ∈ A.
Proof. Let uiki=1 with ui ∈ U for all i ∈ [k] be a basis (not necessarily orthonormal) of
span(U). By the definition of U , for each ui there exists ri > 0 such that riuiu∗i ∈ A. By
convexity, R = 1k
∑ki=1 riuiu
∗i ∈ A. Letting r0 = 1
k mini∈[k] ri > 0 and Q = r0
(∑ki=1 uiu
∗i
),
we have 0 ≤ Q ≤ R. Then by hereditarity, Q ∈ A. By Lemma 2.3.5,
ran(Q) = ran
(k∑i=1
uiu∗i
)= span(u1 . . . , uk) = span(U) = ran(P ).
Now let r be the smallest positive eigenvalue of Q. Then r > 0 and rP ≤ Q, giving rP ∈ A
by hereditarity.
Remark 2.3.8. Since P is the projection onto span(U), for any non-zero u ∈ span(U) we have
uu∗
‖u‖2 ≤ P. By Lemma 2.3.7, there exists r > 0 such that rP ∈ A and thus r‖u‖2uu
∗ ∈ A. We
conclude that u ∈ U , and noting that 0 ∈ U we have U = span(U). Thus U is a subspace of
Cd and ran(P ) = U .
Lemma 2.3.9. For M ∈ Mhd it holds that PMP = M if and only if every eigenvector v of
M with non-zero eigenvalue satisfies v ∈ U .
Proof. Let M =∑k
i=1 λiviv∗i where λi 6= 0 and vi ∈ U for all i ∈ [k]. Then PMP =∑k
i=1 λiPvi(Pvi)∗ =
∑ki=1 λiviv
∗i = M. Conversely, if M = PMP has eigenvector v with
non-zero eigenvalue, then
v ∈ ran(M) = ran(PMP ) ⊆ ran(P ) = U ,
where we have used Remark 2.3.8.
2.3 Reflexivity of Md-convex corners 52
By Lemmas 2.3.6 and 2.3.9, it is now immediate that PMP = M for all M ∈ A. We can
thus write PAP = A. Now let P⊥ = I − P . Clearly P⊥v = 0 for all v ∈ U .
Lemma 2.3.10. The following hold:
(i)⟨M,P⊥
⟩= 0 for all M ∈ A;
(ii)⟨M,P⊥
⟩> 0 for all M ≥ 0 satisfying PMP 6= M .
Proof. The first assertion is immediate because⟨M,P⊥
⟩= Tr(MP⊥) = Tr(PMPP⊥) = 0
for M ∈ A. For the second, note that for v /∈ U we have Pv 6= v and P⊥v 6= 0. If PMP 6= M ,
then by Lemma 2.3.9, M has at least one eigenvector v /∈ U with positive eigenvalue. Then
for each such eigenvector we have
⟨vv∗, P⊥
⟩=⟨v, P⊥v
⟩=⟨v, (P⊥)2v
⟩= ‖P⊥v‖2 > 0,
and the claim follows.
Having fixed the Md-convex corner A, and with P the projection onto span(U) = ran(P )
as before, we set k = rank(P ) and let MPd ⊆M
+d be given by
MPd = M ∈M+
d : rank(M) = k, PMP = M.
Note then from Lemma 2.3.9 that
MPd = M ∈M+
d : there exist s ≥ r > 0 such that rP ≤M ≤ sP, (2.12)
where we can set r to be the minimum positive eigenvalue of M and s to be the maximum
eigenvalue of M .
Recall that for C ∈Mhd we have set
AC =M ∈M+
d : Tr(MC) ≤ 1
and NC =M ∈M+
d : Tr(MC) = 0.
Proposition 2.3.11. Let A be an M+d -convex corner and let P be the projection onto U as
defined in (2.11). Set
A0 = A ∈MPd ∩ A : (1 + ε)A /∈ A for all ε > 0.
2.3 Reflexivity of Md-convex corners 53
Then for each A ∈ A0 there exists a matrix RA ∈M+d such that
A =⋂A∈A0
ARA ∩NP⊥ .
Proof. Consider A ∈ A0 and let An = (1 + 1/n)A, n ∈ N. By construction, An /∈ A for all
n ∈ N. As in the proof of Proposition 2.3.2, for all n ∈ N, there exists QAn ∈Mhd such that
〈M,QAn〉 ≤ 1 < 〈An, QAn〉 for all M ∈ A. (2.13)
Note that A,An ∈ MPd and so PAP = A and PAnP = An, n ∈ N. Recall that PMP = M
for all M ∈ A. Thus
〈M,QAn〉 = 〈PMP,QAn〉 = 〈M,PQAnP 〉
for all M ∈ A, and similarly 〈An, QAn〉 = 〈An, PQAnP 〉. Thus each QAn can be chosen
to satisfy QAn = PQAnP , and, by Lemma 2.3.9, its eigenvectors corresponding to non-zero
eigenvalues are contained in U = ran(P ).
For fixed A ∈ A0, we first show that the set QAn : n ∈ N is bounded. Write
QAn =∑k
i=1 λ(n)i v
(n)i v
(n)∗i , where v(n)
i : i ∈ [k] ⊆ ran(P ) is an orthonormal set and
λ(n)i > 0, i ∈ [k], where k = rank(P ). By Lemma 2.3.7, there exists r > 0 such that
rP ∈ A and hence by hereditarity rv(n)i v
(n)∗i ∈ A for all i ∈ [k] and n ∈ N. It follows from
(2.13) that⟨rv
(n)i v
(n)∗i , QAn
⟩≤ 1, and so
λ(n)i ≤ 1
rfor all i ∈ [k] and n ∈ N. (2.14)
Then note from (2.13) that for all n ∈ N,
(1 + 1/n) 〈A,QAn〉 > 1,
and hence thatk∑i=1
λ(n)i
⟨A, v
(n)i v
(n)∗i
⟩= 〈A,QAn〉 >
1
2for all n ∈ N. (2.15)
Now A ∈MPd , so there exists t > 0 such that A ≥ tP and so
t ≤⟨A, v
(n)i v
(n)∗i
⟩≤ ‖A‖ for all i ∈ [k], n ∈ N.
2.3 Reflexivity of Md-convex corners 54
Now (2.15) and (2.14) give
λ(n)j
⟨A, v
(n)j v
(n)∗j
⟩>
1
2−∑i 6=j
λ(n)i
⟨A, v
(n)i v
(n)∗i
⟩≥ 1
2− d− 1
r‖A‖ > −d− 1
r‖A‖.
So for λ(n)j < 0, it certainly follows that
λ(n)j > −d− 1
rt‖A‖ for all j ∈ [k], n ∈ N. (2.16)
By (2.14) and (2.16), λ(n)i : i ∈ [k], n ∈ N is bounded, and so the set QAn : n ∈ N is
bounded as claimed. Thus, the sequence (QAn)n∈N has a subsequence (QAnj )j∈N convergent
to some RA ∈ Mhd . Then PQAnjP = QAnj → RA, but by continuity PQAnjP → PRAP ;
this gives RA = PRAP .
By (2.13) it holds for all M ∈ A and n ∈ N that 〈M,QAn〉 ≤ 1, and hence
〈M,RA〉 ≤ 1 for all M ∈ A. (2.17)
From (2.13) we also have that
⟨(1 +
1
n
)A,QAn
⟩> 1, n ∈ N,
and since A ∈ A, by letting n→∞ and using (2.17) we conclude that
〈A,RA〉 = 1. (2.18)
We now claim that RA ≥ 0 for all A ∈ A0. Recalling RA = PRAP and using Lemma 2.3.9,
suppose towards a contradiction that there exists A ∈ A0 and unit vector v ∈ U = ran(P )
such that RAv = λv with λ < 0. Since A ∈MPd , there exists t > 0 with A ≥ tP ≥ tvv∗, and
so
0 ≤ A− tvv∗ ≤ A ∈ A,
giving that A− tvv∗ ∈ A by hereditarity. However,
〈A− tvv∗, RA〉 = 1− λt > 1,
contradicting (2.17), and showing that RA ≥ 0.
Now write C =⋂A∈A0
ARA ∩ NP⊥ . The proposition is proved by showing that C = A.
2.3 Reflexivity of Md-convex corners 55
Note by (2.17) for all A ∈ A0 that A ⊆ ARA . By Lemma 2.3.10, A ⊆ NP⊥ . Thus it is clear
that A ⊆ C. Fix M /∈ A. The assertion C = A will follow by showing that M /∈ C. We
identify four cases.
Case 1. M /∈M+d .
Since C ⊆M+d , we have M /∈ C.
Case 2. M ∈MPd \A.
Recall that rP ∈ A for some r > 0. Let m = maxk ∈ R+ : kM ∈ A. By (2.12) on page 52
and Lemma 2.3.7, it is clear that 0 < m < 1, and setting M ′ = mM we have M ′ ∈ A0. Then
C ⊆ ARM′ and by (2.18), 〈M ′, RM ′〉 = 1. Then 〈M,RM ′〉 = 1/m > 1, and so M /∈ C.
Case 3. M = PMP ∈M+d \(M
Pd ∪ A).
Since the sets ARA and NP⊥ are convex, C is convex. From Case 2 and using A ⊆ C, it is
clear that
MPd ∩ A = MP
d ∩ C. (2.19)
Suppose towards a contradiction that M ∈ C and take N ∈ C ∩MPd . Letting Mn = (1 −
1/n)M + (1/n)N , the convexity of C gives that Mn ∈ C for all n ∈ N. Since M = PMP ≥ 0
and N ∈ MPd , it is also clear that Mn ∈ MP
d for all n ∈ N, and so by (2.19), Mn ∈ A for all
n ∈ N. However, Mn →n→∞ M , and since A is closed, M ∈ A, the required contradiction.
Case 4. M ∈M+d , and PMP 6= M.
By Lemma 2.3.10 we have M /∈ NP⊥ , and hence M /∈ C.
We can now give the ‘second anti-blocker theorem’ for Md.
Theorem 2.3.12. A non-empty set A ⊆M+d is reflexive if and only if A is a convex corner.
Proof. If A is a convex corner, its reflexivity is an immediate consequence of Proposition
2.3.11, Lemmas 2.2.24 and 2.2.37 and Corollary 2.3.3. The converse implication follows from
Lemma 2.2.10 (i).
2.3.2 Consequences of reflexivity
Here we give some results which follow from the reflexivity of Md-convex corners. Proposition
2.3.16 answers a question raised in Remark 2.2.23. Definition 2.3.17 and Proposition 2.3.18
are the natural non-commutative analogues of our work in the Rd case.
2.3 Reflexivity of Md-convex corners 56
Lemma 2.3.13. If A and B are Md-convex corners, then
A ⊆ B ⇐⇒ A] ⊇ B],
A = B ⇐⇒ A] = B],
A ( B ⇐⇒ A] ) B].
Proof. These equivalences are trivial consequences of Lemma 2.2.10 (ii) and Theorem 2.3.12.
We recall that if S ⊆M+d , then C(S) denotes the convex corner generated by S.
Proposition 2.3.14. If A ⊆M+d , then A]] = C(A).
Proof. By Lemma 2.2.32, A] = (C(A))], and anti-blocking both sides yields A]] = (C(A))]] =
C(A), using Theorem 2.3.12.
Corollary 2.3.15. If A is a diagonal Md-convex corner, then her(A) is the Md-convex corner
given by her(A) = C(A) = A]].
Proof. If A is a diagonal Md-convex corner, then conv(A) = A. Then C(A) = her(A), and
the result is immediate from Proposition 2.3.14.
Proposition 2.3.16. Let Bα be a convex corner for each α ∈ A. Then
(⋂α∈ABα
)]= C
(⋃α∈AB]α
).
Proof. Using Theorem 2.3.12, Lemma 2.2.22 and Proposition 2.3.14, we have that
(⋂α∈ABα
)]=
(⋂α∈AB]]α
)]=
(⋃α∈AB]α
)]]= C
(⋃α∈AB]α
).
By analogy with the corresponding definitions (1.8), (1.12) and (1.15) on pages 6, 9 and
11 respectively for Rd-convex corners, we make the following definitions.
Definition 2.3.17. Let A be an Md-convex corner.
2.3 Reflexivity of Md-convex corners 57
(i) If A is bounded we define
γ(A) = maxTrA : A ∈ A. (2.20)
The maximum exists because a bounded convex corner is compact and the trace functional
is continuous. If A is unbounded, noting Lemma 2.1.3, we define γ(A) = ∞. Note that
γ(A) = 0 if and only if A = 0.
(ii) If A 6= M+d , we define
N(A) = maxβ ∈ R+ : βI ∈ A. (2.21)
The maximum exists because A is closed. We define N(M+d ) = ∞. Note by Lemma 2.2.12
that N(A) = 0 if and only if A has empty interior relative to M+d .
(iii) If A has non-empty interior relative to M+d we define
M(A) = inf
k∑i=1
λi : ∃k ∈ N, Ai ∈ A, λi > 0 s.t.
k∑i=1
λiAi ≥ I
. (2.22)
(Note that if A has a non-empty interior relative to M+d then the set on the right is non-empty
and the infimum exists.) When A is a standard convex corner, the same argument as given
in Lemma 1.2.19 shows that the infimum is attained. Note that M(M+d ) = 0.
If A has empty interior relative to M+d , the condition on the right of (2.22) cannot be satisfied
and we define M(A) =∞.
The following result is analogous to Proposition 1.2.20. We follow the same convention
that 10 =∞, 1
∞ = 0.
Proposition 2.3.18. If A is an Md-convex corner, then
M(A) =1
N(A)= γ(A]).
Proof. The proof is an obvious adaptation of that of Proposition 1.2.20. Just note that in
the case A has empty interior relative to M+d , we have M(A) = ∞ by Definition 2.3.17,
N(A) = 0 by Lemma 2.2.12 and γ(A]) =∞ by Proposition 2.2.16.
2.4 Entropic quantities in quantum information theory 58
2.4 Entropic quantities in quantum information theory
We refer to the theory outlined in Chapter 1 of a source emitting a sequence of letters
from some fixed, finite alphabet as the classical case, to be contrasted to the quantum case
which we now begin to outline, and which is considered throughout Chapters 3 and 4. The
discovery of quantum mechanics has revolutionised our understanding of the physical world
and simultaneously inspired many new developments in mathematics. Not least to witness
this quantum revolution has been information theory. Section 2.4.1 summarises some basic
concepts in the resulting field, known as quantum information theory. In particular, the
concept of entropy in this quantum setting will be discussed, which will help explain the
motivation for what follows. Section 2.4.2 includes a discussion of the diagonal expectation
operator, and proves a number of related technical lemmas which will be needed later. Finally,
in Section 2.4.3 we introduce the concept of entropy over an Md-convex corner, and show
that the well-known von Neumann entropy is a special case.
2.4.1 Some background on quantum information
We use terminology and conventions as in [35]. Every isolated quantum system has an as-
sociated Hilbert space H, known as the state space. Only finite dimensional Hilbert spaces
will be considered here, and so without loss of generality we can set H = Cd for some d ∈ N.
If ψ ∈ H is a unit vector, then ψ is known as a pure state and represents a possible con-
figuration of the system. When analysing the discrete, memoryless and stationary source in
classical information theory, we considered a sequence of independent, identically distributed
random variables following a certain probability distribution over a fixed and finite alphabet
1, . . . , d. A message is a sequence of letters, or equivalently a sequence of canonical basis
vectors of Rd. In the quantum case a message is formed from a sequence of state vectors in
Cd. Rather than a probability distribution p ∈ Pd, it is now appropriate to consider an en-
semble or mixed state ψi, pi : i ∈ [n], where each ψi ∈ Cd is a pure state and (pi)i∈[n] ∈ Pn.
We represent such an ensemble by the density matrix or state
ρ =
n∑i=1
piψiψ∗i ∈M+
d . (2.23)
If n > 1 in (2.23), we refer to ρ as a mixed state. The density matrix corresponding to a
pure state ψ is the rank one projection ψψ∗, which is also conventionally referred to as a pure
state. Context will usually resolve any potential ambiguity in terminology. It is not hard to
2.4 Entropic quantities in quantum information theory 59
see that the following characterisation holds.
Proposition 2.4.1. [35, Proposition 3.16]. An element ρ ∈Md is a density matrix of some
ensemble if and only if ρ ≥ 0 and Tr ρ = 1.
We let Rd denote the set of all states in Md, that is,
Rd = M ∈M+d : TrM = 1.
It is immediate that Rd is convex. It is well known that the pure states in Rd are the
extreme points of Rd, but we include a proof for the benefit of the reader.
Proposition 2.4.2. If ρ = vv∗ ∈ Rd for some unit vector v ∈ Cd, and ρ =∑n
i=1 λiuiu∗i with
λi > 0 and unit vectors ui ∈ Cd, then uiu∗i = vv∗ for all i = 1, . . . , n.
Proof. First note that we have ρ2 = ρ and so Tr(ρ2) = 1. The Cauchy–Schwarz inequality
gives that | 〈ui, uj〉 | ≤ 1 for all i, j ∈ [n] with equality if and only if uiu∗i = uju
∗j . Since∑n
i=1 λi = Tr ρ = 1, we then have
Tr(ρ2) = Tr
( n∑i=1
λiuiu∗i
)2 =
∑i,j∈[n]
λiλj | 〈ui, uj〉 |2 ≤∑i,j∈[n]
λiλj = 1,
with equality if and only if uiu∗i = vv∗ for all i ∈ [n].
The following well-known corollary follows and gives a number of characterisations of pure
states.
Corollary 2.4.3. [54, p.64] The following are equivalent:
(i) ρ ∈ Rd is pure;
(ii) ρ is an extreme point of Rd;
(iii) ρ ∈ Rd satisfies Tr(ρ2) = 1.
It is easy to see that any mixed state ρ ∈ Rd is in the convex hull of the pure states in
Rd by writing ρ =∑d
i=1 λiviv∗i where v1, . . . , vd is an orthonormal basis of eigenvectors of
ρ with corresponding eigenvalues λ1, . . . , λd.
As in the classical case, entropic quantities involve the logarithm function, but in the
quantum case the logarithm function will act not on positive real numbers, but on positive
2.4 Entropic quantities in quantum information theory 60
definite matrices. For A =∑d
i=1 aiuiu∗i ∈ M
++d with u1, . . . , ud an orthonormal basis of
Cd and each ai > 0, we recall that
logA =d∑i=1
(log ai)uiu∗i . (2.24)
When A,B ∈M++d commute, they are simultaneously unitarily diagonalisable and (2.24)
gives
log(AB) = logA+ logB (2.25)
If A =∑aiuiu
∗i ∈M
++d , we have A−1 =
∑a−1i uiu
∗i ∈M
++d and (2.24) gives
log(A−1) = − logA. (2.26)
It also holds that if logA = logB, then A = B.
We have already seen the importance of the trace functional, and for reference we give the
following straightforward observations. Consider A =∑d
i=1 aiuiu∗i ∈ M
++d as above. From
(2.24) and Lemma B.0.2 (viii) it is clear that
Tr(logA) =d∑i=1
log ai and Tr(ρ logA) =d∑i=1
〈ρui, ui〉 log ai. (2.27)
Working in the extended reals with the conventions 0 log 0 = 0 and log 0 = −∞, we define
Tr(logA) and Tr(ρ logA) for all A ∈ M+d by extending (2.27) to apply to all A ∈ M+
d . We
note then that Tr(ρ logA) is finite if and only if ker(A) ⊆ ker(ρ).
The simplest classical entropic quantity which we have considered is the Shannon entropy.
Its analogue in the quantum setting is the von Neumann entropy (see, for example [54,
Definition 5.17]).
Definition 2.4.4. The von Neumann entropy of a state ρ ∈ Rd is given by
H(ρ) = −Tr(ρ log ρ).
For ρ ∈ Rd, (2.27) shows easily that H(ρ) is the Shannon entropy of the probability
distribution whose elements are the eigenvalues of ρ. Comparing to (1.3) on page 3, it is easy
to see that 0 ≤ H(ρ) ≤ log d for ρ ∈ Rd, and that H(ρ) = 0 if and only if ρ is a pure state.
Another important and well-known concept is quantum relative entropy. Note that in the
2.4 Entropic quantities in quantum information theory 61
definition below, ρ and σ are arbitrary elements of M+d , and not restricted to be states.
Definition 2.4.5. [45, Section 1.2]. Given ρ, σ ∈ M+d satisfying ker(σ) ⊆ ker(ρ), we define
D(ρ‖σ), the quantum relative entropy of ρ with respect to σ, by
D(ρ‖σ) = Tr(ρ log ρ− ρ log σ).
When ker(σ) 6⊆ ker(ρ), we define D(ρ‖σ) =∞.
We now recall some basic properties of D(ρ‖σ).
Lemma 2.4.6. [57, Theorem 11.8.1], [56, p.250]. For ρ, σ ∈ Rd, the quantum relative
entropy satisfies D(ρ‖σ) ≥ 0 and vanishes if and only if ρ = σ.
The next lemma states that the quantum relative entropy D(ρ‖σ) is jointly convex in
ρ ∈M+d and σ ∈M+
d .
Lemma 2.4.7. [45, Theorem 7]. If ρ =∑
k λkρ(k) ∈ M+
d and σ =∑
k λkσ(k) ∈ M+
d satisfy
ker(σ(k)) ⊆ ker(ρ(k)) with λk > 0 and∑
k λk = 1, then
D(ρ‖σ) ≤∑k
λkD(ρ(k)‖σ(k)).
When ρ(k), σ(k) ∈ M++d , equality holds if and only if log ρ− log σ = log ρ(k) − log σ(k) for all
k.
The quantum relative entropy D(ρ‖σ) is known to satisfy the following continuity condi-
tion.
Lemma 2.4.8. ([56, p.251], see also [3].) For n ∈ N and states ρn → ρ ∈ M+d and
σn → σ ∈M+d , we have
limn→∞
D(ρn‖σn) = D(ρ‖σ).
It is trivial to prove the following corollary that will be required later.
Corollary 2.4.9. At fixed ρ ∈ Rd, the quantum relative entropy D(ρ‖σ) is convex and lower
semi-continuous in σ ∈M+d .
Proof. Convexity in σ follows from Lemma 2.4.7 by putting each ρ(k) = ρ. We merely need
to examine the case ker(σ(k)) 6⊆ ker(ρ(k)) for some k. Here D(ρ(k)‖σ(k)) = ∞ and convexity
trivially holds.
2.4 Entropic quantities in quantum information theory 62
For ρ, σ ∈ Rd, Lemma 2.4.8 gives that D(ρ‖σ) is continuous. Continuity, and then
obviously lower semi-continuity, in the case of arbitrary σ ∈ M+d follows easily. Indeed, let
σ = kτ with k ∈ R+ and τ ∈ Rd. By (2.27), Tr(ρ log(kτ)) = Tr(ρ log τ) + log k and so
D(ρ‖σ) = D(ρ‖τ) − log k. Now let σn = knτn → σ as n → ∞ with τn ∈ Rd and kn ∈ R+.
Then kn = Trσn → Trσ = k and by Lemma 2.4.8,
limn→∞
D(ρ‖σn) = limn→∞
D(ρ‖τn)− limn→∞
log kn = D(ρ‖τ)− log k = D(ρ‖σ),
as required.
2.4.2 Diagonal expectation
In this section the role of the diagonal expectation operator will be discussed and used to
explore further the relationship between Md-convex corners and diagonal Md-convex corners.
The motivation for some of the results given in this section may not yet be clear, but will
hopefully become so in the next, where, for instance, we use the theory described here to
generalise results in [11] concerning entropy over Rd-convex corners to the non-commutative
setting. We will also use diagonal expectations to prove Lemma 2.4.35, which shows how the
concept of entropy over an Rd-convex corner is in fact embedded in a more general concept
of entropy over an Md-convex corner, a notion which we introduce in Section 2.4.3. The
properties of the diagonal expectation operator will also be used to prove Proposition 4.1.11,
an important result linking parameters of graphs with parameters of certain sets in M+d which
we will call non-commutative graphs, and which we will regard as generalisations of graphs.
For any orthonormal basis V = v1, . . . , vd of Cd we let
DV = spanviv∗i : i ∈ [d]
denote the algebra of matrices diagonal with respect to V . We write D+V = DV ∩M+
d . For
A ∈Md we define ∆V (A), the diagonal expectation of A with respect to basis V , by
∆V (A) =d∑i=1
〈Avi, vi〉 viv∗i .
Expressing A ∈Md in basis V as A =∑
i,j∈[d] aijviv∗j for aij ∈ C, we observe that ∆V (A) =∑d
i=1 aiiviv∗i ∈ DV , showing that the action of ∆V is to retain only the diagonal elements of
A with respect to basis V . As previously, we denote by Dd the algebra of matrices in Md
2.4 Entropic quantities in quantum information theory 63
diagonal in the canonical basis e1, . . . , ed, and we write ∆(A) for the diagonal expectation
of A with respect to this canonical basis. We begin with some elementary properties.
Lemma 2.4.10. For any orthonormal basis V it holds that
(i) M ∈ DV ⇐⇒ ∆V (M) = M ;
(ii) M ∈Md ⇒ ∆V (M) ∈ DV ;
(iii) A ≥ 0⇒ ∆V (A) ≥ 0;
(iv) A,B ∈ DV ⇒ AB = BA;
(v) For A ⊆Md it holds that DV ∩ A = ∆V (A) if and only if ∆V (A) ⊆ A;
(vi) Tr(∆V (A)) = TrA;
(vii) M ≤ I ⇒ ∆V (M) ≤ I.
Proof. The first four statements are obvious.
For (v) note that DV ∩A ⊆ A, so if DV ∩A = ∆V (A), then ∆V (A) ⊆ A. Conversely, we have
∆V (A) ⊆ DV , so if ∆V (A) ⊆ A, then ∆V (A) ⊆ DV ∩A. But it is clear that DV ∩A ⊆ ∆V (A),
completing the proof.
For (vi), set A =∑
i,j∈[d] aijviv∗j for aij ∈ C, giving ∆V (A) =
∑di=1 aiiviv
∗i ∈ DV . Then
Tr(∆V (A)) =∑d
i=1 aii = TrA, where we have used Lemma B.0.2 (viii).
To prove (vii), suppose that M ≤ I, that is I − M ≥ 0. Applying (iii) yields ∆VM ≤
∆V (I) = I as required.
Remark 2.4.11. For an Md-convex corner A, Lemma 2.2.3 showed that Dd ∩A is a diagonal
Md-convex corner. We note, however, that ∆(A) is not necessarily a diagonal Md-convex
corner for an Md-convex corner A. For example, consider the Md-convex corner BP as given
in (2.5), where we set P = 1d11
∗. Since P is a rank one projection, we have
BP = M ∈M+d : M ≤ P = αP : 0 ≤ α ≤ 1,
and
∆(BP ) =αdId : 0 ≤ α ≤ 1
.
Now 1de1e
∗1 ≤ 1
dId ∈ ∆(BP ), but 1de1e
∗1 /∈ ∆(BP ). It is thus clear that ∆(BP ) lacks hereditarity
over D+d as discussed in Remark 2.2.4, and is not an Md-convex corner. (Note in this case
that Dd ∩ BP = 0, which is trivially a diagonal Md-convex corner.)
2.4 Entropic quantities in quantum information theory 64
The following two lemmas will be widely used.
Lemma 2.4.12. Let M,N ∈Md and V = v1, . . . , vd be an orthonormal basis of Cd. Then
Tr((∆V (M))N) = Tr(M∆V (N)).
Proof. Using the cyclicality of trace we write
Tr((∆V (M))N) = Tr
(d∑i=1
〈Mvi, vi〉 viv∗iN
)
=d∑i=1
〈Mvi, vi〉 〈Nvi, vi〉
= Tr
(d∑i=1
Mviv∗i 〈Nvi, vi〉
)= Tr (M∆V (N)) ,
as required.
Lemma 2.4.13. Let V be an orthonormal basis. If B ⊆M+d satisfies DV ∩B = ∆V (B), then
DV ∩ (∆V (B))] = DV ∩ B] = ∆V (B]). (2.28)
Where B is a convex corner, DV ∩ B = ∆V (B) if and only if DV ∩ B] = ∆V (B]).
Proof. Write A = ∆V (B), and suppose that DV ∩B = ∆V (B) giving A ⊆ B by Lemma 2.4.10
(v). Then A] ⊇ B], and so DV ∩A] ⊇ DV ∩B]. For the reverse inclusion choose T ∈ DV ∩A],
and note that T = ∆V (T ) and Tr(TM) ≤ 1 for all M ∈ A. Let N ∈ B, and observe that
∆V (N) ∈ A. Applying Lemma 2.4.12 yields
Tr(TN) = Tr((∆V (T ))N) = Tr(T∆V (N)) ≤ 1,
and so we have T ∈ DV ∩ B]. Thus DV ∩ A] ⊆ DV ∩ B], and the first equality of (2.28) is
now proved.
To establish the second equality of (2.28), let M ∈ ∆V (B]), that is M = ∆V (R) for some
R ∈ B]. Let Q ∈ B and set P = ∆V (Q). By assumption, P ∈ B. We then have
Tr(MQ) = Tr((∆V (R))Q) = Tr(R∆V (Q)) = Tr(RP ) ≤ 1.
Then M ∈ B] and ∆V (B]) ⊆ B], and the second equality in (2.28) holds by Lemma 2.4.10
(v).
2.4 Entropic quantities in quantum information theory 65
Finally, note that if B is a convex corner satisfying DV ∩B] = ∆V (B]), then the preceding
paragraph shows that DV ∩B]] = ∆V (B]]), and from Theorem 2.3.12, DV ∩B = ∆V (B).
Lemma 2.4.13 has the following immediate corollary, which helps clarify the connection
between diagonal Md-convex corners and Md-convex corners. (Recall from Lemma 2.2.3 and
Definition 2.2.6 that if B ⊆ M+d is a convex corner, then A = Dd ∩ B is a diagonal convex
corner, and we write A[ = Dd ∩ A].)
Corollary 2.4.14. Let B be an Md-convex corner, and let A be the diagonal Md-convex
corner given by A = Dd ∩ B. Then
A = Dd ∩ B = ∆(B) (2.29)
if and only if
A[ = Dd ∩ B] = ∆(B]). (2.30)
We now give a number of other straightforward but important results concerning Md-
convex corners and diagonal Md-convex corners.
Lemma 2.4.15. If B is an Md-convex corner such that ∆(B) is a diagonal Md-convex corner,
then
γ(∆(B)) = γ(B).
Proof. We have
γ(B) = maxTrT : T ∈ B = maxTr(∆(T )) : T ∈ B
= maxTrM : M ∈ ∆(B) = γ(∆(B)),
as required.
Lemmas 2.4.16 and 2.4.17 concern (A[)], which by Lemma 2.2.10 (i) is trivially an Md-
convex corner.
Lemma 2.4.16. If A is a diagonal Md-convex corner, then
(A[)] = M ∈M+d : ∆(M) ∈ A.
Proof. Let A ∈ A[, and let M ≥ 0 with ∆(M) ∈ A. Then 〈A,M〉 = 〈∆(A),M〉 =
〈A,∆(M)〉 ≤ 1 and so M ∈ (A[)], where we have used Lemma 2.4.12. Conversely, sup-
2.4 Entropic quantities in quantum information theory 66
pose that M ≥ 0, but ∆(M) /∈ A. By Lemma 2.2.7, A = A[[, and so ∆(M) /∈ A[[ and there
exists B ∈ A[ such that
1 < 〈∆(M), B〉 = 〈M,∆(B)〉 = 〈M,B〉
and so M /∈ (A[)].
The concept of the hereditary cover of a set A ⊆M+d has already been discussed. Corol-
lary 2.2.28 shows that if A is a bounded diagonal convex corner, then her(A) is a convex
corner. We now consider some related results.
Lemma 2.4.17. If A is a standard diagonal convex corner, then
her(A) ( (A[)] = (her(A[))].
Proof. That (A[)] = (her(A[))] is clear from Lemma 2.2.32. If A is a diagonal Md-convex
corner, then A[ ⊆ A], and Corollary 2.3.15 and Lemma 2.2.10 (ii) give that
A]] = her(A) ⊆ (A[)]. (2.31)
We further claim that
her(A) ( (her(A[))]. (2.32)
Since A is a standard diagonal convex corner, rI ∈ A for some r > 0, and the method used
in Proposition 2.2.16 gives that A[ is bounded. Then Remark 2.2.11 and Corollary 2.2.28
give that her(A) and her(A[) are Md-convex corners. Lemma 2.3.13 then shows that (2.32)
is equivalent to (her(A))] ) her(A[). By Lemma 2.2.32, (her(A))] = A], so to prove this
claim we seek some M ∈ A] such that M /∈ her(A[). We set
m = N(A[) = maxµ : µI ∈ A[. (2.33)
Since A is bounded, the method of Proposition 2.2.17 gives that sI ∈ A[ for some s > 0.
Using that A[ is also bounded gives 0 < m <∞.
Observe that if M ∈Md and A ∈ A ⊆ Dd, then Tr(MA) = Tr(M∆(A)) = Tr((∆(M))A)
by Lemma 2.4.12. Thus if ∆(M) ∈ A] and M ≥ 0, then M ∈ A]. Since mI ∈ A], by this
observation we have mJ ∈ A], where J is the all ones matrix. The proof is completed by
showing that mJ /∈ her(A[), that is, there does not exist N ∈ A[ satisfying mJ ≤ N. To
2.4 Entropic quantities in quantum information theory 67
show this, suppose to the contrary that
mJ ≤ N =
µ1 0 . . . 0
0 µ2 . . . 0...
.... . .
...
0 0 . . . µd
∈ A[.
Let Q = (qij) = N −mJ , so that
Q =
µ1 −m −m . . . −m
−m µ2 −m . . . −m...
.... . .
...
−m −m . . . µd −m
≥ 0.
This immediately requires µi ≥ m for all i = 1, . . . , d. But if µi = m, then qii = 0 and, since
Q ≥ 0, Corollary 2.1.2 gives qij = qji = −m = 0 for all j 6= i, contradicting that m > 0. So
we must have µi > m for all i = 1, . . . , d, and there exists ε > 0 such that µi ≥ m+ ε for all
i ∈ [d]. Then (m+ ε)I ≤ N and by hereditarity (m+ ε)I ∈ A[, contradicting (2.33). We can
conclude that mJ /∈ her(A[) as required.
Lemma 2.4.18. Let A be a diagonal Md-convex corner and B = her(A). Then A = ∆(B) =
Dd ∩ B.
Proof. It is trivial that A ⊆ B, and so A ⊆ Dd ∩ B ⊆ ∆(B). For the reverse inclusions, it
suffices to show that if T ∈ B then ∆(T ) ∈ A. Let N ∈ A be such that 0 ≤ T ≤ N . Then
N − T ≥ 0 and ∆(N − T ) ≥ 0. Thus 0 ≤ ∆(T ) ≤ ∆(N) = N ∈ A. It follows that ∆(T ) ∈ A
by the hereditarity of A over Dd.
Theorem 2.4.19. Let A be a diagonal Md-convex corner and B be an Md-convex corner.
Then A = ∆(B) = Dd ∩ B if and only if her(A) ⊆ B ⊆ (A[)].
Proof. Letting B1 = her(A), we have A = ∆(B1) = Dd ∩B1 by Lemma 2.4.18. Using Lemma
2.2.32, we write
B2 = (A[)] = (her(A[))]. (2.34)
Since A is a diagonal Md-convex corner, by Definition 2.2.2 we have A = φ(C) where C is an
Rd-convex corner and φ is as given in (2.1). Lemma 2.2.7 gives A[ = φ(C[), and since C[ is an
Rd-convex corner, Definition 2.2.2 gives that A[ is a diagonal Md-convex corner. By Corollary
2.4 Entropic quantities in quantum information theory 68
2.3.15 it then holds that her(A[) is an Md-convex corner. Theorem 2.3.12 then gives that
her(A[) is reflexive, and we anti-block (2.34) to yield B]2 = her(A[). By Lemma 2.4.18 we
obtain A[ = ∆(B]2) = Dd∩(B]2), and by Corollary 2.4.14 it follows that A = ∆(B2) = Dd∩B2.
We conclude that when convex corner B satisfies B1 ⊆ B ⊆ B2, we have A = ∆(B) = Dd ∩B.
For the reverse implication, observe that if A = Dd ∩ B then A ⊆ B, and so her(A) ⊆ B
by the hereditarity of B. Now if A = Dd ∩ B = ∆(B), it follows by Corollary 2.4.14 that
A[ = Dd∩B]. ThenA[ ⊆ B] and hence her(A[) ⊆ B] by the hereditarity of B]. Theorem 2.3.12
and Lemmas 2.2.10 (ii) and 2.2.32 then give B = B]] ⊆ (her(A[))] = (A[)] as required.
The condition ∆V (A) ⊆ A, or equivalently, ∆V (A) = DV ∩A, has appeared in a number
of contexts. It is instructive to give examples of some convex corners satisfying this important
condition.
Lemma 2.4.20. The convex corner AN,k = T ∈ M+d : 〈T,N〉 ≤ k with N ∈ D+
V satisfies
∆V (AN,k) ⊆ AN,k.
Proof. For A ∈ AN,k we have A ≥ 0 and so ∆V (A) ≥ 0. Then for A ∈ AN,k and N ∈ D+V ,
using Lemma 2.4.12 we have that 〈∆V (A), N〉 = 〈A,∆V (N)〉 = 〈A,N〉 ≤ k. We conclude
∆V (AN,k) ⊆ AN,k.
Remark 2.4.21. If the condition N ∈ D+V is dropped, it is easy to find a counter example.
For example, in the case that N =
1 −1
−1 1
and k = 1 we have J =
1 1
1 1
∈ AN,1 but
∆(J) = I /∈ AN,1.
Lemma 2.4.22. The convex corner BM = T ∈ M+d : T ≤ M with M ∈ D+
V satisfies
∆V (BM ) ⊆ BM .
Proof. For B ∈ BM we have B ≥ 0 and so ∆V (B) ≥ 0. Furthermore, if B ∈ BM then
M−B ≥ 0 giving ∆V (M−B) ≥ 0. Then M−∆V (B) ≥ 0, and we conclude ∆V (B) ∈ BM .
Remark 2.4.23. Lemma 2.4.22 does not extend to all M ≥ 0 and orthonormal bases V . For
an obvious counter example, note that J ∈ BJ , but ∆(J) = I /∈ BJ .
Proposition 2.4.24. If A is a convex corner, then ∆V (A) ⊆ A if and only if ∆V (A]) ⊆ A].
Proof. First we assume ∆V (A) ⊆ A. Let B ∈ A], that is B ≥ 0 and 〈B,A〉 ≤ 1 for all A ∈ A.
Thus ∆V (B) ≥ 0 and 〈∆V (B), A〉 = 〈B,∆V (A)〉 ≤ 1 for all A ∈ A because ∆V (A) ∈ A. This
2.4 Entropic quantities in quantum information theory 69
shows that ∆V (B) ∈ A], and hence ∆V (A]) ⊆ A]. The reverse direction follows by Theorem
2.3.12.
2.4.3 Entropy over an Md-convex corner
Entropic quantities are fundamental in quantum information theory as well as in classical
information. In this section we define HA(ρ), the entropy of a state ρ ∈ Rd over an Md-convex
corner A. This is a new concept in the quantum setting, and we proceed by analogy with the
Rd case introduced first in [11] and discussed in the previous chapter. Two of the simplest,
but also most fundamental, Md-convex corners are the ‘Md-unit corner’ and the ‘Md-unit
cube’. We use the notation of (2.3) on page 44 and (2.5) on page 44 to write the Md-unit
corner as AId = T ∈ M+d : TrT ≤ 1, and the Md-unit cube as BId = T ∈ M+
d : T ≤ I.
Recall from Corollaries 2.2.33 and 2.2.34 that A]Id = BId and B]Id = AId .
The following lemma is the non-commutative version of Lemma 1.2.3.
Lemma 2.4.25. For a state ρ ∈ Rd and a bounded Md-convex corner A, the function f :
A → R∪∞ given by f(A) = −Tr(ρ logA) attains a minimum value f(A0) for some A0 ∈ A.
If ρ > 0 and A has non-empty interior relative to M+d , then A0 is uniquely determined.
Proof. For A ∈ A we write A =∑d
i=1 aiviv∗i for an orthonormal set v1, . . . , vd and ai ∈ R+.
Let
A0(ρ) =
A =
d∑i=1
aiviv∗i ∈ A : 〈vi, ρvi〉 > 0⇒ ai > 0
.
For A =∑d
i=1 aiviv∗i ∈ A0(ρ) observe using (2.27) that
f(A) = −d∑i=1
〈vi, ρvi〉 log ai <∞,
and the restriction of f to A0(ρ) is continuous. However, f(A) =∞ for all A ∈ A\A0(ρ). It
is also clear that limB→A f(B) = ∞ for all A ∈ A\A0(ρ), and so f is lower semi-continuous
on A. Since A is bounded and closed, it is compact, and the first assertion follows from
Theorem A.0.6.
For fixed ρ ∈ Rd, using Definition 2.4.5, we have f(A) = D(ρ‖A)−Tr(ρ log ρ), and Lemma
2.4.7 gives
f
(A0 +B0
2
)≤ 1
2f(A0) +
1
2f(B0) for A0, B0 ∈ A, (2.35)
2.4 Entropic quantities in quantum information theory 70
and states that when ρ,A0, B0 ∈M++d , equality holds in (2.35) if and only if logA0 = logB0,
a condition equivalent to A0 = B0. Now suppose ρ > 0 and that A has non-empty interior
relative to M+d . This gives that A0(ρ) = A ∩M++
d 6= ∅ by Lemma 2.2.12. Now, assume
towards a contradiction that there exist distinct A0, B0 ∈ A satisfying f(A0) = f(B0) =
minA∈A f(A). Then A0, B0 ∈ A0(ρ) ∩M++d and in (2.35) we have f(A0+B0
2 ) < f(A0), where
A0+B02 ∈ A by the convexity of A, contradicting that f(A0) = minA∈A f(A). We conclude
that when A has non-empty interior relative to M+d and ρ > 0, the minimum is achieved for
a unique A0 ∈ A.
The fact that this minimum exists motivates the following definition, analogous to Defi-
nition 1.2.4.
Definition 2.4.26. Let A ⊆ M+d be a bounded convex corner and ρ ∈ Rd be a state. The
entropy of ρ over A is given by
HA(ρ) = min−Tr(ρ logA) : A ∈ A.
Remark 2.4.27. If Md-convex corner A has empty interior relative to M+d , then by Lemma
2.2.12, A has no strictly positive element, and there exists ρ ∈ Rd, for example the maximally
mixed state 1dI, such that A0(ρ) = ∅. Then −Tr(ρ logA) =∞ for all A ∈ A, and HA(ρ) =∞.
Conversely, if A has non-empty interior relative to M+d , then for all states ρ ∈ Rd it holds
that A0(ρ) 6= ∅ and HA(ρ) is finite. Theorem 2.4.31 will examine the latter case and give an
upper bound on HA(ρ).
Lemma 2.4.28. If convex corners A,B satisfy A ⊆ B, then HA(ρ) ≥ HB(ρ) for all ρ ∈ Rd.
Proof. This is immediate from Definition 2.4.26.
Elements of AId have eigenvalues in the interval [0, 1], so for all ρ ∈ Rd, it is not hard to
see that a minimising element of AId in the definition of HAId (ρ) can be chosen to have unit
trace, and
0 ≤ HAId (ρ) = min−Tr(ρ log σ) : σ ∈ Rd.
Now from Lemma 2.4.6 we recall for all ρ, σ ∈ Rd thatD(ρ‖σ) = Tr(ρ log ρ−ρ log σ) ≥ 0. Thus
for given ρ ∈ Rd it holds that −Tr(ρ log σ) ≥ H(ρ) for all σ ∈ Rd, but since ρ ∈ Rd ⊆ AId ,
we have HAId (ρ) ≤ −Tr(ρ log ρ). We conclude that
HAId (ρ) = H(ρ) for all ρ ∈ Rd, (2.36)
2.4 Entropic quantities in quantum information theory 71
in other words, von-Neumann entropy is an example of entropy over a convex corner.
Elements of BId also have eigenvalues in the interval [0, 1], and thus −Tr(ρ logA) ≥ 0 for
all ρ ∈ Rd and A ∈ BId . However, since I ∈ BId and Tr(ρ log I) = 0,
HBId (ρ) = 0 for all ρ ∈ Rd. (2.37)
For an Md-convex corner A satisfying AId ⊆ A ⊆ BId , Lemma 2.4.28 now gives
0 ≤ HA(ρ) ≤ H(ρ) for all ρ ∈ Rd. (2.38)
In the commutative case [11, Section 2] notes that there exist Rd-convex corners B, C such
that HB(p) < 0 and HC(p) > H(p) for all p ∈ Pd. Similarly, there exist Md-convex corners
B, C satisfying HB(ρ) < 0 and HC(ρ) > H(ρ) for all ρ ∈ Rd. For example, if B = kBId =
kT : T ∈ BId and k > 1, then
HB(ρ) = HBId (ρ)− log k = − log k < 0.
Similarly, if C = 1kAId and k > 1, then
HC(ρ) = HAId (ρ) + log k = H(ρ) + log k > H(ρ).
Concavity is an important characteristic of entropic quantities, and the following lemma
gives a straightforward proof that HA(ρ) has this property.
Lemma 2.4.29. For a bounded Md-convex corner A, the entropy function ρ → HA(ρ) is
concave.
Proof. Let ρ = λ1ρ1 + λ2ρ2 with ρi ∈ Rd, λi ∈ R+, i = 1, 2 and λ1 + λ2 = 1. For some
A0 ∈ A we have
HA(ρ) = Tr(−ρ logA0) =λ1 Tr(−ρ1 logA0) + λ2 Tr(−ρ2 logA0)
≥λ1 minA∈A
Tr(−ρ1 logA) + λ2 minA∈A
Tr(−ρ2 logA)
=λ1HA(ρ1) + λ2HA(ρ2),
thus establishing concavity.
The following continuity property of HA(ρ) is a non-commutative version of Lemma 1.2.11
2.4 Entropic quantities in quantum information theory 72
and allows us to work towards a non-commutative analogue of Theorem 1.2.13.
Lemma 2.4.30. Let A be a standard Md-convex corner. Then the function f : Rd → R
given by f(ρ) = HA(ρ) is upper semi-continuous on Rd and attains a finite maximum value
on Rd.
Proof. Since A is bounded and has non-empty interior relative to M+d , Lemma 2.4.25 and
Remark 2.4.27 give that f(ρ) = HA(ρ) is finite for all ρ ∈ Rd. Let (ρ(n))n∈N be a sequence
in Rd converging to ρ ∈ Rd. For ρ ∈ Rd and B ∈M+d satisfying kerB ⊆ ker ρ, let g(ρ,B) =
−Tr(ρ logB). We denote by A ∈ A and A(n) ∈ A the minimising matrices in the definitions
of HA(ρ) and HA(ρ(n)) respectively, that is, HA(ρ) = g(ρ,A) and HA(ρ(n)) = g(ρ(n), A(n)).
Since A has non-empty interior relative to M+d , there exists r > 0 such that rI ∈ A, and we
form B = (1− µ)A+ µrI ∈M++d where µ ∈ (0, 1). By convexity, B ∈ A.
Let A =∑d
i=1 aiviv∗i where v1, . . . , vd is an orthonormal basis of Cd and ai ≥ 0. Then
g(ρ,A) = HA(ρ) = −Tr(ρ logA) = −∑d
i=1 ρii log ai, where ρii = 〈ρvi, vi〉 . Since HA(ρ) <∞,
it is clear that ρii = 0 for all i such that ai = 0. Note that each vi is also an eigenvector of
B; indeed, B =∑d
i=1 ((1− µ)ai + µr) viv∗i . We have
g(ρ,A) ≤ g(ρ,B) =− Tr (ρ log ((1− µ)A+ µrI))
=−d∑i=1
ρii log ((1− µ)ai + µr) .
Noting that ρii = 0 when ai = 0, we see that g(ρ,B)→ g(ρ,A) as µ→ 0. Then for any δ > 0,
there exists µ ∈ (0, 1) such that g(ρ,A) ≤ g(ρ,B) ≤ g(ρ,A) + δ. For all n ∈ N it holds that
HA(ρ(n)) = g(ρ(n), A(n)) ≤ g(ρ(n), B). Then because ρ(n) → ρ and B > 0, we have
lim supn→∞
HA(ρ(n)) ≤ lim supn→∞
g(ρ(n), B) = g(ρ,B).
Finally, since δ > 0 was arbitrary and g(ρ,B) ≤ g(ρ,A) + δ = HA(ρ) + δ, we have
lim supn→∞
f(ρ(n)) = lim supn→∞
HA(ρ(n)) ≤ HA(ρ) = f(ρ),
and f is upper semi-continuous as stated. Theorem A.0.6 and the compactness of Rd show
that a maximum value is attained.
Having established that a maximum is attained, the next proposition, analogous to The-
orem 1.2.13, determines its value. (Recall that the parameter N(A) was introduced in Defi-
2.4 Entropic quantities in quantum information theory 73
nition 2.3.17.)
Theorem 2.4.31. For a standard Md-convex corner A it holds that
maxρ∈Rd
HA(ρ) = − logN(A).
Proof. Note that Rd and A are compact and convex subsets of finite dimensional spaces.
Consider the function f : Rd × A → R ∪ ∞ given by f(ρ,A) = −Tr(ρ logA). The
function ρ → f(ρ,A) is linear and hence concave for fixed A ∈ A. Note that f(ρ,A) =
D(ρ‖A) − Tr(ρ log ρ), and so from Corollary 2.4.9 we have that the function A → f(ρ,A)
is convex and lower semi-continuous for fixed ρ ∈ Rd. Thus all the conditions for applying
Theorem A.0.8 are met and by interchange of the supremum and infimum, we obtain
maxρ∈Rd
HA(ρ) = supρ∈Rd
infA∈A
f(ρ,A) = infA∈A
supρ∈Rd
f(ρ,A).
Fix A ∈ A and let λmin(A) denote the smallest eigenvalue of A and uA its corresponding unit
eigenvector. Working in R ∪ ∞ if necessary, (2.27) gives
supf(ρ,A) : ρ ∈ Rd = sup−Tr(ρ logA) : ρ ∈ Rd = − log λmin(A),
where the supremum is obtained by the pure state uAu∗A. Then
maxHA(ρ) : ρ ∈ Rd = inf− log λmin(A) : A ∈ A = − log (supλmin(A) : A ∈ A) .
(2.39)
Let m = supA∈A λmin(A). Recalling that N(A)I ∈ A, it is clear that m ≥ N(A).
Conversely, for every ε > 0, there exists A ∈ A such that m − ε < λmin(A) and hence
(m− ε)I ≤ A ∈ A. By hereditarity it follows that (m− ε)I ∈ A. Thus N(A) ≥ m− ε for all
ε > 0 and N(A) ≥ m. Thus m = N(A) and putting this in (2.39) completes the proof.
Corollary 2.4.32. For a standard Md-convex corner A it holds that
maxρ∈Rd
HA(ρ) = − logN(A) = logM(A) = log γ(A]).
Proof. This is immediate from Theorem 2.4.31 and Proposition 2.3.18.
Lemma 2.4.33. If an Md-convex corner B satisfies ∆(B) = Dd ∩ B, then
M(∆(B)) =1
N(∆(B))= γ(∆(B)[) = M(B) =
1
N(B)= γ(B]).
2.4 Entropic quantities in quantum information theory 74
Proof. Since ∆(B) = Dd∩B, from Lemma 2.2.3 it is clear that ∆(B) is a diagonal Md-convex
corner. By Corollary 2.4.14, ∆(B)[ = ∆(B]) = Dd ∩ B]. We note that ∆(B]) is a diagonal
Md-convex corner, and so by Lemma 2.4.15 it follows that γ(∆(B)[) = γ(∆(B])) = γ(B]).
Propositions 2.2.9 and 2.3.18 complete the proof.
Corollary 2.4.34. Suppose the standard Md-convex corner B satisfies ∆(B) = Dd ∩ B.
Taking φ to be the canonical bijection R+d → D
+d as given in (2.1) on page 35, it holds that
maxp∈Pd
Hφ−1(∆(B))(p) = maxρ∈Rd
HB(ρ).
Proof. As in the proof of Lemma 2.4.33, ∆(B) is a diagonal Md-convex corner. Lemma 2.4.33
gives N(∆(B)) = N(B). Then simply note from Theorem 1.2.13 and Definition 2.2.8 that
maxp∈Pd Hφ−1(∆(B))(p) = − logN(∆(B)), and from Theorem 2.4.31 that maxρ∈Rd HB(ρ) =
− logN(B).
Corollary 2.4.34 relates the entropy over certain Md-convex corners and that over related
diagonal Md-convex corners; this is also the theme of the next lemma. Some ideas in its proof,
along with Definition 4.3.1, arose in earlier discussions between Joshua Lockhart, Giannicola
Scarpa, Ivan Todorov and Andreas Winter.
Lemma 2.4.35. Let A be a diagonal convex corner and let B be a bounded Md-convex corner
with A = ∆(B) = Dd ∩ B. If p ∈ Pd and ρ =∑d
i=1 pieie∗i , then
Hφ−1(A)(p) = HB(ρ).
Proof. For A ∈ A we have A =∑d
i=1 aieie∗i with ai ≥ 0, and Tr(ρ logA) =
∑di=1 pi log ai.
Now A ⊆ B, so
HB(ρ) = minB∈B
Tr(−ρ logB) ≤ minA∈A
Tr(−ρ logA) = Hφ−1(A)(p).
For the reverse inequality, let B =∑d
j=1 bjvjv∗j ∈ B with bj ≥ 0 and v1, . . . , vd an
orthonormal basis of Cd. We have
∆(B) =
d∑i=1
〈Bei, ei〉 eie∗i =d∑i=1
d∑j=1
bj | 〈vj , ei〉 |2 eie
∗i ,
−Tr(ρ logB) = −d∑i=1
d∑j=1
pi| 〈vj , ei〉 |2 log bj ,
2.4 Entropic quantities in quantum information theory 75
and
−Tr(ρ log(∆(B))) =−d∑i=1
pi log
d∑j=1
bj | 〈vj , ei〉 |2
≤−d∑i=1
pi
d∑j=1
| 〈vj , ei〉 |2 log bj = −Tr(ρ logB),
where the inequality follows from the concavity of the logarithm function and the fact that∑dj=1 | 〈vj , ei〉 |2 = 1 for each i = 1, . . . , d. Noting that ∆(B) ∈ A gives Hφ−1(A)(p) ≤ HB(ρ),
and the proof is complete.
The following lemmas give straightforward but useful characterisations of entropy over
the convex corners BId and AId respectively.
Lemma 2.4.36. Let A be an Md-convex corner with AId ⊆ A ⊆ BId. The following are
equivalent:
(i) HA(ρ) = 0 for all states ρ;
(ii) γ(A]) = 1;
(iii) I ∈ A;
(iv) A = BId;
(v) γ(A) = d.
Proof. (i) ⇐⇒ (ii). This follows from Corollary 2.4.32 and (2.38) on page 71.
(ii) ⇐⇒ (iii). For A ⊆ BId , note that N(A) ≤ 1. The assertion then follows from Proposition
2.3.18.
(iii) ⇐⇒ (iv). From the hereditarity of A and the assumption A ⊆ BId , if I ∈ A, then
A = M ∈M+d : M ≤ I. The converse is trivial.
(iv) ⇐⇒ (v). This is immediate given that A ⊆ BId .
Lemma 2.4.37. Let A be an Md-convex corner with AId ⊆ A ⊆ BId. The following are
equivalent:
(i) HA(ρ) = H(ρ) for all states ρ;
2.4 Entropic quantities in quantum information theory 76
(ii) γ(A]) = d;
(iii) A = AId;
(iv) γ(A) = 1.
Proof. (ii) ⇐⇒ (iii). Recall that A]Id = BId and B]Id = AId , yielding AId ⊆ A] ⊆ BId . It
thus holds that A = AId ⇐⇒ A] = BId ⇐⇒ γ(A]) = d by Lemma 2.4.36.
(iii) ⇐⇒ (iv). This follows immediately from the assumption AId ⊆ A.
(iii) ⇒ (i). This was proved in (2.36) on page 70.
(i) ⇒ (iv). We prove the contrapositive statement. Suppose there exists B ∈ A with TrB =
t > 1, such that (iv) does not hold. Note that Tr(t−1B) = 1, and hence t−1B ∈ Rd. Since
1dId ∈ A, for ε ∈ (0, 1) the convex combination B′ = (1 − ε)B + ε
dId ∈ A ∩M++d satisfies
TrB′ = (1 − ε)t + ε > 1. Thus, without loss of generality we assume that B ∈ M++d . By
(2.25) we then have
log(t−1B) = log(t−1IdB) = log(t−1Id) + logB = Id log t−1 + logB,
and the von Neumann entropy H(t−1B) satisfies
H(t−1B) = Tr(−t−1B log(t−1B)) = log t+ Tr(−t−1B logB)
where log t > 0. However,
HA(t−1B) = minA∈A
Tr(−t−1B logA) ≤ Tr(−t−1B logB) < H(t−1B),
and (i) does not hold, as we required.
In the commutative case, a number of interesting results on the entropy of a probability
distribution over different Rd-convex corners are given in [11, Section 2]. For instance, for an
Rd-convex corner A it is shown that
H(p) = HA(p) +HA[(p) for all p ∈ Pd. (2.40)
We seek to consider the non-commutative setting. Many of the results in [11] require com-
mutativity, but the following definition and Proposition 2.4.38 will allow some progress in a
restricted case of the Md setting.
2.4 Entropic quantities in quantum information theory 77
Let V = v1, . . . , vd be an orthonormal basis of Cd and let A =∑d
i=1 aiuiu∗i ∈ M++
d
where u1, . . . , ud is a set of orthonormal eigenvectors of A and ai ∈ R+. We have
∆V (logA) =d∑
i,j=1
vjv∗j | 〈ui, vj〉 |2 log ai,
and for ρ ∈ Rd,
Tr(ρ∆V (logA)) =
d∑i,j=1
〈ρvj , vj〉 | 〈ui, vj〉 |2 log ai. (2.41)
Working in the extended reals R = R ∪ −∞,∞, we define Tr(ρ∆V (logA)) for all A ∈M+d
by extending (2.41) to hold for all positive semi-definite matrices.
Proposition 2.4.38. Let ρ ∈Md be a state and V be an orthonormal basis of Cd such that
ρ ∈ DV . Let A be a bounded Md-convex corner satisfying ∆V (A) ⊆ A.
Then there exists M ∈ A ∩ DV which commutes with ρ and satisfies −Tr(ρ logM) =
HA(ρ).
Proof. By Lemma 2.4.25, there exists A ∈ A satisfying −Tr(ρ logA) = HA(ρ). Let A =∑di=1 aiuiu
∗i where u1, . . . , ud is a set of orthonormal eigenvectors of A and ai > 0. Working
in R where necessary, set
A′ =
d∑j=1
vjv∗j exp2
(d∑i=1
| 〈ui, vj〉 |2 log ai
)∈ DV . (2.42)
Comparing to (2.41) we observe that
Tr(ρ logA′) =d∑
i,j=1
〈ρvj , vj〉 | 〈ui, vj〉 |2 log ai = Tr(ρ∆V (logA)). (2.43)
Using that∑d
i=1 | 〈ui, vj〉 |2 = ‖vj‖2 = 1 in (2.42) on page 77, the convexity of the exponential
function yields
A′ ≤d∑j=1
vjv∗j
d∑i=1
| 〈ui, vj〉 |2ai =d∑j=1
vjv∗j 〈Avj , vj〉 = ∆V (A) ∈ A,
where we used the assumption ∆V (A) ⊆ A. It follows by the hereditarity of A that A′ ∈ A.
Now ∆V (ρ) = ρ, and we use Lemma 2.4.12 and (2.43) to obtain
HA(ρ) = −Tr(ρ logA) = −Tr(∆V (ρ) logA
)= −Tr
(ρ∆V (logA)
)= −Tr(ρ logA′).
2.4 Entropic quantities in quantum information theory 78
Thus A′ ∈ A ∩ DV satisfies −Tr(ρ logA′) = HA(ρ) and clearly commutes with ρ ∈ DV .
The next two propositions are analogous to [11, Lemma 2.5 (a)].
Proposition 2.4.39. Let A and B be Md-convex corners, and ρ = AB ∈ Rd where A ∈ A
and B ∈ B. Then
H(ρ) ≥ HA(ρ) +HB(ρ).
Equality holds if and only if A and B are elements of A and B achieving the respective minima
in Definition 2.4.26.
Proof. Note that AB = ρ = ρ∗ = B∗A∗ = BA, establishing that A and B commute and are
thus simultaneously diagonalisable. Then
HA(ρ) +HB(ρ) ≤ −Tr(ρ logA)− Tr(ρ logB)
= −Tr(ρ log(AB)
)= −Tr ρ log ρ = H(ρ),
using (2.25) on page 60. The equality condition is immediate.
Proposition 2.4.40. Let ρ ∈Md be a state and V be an orthonormal basis of Cd such that
ρ ∈ DV . Suppose A and B are bounded Md-convex corners satisfying ∆V (A) ⊆ A, ∆V (B) ⊆
B, and B ⊆ A]. Then
H(ρ) ≤ HA(ρ) +HB(ρ).
Proof. By Proposition 2.4.38, there exist A ∈ A and B ∈ B such that HA(ρ) = −Tr(ρ logA)
and HB(ρ) = −Tr(ρ logB), and such that A,B ∈ DV . Let ρ =∑d
i=1 piviv∗i , A =
∑di=1 aiviv
∗i
and B =∑d
i=1 biviv∗i . It then follows from (2.27) on page 60 that
H(ρ)−HA(ρ)−HB(ρ) = Tr(ρ logA) + Tr(ρ logB)− Tr(ρ log ρ)
=∑i:pi>0
pi log
(aibipi
)≤ log
∑i:pi>0
aibi
≤ 0,
where the first inequality follows from the concavity of the log function and because∑d
i=1 pi =
1, while the second inequality follows from the fact that B ∈ A] and consequently 〈A,B〉 =∑di=1 aibi ≤ 1.
The following result is given in [11].
2.4 Entropic quantities in quantum information theory 79
Theorem 2.4.41. [11, Theorem 1.1] If A,B are Rd-convex corners satisfying A[ ⊆ B, then
for any p = (pi)i∈[d] ∈ Pd there exist a = (ai)i∈[d] ∈ A and b = (bi)i∈[d] ∈ B such that pi = aibi
for all i ∈ [d].
We have the following non-commutative version.
Proposition 2.4.42. Let V = v1, . . . , vd be an orthonormal basis and ρ ∈ DV ∩ Rd. If
A,B are Md-convex corners satisfying ∆V (A) ⊆ A and A] ⊆ B, then there exist A ∈ A and
B ∈ B such that ρ = AB.
Proof. Recall from Lemma 2.4.10 that ∆V (A) ⊆ A is equivalent to the condition ∆V (A) =
DV ∩ A. Let A0 = ∆V (A) = DV ∩ A and B0 = DV ∩ B. We define the bijection φ : Rd+ →
DV ∩M+d by φ
(∑di=1 aiei
)=∑d
i=1 aiviv∗i where ai ∈ R+ and e1, . . . , ed is the canonical
basis of Cd. By the argument of Lemma 2.2.3, φ−1(A0) and φ−1(B0) are Rd-convex corners.
We claim that
DV ∩ A]0 = DV ∩ A]. (2.44)
Since A0 ⊆ A, the inclusion DV ∩ A]0 ⊇ DV ∩ A] is trivial. For the reverse inclusion, fix
M ∈ DV ∩ A]0 and A ∈ A. We have
〈M,A〉 = 〈∆V (M), A〉 = 〈M,∆V (A)〉 ≤ 1,
where we have used Lemma 2.4.12 and the fact that ∆V (A) ∈ A0. Thus, M ∈ DV ∩ A] and
(2.44) holds. Now observe that
DV ∩ A]0 = DV ∩ A] ⊆ DV ∩ B = B0. (2.45)
Since 〈f, g〉 = 〈φ(f), φ(g)〉 for all f, g ∈ Rd+, it is clear that (φ−1(A0))[ = φ−1(DV ∩ A]0).
By (2.45) it then holds that (φ−1(A0))[ ⊆ φ−1(B0). For state ρ =∑d
i=1 piviv∗i ∈ DV we
set p = φ−1(ρ) ∈ Pd and apply Theorem 2.4.41. Thus there exist a = (ai) ∈ φ−1(A0) and
b = (bi) ∈ φ−1(B0) such that pi = aibi for all i ∈ [d]. Then φ(a) =∑d
i=1 aiviv∗i ∈ A0 ⊆ A
and φ(b) =∑d
i=1 biviv∗i ∈ B0 ⊆ B satisfy φ(a)φ(b) = ρ as required.
Corollary 2.4.43. Let A and B be Md-convex corners and V be an orthonormal basis of Cd
such that ∆V (A) ⊆ A and A] ⊆ B. Then for any state ρ ∈ DV
H(ρ) ≥ HA(ρ) +HB(ρ).
Proof. This follows from Propositions 2.4.39 and 2.4.42.
2.4 Entropic quantities in quantum information theory 80
The following result is the non-commutative analogue of (2.40).
Theorem 2.4.44. Let ρ ∈ Md be a state and V be an orthonormal basis of Cd such that
ρ ∈ DV . Let A be a bounded Md-convex corner satisfying ∆V (A) ⊆ A. Then H(ρ) =
HA(ρ) +HA](ρ).
Proof. The result follows from putting B = A] in Proposition 2.4.40 and Corollary 2.4.43;
Lemma 2.4.24 shows that the necessary conditions are satisfied.
We finish this section with a non-commutative analogue of the lower bound on entropy
given in Proposition 1.2.9.
Proposition 2.4.45. Let ρ ∈Md be a state and V be an orthonormal basis of Cd such that
ρ ∈ DV . Let A be a bounded Md-convex corner satisfying ∆V (A) ⊆ A. It holds that
HA(ρ) ≥ H(ρ)− log γ(A).
Equality holds if and only if γ(A)ρ ∈ A.
Proof. By Proposition 2.4.38 we can choose B ∈ A satisfying HA(ρ) = −Tr(ρ logB) and
such that B ∈ DV . Write ρ =∑d
i=1 ρiviv∗i and B =
∑di=1 biviv
∗i with ρi, bi ≥ 0. Then
H(ρ) = −Tr(ρ log ρ) = −d∑i=1
ρi log ρi,
and
HA(ρ) = −Tr(ρ logB) = −d∑i=1
ρi log bi.
Now Lemma 1.0.1 gives
d∑i=1
ρi log
(ρibi
)≥ − log
(d∑i=1
bi
)≥ − log γ(A),
whence the result is immediate. The equality condition follows as in Theorem 1.2.9.
Chapter 3
Non-commutative graphs and
associated convex corners
The non-commutative graph of a quantum channel was introduced in [13] as a generalisation
to the quantum setting of the confusability graph of a classical channel. After summarising
the classical case and necessary background from [13], we build on the theory introduced
there to explore further the analogy between graphs and non-commutative graphs. Just as
a number of Rd-convex corners can be associated with a given graph, so we will introduce a
number of Md-convex corners which can be associated with a given non-commutative graph.
These will include a quantum generalisation of the theta corner of a graph, which will be
shown to lead to a ‘quantum sandwich theorem’, analogous to Theorem 1.4.5.
3.1 Background
This section summarises the foundations of zero-error information theory and reviews some
established results on non-commutative graphs. We begin by describing the classical channel
and its confusability graph, and recall the definitions of the one shot zero-error capacity and
the Shannon capacity of a channel. Finally we recall how [13] generalises the classical theory
to associate a non-commutative graph with a quantum channel, and we summarise how the
classical channel can be embedded in the quantum setting.
81
3.1 Background 82
3.1.1 Classical channels
We have encountered situations, for example the definition of graph entropy in Definition
1.3.5, where we encode only a ‘probable set’, and in doing so ‘tolerate’ a small probability of
error. On the other hand, in zero-error information theory we impose the condition that the
probability of error be zero throughout. This was first discussed by Shannon in [48], where
the concept of zero-error capacity was introduced. Here we follow [13] and [35]. A channel is
the means by which a sender, usually named ‘Alice’, transmits a signal to a receiver, usually
named ‘Bob’. The simplest channel we can consider is the identity channel, whose output is
always equal to its input. In general the signal can be subject to interference, and in this
case we say the channel is noisy. Suppose Alice transmits a letter from the finite alphabet
X = x1, . . . , xn through the noisy channel N to Bob, who then receives a letter from the
finite alphabet Y = y1, . . . , ym. We assume the channel N is memoryless and stationary. It
may thus be described by fixed probabilities p(yi|xj) for i ∈ [m] and j ∈ [n], where p(yi|xj)
denotes the probability that Bob receives yi ∈ Y when Alice sends xj ∈ X . Although we
will write N : X → Y, we note that N is not a function. We can, however, think of the
channel as a mapping between sets of probability distributions in the following sense. Define
the matrix PN = (pij)i∈[m],j∈[n] ∈ Mm,n where pij = p(yi|xj). Let αi be the probability that
Alice sends the letter xi and βi be the probability that Bob receives the letter yi. Setting
α = (αi)i∈[n] ∈ Pn and β = (βi)i∈[m] ∈ Pm, we see that the action of channel N can be
described by the mapping Pn → Pm given by
β = PNα. (3.1)
Distinct letters xi, xj ∈ X are said to be confusable if there exists y ∈ Y such that
p(y|xi)p(y|xj) > 0, (3.2)
for in that case whenever Alice sends either xi or xj , Bob could receive y and would not know
for certain whether xi or xj was sent.
Definition 3.1.1. [13, Section I] Corresponding to noisy channel N : X → Y as above, is
the confusability graph GN with V (GN ) = X and in which xi ∼ xj if and only if xi and xj
are distinct and confusable.
Suppose Alice chooses a subset X0 ⊆ X known to Bob such that when she transmits a
letter from X0 to Bob, he can know with certainty which letter was sent.
3.1 Background 83
Definition 3.1.2. [35, Definition 6.1] The maximum possible number of elements of such a
subset X0 is called the one-shot zero-error capacity of channel N and denoted α(N).
It is clear that X0 cannot contain two mutually confusable elements of X . Indeed, it is
trivial to verify the following proposition.
Proposition 3.1.3. [35, Proposition 6.5] Let noisy channel N have confusability graph GN .
Then
α(N) = α(GN ),
where α(GN ) is the independence number of GN .
To consider multiple uses of N rather than the ‘one-shot’ case, the following standard
definition from graph theory will be needed.
Definition 3.1.4. ([15, p.155]) The strong product of graphs G and H is the graph G H
with V (G H) = V (G) × V (H) and (i, j) ' (k, l) in G H if and only if i ' k in G and
j ' l in H.
We regard k successive uses of the memoryless and stationary channel N : X → Y as a
single use of the channel Nk : X k → Yk, with
p((y1, . . . , yk)|(x1, . . . , xk)
)= p(y1|x1) . . . p(yk|xk).
Letting Gn denote the strong product of n copies of G, it is clear that the confusability
graph of Nk is given by GNk = GkN [35, Proposition 6.9]. The one-shot zero-error capacity
of Nk : X k → Yk is then given by α(GkN ), motivating the next definition.
Definition 3.1.5. [35, Section 6.2], [28, Section I]. The Shannon capacity, also known as the
zero-error capacity, of the graph G is defined by
c(G) = supn∈N
n
√α(Gn).
We now recall Fekete’s Lemma, which will be used a number of places in the sequel.
Importantly, it allows the supremum in the definition of Shannon capacity to be replaced by
a limit, as we show in Corollary 3.1.7. This corollary is well known, but we supply a proof
because we will employ the same method later for other parameters.
Lemma 3.1.6. ( Fekete’s Lemma.) A real sequence (an)n∈N is called sub-additive if am+n ≤
am + an for all m,n ∈ N. Similarly, a real sequence (an)n∈N is called super-additive if
3.1 Background 84
am+n ≥ am + an for all m,n ∈ N.
If (an)n∈N is sub-additive, then limn→∞ann exists and is equal to infn∈N
ann . (Note that we
may have to work in R ∪ −∞ and that the limit and infimum can be −∞.)
If (an)n∈N is super-additive, then limn→∞ann exists and is equal to supn∈N
ann . (Note that we
may have to work in R ∪ ∞ and that the limit and supremum can be ∞.)
Corollary 3.1.7. [28, Section I]. The limit limn→∞n√α(Gn) exists and is equal to c(G),
the Shannon capacity of the graph G.
Proof. If S and T are independent sets in graphs G and H respectively, then S × T is
independent in GH, and hence α(GH) ≥ α(G)α(H). Letting an = logα(Gn), we have
am+n = logα(Gm Gn
)≥ log
(α(Gm)α(Gn)
)=am + an,
whence Lemma 3.1.6 yields that
supn∈N
log n
√α(Gn) = lim
n→∞
(log n
√α(Gn)
),
and the result follows.
We have the intuitive meaning that zero-error transmission of a long message through
channel N is asymptotically equivalent to using the identity channel over an alphabet of size
c(GN ).
By Definition 3.1.5 it is immediate that c(G) ≥ α(G). Although c(G) = α(G) for some
graphs, in general the calculation of the Shannon capacity of an arbitrary graph is a no-
toriously difficult problem. A major breakthrough was made in [28] where Laszlo Lovasz
introduced the graph parameter θ(G), now known as the Lovasz number of graph G, and
which we have already encountered in (1.35) on page 31. Lovasz [28, Lemma 3] showed for
any graph G that α(G) ≤ θ(G). Crucially, [28, Corollary 7] shows that the Lovasz number is
sub-multiplicative over strong products, that is θ(Gn) ≤ θ(G)n for all n ∈ N. It immediately
follows that
c(G) = limn→∞
n
√α(Gn) ≤ lim
n→∞n√θ(G)n = θ(G),
3.1 Background 85
and we see that c(G) is bounded as follows:
α(G) ≤ c(G) ≤ θ(G). (3.3)
In [58], Witsenhausen solves the problem of ‘zero-error side information’. Here we follow
the discussion of that result in [35]. Suppose in addition to noisy channel N : X → Y,
Alice can communicate with Bob using an identity channel I : [k] → [k] for any k ∈ N of
her choice. Alice wishes to run this identity channel in parallel with N such that Bob can
retrieve the letter x ∈ X sent by Alice with certainty. Alice seeks a function f : X → [k],
such that when Bob receives the output from x ∈ X passing through noisy channel N along
with the output f(x) from the identity channel I, he can deduce x with certainty. However,
the identity channel is regarded as ‘expensive’, with the cost increasing with k, and so it is
desired to minimise k. The minimum value of k such that these constraints can be satisfied is
denoted χ(N) and known as the packing number of N . If noisy channel N has confusability
graph GN , it is not hard to see that it is both necessary and sufficient that function f be a
colouring of GN , giving the following result.
Proposition 3.1.8. [35, Theorem 6.11] For classical channel N with confusability graph GN
it holds that χ(N) = χ(GN ).
Moving from the one-shot case to that of n successive uses of N , the packing number of
Nn is then given by χ(Nn) = χ(GNn) = χ(GnN ). Recall for graphs G and H that χ(GH) ≤
χ(G)χ(H). (To see this, observe that if g : V (G) → [m] and h : V (H) → [n] are colourings
of G and H respectively, then f : V (G) × V (H) → [m] × [n] is a colouring of G H where
f(i, j) = (g(i), h(j)).) Using a similar argument to that of Corollary 3.1.7, it then follows
that (log(χ(Gn))n∈N is sub-additive and we can apply Lemma 3.1.6 to show the existence
of the limit in the following definition, which is analogous to Corollary 3.1.7.
Definition 3.1.9. [35, Definition 6.12]. For a graph G we define
r(G) = limn→∞
n
√χ(Gn).
If GN is the confusability graph of a noisy channel N , the Witsenhausen rate of channel N
is defined by
R(N) = log r(GN ).
For graph G it is also convenient to write R(G) = log r(G).
3.1 Background 86
3.1.2 Quantum measurement
Fundamental to quantum mechanics is the non-classical phenomenon that measurement af-
fects the system. The next proposition briefly summarises some basic facts about quantum
measurements.
Proposition 3.1.10. ([35, Theorems 3.9, 3.13].) Corresponding to a quantum measurement
on states in Md with possible outcomes λ1, . . . , λk, is a set of operators Qii∈[k] ⊆Mn,d for
some n ∈ N, known as a measurement system, satisfying∑k
i=1Q∗iQi = Id and such that, if
the quantum system is in initial state ρ ∈ Rd, then the following hold.
(i) The probability of observing outcome λi is Tr(QiρQ∗i ).
(ii) If outcome λi is observed, then the state ‘collapses’ to the density matrix
QiρQ∗i
Tr(QiρQ∗i )∈ Rn.
Recalling (2.23), from (i) and (ii) it follows that the measurement yields an ensemble with
density matrix
ρ′ =k∑i=1
QiρQ∗i ∈ Rn. (3.4)
Some notion of distinguishability of states will be needed, and Proposition 3.1.10 (ii)
suggests the following definition.
Definition 3.1.11. [35, Definition 7.1] States ρ1, . . . , ρm ∈ Rd are perfectly distinguishable
if there exists a measurement system Mii∈[k] with k ≥ m such that Tr(MiρjM∗i ) = δij for
all i ∈ [k] and j ∈ [m].
The following proposition gives a useful characterisation of the distinguishability of states.
A proof can be found in [35].
Proposition 3.1.12. [35, Proposition 7.4] States ρ1, . . . , ρm ∈ Rd are perfectly distinguish-
able if and only if ρiρj = 0 for all i 6= j.
Remark 3.1.13. (i) For unit vectors vi ∈ Cd, the density matrices v1v∗1, . . . , vkv
∗k ∈ Rd are
perfectly distinguishable if and only if v1, . . . , vk ∈ Cd are orthonormal.
(ii) By Lemma B.0.2 (x), states ρ1, . . . , ρm ∈ Rd are perfectly distinguishable if and only if
Tr(ρiρj) = 0 for all i 6= j.
3.1 Background 87
3.1.3 Quantum channels
Having discussed classical channels, we now consider the quantum setting. In this case Alice
and Bob communicate by sending and receiving quantum states. The communication between
Alice and Bob is described by the quantum channel Φ : Md →Mk such that when Alice sends
a state ρ ∈ Rd, Bob receives the state σ ∈ Rk given by σ = Φ(ρ). In fact, the concept of
a quantum channel models any physically realisable process leading to a change in state.
Following [32], [35] and [57], we first recall some definitions, and then give some important
characterisations of quantum channels.
Definition 3.1.14. Linear map Φ : Md →Mk is called trace preserving if Tr(Φ(A)
)= Tr(A)
for all A ∈Md.
Definition 3.1.15. Linear map Φ : Md →Mk is positive if Φ(A) ∈M+k for all A ∈M+
d .
For a linear map Φ : Md →Mk and m ∈ N, we define the map Φm : Mm(Md)→Mm(Mk)
by Φm((Qij)i,j∈[m]
)=(Φ(Qij)
)i,j∈[m]
, where Qij ∈Md for i, j ∈ [m].
Definition 3.1.16. Linear map Φ : Md →Mk is completely positive if Φm is positive for all
m ∈ N.
A linear map that is both completely positive and trace preserving will be called a c.p.t.p.
map. It is a consequence of the postulates of quantum mechanics that the set of quantum
channels is precisely the set of c.p.t.p. maps [54, Definition 2.13]. The next two well-known
propositions give characterisations of completely positive maps and c.p.t.p. maps. We let
Eij denote the canonical matrix unit Eij = eie∗j in Md.
Proposition 3.1.17. ([8, Theorems 1 and 2].) For map Φ : Md → Mk the following are
equivalent:
(i) Φ is completely positive;
(ii) There exist n ∈ N and A1, . . . , An ∈Mk,d such that
Φ(V ) =n∑i=1
AiV A∗i for all V ∈Md; (3.5)
(iii) The matrix PΦ = (Φ(Eij))i,j∈[d] ∈Md(Mk) is positive.
Proposition 3.1.18. [54, Corollary 2.27]. For map Φ : Md →Mk the following are equiva-
lent:
3.1 Background 88
(i) Φ is a c.p.t.p. map;
(ii) There exist n ∈ N and A1, . . . , An ∈Mk,d satisfying∑n
i=1A∗iAi = Id such that
Φ(V ) =n∑i=1
AiV A∗i for all V ∈Md; (3.6)
(iii) Matrix PΦ = (Φ(Eij))i,j∈[d] ∈Md(Mk) is positive, and Φ satisfies
Tr(Φ(Eij)) = δij for all i, j ∈ [d].
Remark 3.1.19. (i) The form (3.5) is known as a Kraus representation of Φ, and the matrices
A1, . . . An are called the Kraus operators of Φ for this representation.
(ii) The matrix PΦ as defined in Proposition 3.1.17 is known as the Choi matrix of Φ.
(iii) Comparing (3.4) with Proposition 3.1.18 (ii) shows that quantum measurement is an
example of a quantum channel.
We briefly review some properties of Choi matrices and Kraus operators. It is clear that if
Φ,Ψ are completely positive linear maps from Md to Mk, then Φ = Ψ if and only if PΦ = PΨ.
(Interestingly, if we identify Md(Mk) as Mkd, then PΦ = PΨ does not imply that Φ = Ψ for
arbitrary completely positive linear maps Φ,Ψ. For example if PΦ ∈M6, then Φ : Md →Mk
could have (d, k) = (6, 1), (3, 2), (2, 3) or (1, 6).) If, however, Φ : Md →Mk is a c.p.t.p. map,
then
Tr(PΦ) =
d∑i=1
Tr(Φ(Eii)) =
d∑i=1
Tr(Eii) = d,
and it follows that if Φ,Ψ are c.p.t.p. maps then Φ = Ψ if and only if PΦ = PΨ.
It is important to note that the Kraus representation of a completely positive map Φ as
in (3.5) is not unique; indeed, there is ‘unitary freedom’ in the choice of Kraus operators, as
described in the following standard result.
Proposition 3.1.20. [32, Theorem 8.2]. Suppose that the completely positive maps Φ and
Ψ have Kraus representations
Φ(ρ) =k∑i=1
AiρA∗i and Ψ(σ) =
l∑i=1
BiσB∗i
respectively. By appending zero operators to the shorter list we can ensure that k = l. Then
Φ = Ψ if and only if there exists a unitary matrix U = (uij)i,j∈[k] such that Ai =∑k
j=1 uijBj.
The following standard result, showing it is possible to choose a set of mutually orthogonal
3.1 Background 89
Kraus operators, is a simple corollary of Proposition 3.1.20. We supply a proof for the benefit
of the reader.
Corollary 3.1.21. (See [32, Exercise 8.10].) Given a completely positive map Φ : Md →Mk,
there exist Kraus operators F1, . . . , Fm ∈Mk,d for Φ satisfying Tr(FiF∗j ) = 0 if i 6= j.
Proof. Suppose Φ has a Kraus representation Φ(ρ) =∑m
i=1EiρE∗i for ρ ∈Md and with Ei ∈
Mk,d. Define the matrix W = (wpq)p,q∈[m] ∈Mm by wpq = Tr(EpE∗q ). Then wpq = 〈Ep, Eq〉 =
〈Eq, Ep〉 = wqp, and so W is Hermitian. There thus exists unitary U = (uij)i,j∈[m] ∈ Mm
such that UWU∗ is diagonal. Let V = (vij) = UWU∗ and set Fj =∑m
l=1 ujlEl. Proposition
3.1.20 shows that∑m
i=1 FiρF∗i is a Kraus representation of Φ. We then have
Tr(FpF∗q ) = Tr
∑l,n
uplEluqnE∗n
=∑l,n
upl Tr(ElE∗n)uqn
=∑l,n
uplwlnuqn = vpq;
noting that V is diagonal completes the proof.
Remark 3.1.22. Corollary 3.1.21 can also be seen as a consequence of the following canonical
method as used in [35, Theorem 4.8] for obtaining a (mutually orthogonal) set of Kraus
operators A1, . . . , An with n ≤ kd for a completely positive map Φ : Md → Mk from its
Choi matrix PΦ. Since PΦ ∈ M+dk, there exists an orthonormal basis u1, . . . , udk ⊆ Cdk of
eigenvectors of PΦ such that PΦ =∑n
i=1 λiuiu∗i with n ≤ kd and eigenvalues λ1, . . . , λn > 0.
Letting vi =√λiui ∈ Ckd, we have PΦ =
∑ni=1 viv
∗i . For each i = 1, . . . , n we write
vi =
v
(1)i
...
v(d)i
, with v(j)i ∈ C
k for j = 1, . . . , d,
and it is easy to see that PΦ =(∑n
l=1 v(i)l v
(j)∗l
)i,j∈[d]
, giving that
Φ(eie∗j ) =
n∑l=1
v(i)l v
(j)∗l . (3.7)
Setting Ai =(v
(1)i . . . v
(d)i
)∈Mk,d and using (3.7) yields
n∑l=1
Aleie∗jA∗l =
n∑l=1
v(i)l v
(j)∗l = Φ(eie
∗j ).
3.1 Background 90
By linearity Φ then has a Kraus representation Φ(ρ) =∑n
l=1AlρA∗l for all ρ ∈Md. It is easy
to see that Tr(AiA∗j ) = 〈vi, vj〉, so the orthogonality of the set A1, . . . , An follows from that
of v1, . . . , vn.
By analogy with the classical case, we now make the following definition for a quantum
channel Φ.
Definition 3.1.23. ([13], [35].) We define α(Φ), the one-shot zero-error capacity of quantum
channel Φ : Md →Mk, to be the maximum n ∈ N such that there exist n orthonormal state
vectors v1, . . . , vn ∈ Cd whose corresponding density matrices yield the perfectly distinguish-
able outputs Φ(v1v∗1), . . . ,Φ(vnv
∗n).
3.1.4 Non-commutative graphs
The concept of the confusability graph of a classical channel has been generalised to the
quantum setting in [13]. Here we give an overview of the theory, beginning with the definition
of an operator system, first introduced in [2]. More details can be found in [36]. Here H will
denote a complex Hilbert space, and we write L(H) for the set of all linear transformations
H → H.
Definition 3.1.24. A subspace S ⊆ L(H) is called an operator system if S is unital and
self-adjoint; that is, if I ∈ S, and X ∈ S ⇒ X∗ ∈ S.
Given an orthonormal basis of a Hilbert space H of dimension d, we work in that basis
and make the canonical identification L(H) ≡Md. We will often write Md in place of L(H)
even if we have not specified a particular basis.
Remark 3.1.25. Straightforward results in Appendix B show that if S ⊆ Md and T ⊆ Mk
are operator systems, then S ⊗ T ⊆ Mdk is an operator system. Furthermore, if d = k then
S + T ⊆Md is an operator system.
Definition 3.1.26. ([13, equation (2)]) Let Φ : Md →Mk be a quantum channel with Kraus
operators E1, . . . , En ∈Mk,d. Then the subspace
SΦ = spanE∗i Ej : i, j ∈ [n] ⊆Md (3.8)
is called the non-commutative graph of Φ.
Remark 3.1.27. (i) The term ‘non-commutative graph’ was introduced in [13]. In [55], Weaver
calls the same objects ‘quantum graphs’.
3.1 Background 91
(ii) As shown in Proposition 3.1.20, a channel Φ has many different possible sets of Kraus
operators, but the same proposition immediately shows that subspace SΦ is independent of
this choice.
It is easy to see that if Φ is a quantum channel, then the non-commutative graph SΦ as
given in (3.8) is an operator system: just note that SΦ is unital because∑k
i=1E∗i Ei = I and
self-adjoint because (E∗i Ej)∗ = E∗jEi. In fact the following stronger proposition holds.
Proposition 3.1.28. [12, Lemma 2] Let S ⊆Md. The following are equivalent:
(i) S is an operator system;
(ii) there exists a quantum channel Φ : Md →Mk for some k ∈ N such that S = SΦ.
With this in mind, the terms operator system and non-commutative graph will be used
interchangeably in the sequel.
In [13, Remark, p.5] it is noted that distinct channels can have the same non-commutative
graph. (We later see how a classical channel embeds in the quantum setting, but for now
recall that different classical channels can have the same confusability graph, since the con-
fusability graph records only which pairs of inputs are confusable, but does not give any
actual probabilities.) To illustrate this, we now supply a simple example of a whole family of
quantum channels with the same non-commutative graph.
Example 3.1.29. For p ∈ (0, 1), let Φp : M2 →M2 be given by
Φp(ρ) =4∑i=1
A(p)i ρA
(p)∗i , ρ ∈M2, (3.9)
where
A(p)1 =
√1− p e1e
∗1, A
(p)2 =
√p e1e
∗2, A
(p)3 =
√p e2e
∗1, A
(p)4 =
√1− p e2e
∗2.
We note for all p ∈ (0, 1) that∑4
i=1A(p)∗i A
(p)i = I, so Φp is a quantum channel with Kraus
representation (3.9). Furthermore, for all p ∈ (0, 1) we have SΦp = spanA(p)∗i A
(p)j : i, j =
1, 2, 3, 4 = M2. (Note Φp can be thought of as the quantum generalisation of the binary
symmetric channel with error probability p and with Kraus operators given by (3.10) on
page 92.)
The following characterisation of distinguishability is discussed in [13, Section III]. For
convenience we supply a proof.
3.1 Background 92
Proposition 3.1.30. [13] For quantum channel Φ : Md →Mk and orthonormal state vectors
u, v ∈ Cd, the following are equivalent:
(i) Φ(uu∗) and Φ(vv∗) are perfectly distinguishable;
(ii) uv∗ ∈ S⊥Φ ;
(iii) 〈u,Av〉 = 0 for all A ∈ SΦ.
Proof. Recall from Remark 3.1.13 (ii) that Φ(uu∗) and Φ(vv∗) are perfectly distinguishable
precisely when Tr (Φ(uu∗)Φ(vv∗)) = 0. Let Φ have Kraus operators E1, . . . , En. Now observe
that
Tr (Φ(uu∗)Φ(vv∗)) =∑i,j∈[n]
Tr(Eiuu∗E∗i Ejvv
∗E∗j ) =∑i,j∈[n]
Tr(v∗E∗jEiuu∗E∗i Ejv)
=∑i,j∈[n]
| 〈Eiu,Ejv〉 |2 =∑i,j∈[n]
| 〈uv∗, E∗i Ej〉 |2 = 0.
It is then clear that this vanishes if and only if 〈uv∗, E∗i Ej〉 = 0 for all i, j ∈ [n], that is
uv∗ ∈ S⊥Φ , which is trivially equivalent to the condition 〈u,Av〉 = 0 for all A ∈ SΦ.
Recall for the classical channel N : X → Y that a, b ∈ X are distinct and non-confusable
when a 6' b in GN , or equivalently when a ∼ b in GN . A comparison with Proposition 3.1.30
shows the sense in which the non-commutative graph of a quantum channel generalises the
notion of the confusability graph of a classical channel.
We consider more formally how the classical channel N : X → Y with X = x1, . . . , xn
and Y = y1, . . . , ym can be embedded in the quantum framework. As in (3.1), we describe
the channel by the matrix P =(p(yi|xj)
)i∈[m],j∈[n]
with β = Pα, where α ∈ Pn and β ∈
Pm. Let the canonical orthonormal bases of Cn and Cm be e1, . . . , en and f1, . . . , fm
respectively. Using notation as in (2.1) on page 35, for v ∈ Pk we form the diagonal matrix
φ(v) ∈ Mk. Then [13, Section II] points out that the classical channel N can be seen as the
restriction to φ(Pn) of the c.p.t.p. map ΦN : Mn →Mm with Kraus operators
Vij =√p(yi|xj)fie∗j for i ∈ [m], j ∈ [n] (3.10)
in the sense that for β = Pα we have φ(β) = ΦN (φ(α)).
Letting GN be the confusability graph of classical channel N , Definitions 3.1.1 and 3.1.26
3.1 Background 93
and (3.2) on page 82 then give
SΦN = spanV ∗ijVkl : i, k ∈ [m], j, l ∈ [n]
= spaneie∗j : i, j ∈ [n], i ' j in GN. (3.11)
It is then clear that distinct i, j ∈ X are not confusable, that is i ∼ j in GN , precisely when
eie∗j ∈ S⊥ΦN ,
a condition we compare to Proposition 3.1.30.
These observations also suggest the following way to associate an operator system SG
with a graph G.
Definition 3.1.31. [35, Section 7.2] Consider graph G with V (G) = [n]. We define SG, the
operator system associated to graph G, by
SG = spaneie∗j : i, j ∈ [n], i ' j in G.
The next corollary is immediate from (3.11).
Corollary 3.1.32. If a classical channel N : X → Y with confusability graph G is the
restriction to φ(P|X |) of c.p.t.p. map ΦN as described above, then SΦN = SG, and distinct
i, j ∈ X are confusable precisely when eie∗j ∈ SG.
Definition 3.1.23 defined α(Φ), the one-shot zero-error capacity of quantum channel Φ.
We now define the related notion of α(S) for an operator system S ⊆ Md, as introduced in
[13].
Definition 3.1.33. ([35, Definition 7.12], [13, equation 3].) Given an operator system S ⊆
Md, a orthonormal set v1, . . . , vk ⊆ Cd is S-independent if viv∗j ∈ S⊥ for all i 6= j. The
independence number of S is given by the maximum cardinality of an S-independent set and
is denoted by α(S).
If Φ is a quantum channel, Proposition 3.1.30 shows that the orthonormal set v1, . . . , vk
is SΦ-independent precisely when Φ(v1v∗1), . . . ,Φ(vkv
∗k) are perfectly distinguishable. The
following result is immediate from Definition 3.1.23, and should be compared to the classical
result in Proposition 3.1.3.
3.1 Background 94
Corollary 3.1.34. ([13, p.6], [35, Theorem 7.14]) If Φ is a quantum channel, then
α(SΦ) = α(Φ).
Moving from the one-shot case to multiple uses of channel Φ requires us to consider
tensor products of operator systems. For basic properties of the tensor product, the reader
is referred to Appendix B. The following lemma shows that the tensor product of operator
systems is the natural quantum generalisation of the classical notion of the strong product
GH of the graphs G and H. It is instructive to recall the proof.
Lemma 3.1.35. [35, Theorem 8.16] If G and H are graphs, then
SGH = SG ⊗ SH .
Proof. Let V (G) = [n] and V (H) = [m], and let the canonical orthonormal bases of Cn and
Cm be e1, . . . , en and f1, . . . , fm respectively. Then
SG ⊗ SH = spaneie∗j : i ' j in G ⊗ spanfkf∗l : k ' l in H
= spaneie∗j ⊗ fkf∗l : i ' j in G, k ' l in H
= span(ei ⊗ fk)(ej ⊗ fl)∗ : (i, k) ' (j, l) in GH = SGH
as stated.
We now show more formally how tensor products are to be used to define the Shannon
capacity of a quantum channel. If Φ : Mc → Mk and Ψ : Md → Ml are quantum channels
with sets of Kraus operators B1, . . . , Bp and A1, . . . Aq respectively, then we define the
tensor product map Φ ⊗ Ψ : (Mc ⊗Md) → (Mk ⊗Ml) as the unique linear map such that
(Φ⊗Ψ)(ρ⊗ σ) = Φ(ρ)⊗Ψ(σ) for ρ ∈Mc, σ ∈Md.
Lemma 3.1.36. If Φ, Ψ and Φ ⊗ Ψ are as given above, then Φ ⊗ Ψ is a quantum channel
with Kraus operators Bi⊗Aj for i ∈ [p], j ∈ [q] ([35, Section 3.6]), and it follows immediately
that SΦ⊗Ψ = SΦ ⊗ SΨ.
We write S⊗k for the tensor product of k copies of S, and similarly we denote the channel
given by the tensor product of k copies of Φ by Φ⊗k. Suppose quantum channel Φ has
operator system S. Then k successive uses of Φ corresponds to a single use of the quantum
channel Φ⊗k, which has operator system S⊗k and one-shot zero-error capacity α(S⊗k).
3.1 Background 95
Lemma 3.1.37. Independence number is super-multiplicative over tensor products (but not
in general multiplicative); that is for operator systems S, T we have α(S ⊗ T ) ≥ α(S)α(T ).
Proof. Suppose that ui : i = 1, . . .m is S-independent with α(S) = m and that vi : i =
1, . . . , n is T -independent with α(T ) = n. That is,⟨uiu∗j , S⟩
= 0 when S ∈ S for all i 6= j,
and⟨viv∗j , T
⟩= 0 for all T ∈ T when i 6= j. It follows that
〈(ui ⊗ vk)(uj ⊗ vl)∗, S ⊗ T 〉 =⟨uiu∗j , S⟩〈vkv∗l , T 〉 = 0
for all S ∈ S and T ∈ T when (i, k) 6= (j, l). We note also that
〈ui ⊗ vk, uj ⊗ vl〉 = 〈ui, uj〉 〈vk, vl〉 = δijδkl. (3.12)
Thus ui ⊗ vj : (i, j) ∈ [m]× [n] is S ⊗ T -independent, giving α(S ⊗ T ) ≥ mn.
To see that independence number is not multiplicative in general, merely note from
Proposition 4.1.11 that α(SG) = α(G) for a graph G. Then recall from Lemma 3.1.35
that SG ⊗ SH = SGH , and use the fact that α(GH) can exceed α(G)α(H); for example,
α(C5) = 2, but letting V (C5) = [5], it is easy to see that (1, 1), (2, 3), (3, 5), (4, 2), (5, 4) is
an independent set in C5 C5 and that α(C5 C5) = 5.
The following definition is the quantum analogue of Corollary 3.1.7. The limit can be
shown to exist by the argument of Corollary 3.1.7.
Definition 3.1.38. [13, Section II] The Shannon capacity c(S) of operator system S is given
by
c(S) = limn→∞
n√α(S⊗n).
By Lemma 3.1.37, c(S) ≥ α(S). Upper bounds on c(S) are not trivial and will be
discussed later. Recall that the packing number of a classical channel N is equal to χ(GN ),
the chromatic number of its confusability graph. [35, Section 7.3] examines the corresponding
‘zero-error side information’ problem in the quantum setting for c.p.t.p. map Φ : Md →Mk.
We seek an orthonormal basis v1, . . . , vd ⊆ Cd, an n ∈ N and a function f : [d]→ [n] such
that the outputs
(Φ⊗ I)((vi ⊗ ef(i))(vi ⊗ ef(i))
∗) = Φ(viv∗i )⊗ ef(i)e
∗f(i)
are perfectly distinguishable for i = 1, . . . , d, where I : Mn → Mn is the identity channel
3.2 Convex corners from non-commutative graphs 96
and e1, . . . , en is the canonical orthonormal basis of Cn. The least n ∈ N such that this is
possible is denoted χ(Φ) and called the packing number of Φ.
The following definition of the chromatic number of an operator system is given in [35]
and generalises the notion of the chromatic number of a graph.
Definition 3.1.39. [35, Definition 7.26] If S ⊆ Md is an operator system, we define χ(S),
the chromatic number of S, to be the minimum n ∈ N such that there exists an orthonormal
basis of Cd which may be partitioned into n S-independent sets.
The next result is proved in [35] and is the quantum version of Proposition 3.1.8.
Proposition 3.1.40. [35, Theorem 7.27] If Φ is a quantum channel with associated non-
commutative graph SΦ, then χ(SΦ) = χ(Φ).
Remark 3.1.41. We noted that classical channel N : X → Y can be regarded as the restriction
to diagonal states of the c.p.t.p. map ΦN with Kraus operators given in (3.10) on page
92. Letting GN be the confusability graph of N , by Proposition 3.1.8, (4.11) on page 130,
Corollary 3.1.32 and Proposition 3.1.40 we obtain
χ(N) = χ(GN ) = χ(SGN ) = χ(SΦN ) = χ(ΦN ),
and we see that the classical side information problem is naturally embedded in the quantum
setting.
Remark 3.1.42. For a quantum channel Φ, Proposition 4.5.7 will show the existence of the
limit limn→∞logχ(S⊗nΦ )
n . By analogy with Definition 3.1.9, this will then lead us to the defi-
nition of the Witsenhausen rate of an operator system in Definition 4.5.8.
3.2 Convex corners from non-commutative graphs
3.2.1 The abelian, clique and full projection convex corners
Section 1.4 associated a number of Rd-convex corners with a given graph on d vertices. With a
given non-commutative graph S ⊆Md, we now associate the Md-convex corners ap(S), cp(S)
and fp(S), known respectively as the abelian, clique and full projection convex corners of S.
Recalling the definition of an S-independent set in Definition 3.1.33, we now define the
analogous concepts of S-clique and S-full sets. (For comparison, we restate the definition of
S-independence.)
3.2 Convex corners from non-commutative graphs 97
Definition 3.2.1. Given a non-commutative graph S ⊆Md, an orthonormal set
v1, . . . , vk ⊆ Cd is called
(i) S-independent if viv∗j ∈ S⊥ for all i 6= j;
(ii) an S-clique if viv∗j ∈ S for all i 6= j; and
(iii) S-full if viv∗j ∈ S for all i, j ∈ [k].
Remark 3.2.2. (i) It is trivial to see that if a set is S-full then it is an S-clique.
(ii) If G is a graph and i1, . . . , in ⊆ V (G) is an independent set in G, then ei1 , . . . , ein
is SG-independent, and if i1, . . . , in ⊆ V (G) is a clique in G, then ei1 , . . . , ein is both an
SG-clique and an SG-full set.
(iii) Note that the singleton set u is both S-independent and an S-clique for any unit vector
u ∈ Cd and non-commutative graph S ⊆ Md, just as x is both an independent set and
clique of graph G for any vertex x ∈ V (G). However, for d > 1 and unit vector u ∈ Cd,
the set u is not S-full for every non-commutative graph S ⊆ Md. As a simple example,
consider the operator system CId = spanId. Let d > 1 and u ∈ Cd be a unit vector. Since
rank(uu∗) = 1, but rank(Id) = d, it is clear that uu∗ /∈ CId, and we can conclude that there
are no singleton CId-full sets for d > 1; indeed, there are no non-empty CId-full sets for d > 1.
Definition 3.2.3. For a given non-commutative graph S ⊆ Md, a projection P ∈ Md will
be called
(i) S-abelian if there exists an S-independent set viki=1 such that P =∑k
i=1 viv∗i ;
(ii) S-clique if there exists an S-clique viki=1 such that P =∑k
i=1 viv∗i ; and
(iii) S-full if there exists an S-full set viki=1 such that P =∑k
i=1 viv∗i .
Remark 3.2.4. (i) The choice of nomenclature in Definition 3.2.3 (i) is justified by the result
that projection P is S-abelian precisely when PSP is contained in an abelian C∗-subalgebra
of Md. This was proved by Vern I. Paulsen in an unpublished note, but details can be found
in [6].
(ii) The condition that projection P ∈Md is S-full for S ⊆Md is equivalent to the condition,
as stated in [6], that L(ran(P ))⊕ 0P⊥ ⊆ S where 0P⊥ denotes the zero operator on ran(P )⊥.
The equivalence is easy to see. Take P =∑k
i=1 viv∗i . First assume that v1, . . . , vk is S-full.
Then for any T ∈ L(ran(P ))⊕0P⊥ , we have T = TP =∑k
i=1(Tvi)v∗i with Tvi ∈ spanvi for
all i ∈ [k], and so T ∈ S, using that viv∗j ∈ S for all i, j ∈ [k]. Conversely, if L(ran(P ))⊕0P⊥ ⊆
3.2 Convex corners from non-commutative graphs 98
S, noting that viv∗j ∈ L(ran(P ))⊕0P⊥ for all i, j ∈ [k], we have that viv
∗j ∈ S for all i, j ∈ [k],
and v1, . . . , vk is S-full.
The following proposition associates a number of convex corners with a non-commutative
graph S. We recall from Lemma 2.2.26 that, for a bounded subset G ⊆ M+d , it holds that
her(conv(G)) is an Md-convex corner which we denote by C(G).
Definition 3.2.5. For non-commutative graph S ⊆ Md we define the following Md-convex
corners, known as the abelian, clique and full projection convex corners of S respectively:
ap(S) = C(P : P is an S-abelian projection
),
cp(S) = C(P : P is an S-clique projection
),
fp(S) = C(P : P is an S-full projection
).
We give some simple properties of these convex corners.
Lemma 3.2.6. If S ⊆Md is a non-commutative graph, then fp(S) ⊆ cp(S).
Proof. Simply note from the definitions that every S-full projection is an S-clique projection.
Lemma 3.2.7. For any operator system S ⊆Md, it holds that
AId ⊆ ap(S) ⊆ BId , AId ⊆ cp(S) ⊆ BId and fp(S) ⊆ BId ,
where AId and BId are the Md-unit corner and Md-unit cube as defined in Section 2.4.3.
Proof. To show that AId is a subset of ap(S) and cp(S) first note that every element T ∈ AIdis of the form
∑di=1 µiviv
∗i , with µi ≥ 0 for all i ∈ [d] and
∑di=1 µi ≤ 1, and where v1, . . . , vd
is an orthonormal basis of Cd. We then use Remark 3.2.2 (iii), which shows that any rank
one projection is both S-abelian and S-clique, and the result follows by Definition 3.2.5.
Since P ≤ Id for any projection P ∈ M+d , and these convex corners are generated by sets of
projections, it is clear that they are contained in BId .
Remark 3.2.8. We note by Remark 3.2.2 (iii) that it is possible that no non-zero projection
is S-full; S = spanId is an example of this. We set fp(S) = 0 in this case. However, if
G is a graph then eie∗i is an SG-full projection for all i ∈ V (G) and 1
|V (G)|I ∈ fp(SG), giving
that fp(SG) is a standard convex corner.
3.2 Convex corners from non-commutative graphs 99
Lemma 3.2.9. If operator systems S, T ⊆Md satisfy S ⊆ T then
ap(S) ⊇ ap(T ), cp(S) ⊆ cp(T ) and fp(S) ⊆ fp(T ).
Proof. For S ⊆ T , we have S⊥ ⊇ T ⊥, and the results follow immediately from Definitions
3.2.1 and 3.2.3.
We can now prove a first ‘quantum sandwich’ theorem concerning the convex corners we
have introduced. This will be seen to have important consequences later.
Theorem 3.2.10. Let S be a non-commutative graph. Then
ap(S) ⊆ cp(S)] ⊆ fp(S)].
Proof. The second inclusion follows from Lemmas 3.2.6 and 2.2.10. The first inclusion clearly
holds if every S-abelian projection lies in the Md-convex corner cp(S)]: we now show this
to be the case. Let ξiki=1 (resp. ηpmp=1) be an S-independent set (resp. an S-clique)
and P (resp. Q) the projection onto its span. By Lemma 2.2.32, it suffices to show that
Tr(PQ) ≤ 1. Since ξiξ∗j ∈ S⊥ for i 6= j, and ηpη
∗q ∈ S for p 6= q, we have that
0 = Tr(ξiξ∗j ηqη
∗p) = 〈ηq, ξj〉 〈ξi, ηp〉 whenever i 6= j and p 6= q.
Thus,
i 6= j, p 6= q =⇒ 〈ξi, ηp〉 = 0 or 〈ξj , ηq〉 = 0. (3.13)
For each i ∈ [k], let
β(i) = p ∈ [m] : 〈ξi, ηp〉 = 0,
and let
α = i ∈ [k] : β(i) 6= [m].
We write β(i)c = [m]\β(i).
We note that
Tr(PQ) =k∑i=1
m∑p=1
Tr(ξiξ∗i ηpη
∗p) =
k∑i=1
m∑p=1
|〈ξi, ηp〉|2 =k∑i=1
∑p∈β(i)c
|〈ξi, ηp〉|2,
and distinguish three cases:
3.2 Convex corners from non-commutative graphs 100
Case 1. α = ∅. Then β(i) = [m] for every i ∈ [k] and it follows that Tr(PQ) = 0 ≤ 1.
Case 2. |α| = 1. Say α = i0. Here β(i) = [m] for all i 6= i0. Then
Tr(PQ) =
k∑i=1
∑p∈β(i)c
|〈ξi, ηp〉|2 =∑
p∈β(i0)c
|〈ξi0 , ηp〉|2 ≤ 1, (3.14)
because the family ηpmp=1 is orthonormal and ‖ξi0‖ = 1.
Case 3. |α| > 1. Consider i, j ∈ [k] and p, q ∈ [m] such that i 6= j and p 6= q. Then by (3.13),
it holds either that p ∈ β(i) or q ∈ β(j). Then for i 6= j, we can conclude
(p, q) ∈ [m]× [m] : p 6= q ⊆ (β(i)× [m]) ∪ ([m]× β(j)) . (3.15)
Taking complements of both sides of (3.15) yields
β(i)c × β(j)c ⊆ (p, p) : p ∈ [m] whenever i 6= j. (3.16)
Suppose that |β(i)c × β(j)c| > 1 for some i, j with i 6= j. Then by (3.16) there exist
distinct p, p′ ∈ [m] such that
p, p′ ∈ β(i)c, and p, p′ ∈ β(j)c,
but then (p, p′) ∈ β(i)c × β(j)c, contradicting (3.16). Thus, |β(i)c × β(j)c| ≤ 1 for all pairs
(i, j) with i 6= j. Since |α| > 1, it follows that |β(i)c| = 1 for every i ∈ α. For each i ∈ α we
write β(i)c = pi. Then (pi, pj) ∈ β(i)c × β(j)c for all i, j ∈ α with i 6= j. In view of (3.16),
it holds that pi = pj for all i, j ∈ α. Let p0 be the common value of pi for all i ∈ α. Then,
Tr(PQ) =k∑i=1
∑p∈β(i)c
|〈ξi, ηp〉|2 =∑i∈α|〈ξi, ηp0〉|2 ≤ 1,
because the family ξiki=1 is orthonormal and ‖ηp0‖ = 1. This completes the proof.
3.2.2 Embedding the classical in the quantum setting
Here the aim is to show that the Md-convex corners just introduced can be seen as quantum
generalisations of Rd-convex corners used in the classical setting. Recall that associated to
a graph G with d vertices are the Rd-convex corners VP(G) and FVP(G) = VP(G)[. We
3.2 Convex corners from non-commutative graphs 101
will find it convenient to consider the associated diagonal convex corners vp(G) = φ(VP(G))
and fvp(G) = φ(FVP(G)) where φ is the canonical mapping Rd+ → D+d as in (2.1). Also
associated to graph G is the operator system SG ⊆Md as in Definition 3.1.31. In the results
below, we use notation introduced in Section 2.4.2. Recall that matrix M ∈ Md is doubly
stochastic if each row and column is a probability distribution.
Theorem 3.2.11. Let G be a graph with vertex set X = [d]. Then
(i) ∆(ap(SG)) = Dd ∩ ap(SG) = vp(G);
(ii) ∆(cp(SG)) = Dd ∩ cp(SG) = vp(G);
(iii) ∆(fp(SG)) = Dd ∩ fp(SG) = vp(G).
Proof. (i) Let S be an independent set in G. Then exe∗y ∈ S⊥G for distinct x, y ∈ S, and hence
ex : x ∈ S is an SG-independent set and∑
x∈S exe∗x ∈ Dd is an SG-abelian projection.
Since Dd ∩ (ap(SG)) is a convex set, this implies vp(G) ⊆ Dd ∩ ap(SG) ⊆ ∆(ap(SG)).
The proof of (i) is completed by showing that ∆(ap(SG)) ⊆ vp(G). Let v1, . . . , vm be
an SG-independent set and let P =∑m
i=1 viv∗i . Write vi =
∑x∈X λ
(i)x ex and a
(i)x =
∣∣∣λ(i)x
∣∣∣2 for
i ∈ [m] and x ∈ X. For i 6= j we have viv∗j =
∑x,y λ
(i)x λ
(j)y exe
∗y ∈ S⊥G . Thus for i 6= j,
a(i)x a
(j)y 6= 0 =⇒ x 6' y in G. (3.17)
Now
P =
m∑i=1
∑x,y∈X
λ(i)x λ
(i)y exe
∗y,
and so
∆(P ) =∑x∈X
m∑i=1
a(i)x exe
∗x. (3.18)
It is clear that m ≤ d since S-independent sets are orthonormal, and first we consider the
case that m = d. Here P = I = ∆(P ), and so (3.18) implies that
m∑i=1
a(i)x = 1 for all x ∈ X. (3.19)
Setting mi,x = a(i)x ≥ 0 for i, x ∈ [d], we have that
∑x∈X
mi,x =∑x∈X
a(i)x = ‖vi‖2 = 1, (3.20)
3.2 Convex corners from non-commutative graphs 102
and thatd∑i=1
mi,x =d∑i=1
a(i)x = 1
by (3.19). Thus the matrix M = (mi,x) ∈Md is doubly stochastic.
Now we consider the case that m < d. Here P ≤ I, and so ∆(P ) ≤ I. Then for each
x ∈ X, (3.18) gives that∑m
i=1 a(i)x ≤ 1. For each x ∈ X, let rx = 1 −
∑mi=1 a
(i)x and observe
that rx ≥ 0. For each x ∈ X set
mi,x = a(i)x if 1 ≤ i ≤ m,
mi,x =rx
d−mif m+ 1 ≤ i ≤ d,
and let M = (mi,x) ∈Md. Note first that if 1 ≤ i ≤ m then, as in (3.20),
∑x∈X
mi,x =∑x∈X
a(i)x = 1.
On the other hand, d−∑
x∈X rx =∑m
i=1
∑x∈X a
(i)x = m and hence, if m+ 1 ≤ i ≤ d we have
∑x∈X
mi,x =∑x∈X
rxd−m
= 1.
Finally, if x ∈ X thend∑i=1
mi,x =m∑i=1
a(i)x + rx = 1,
and the matrix M = (mi,x) ∈Md is doubly stochastic.
The Birkhoff–von Neumann theorem states that a doubly stochastic matrix can be ex-
pressed as a convex combination of permutation matrices [41]. Thus, for a given matrix M
as formed above in either the case m = d or the case m < d, there exist γ1, . . . , γl > 0 and
permutation matrices P (k) =(p
(k)i,x
)i,x∈[d]
∈Md for k = 1, . . . , l, such that∑l
k=1 γk = 1 and
M =l∑
k=1
γkP(k). (3.21)
Set
Qk =∑x∈X
m∑i=1
p(k)i,x exe
∗x. (3.22)
3.2 Convex corners from non-commutative graphs 103
For x ∈ X and 1 ≤ i ≤ m, by (3.21) we have
a(i)x = mi,x =
l∑k=1
γkp(k)i,x , (3.23)
and so (3.18) implies
∆(P ) =l∑
k=1
γkQk. (3.24)
Observe from (3.22) that Qk ∈ Dd and has diagonal entries in 0, 1. Suppose that
〈ex, Qkex〉 = 〈ey, Qkey〉 = 1 for distinct x, y ∈ X. Then there exist i, j ∈ [m] such that
p(k)i,x = p
(k)j,y = 1. Since P (k) is a permutation matrix, i 6= j. By (3.23), a
(i)x 6= 0 and a
(j)y 6= 0,
and then by (3.17), x and y are non-adjacent vertices in G. Thus, each Qk is a projection
in vp(G), and (3.24) implies that ∆(P ) ∈ vp(G). It follows by convexity that ∆(T ) ∈
vp(G) whenever T ∈ convP : P is SG-abelian. Then since vp(G) is closed, ∆(T ) ∈ vp(G)
whenever T ∈ convP : P is SG-abelian. Finally, note that if T ∈ ap(SG) = her(convP :
P is SG-abelian), then T ≤ S for some S ∈ convP : P is SG-abelian. This gives ∆(T ) ≤
∆(S) ∈ vp(G), and so ∆(T ) ∈ vp(G) by hereditarity, completing the proof of (i).
(ii),(iii) Let K be an independent set in G, that is, a clique in G. Then exe∗y ∈ SG for all
x, y ∈ K and so∑
x∈K exe∗x ∈ Dd is an SG-full projection. Together with Lemma 3.2.6, this
implies
vp(G) ⊆ Dd ∩ fp(SG) ⊆ Dd ∩ cp(SG) ⊆ ∆(cp(SG))
and
vp(G) ⊆ Dd ∩ fp(SG) ⊆ ∆(fp(SG)) ⊆ ∆(cp(SG)).
To complete the proof it will suffice to show that ∆(cp(SG)) ⊆ vp(G). Let v1, . . . , vm
be an SG-clique and P =∑m
i=1 viv∗i be the corresponding SG-clique projection. We have that
viv∗j ∈ SG for i 6= j. Writing vi =
∑x∈X λ
(i)x ex, as in the proof of (i), it holds analogously
to (3.17) that if i 6= j and λ(i)x λ
(j)y 6= 0, then x ' y in G. Following the method of proof of
(i), we now obtain that ∆(P ) ∈ vp(G), and consequently that ∆(cp(SG)) ⊆ vp(G), and the
proof is complete.
Corollary 3.2.12. Let G be a graph with vertex set X = [d]. Then
(i) ∆(ap(SG)]
)= Dd ∩ ap (SG)] = vp(G)[;
(ii) ∆(cp(SG)]
)= Dd ∩ cp (SG)] = fvp(G);
3.2 Convex corners from non-commutative graphs 104
(iii) ∆(fp(SG)]
)= Dd ∩ fp (SG)] = fvp(G).
Proof. This is immediate from Corollary 2.4.14 and Theorem 3.2.11, where we recall that
fvp(G) = vp(G)[.
Corollary 3.2.13. Let graph G have associated operator system SG. Then
(i) her(vp(G)) ⊆ ap(SG) ⊆ (vp(G)[)];
(ii) her(vp(G)) ⊆ fp(SG) ⊆ cp(SG) ⊆ (fvp(G))];
(iii) her(vp(G)[) ⊆ ap(SG)] ⊆ (vp(G))];
(iv) her(fvp(G)) ⊆ cp(SG)] ⊆ fp(SG)] ⊆ (fvp(G)[)].
Proof. We can apply Theorem 2.4.19 to Theorem 3.2.11 and Corollary 3.2.12. We have also
used Lemma 3.2.6 and Theorem 1.2.15.
Remark 3.2.14. We will see shortly that the first inclusion in Corollary 3.2.13 (ii) and the
third inclusion in Corollary 3.2.13 (iv) are in fact equalities.
The previous results, in particular Theorem 3.2.11 and Corollary 3.2.12, suggest that
ap(S) should be regarded as a non-commutative generalisation of VP(G), and that cp(S)]
and fp(S)] generalise FVP(G). We note for a graph G that it is possible to define the Rd-
convex corner CP(G) = VP(G), the convex hull of the characteristic vectors of cliques in G,
and we can thus regard cp(S) and fp(S) as non-commutative generalisations of CP(G). Given
the increased complexity of the quantum setting, it should not surprise us that a classical
object may have more than one quantum generalisation. (We will see this phenomenon again
in the case of the Lovasz number, θ(G).) It is inviting to compare cp(S) and fp(S). Lemma
3.2.6 showed in the general case that fp(S) ⊆ cp(S); we now narrow the focus to operator
systems associated with graphs. Theorem 3.2.11 shows that convex corners ap(SG), cp(SG)
and fp(SG) have the same diagonal expectations, and the same diagonal elements. However,
these convex corners are not in general equal, as we now see.
Proposition 3.2.15. Let G be a graph on d vertices. Then ap(SG) = cp(SG) = fp(SG) if
and only if G = Kd.
Proof. Let V (G) = [d]. If G = Kd, it is clear that SG = Md, and e1, . . . , ed is an SG-
independent set, an SG-clique and an SG-full set. Thus Id =∑d
i=1 eie∗i is an SG-abelian
3.2 Convex corners from non-commutative graphs 105
projection, an SG-clique projection and an SG-full projection, and
ap(SKd) = cp(SKd) = fp(SKd) = BId . (3.25)
If G 6= Kd, then there exist i, j ∈ [d] such that i 6' j in G, and we set unit vector v =
1√2(ei + ej), giving
vv∗ =1
2(eie
∗i + eje
∗j + eie
∗j + eje
∗i ). (3.26)
Now consider an SG-full set v1, . . . , vk with associated SG-full projection given by P =∑kl=1 vlv
∗l . For l = 1, . . . , k we set vl =
∑dr=1 α
(l)r er with coefficients α
(l)r ∈ C. Then
vlv∗m =
∑r,s∈[d]
α(l)r α
(m)s ere
∗s ∈ SG for all l,m ∈ [k],
but by assumption eie∗j /∈ SG. Thus for all l,m ∈ [k] we have α
(l)i α
(m)j = 0 and either
α(l)i = 0 for all l ∈ [k], or α
(m)j = 0 for all m ∈ [k]. Using that P =
∑kl=1 vlv
∗l where
vlv∗l =
∑r,s∈[d] α
(l)r α
(l)s ere
∗s, it then holds that
⟨P, eie
∗j
⟩=∑
l∈[k] α(l)i α
(l)j = 0 and that ei-
ther 〈P, eie∗i 〉 =∑
l∈[k] |α(l)i |2 = 0 or
⟨P, eje
∗j
⟩=∑
l∈[k] |α(l)j |2 = 0. Since P ≤ I we have
〈ei, P ei〉 ≤ 1 and 〈ej , P ej〉 ≤ 1. Letting Pf (SG) denote the set of all SG-full projections, it is
then clear for all A ∈ conv(Pf (SG)) that
〈ei, Aei〉+ 〈ej , Aej〉 ≤ 1 and 〈ei, Aej〉 = 〈ej , Aei〉 = 0.
Given the form of vv∗ in (3.26) and the fact that
1
2
1 1
1 1
6≤a 0
0 1− a
when a ∈ [0, 1],
it is easy to see that vv∗ 6≤ A for all A ∈ conv(Pf (SG)), and we conclude vv∗ /∈ fp(SG).
However, by Remark 3.2.2 (iii), vv∗ ∈ ap(SG) and vv∗ ∈ cp(SG), and the proof is complete.
Remark 3.2.16. We note that if G is not complete, it could still hold that ap(SG) = cp(SG);
showing when this equality holds is an open problem.
It was shown in Lemma 2.4.17 for a standard diagonal convex corner A that the strict
inclusion her(A) ( her(A[)] = (A[)] holds. The following two lemmas show that when graph
3.2 Convex corners from non-commutative graphs 106
G is neither empty nor complete, Corollary 3.2.13 (i) becomes
her(vp(G)) ( ap(SG) ( (vp(G)[)].
Lemma 3.2.17. It holds that ap(SG) = her(vp(G)) if and only if G is empty.
Proof. Consider the empty graph Kd. By (3.25), ap(SKd) = BId , and as Id ∈ vp(Kd) we
have vp(Kd) = M ∈M+d ∩ Dd : M ≤ Id, giving that
her(vp(Kd)) = BId . (3.27)
Conversely, suppose that G is non-empty with i ∼ j in G. Let v be the unit vector 1√2(ei+ej).
All rank one projections are SG-abelian, so certainly vv∗ ∈ ap(SG). Suppose that vv∗ ≤ Q ∈
vp(G). Let Q = (qij) =∑
k µkPk where µk > 0 and∑
k µk = 1, and where for the independent
set Sk ⊆ V (G) we set Pk =∑
i∈Sk eie∗i . Since Q ≥ vv∗, we have qii ≥ 1/2 and qjj ≥ 1/2. As
i ∼ j in G, no independent set Sk contains both i and j, and we must have qii = qjj = 1/2.
Then e∗i (Q − vv∗)ei = e∗j (Q − vv∗)ej = 0 and, since Q − vv∗ ≥ 0, Corollary 2.1.2 gives that
e∗i (Q − vv∗)ej = 0. But Q is diagonal and hence e∗i (Q − vv∗)ej = −e∗i vv∗ej = −1/2. From
this contradiction we conclude vv∗ /∈ her(vp(G)) and hence her(vp(G)) 6= ap(SG).
Lemma 3.2.18. It holds that ap(SG) = (vp(G)[)] if and only if G is complete.
Proof. We have SKd = Md and so the SKd-abelian projections are precisely the rank one
projections. Thus ap(SKd) = AId . The independent sets of Kd are ∅ and the singletons, so
vp(Kd) = M ∈M+d ∩ Dd : TrM ≤ 1, giving
(vp(Kd)[)] = M ∈M+
d : ∆(M) ∈ vp(Kd) = AId , (3.28)
where we have used Lemma 2.4.16.
Conversely, suppose that graph G on d vertices is not complete and that k 6' l in G. Let
A = eke∗k+eke
∗l +ele
∗k+ele
∗l ≥ 0. Note that e∗k(I−A)ek = 0 but e∗l (I−A)ek = −1, and so by
Corollary 2.1.2, I −A 6≥ 0. Since ap(SG) ⊆ M ∈M+d : M ≤ I, it follows that A /∈ ap(SG).
However, k 6' l in G and so ∆(A) = eke∗k + ele
∗l ∈ vp(G). Then A ∈ (vp(G)[)] by Lemma
2.4.16, and then ap(SG) 6= (vp(G)[)].
We now examine cp(SG) and fp(SG) in the same way.
3.2 Convex corners from non-commutative graphs 107
Lemma 3.2.19. For a graph G we have:
(i) cp(SG) = (vp(G)[)] if and only if G is empty;
(ii) cp(SG) = her(vp(G)) if and only if G is complete;
(iii) fp(SG) = her(vp(G)) for every graph G.
Proof. (i) We claim that cp(SKd) = M ∈M+
d : TrM ≤ 1. Since every rank one projection
lies in cp(SKd), it suffices to show that if rank(P ) > 1 for projection P , then P /∈ cp(SKd
). To
establish this, suppose there exists an SKd- clique projection P with rankP ≥ 2. Then there
exist orthonormal u = (ui), v = (vi) ∈ Cd such that uv∗ ∈ SKd= Dd. Suppose ui 6= 0. Then,
since 〈u, v〉 = 0, it is clear that vj 6= 0 for some j 6= i. This, however, gives⟨eie∗j , uv
∗⟩6= 0,
contradicting that uv∗ ∈ Dd. Then, recalling (3.28), we see that cp(SKd) = (vp(Kd)
[)]. Now,
say G is non-empty and k ∼ l in G. As in Lemma 3.2.18, we let A = eke∗k +eke
∗l +ele
∗k +ele
∗l ,
giving ∆(A) ∈ vp(G) and A ∈ (vp(G)[)] by Lemma 2.4.16. However by Lemma 3.2.7 we
have A /∈ cp(SG) because A 6≤ I. This proves (i).
(ii) From (3.25) and (3.27), cp(SKd) = her(vp(Kd)). To complete the proof of (ii) consider
non-complete G in which i 6' j. Let v be the unit vector 1√2(ei+ej). As a rank one projection,
vv∗ ∈ cp(SG), but, using the argument of Lemma 3.2.17, vv∗ /∈ her(vp(G)).
(iii) By Corollary 3.2.13 (ii), her(vp(G)) ⊆ fp(SG) and we now show the reverse inclusion. Let
v1, . . . , vr be SG-full and P =∑r
i=1 viv∗i . Set vi =
∑j λ
(i)j ej with λ
(i)j ∈ C. Now viv
∗j ∈ SG
for all i, j ∈ [r], and so if λ(i)l λ
(j)k 6= 0 for some i, j ∈ [r], then ele
∗k ∈ SG and l ' k in G. We
conclude that the set SQ ⊆ V (G) defined by SQ = j : λ(k)j 6= 0 for some k is a clique in
G. Then, defining Q :=∑
j∈SQ eje∗j , we have Q ∈ vp(G). By the definition of SQ we have
v1, . . . , vr ∈ spanej : j ∈ SQ. Thus
ran(P ) = spanvi : i ∈ [r] ⊆ spanej : j ∈ SQ = ran(Q),
and P ≤ Q ∈ vp(G) and so P ∈ her(vp(G)). Now vp(G) is closed and convex, and this
fact and Lemma 2.2.26 then give that her(vp(G)) is an Md-convex corner and so fp(SG) ⊆
her(vp(G)), as required.
Corollary 3.2.20. For a graph G we have:
(i) ap(SG)] = vp(G)] if and only if G is empty;
3.2 Convex corners from non-commutative graphs 108
(ii) ap(SG)] = her(vp(G)[) if and only if G is complete;
(iii) cp(SG)] = her(vp(G)[) if and only if G is empty;
(iv) cp(SG)] = vp(G)] if and only if G is complete;
(v) fp(SG)] = vp(G)] for every graph G.
Proof. These follow immediately by anti-blocking the results in Lemmas 3.2.17, 3.2.18 and
3.2.19, and using Lemma 2.2.32 and Corollary 2.3.15.
Remark 3.2.21. In [11] it is noted that graph G is perfect if and only if vp(G) = fvp(G);
for a proof see [9, Theorem 3.1]. Now, by Theorem 3.2.11 (i), vp(G) = Dd ∩ ap(SG), and
Corollary 3.2.12 (iii) gives that fvp(G) = Dd ∩ fp (SG)] . Thus G is perfect if and only if
Dd ∩ ap(SG) = Dd ∩ fp(SG)]. It is interesting to note, however, that this condition is not
equivalent to ap(SG) = fp(SG)]. Indeed, ap(SG) = fp(SG)] if and only if G is complete. To see
this, note first by the observation above that whenG is not perfect, we have ap(SG) 6= fp(SG)].
Then recall from Corollary 3.2.20 that fp(SG)] = vp(G)] = (fvp(G)[)], and so when G is
perfect we have fp(SG)] = (vp(G)[)]. However, Lemma 3.2.18 gives that ap(SG) = (vp(G)[)]
if and only if G is complete.
3.2.3 The Lovasz corner
We now wish to generalise TH(G), the theta corner of graph G as described in Definition
1.4.3, to the non-commutative setting. In this section we introduce the Md-convex corner
th(S) associated with non-commutative graph S ⊆Md, and show that it can be seen as such
a generalisation. We also give some basic properties.
We write th(G) for the diagonal convex corner φ(TH(G)). For graph G with vertex set
X = [d] we let P0(G) = φ(C(G)) where φ : Rd+ → D+d is given in (2.1) and C(G) is given in
(1.34). By Lemma 2.2.7 and Definition 1.4.3,
th(G) = P0(G)[. (3.29)
We recall the following standard definition.
Definition 3.2.22. [54, Equation 1.112] Given linear map Φ : Md →Mk we define the linear
map Φ∗ : Mk →Md, called the adjoint of Φ, by
〈T,Φ∗(S)〉 = 〈Φ(T ), S〉 for all T ∈Md and S ∈Mk.
3.2 Convex corners from non-commutative graphs 109
Ii is easy to see that if Φ : Md → Mk is a quantum channel with Kraus operators
A1, . . . , Ar ∈Mk,d, then Φ∗(σ) =∑r
i=1A∗iσAi for σ ∈Mk ([54, (2.69)]). Furthermore, if Φ is
positive, then Φ∗ is positive, and if Φ is completely positive, then Φ∗ is completely positive
([54, Proposition 2.18]). Note that if Φ is trace-preserving, it does not in general follow that
Φ∗ is trace-preserving. When Φ : Md → Mk is a quantum channel with Kraus operators
A1, . . . , Ar ∈ Mk,d, we have Φ∗(Ik) =∑r
i=1A∗iAi = Id and we say that Φ∗ is unital. (This
shows that when Φ : Md → Mk is a quantum channel with d 6= k, the adjoint channel Φ∗
cannot be trace-preserving.)
Let S ⊆ Md be an operator system. We make the following definitions where SΦ is as
defined in Definition 3.1.26:
C(S) = Φ : Md →Mk : Φ is a quantum channel with SΦ ⊆ S, k ∈ N ; (3.30)
th(S) =T ∈M+
d : Φ(T ) ≤ I for every Φ ∈ C(S)
; (3.31)
and
P(S) = Φ∗(σ) : Φ ∈ C(S), σ ≥ 0, Tr(σ) ≤ 1 . (3.32)
Remark 3.2.23. (i) We call th(S) the Lovasz corner of operator system S.
(ii) Note that for S ⊆Md, the set C(S) can contain quantum channels Md →Mk for infinitely
many k ∈ N.
(iii) In (3.32) for each choice of Φ ∈ C(S) with Φ : Md → Mk, it is obviously assumed that
we choose σ ∈M+k . This also applies in the proof of the lemma below, our ‘quantum version’
of (3.29).
Lemma 3.2.24. Let S ⊆ Md be an operator system. Then th(S) is a convex corner and
th(S) = P(S)].
Proof. For T ∈M+d , we have
T ∈ th(S) ⇔ Φ(T ) ≤ I for all Φ ∈ C(S)
⇔ 〈Φ(T ), σ〉 ≤ 1 for all Φ ∈ C(S) and all σ ≥ 0, Tr(σ) ≤ 1
⇔ 〈T,Φ∗(σ)〉 ≤ 1 for all Φ ∈ C(S) and all σ ≥ 0, Tr(σ) ≤ 1
⇔ T ∈ P(S)].
3.2 Convex corners from non-commutative graphs 110
To see the second equivalence, note by Lemma B.0.2 (v) that 〈Φ(T ), σ〉 = Tr(Φ(T )σ) ≤
Tr(σ) ≤ 1 when Φ(T ) ≤ I and Tr(σ) ≤ 1. Conversely, if Φ(T ) 6≤ I then Φ(T ) has a unit
eigenvector v such that 〈Φ(T ), vv∗〉 > 1. Having established that th(S) = P(S)], Lemma
2.2.10 gives that th(S) is a convex corner.
For a general operator system S ⊆ Md, one of the apparent difficulties in working with
th(S) is that the set C(S) used in its definition contains quantum channels Φ : Md → Mk
with no upper bound on k. However, the following results lead us to Corollary 3.2.30, which
shows that only quantum channels Φ : Md →Md2 need to be considered in order to determine
th(S).
For an operator system S ⊆Md and for fixed k ∈ N, let
Ck(S) = Φ : Md →Mk : Φ ∈ C(S) , (3.33)
thk(S) = T ∈M+d : Φ(T ) ≤ Ik for every Φ ∈ Ck(S), (3.34)
and
Pk(S) = Φ∗(σ) : Φ ∈ Ck(S), σ ∈M+k , Tr(σ) ≤ 1. (3.35)
As in Lemma 3.2.24, one can see that
Pk(S)] = thk(S), (3.36)
and by Lemma 2.2.10 it follows that thk(S) is a convex corner; we call thk(S) the kth Lovasz
corner of S. It is clear that th(S) = ∩k∈N thk(S). Furthermore, the following lemma shows
that the sets thk(S) are nested.
Lemma 3.2.25. For operator system S ⊆Md and k ∈ N, we have
thk+1(S) ⊆ thk(S).
Proof. Consider Φ ∈ Ck(S) having Kraus representation
Φ(ρ) =
q∑p=1
ApρA∗p for ρ ∈Md, with Ap ∈Mk,d,
q∑p=1
A∗pAp = Id.
For ρ ∈ M+d , write Φ(ρ) = (ϕ(ρ)ij)i,j∈[k] and let mapping Φ′ : Md → Mk+1 be given by
3.2 Convex corners from non-commutative graphs 111
Φ′(ρ) = (ϕ′(ρ)ij)i,j∈[k+1], where
ϕ′(ρ)ij =
ϕ(ρ)ij when i, j ∈ [k],
0 otherwise.
Letting the operator A′p ∈Mk+1,d be formed by giving Ap a (k+ 1)th row consisting entirely
of zeros, it is easy to see that
Φ′(ρ) =
q∑p=1
A′pρA′∗p . (3.37)
It is also clear that A′∗nA′m = A∗nAm for all n,m ∈ [q], and thus
∑qp=1A
′∗p A′p = Id, and
Φ′ is a c.p.t.p. map satisfying SΦ′ = SΦ. Thus (3.37) is a Kraus representation of Φ′ ∈
Ck+1(S). Clearly Φ(T ) ≤ Ik if and only if Φ′(T ) ≤ Ik+1, and the inclusion thk+1(S) ⊆ thk(S)
follows.
Lemma 3.2.26. Let S ⊆ Md be a non-commutative graph, k ∈ N, and Φ ∈ Ck(S). Suppose
that Φ =∑n
i=1 tiΦi where Φi : Md → Mk is a quantum channel for each i = 1, . . . , n and
ti ∈ (0, 1) with∑n
i=1 ti = 1. It follows that Φi ∈ Ck(S) for all i = 1, . . . , n.
Proof. For the case that n = 2, let Aimi=1 and Bplp=1 be families of Kraus operators for
Φ1 and Φ2 respectively. Then it is easy to verify that√
t1Ai,√t2Bp : i ∈ [m], p ∈ [l]
is a
family of Kraus operators for Φ and so A∗iAj ∈ SΦ ⊆ S and B∗pBq ∈ SΦ ⊆ S for all i, j ∈ [m]
and all p, q ∈ [l]. Thus, SΦi ⊆ S and Φi ∈ Ck(S) for i = 1, 2. A routine induction argument
completes the proof for general n ∈ N.
Lemma 3.2.27. For an operator system S ⊆Md the set Ck(S) is compact for each k ∈ N.
Proof. Consider completely positive maps Ψ,Φ : Md →Mk. The definition of the Choi matrix
as given in Proposition 3.1.17 shows that PkΦ = kPΦ for k ∈ R+ and that PΦ+Ψ = PΦ + PΨ,
and so the map Φ → PΦ is linear. It is also clear the map J : Ck(S) → PΦ : Φ ∈ Ck(S)
defined by J(Φ) = PΦ is bijective, and so by standard analysis the compactness of Ck(S)
is equivalent to that of B where B = PΦ : Φ ∈ Ck(S) ([43, Theorem 4.14]). Given that
Md(Mk) is of finite dimension, by standard theory, see for example [31, Corollary 1.4.21], it
suffices to show that B is both closed and bounded. For the boundedness of B recall for all
Φ ∈ Ck(S) that PΦ ≥ 0 and Tr(PΦ) = d, and hence ‖PΦ‖ ≤ d.
Consider the sequence (Ψn)n∈N ⊆ Ck(S) satisfying PΨn → Q. To show that B is closed it
is required to show that Q = PΨ for some Ψ ∈ Ck(S). Let Ψn have the canonical Kraus
representation Ψn(ρ) =∑kd
i=1A(n)i ρA
(n)∗i , constructed as in Remark 3.1.22. Because of
3.2 Convex corners from non-commutative graphs 112
how the operators A(n)i are constructed from the eigenvalues and eigenvectors of PΦn , it
is clear thatA
(n)i : i ∈ [kd], n ∈ N
is bounded in Hilbert–Schmidt norm. Then the se-
quence((A
(n)1 , . . . , A
(n)kd
))n∈N
has a convergent subsequence corresponding to a subsequence
of channels (Ψni)i∈N. Thus for each j ∈ [kd] we have Aj ∈ Mk,d such that A(ni)j → Aj as
i→∞. Since Ψni ∈ Ck(S) for all i ∈ N,
A(ni)∗p A(ni)
q ∈ S for all i ∈ N, p, q ∈ [kd],
and alsokd∑p=1
A(ni)∗p A(ni)
p = I for all i ∈ N.
Then by continuity and because S is closed, we have
kd∑p=1
A∗pAp = I, and A∗pAq ∈ S for all p, q ∈ [kd].
Finally, setting Ψ(ρ) =∑kd
j=1AjρA∗j , it is clear that PΨn → PΨ with Ψ ∈ Ck(S).
Since Md is an operator system, for k, d ∈ N we observe that Ck(Md) is the set of all
quantum channels from Md to Mk. By Lemma 3.2.27, Ck(Md) is compact. For c.p.t.p maps
Φ1,Φ2 : Md → Mk and for any t ∈ (0, 1), it is trivial to see that Φ = tΦ1 + (1 − t)Φ2 is a
c.p.t.p. map, and we conclude that Ck(Md) is convex.
Letting Ek be the set of all extreme points in Ck(Md), Theorem A.0.2 then gives that
Ck(Md) = conv(Ek). (3.38)
Remark 3.2.28. We note that Ck(S) is not convex for every operator system S. For example,
consider the ‘constant diagonal’ operator system S2 defined by
S2 = spanI, e1e∗2, e2e
∗1 ⊆M2.
(Not being equal to SG for any graph G, the operator system S2 is one of the simplest
to exhibit non-classical properties, and will be examined in more detail later.) Now let
3.2 Convex corners from non-commutative graphs 113
Φ : M2 →M3 be the quantum channel with Kraus operators
A1 =1√2
0 0
1 0
0 1
and A2 =1√2
0 1
0 0
1 0
,
and let Ψ : M2 →M3 be the quantum channel with Kraus operators
B1 =1√2
0 0
0 1
1 0
and B2 =1√2
0 1
1 0
0 0
.
It is easy to verify that Φ,Ψ ∈ C3(S2). By the argument in the proof of Lemma 3.2.26, for
a ∈ (0, 1) the convex combination aΦ + (1− a)Ψ has Kraus operators
√aA1,
√aA2,
√1− aB1,
√1− aB2
.
We then note that
√aA∗1√
1− aB2 =
√a(1− a)
2
1 0
0 0
/∈ S2,
and we conclude that aΦ + (1− a)Ψ /∈ C3(S2), and C3(S2) is not convex.
Proposition 3.2.29. Let S ⊆ Md be an operator system and k ≥ d2. Let Ek denote the set
of all extreme points in Ck(Md). Then
thd2(S) = thk(S) =T ∈M+
d : Ψ(T ) ≤ I for all Ψ ∈ Ck(S) ∩ Ek. (3.39)
Proof. First we show for k ∈ N that
thk(S) =T ∈M+
d : Ψ(T ) ≤ I for all Ψ ∈ Ck(S) ∩ Ek. (3.40)
To see this, denote the right hand side of (3.40) by R and note that it is trivial that thk(S) ⊆
R. To show the reverse inclusion, use (3.38) to write Φ ∈ Ck(S)\Ek as Φ =∑l
p=1 tpΦp
with Φp ∈ Ek, tp ∈ (0, 1), and∑l
p=1 tp = 1. Lemma 3.2.26 gives that Φp ∈ Ck(S) for
all p ∈ [l], and so Φp(T ) ≤ I for all p ∈ [l] and for all T ∈ R. Then if T ∈ R we have
Φ(T ) =∑l
p=1 tpΦp(T ) ≤ I and thus T ∈ thk(S), as claimed.
Now consider the case k ≥ d2. Lemma 3.2.25 gives thk(S) ⊆ thd2(S). To complete the
proof of (3.39) it must be shown that thk(S) ⊇ thd2(S). By (3.40) it is sufficient to show
3.2 Convex corners from non-commutative graphs 114
that if T ∈ thd2(S), then Ψ(T ) ≤ I for all Ψ ∈ Ck(S) ∩ Ek.
Now by [8, Theorem 5], for quantum channel Ψ ∈ Ek there exists a Kraus representation
Ψ(σ) =m∑i=1
AiσA∗i , Ai ∈Mk,d (3.41)
such that the set S = A∗iAj : i, j ∈ [m] ⊆Md is linearly independent. Then |S| ≤ dim(Md)
and so m ≤ d. Let P ∈ Mk be the projection onto the span of the ranges of the operators
A1, . . . , Am. Then since the range of a matrix is the span of its columns and each Ai ∈Mk,d,
it holds that rank(P ) ≤ md ≤ d2. Write P =∑p
i=1 viv∗i ∈ Mk for some orthonormal set
v1, . . . , vp ⊆ Ck with p ≤ d2. It is clear that PAi = Ai and A∗iP = A∗i for all i ∈ [m], and so
PΨ(σ)P = Ψ(σ) for σ ∈Md. We now show how to form a related channel Ψ′ : Md →Md2 . Let
e1, . . . , ed2 denote the canonical orthonormal basis of Cd2and let Q =
∑pi=1 eiv
∗i ∈ Md2,k.
We have Q∗Q =∑p
i=1 viv∗i = P and QQ∗ =
∑pi=1 eie
∗i ≤ Id2 . Now let map Ψ′ : Md → Md2
be given by
Ψ′(ρ) = QΨ(ρ)Q∗ =m∑i=1
(QAi)ρ(QAi)∗, ρ ∈Md.
Note that the operators QAi ∈Md2,d satisfy
m∑i=1
(QAi)∗(QAi) =
m∑i=1
A∗iPAi =m∑i=1
A∗iAi = Id,
and so Ψ′ is a c.p.t.p. map with Kraus operators QA1, . . . , QAm. We now show that Ψ′ is
related to Ψ in the following ways.
(i) We observe that
SΨ′ = span(QAi)∗(QAj) = spanA∗iPAj = spanA∗iAj = SΨ,
and so
Ψ ∈ Ck(S) ∩ Ek ⇒ Ψ′ ∈ Cd2(S). (3.42)
(ii) We claim for T ∈M+d that
Ψ′(T ) ≤ Id2 ⇒ Ψ(T ) ≤ Ik. (3.43)
To establish this, suppose for T ∈ M+d that we have Ψ′(T ) ≤ Id2 , in other words that
3.2 Convex corners from non-commutative graphs 115
0 ≤ 〈v,Ψ′(T )v〉 ≤ 〈v, v〉 for all v ∈ Cd2. For any unit vector u ∈ Ck we then have
0 ≤ 〈u,Ψ(T )u〉 = 〈u, PΨ(T )Pu〉 = 〈u,Q∗QΨ(T )Q∗Qu〉
=⟨Qu,Ψ′(T )Qu
⟩≤ 〈Qu,Qu〉 = 〈u, Pu〉 ≤ 1,
and Ψ(T ) ≤ Ik as required.
Then, if T ∈ thd2(S) and Ψ ∈ Ck(S) ∩ Ek, (3.42) gives that Ψ′(T ) ≤ Id2 and from (3.43)
we obtain Ψ(T ) ≤ Ik, as required to complete the proof.
Since th(S) =⋂k∈N thk(S), Proposition 3.2.29 and Lemma 3.2.25 immediately yield the
following corollary, showing, as promised, that only quantum channels Φ : Md → Md2 need
be considered to determine th(S) for operator system S ⊆Md.
Corollary 3.2.30. For operator system S ⊆Md it holds that th(S) = thd2(S).
The next results show how th(S) is indeed a generalisation of TH(G) by explaining how
the classical situation of a graph G with vertex set X = [d] embeds into the quantum setting.
We will call a family (Px)x∈X of projections in Mk for some k ∈ N a projective orthogonal
labelling (p.o.l.) of G acting on Mk if
x 6' y in G ⇒ PxPy = 0. (3.44)
We say(
(Px)x∈X , ρ)
is a handled projective orthogonal labelling (h.p.o.l.) of G acting on Mk
when (Px)x∈X is a p.o.l. of G acting on Mk and ρ ∈ Rk, the set of states in Mk. We define
the set P(G) ⊆ Dd by
P(G) =φ((Tr(Pxρ))x∈X
):((Px)x∈X , ρ
)is a h.p.o.l. of G
. (3.45)
Comparing to the theory in Section 1.4, we note that, if((ax)x∈X , c
)is a h.o.n.l. of G in Rk,
then((axa
∗x)x∈X , cc
∗) is a h.p.o.l. of G acting on Mk. Setting P0(G) = φ(C(G)) as on page
108, and recalling (1.34) on page 31, we obtain
P0(G) ⊆ P(G). (3.46)
Lemma 3.2.31. Let graph G have vertex set X = [d]. Then P(G) ⊆ P(G)[.
Proof. Let ((Px)x∈X , ρ) be a h.p.o.l. of G acting on Mk and ((Qx)x∈X , σ) be a h.p.o.l. of G
3.2 Convex corners from non-commutative graphs 116
acting on Ml.
Then Px⊗Qx ∈Mk⊗Ml satisfies (Px⊗Qx)2 = (Px⊗Qx)∗ = Px⊗Qx is thus a projection
for all x ∈ X. By (3.44),
〈Px ⊗Qx, Py ⊗Qy〉 = 〈Px, Py〉 〈Qx, Qy〉 = 0 for all distinct x, y ∈ X,
where we have used the fact that for distinct x, y ∈ X either x 6∼ y in G or x 6∼ y in G. As the
sum of mutually orthogonal projections in Mkl, it follows that the operator∑
x∈X Px⊗Qx ≤
Ikl. Then
∑x∈X
Tr(Pxρ) Tr(Qxσ) =∑x∈X
Tr((Px ⊗Qx)(ρ⊗ σ)
)= Tr
((∑x∈X
Px ⊗Qx
)(ρ⊗ σ)
)≤ Tr(ρ⊗ σ) = 1,
showing that φ ((Tr(Qxσ))x∈X) ∈ P(G)[, to complete the proof.
Suppose a graph G has vertex set [d]. We associate with an o.n.l. A = (ax)x∈[d] ⊆ Rk of
G the mapping ΦA : Md →Mk given by
ΦA(ρ) =∑x∈[d]
(axe∗x)ρ(exa
∗x) for ρ ∈Md. (3.47)
Since∑
x∈[d](exa∗x)(axe
∗x) =
∑x∈[d] exe
∗x = I, Proposition 3.1.18 shows that ΦA is a quantum
channel. Now (exa∗x)(aye
∗y) = 〈ay, ax〉exe∗y, which vanishes when x 6' y, and hence SΦA ⊆ SG
and ΦA ∈ C(SG).
Lemma 3.2.32. Let graph G have vertex set X = [d]. Then P0(G) ⊆ P(SG).
Proof. First note that an arbitrary element of P0(G) can be expressed as∑
x∈X | 〈ax, c〉 |2exe∗xfor some h.o.n.l. ((ax)x∈X , c) of G. Now let ΦA ∈ C(SG) be the channel defined in (3.47) for
the o.n.l. A = (ax)x∈X ⊆ Rk. Note that for σ ∈ Mk we have Φ∗A(σ) =∑
x∈X(exa∗x)σ(axe
∗x),
so for the unit vector c ∈ Rk, we have that
Φ∗A(cc∗) =∑x∈X|〈ax, c〉|2exe∗x ∈ P(SG),
as required.
3.2 Convex corners from non-commutative graphs 117
Lemma 3.2.33. Let graph G have vertex set X = [d]. Then
P(G)[ ⊆ Dd ∩ th(SG).
Proof. For all T ∈ P(G)[ it is required to show that Φ(T ) ≤ I for all Φ ∈ C(SG). For
Φ ∈ Ck(SG) we can write
Φ(T ) =m∑p=1
ApTA∗p, T ∈Md,
where A1, . . . , Am ∈ Mk,d satisfy∑m
p=1A∗pAp = I. Set ap,x = Apex ∈ Ck for all p ∈ [m] and
x ∈ X, and for each x ∈ X set Zx =∑m
p=1 ap,xa∗p,x ∈ M+
k . Let Px ∈ Mk be the projection
onto span(ap,x : p ∈ [m]) and observe that
Zx = PxZxPx for all x ∈ X. (3.48)
For all p, q ∈ [m], we have A∗pAq ∈ SΦ ⊆ SG. Now suppose that x, y ∈ X satisfy x 6' y in
G so that exe∗y ∈ S⊥G . Then for all p, q ∈ [m],
〈aq,y, ap,x〉 = 〈Aqey, Apex〉 = 〈A∗pAq, exe∗y〉 = 0.
It follows that PxPy = 0, and the family (Px)x∈X is a p.o.l. of G. On the other hand,
‖Zx‖ ≤m∑p=1
‖ap,xa∗p,x‖ =m∑p=1
〈ap,x, ap,x〉 =m∑p=1
〈A∗pApex, ex〉 = 〈ex, ex〉 = 1, (3.49)
using that ‖vv∗‖ = 〈v, v〉 for v ∈ Ck. Relations (3.48) and (3.49) imply that
Zx ≤ Px, x ∈ X. (3.50)
Now consider a general element T =∑
x∈X txexe∗x ∈ P(G)[ with tx ∈ R+. It holds that
∥∥∥∥∥∑x∈X
txPx
∥∥∥∥∥ = max
⟨v,∑x∈X
txPxv
⟩: v ∈ Ck, ‖v‖ = 1
= max
Tr
((∑x∈X
txPx
)ρ
): ρ ∈ Rk
= max
∑x∈X
tx Tr(Pxρ) : ρ ∈ Rk
≤ 1,
where the inequality follows from the anti-blocking condition, using that ((Px)x∈X , ρ) is a
3.2 Convex corners from non-commutative graphs 118
h.p.o.l. of G. Since∑
x∈X txPx ≥ 0 we have that∑
x∈X txPx ≤ Ik. Inequalities (3.50) now
imply that∑
x∈X txZx ≤ I, and so
Φ(T ) =m∑p=1
Ap
(∑x∈X
txexe∗x
)A∗p =
∑x∈X
tx
m∑p=1
(Apex)(Apex)∗
=∑x∈X
tx
m∑p=1
ap,xa∗p,x =
∑x∈X
txZx ≤ Ik.
Therefore T ∈ th(SG), and the proof is complete.
The next theorem makes clear the sense in which th(S) is a non-commutative version of
TH(G).
Theorem 3.2.34. Let graph G have vertex set X = [d]. Then
th(G) = Dd ∩ th(SG) = ∆(th(SG)).
Proof. By Definition 1.4.3, Lemma 2.2.7, (1.36), (3.46), (1.14), and Lemma 3.2.31,
P0(G)[ = φ(C(G)[) = th(G) = th(G)[ = P0(G)[[ ⊆ P(G)[[ ⊆ P(G)[[[.
Since P(G)[ is a diagonal convex corner, Lemma 2.2.7 gives that P(G)[[[ = P(G)[. Hence
P0(G)[ = th(G) ⊆ P(G)[ ⊆ P0(G)[, where the second inclusion follows from (1.14) and
(3.46). We can conclude that th(G) = P(G)[ and hence, by Lemma 3.2.33,
th(G) ⊆ Dd ∩ th(SG). (3.51)
Clearly Dd∩th(SG) ⊆ ∆(th(SG)), so the proof is completed by showing that ∆(T ) ∈ th(G)
when T ∈ th(SG). Suppose that Φ : Md →Mk is a quantum channel satisfying SΦ ⊆ SG and
having Kraus representation
Φ(T ) =
m∑p=1
ApTA∗p, T ∈Md.
For x ∈ X and p ∈ [m] set Ap,x = Ap(exe∗x). Now for p, q ∈ [m] we have A∗pAq ∈ SΦ ⊆ SG,
so we can write A∗pAq =∑
i,j∈[d] αijeie∗j with αij = 0 when i 6' j in G. For all x, y ∈ [d] this
gives
A∗p,xAq,y = (exe∗x)A∗pAq(eye
∗y) = αxyexe
∗y ∈ SG. (3.52)
3.2 Convex corners from non-commutative graphs 119
Using that∑m
p=1A∗pAp = I yields
m∑p=1
∑x∈X
A∗p,xAp,x =m∑p=1
∑x∈X
(exe∗x)A∗pAp(exe
∗x) =
∑x∈X
exe∗x = I.
Thus, the map Ψ : Md →Mk, given for S ∈Md by
Ψ(S) =
m∑p=1
∑x∈X
Ap,xSA∗p,x,
is a quantum channel which by (3.52) satisfies
SΨ = spanA∗p,xAq,y : p, q ∈ [m], x, y ∈ [d] ⊆ SG;
thus we have Ψ ∈ C(SG). For T ∈ th(SG) it then holds that
Φ(∆(T )) =
m∑p=1
Ap
(∑x∈X
(exe∗x)T (exe
∗x)
)A∗p = Ψ(T ) ≤ I,
and it follows that ∆(T ) ∈ th(SG). Now by Lemmas 3.2.24 and 3.2.32, th(SG) = P(SG)] ⊆
P0(G)]. Then ∆(T ) ∈ Dd ∩ P0(G)] = P0(G)[ = th(G), where we used Definition 2.2.6 and
(3.29) on page 108 to complete the proof.
Lemma 3.2.35. If operator systems S, T ⊆Md satisfy S ⊆ T then
th(S) ⊇ th(T ).
Proof. For S ⊆ T , we have C(S) ⊆ C(T ), and the result follows from (3.31) on page 109.
It is easy to see that the identity channel Id : Md → Md given by Id(ρ) = ρ for all
ρ ∈Md has a Kraus representation Id(ρ) = IdρI∗d , and so has the associated operator system
SId = spanId. It is immediate for any non-commutative graph S ⊆Md that
SId ⊆ S and Id ∈ C(S). (3.53)
Lemma 3.2.36. For any operator system S ⊆Md,
AId ⊆ th(S) ⊆ BId .
Proof. If T ∈ th(S), then by (3.31) and (3.53), Id(T ) = T ≤ I. This proves the second
3.2 Convex corners from non-commutative graphs 120
inclusion. For the first inclusion, consider U ∈M+d satisfying TrU ≤ 1. Since every quantum
channel is c.p.t.p., we have Tr(Φ(U)) ≤ 1 and Φ(U) ≥ 0 for all Φ ∈ C(S), and it thus holds
that Φ(U) ≤ I for all Φ ∈ C(S), giving U ∈ th(S) by (3.31).
3.2.4 A quantum sandwich theorem
Given that we have now formed Md-convex corners which generalise VP(G), FVP(G) and
TH(G) to the quantum setting, it is natural to ask if there is a quantum version of the well-
known ‘classical’ sandwich theorem given in Theorem 1.4.5. The following theorem answers
that question affirmatively.
Theorem 3.2.37. If S ⊆Md is an operator system, then
ap(S) ⊆ th(S) ⊆ fp(S)].
Proof. We begin by proving the first inclusion. Let P be an S-abelian projection, and suppose
that ξ1, . . . , ξk ⊆ Cd is an S-independent set such that P =∑k
i=1 ξiξ∗i . Consider Φ ∈ C(S)
with Kraus operators A1, . . . , Am, and let i, j ∈ [k] with i 6= j. Then
Tr(Φ(ξiξ
∗i )Φ(ξjξ
∗j ))
=m∑
p,q=1
Tr(Ap(ξiξ
∗i )A∗pAq(ξjξ
∗j )A∗q
)=
m∑p,q=1
Tr((Apξi)(Apξi)
∗(Aqξj)(Aqξj)∗)
=m∑
p,q=1
|〈Aqξj , Apξi〉|2
=
m∑p,q=1
∣∣⟨A∗pAq, ξiξ∗j ⟩∣∣2 = 0, (3.54)
where we have used that A∗pAq ∈ SΦ ⊆ S for all p, q ∈ [m], while ξiξ∗j ∈ S⊥. Since Φ(ξiξ
∗i ) ≥
0, for each i ∈ [k] we can write Φ(ξiξ∗i ) =
∑nil=1 λ
(i)l v
(i)l v
(i)∗l , where v
(i)1 , . . . , v
(i)ni are unit
eigenvectors of Φ(ξiξ∗i ) with the strictly positive eigenvalues λ
(i)1 , . . . , λ
(i)ni > 0 respectively.
For i 6= j, we apply (3.54) to obtain
Tr(Φ(ξiξ
∗i )Φ(ξjξ
∗j ))
=∑
l∈[ni], r∈[nj ]
λ(i)l λ
(j)r
∣∣∣⟨v(i)l , v(j)
r
⟩∣∣∣2 = 0,
3.2 Convex corners from non-commutative graphs 121
and it then holds for i 6= j that⟨v
(i)l , v
(j)m
⟩= 0 for all l ∈ [ni] and m ∈ [nj ]. It follows that
‖Φ(P )‖ =
∥∥∥∥∥k∑i=1
Φ(ξiξ∗i )
∥∥∥∥∥ = maxi=1,...,k
‖Φ(ξiξ∗i )‖ . (3.55)
Since Tr(ξiξ∗i ) = 1 and Φ is trace preserving, Tr(Φ(ξiξ
∗i )) = 1, and so ‖Φ(ξiξ
∗i )‖ ≤ 1, for all
i ∈ [k]. Then by (3.55), ‖Φ(P )‖ ≤ 1, and since Φ ∈ C(S) was arbitrary, we conclude that
P ∈ th(S).
Since ap(S) is generated by the S-abelian projections and th(S) is an Md-convex corner,
by Lemma 2.2.29 it holds that ap(S) ⊆ th(S), and the first inclusion is proved.
To prove the second inclusion, suppose thatQ is an S-full projection, and let η1, . . . , ηk ⊆
Cd be an S-full set such that Q =∑k
j=1 ηjη∗j ∈ Md. Let η ∈ span(η1, . . . , ηk) be a unit
vector and write Q⊥ = I −Q, giving Qη = η, Q⊥η = 0 and η∗Q⊥ = 0. Now
k∑j=1
(ηη∗j )∗(ηη∗j ) + (Q⊥)∗Q⊥ =
k∑j=1
ηjη∗ηη∗j +Q⊥ =
k∑j=1
ηjη∗j +Q⊥ = I,
and so by Proposition 3.1.18, the mapping Φ : Md →Md given by
Φ(T ) = Q⊥TQ⊥ +k∑j=1
(ηη∗j )T (ηjη∗)
is a quantum channel with Kraus operators ηη∗1, . . . , ηη∗k, Q
⊥ ∈Md. It is easy to see that
(ηiη∗)(ηη∗j ) = ηiη
∗j ∈ S for all i, j ∈ [k],
Q⊥ηη∗i = ηiη∗Q⊥ = 0 ∈ S for all i ∈ [k],
(Q⊥)∗Q⊥ = Q⊥ = I −k∑j=1
ηjη∗j ∈ S.
It follows that SΦ ⊆ S and Φ ∈ C(S).
We then have
Φ∗(ηη∗) = Q⊥ηη∗Q⊥ +
k∑j=1
ηjη∗(ηη∗)ηη∗j =
k∑j=1
ηjη∗j = Q,
and (3.32) on page 109 shows that Q ∈ P(S). Since Q was an arbitrary S-full projection,
it is clear that Pf(S) ⊆ P(S) where Pf(S) is the set of all S-full projections. By Lemmas
3.2.24 and 2.2.10 it then holds that th(S) = P(S)] ⊆ Pf(S)]. By Definition 3.2.5 we have
3.2 Convex corners from non-commutative graphs 122
fp(S) = C(Pf(S)), and Lemma 2.2.32 gives fp(S)] = Pf(S)], which is sufficient to complete
the proof.
Remark 3.2.38. Given that cp(S)] and fp(S)] are both non-commutative versions of fvp(G),
it is natural to ask if th(S) ⊆ fp(S)] can be replaced by the stronger inclusion th(S) ⊆ cp(S)]
in Theorem 3.2.37. The answer is negative, as will be shown by Lemma 4.6.21.
To complete this chapter we obtain results for the Lovasz corner th(SG) where G is a
graph, analogous to those found in Section 3.2.2 for ap(SG), cp(SG) and fp(SG).
Proposition 3.2.39. Let G be a graph with vertex set X = [d]. Then
∆(
th(SG)])
= Dd ∩ th(SG)] = th(G)[.
Proof. Simply apply Corollary 2.4.14 to Theorem 3.2.34.
Corollary 3.2.40. Let graph G have associated operator system SG. Then
(i) her(th(G)) ⊆ th(SG) ⊆ (th(G)[)];
(ii) her(th(G)[) ⊆ th(SG)] ⊆ (th(G))].
Proof. These follow from Theorem 1.2.15, Theorem 2.4.19, Proposition 3.2.39 and Theorem
3.2.34.
Lemma 3.2.41. We have th(SG) = (th(G)[)] if and only if G is complete.
Proof. Recall from the proof of Lemma 3.2.18 that ap(SKd) = AId and from (3.25) that
fp(SKd) = BId , giving fp(SKd)] = AId . Lemma 3.2.36 and Theorem 3.2.37 then yield
th(SKd) = AId . By Theorem 3.2.34,
th(Kd) = Dd ∩ th(SKd) = M ∈M+d ∩ Dd : TrM ≤ 1.
It follows from Lemma 2.4.16 that
(th(Kd)[)] = M ∈M+
d : ∆(M) ∈ th(Kd) = M ∈M+d : TrM ≤ 1 = th(SKd).
For the converse, assume that G is not complete with vertices k 6' l in G. We follow the
method of Lemma 3.2.18. LetA = eke∗k+eke
∗l +ele
∗k+ele
∗l ≥ 0. Recall that I−A 6≥ 0 and, since
3.2 Convex corners from non-commutative graphs 123
th(SG) ⊆ BId by Lemma 3.2.36, it follows that A /∈ th(SG). However we recall from Lemma
3.2.18 that A ∈ (vp(G)[)]. Theorem 1.4.5 gives vp(G) ⊆ th(G); thus vp(G)[ ⊇ th(G)[, and
(vp(G)[)] ⊆ (th(G)[)]. Then A ∈ (th(G)[)] and th(SG) 6= (th(G)[)].
Lemma 3.2.42. It holds that th(SG) = her(th(G)) if and only if G is empty.
Proof. In (3.25) we have ap(SKd) = BId , and then Theorem 3.2.37 and Lemma 3.2.36 give
that th(SKd) = BId . It follows from Theorem 3.2.34 that th(Kd) = Dd ∩ th(SKd
) = M ∈
M+d ∩ Dd : M ≤ I, giving her(th(Kd)) = BId = th(SKd
).
Conversely, suppose G is non-empty with i ∼ j in G. Setting v = 1√2(ei + ej) we have
Tr(vv∗) = 1 and vv∗ ∈ th(SG) by Lemma 3.2.36. Choosing a h.o.n.l.((a(l))l∈V (G), c
)for G
with a(i) = a(j) = c and⟨c, a(l)
⟩= 0 when l /∈ i, j gives eie
∗i + eje
∗j ∈ P0(G). Suppose
towards a contradiction that vv∗ ∈ her(th(G)), that is
vv∗ =1
2
(eie∗i + eie
∗j + ejei + eje
∗j
)≤ Q
for some Q ∈ th(G) ⊆ Dd. This requires 〈ei, Qei〉 > 12 and 〈ej , Qej〉 > 1
2 . (To see that
the inequalities are strict, observe that for Q ∈ Dd we have 〈ei, (Q− vv∗)ej〉 = −12 . But
if 〈ei, Qei〉 = 12 , we have e∗i (Q − vv∗)ei = 0, and since Q ≥ vv∗, Corollary 2.1.2 requires
〈ei, (Q− vv∗)ej〉 = 0. A similar argument applies for j.) We then have⟨Q, eie
∗i + eje
∗j
⟩> 1
and so Q 6∈ P0(G)[ = th(G), the required contradiction. We conclude vv∗ /∈ her(th(G)).
Corollary 3.2.43. If G is a graph then
(i) th(SG)] = her(th(G)[) if and only if G is complete, and
(ii) th(SG)] = th(G)] if and only if G is empty.
Proof. These are immediate from anti-blocking Lemmas 3.2.41 and 3.2.42 and using Lemma
2.2.32 and Corollary 2.3.15.
Chapter 4
Parameters for non-commutative
graphs
Continuing to explore the analogy between graphs and non-commutative graphs, we now show
how the convex corners introduced in the previous chapter lead naturally to definitions of a
number of new parameters for non-commutative graphs, including the fractional chromatic
number and the Lovasz number. We introduce another quantum generalisation of the Lovasz
number which we show to be an upper bound on the Shannon capacity of a quantum channel.
In this chapter we will also discuss the concept of non-commutative graph entropy, a quantum
analogue of graph entropy, and define a generalisation of the Witsenhausen rate to the non-
commutative setting. We conclude by illustrating the theory with some concrete examples of
operator systems.
4.1 Parameters for non-commutative graphs from convex cor-
ners
In the classical setting we have seen how many important parameters of a graph G with d
vertices can be defined in terms of Rd-convex corners associated with G; for instance from
Lemmas 1.3.9 and 1.4.2 and Definition 1.4.1 and equations (1.18), (1.19), (1.26) and (1.35)
124
4.1 Parameters for non-commutative graphs from convex corners 125
we have:
α(G) = γ(VP(G)),
ω(G) = γ(VP(G)) = γ(CP(G)),
ωf(G) = χf(G) = γ(VP(G)[),
ωf(G) = χf(G) = γ(FVP(G)),
θ(G) = γ(TH(G)),
where CP(G) is as defined on page 104.
We begin this section by using Md-convex corners to give quantum analogues of the
above results. We include a brief discussion of homomorphisms between operator systems as
introduced in [51], and also of weighted non-commutative graph parameters.
4.1.1 Defining non-commutative graph parameters
Recall that if G ⊆ M+d , then C(G) = her(conv(G)) is the convex corner generated by G. For
convex corner A we use the parameters γ(A) and M(A) as defined in Definition 2.3.17. Many
of the Md-convex corners we wish to consider are generated by families of projections. The
next two lemmas concern this situation.
Lemma 4.1.1. If P ⊆Md is a set of projections, then
γ(C(P)) = maxrankP : P ∈ P.
Proof. Let P0 ∈ P satisfy rankP0 = maxrankP : P ∈ P. Clearly TrP0 = rankP0, and
so rankP0 ≤ γ(C(P)). Now if A,B ∈ M+d and A ≥ B, then TrA ≥ TrB, meaning that
γ(C(P)) = maxTr A : A ∈ conv(P). But TrP = rankP ≤ rankP0 for all P ∈ P, and so
TrA ≤ rankP0 for all A ∈ conv(P). This gives γ(C(P)) ≤ rankP0 by continuity, and the
proof is complete.
Lemma 4.1.2. If P ⊆Md is a set of projections, then
M(C(P)) = inf
k∑i=1
λi : k ∈ N, P1, . . . , Pk ∈ P, λi > 0,k∑i=1
λiPi ≥ I
.
Proof. Denote the right hand side of the claimed identity by R. Since P ⊆ C(P), it is
clear that M(C(P)) ≤ R. For the reverse inequality, note that if ε > 0, there exists k ∈ N
4.1 Parameters for non-commutative graphs from convex corners 126
such that we can find λ1, . . . , λk > 0 and A1, . . . , Ak ∈ C(P) satisfying∑k
i=1 λiAi ≥ I and∑ki=1 λi ≤ M(C(P)) + ε. For each i ∈ [k] there exist Bi ∈ conv(P) satisfying Ai ≤ Bi and
a sequence (B(j)i )j∈N with B
(j)i ∈ conv(P) for all j ∈ N satisfying B
(j)i →j→∞ Bi. For all
δ > 0 and for each i ∈ [k] there exists ni ∈ N such that B(ni)i + δI ≥ Bi ≥ Ai, and we have∑k
i=1 λiB(ni)i ≥ I(1− rδ) where r =
∑ki=1 λi. When δ is chosen so that 1− rδ > 0, we have
k∑i=1
λi1− rδ
B(ni)i ≥ I. (4.1)
Since B(ni)i ∈ conv(P), for each i ∈ [k] we can write B
(ni)i =
∑mil=1 µ
(i)l Pi,l with each Pi,l ∈ P
and µ(i)l ∈ R+ satisfying
∑mil=1 µ
(i)l = 1. From (4.1) we can then see that
R ≤k∑i=1
mi∑l=1
1
1− rδλiµ
(i)l =
k∑i=1
1
1− rδλi ≤
1
1− rδ(M(C(P)) + ε
).
By fixing ε > 0, and hence r, and choosing arbitrarily small δ it follows that R ≤M(C(P))+ε,
Then, since ε can be chosen to be arbitrarily small, R ≤M(C(P)), as required.
The definitions of α(S) in Definition 3.1.33 and ap(S) in Definition 3.2.5 mean that
Lemma 4.1.1 has the following immediate corollary.
Corollary 4.1.3. If S ⊆Md is a non-commutative graph, then α(S) = γ(ap(S)).
The next definition continues this theme by introducing a number of new parameters
for non-commutative graphs based on their associated convex corners, in an analogous way
to the classical case. (We note that the notion of the clique number of an operator system
already exists in the literature [21].) We think of these definitions as generalisations of the
classical parameters listed at the beginning of this section; recall that we regard ap(S) (resp.
th(S)) as a quantum version of VP(G) (resp. TH(G)), and cp(S) (resp. cp(S)]) and fp(S)
(resp. fp(S)]) as quantum versions of CP(G) (resp. FVP(G)).
Definition 4.1.4. Let S ⊆Md be a non-commutative graph. We define
(i) ω(S) = γ (cp(S)) – the clique number of S;
(ii) ω(S) = γ (fp(S)) – the full number of S;
(iii) ωf(S) = γ(ap(S)]
)– the fractional clique number of S;
(iv) θ(S) = γ(th(S)) – the Lovasz number of S; and
4.1 Parameters for non-commutative graphs from convex corners 127
(v) θk(S) = γ(thk(S)) – the kth Lovasz number of S.
Remark 4.1.5. (i) Since ap(S), cp(S) and fp(S) are generated by families of projections,
it is immediate from Lemma 4.1.1 that α(S), ω(S), ω(S) are non-negative integers. Since
AId ⊆ ap(S), cp(S), it further holds that α(S), ω(S) ≥ 1. As noted in Remark 3.2.8, however,
it is possible that fp(S) = 0, and in this case ω(S) = 0.
(ii) By Lemma 4.1.1, it is clear that ω(S) (resp. ω(S)) is the maximum cardinality of an
S-clique (resp. an S-full set).
Letting Pa(S) denote the set of all S-abelian projections, it is useful to note that Definition
3.1.39 of the chromatic number of operator system S can be stated equivalently as
χ(S) = min
k ∈ N : P1, . . . , Pk ∈ Pa(S),
k∑i=1
Pi = I
, (4.2)
showing that χ(S) can be thought of as a ‘covering number’ for S using S-abelian projections.
Note trivially that if S ⊆ Md is an operator system, then every rank one projection is an
S-abelian projection, and it thus holds that χ(S) ≤ d. Using S-clique or S-full projections
instead of S-abelian projections in (4.2), we now define two more parameters for a non-
commutative graph. (Remark 4.2.1 will show these correspond to parameters introduced in
[21].) We let Pc(S) and Pf(S) denote the sets of S-clique projections and S-full projections
respectively.
Definition 4.1.6. For operator system S, the clique covering number Ω(S) and the full
covering number Ω(S) are given by
Ω(S) = min
k ∈ N : P1, . . . , Pk ∈ Pc(S),
k∑i=1
Pi = I
, (4.3)
Ω(S) = min
k ∈ N : P1, . . . , Pk ∈ Pf(S),
k∑i=1
Pi = I
. (4.4)
Remark 4.1.7. Let S ⊆Md be an operator system.
(i) Since every rank one projection is an S-clique projection, we have 1 ≤ Ω(S) ≤ d. However,
it may be that there is no k ∈ N such that the condition on the right of (4.4) is satisfied, in
which case we set Ω(S) =∞. Thus, in general it holds that 1 ≤ Ω(S) ≤ ∞.
(ii) It is clear from the respective definitions that Ω(S) = 1 ⇐⇒ ω(S) = d and that
Ω(S) = 1 ⇐⇒ ω(S) = d.
Relaxing the conditions in equations (4.2), (4.3) and (4.4) just as in the definition of the
fractional chromatic number of a graph in (1.16), yields ‘fractional’ versions of χ(S),Ω(S)
4.1 Parameters for non-commutative graphs from convex corners 128
and Ω(S).
Definition 4.1.8. For non-commutative graph S ⊆Md, we define
(i) χf(S), the fractional chromatic number of S, by
χf(S) = inf
k∑i=1
λi : k ∈ N, λi > 0, P1, . . . , Pk ∈ Pa(S),
k∑i=1
λiPi ≥ I
; (4.5)
(ii) Ωf(S), the fractional clique covering number of S, by
Ωf(S) = inf
k∑i=1
λi : k ∈ N, λi > 0, P1, . . . , Pk ∈ Pc(S),
k∑i=1
λiPi ≥ I
; (4.6)
and
(iii) Ωf(S), the fractional full covering number of S, by
Ωf(S) = inf
k∑i=1
λi : k ∈ N, λi > 0, P1, . . . , Pk ∈ Pf(S),k∑i=1
λiPi ≥ I
. (4.7)
If fp(S) has empty interior relative to M+d , then the condition on the right of (4.7) cannot
be satisfied, and we set Ωf(S) =∞.
(Note that in [6], Ωf(S) was called the complementary fractional clique number and
denoted by κ(S), and that Ωf(S) was called the complementary fractional full number and
denoted by ϕ(S).) It is immediate from the definitions above that if S ⊆ Md is a non-
commutative graph, then
1 ≤ χf(S) ≤ χ(S) ≤ d, 1 ≤ Ωf(S) ≤ Ω(S) ≤ d, and 1 ≤ Ωf(S) ≤ Ω(S). (4.8)
(To see that each parameter is lower bounded by 1, just note that each projection P satisfies
P ≤ I, and so if∑k
i=1 λi < 1 we have∑k
i=1 λiPi < I.) Since Pf(S) ⊆ Pc(S), it is also clear
that
Ω(S) ≤ Ω(S) and Ωf(S) ≤ Ωf(S).
It is easy to see that these ‘fractional’ parameters for an operator system S can be related
to convex corners associated to S.
Theorem 4.1.9. It holds that
χf(S) = γ(ap(S)]), Ωf(S) = γ(cp(S)]), Ωf(S) = γ(fp(S)]).
4.1 Parameters for non-commutative graphs from convex corners 129
Proof. By Lemma 4.1.2 it is clear that
χf(S) = M(ap(S)), Ωf(S) = M(cp(S)), Ωf(S) = M(fp(S));
we then apply Proposition 2.3.18.
As each of the parameters in Definition 4.1.8 is of the form γ(A) for some Md-convex
corner A, Corollary 2.4.32 gives each an entropic significance. In particular,
logχf(S) = maxρ∈Rd
Hap(S)(ρ), (4.9)
log Ωf(S) = maxρ∈Rd
Hcp(S)(ρ),
log Ωf(S) = maxρ∈Rd
Hfp(S)(ρ).
Recall from (1.26) the equality of the fractional chromatic number and fractional clique
number of a graph. It is now clear that the corresponding result for operator systems holds.
Corollary 4.1.10. If S ⊆Md is a non-commutative graph, then
ωf(S) = χf(S). (4.10)
Proof. This is immediate from Definition 4.1.4 and Theorem 4.1.9.
We now show how the classical case is embedded in the quantum setting by verifying that
each non-commutative graph parameter of the operator system SG is equal to the correspond-
ing classical graph parameter of the graph G. (The result in the case of independence number
was proved by a combinatorial method in [35]; [21] gives a proof for both independence and
clique numbers.) Note that χf(G) is commonly written as χf(G) and called the fractional
clique covering number of G.
Proposition 4.1.11. If graph G has associated operator system SG, then
α(SG) = α(G),
ω(SG) = ω(SG) = ω(G),
ωf(SG) = χf(SG) = ωf(G) = χf(G),
θ(SG) = θ(G),
Ωf(SG) = Ωf(SG) = ωf(G) = χf(G).
4.1 Parameters for non-commutative graphs from convex corners 130
Proof. Apply Lemma 2.4.15 to Theorem 3.2.11, Corollary 3.2.12 and Theorem 3.2.34 together
with Lemmas 1.3.9 and 1.4.2, Definition 2.2.8, Theorem 4.1.9 and equations (1.18), (1.19),
(1.26), (1.35) and (4.10).
In [35] it is proved that χ(SG) = χ(G) for a graph G. The method used there extends to
yield the analogous results in the case of the clique and full covering numbers.
Proposition 4.1.12. If graph G has associated operator system SG, then
χ(SG) = χ(G), and (4.11)
Ω(SG) = Ω(SG) = χ(G). (4.12)
Proof. (We follow [35, Theorem 7.27].) Let V (G) = [d]. If i1, . . . , ik is an independent set
(resp. clique) in graph G, then ei1 , . . . , eik is an SG-independent set (resp. an SG-clique and
SG-full set) and P =∑k
j=1 eije∗ij
is an SG-abelian projection (resp. an SG-clique projection
and an SG-full projection) by Remark 3.2.2 (i) and Definition 3.2.3. It easily follows that
χ(SG) ≤ χ(G) and that
Ω(SG) ≤ Ω(SG) ≤ χ(G). (4.13)
We now work towards the reverse inequalities. Suppose that χ(SG) = k. Then by
(4.2), there exist SG-abelian projections P1, . . . , Pk satisfying∑k
i=1 Pi = I. By (B.1) we have
ran(Pi) ⊥ ran(Pj) for distinct i, j ∈ [k]. There is then an orthonormal basis v1, . . . , vd of
Cd and a partition S1 ∪ . . . ∪ Sk of [d] such that Pi =∑
j∈Si vjv∗j for each i = 1, . . . , k, and
where S1, . . . , Sk are SG-independent sets.
Denoting the canonical basis of Cd by e1, . . . , ed, it is a standard combinatorial result
(see [35, Lemma 7.28] and [21, Lemma 13]) that there exists a permutation σ on [d] such that⟨eσ(i), vi
⟩6= 0 for all i ∈ [d], and so for j, k ∈ [d] it holds that
⟨vjv∗k, eσ(j)e
∗σ(k)
⟩=⟨eσ(k), vk
⟩ ⟨vj , eσ(j)
⟩6= 0. (4.14)
Since Si is an SG-independent set, vpv∗q ∈ S⊥G for distinct p, q ∈ Si, and so by (4.14),
eσ(p)e∗σ(q) /∈ SG. It then holds that σ(p) 6∼ σ(q) in G for distinct p, q ∈ Si, and we conclude
that σ(j) : j ∈ Si is an independent set in G for each i ∈ [k]. Thus we have k independent
sets which partition V (G), giving χ(G) ≤ χ(SG) as required.
We use an analogous argument to show that χ(G) ≤ Ω(SG), which with (4.13) is sufficient
4.1 Parameters for non-commutative graphs from convex corners 131
to prove (4.12). Suppose that Ω(SG) = l. Then by (4.3), there exist SG-clique projections
Q1, . . . , Ql satisfying∑l
i=1Qi = I. As above, there is then an orthonormal basis u1, . . . , ud
of Cd and a partition T1 ∪ . . . ∪ Tl of [d] such that Qi =∑
j∈Ti uju∗j for each i = 1, . . . , l, and
where T1, . . . , Tl are SG-cliques. Now there is a permutation τ on [d] such that
⟨uju∗k, eτ(j)e
∗τ(k)
⟩6= 0 for j, k ∈ [d].
Since Ti is an SG-clique, upu∗q ∈ SG for distinct p, q ∈ Ti, giving in that case that eτ(p)e
∗τ(q) /∈
S⊥G . It then holds that τ(p) ∼ τ(q) in G for distinct p, q ∈ Ti, and thus τ(j) : j ∈ Ti is a
clique in G for each i ∈ [l]. We then have l cliques which partition V (G), and χ(G) ≤ Ω(SG)
as required.
4.1.2 Properties of non-commutative graph parameters
Here we consider some properties of non-commutative graph parameters, beginning with the
following monotonicity results.
Lemma 4.1.13. Let S, T ⊆Md be operator systems satisfying S ⊆ T .
(i) If ζ ∈ α, θ,Ωf , Ωf ,Ω, Ω, then ζ(S) ≥ ζ(T ).
(ii) If ζ ∈ ω, ω, ωf , χ, then ζ(S) ≤ ζ(T ).
Proof. The result for α is immediate from Definition 3.1.33. Noting that if S ⊆ T , then
Pa(T ) ⊆ Pa(S), Pc(S) ⊆ Pc(T ) and Pf(S) ⊆ Pf(T ), the results for Ω, Ω,Ωf and Ωf follow
from Definitions 4.1.8 and 4.1.6 and (4.2). The remaining cases are immediate from Lemmas
3.2.9, 3.2.35 and 2.2.10 (ii).
The various non-commutative graph parameters we have discussed satisfy a number of
further important inequalities, which we now state and prove. Note that Theorem 4.1.14 (i)
is the quantum version of the ‘classical sandwich’ result (1.37).
Theorem 4.1.14. If S ⊆ Md is a non-commutative graph, then the following inequalities
apply.
(i) 1 ≤ α(S) ≤ θ(S) ≤ Ωf(S),
(ii) θ(S) ≤ d,
(iii) α(S) ≤ Ωf(S) ≤ Ωf(S),
4.1 Parameters for non-commutative graphs from convex corners 132
(iv) 0 ≤ ω(S) ≤ ω(S) ≤ ωf(S) = χf(S) ≤ d.
Proof. (i) Use Lemma 3.2.7 and Theorem 3.2.37 together with Theorem 4.1.9, Corollary 4.1.3
and (2.20).
(ii) This is immediate from Lemma 3.2.36.
(iii) Use Theorem 3.2.10.
(iv) By anti-blocking Theorem 3.2.10 and using Lemmas 2.2.33 and 3.2.7 we obtain fp(S) ⊆
cp(S) ⊆ ap(S)] ⊆ BId , and the assertion follows by Corollary 4.10.
Our goal is to extend the analogy between graphs and non-commutative graphs, and to
that end it is useful to note how some well-known results for graphs can be generalised. For
instance, if a graph G has d vertices, it is clear that
1 ≤ ω(G) ≤ ωf(G) = χf(G) ≤ χ(G) ≤ d.
Using inequalities from Theorem 4.1.14 as well as (4.8), the generalisation
1 ≤ ω(S) ≤ ωf(S) = χf(S) ≤ χ(S) ≤ d
holds for any non-commutative graph S ⊆Md.
Since for a graph G we can partition V (G) into χ(G) independent sets each of cardinality
at most α(G), it is clear that the well-known result α(G)χ(G) ≥ d holds, with ‘complemen-
tary’ version ω(G)χ(G) ≥ d. It is trivial to generalise for non-commutative graphs. (Note
that corresponding results for so-called ‘operator anti-systems’ are considered in [21, Section
3.1]; operator anti-systems are discussed in Section 4.2.)
Lemma 4.1.15. If S ⊆Md is a non-commutative graph, then
α(S)χ(S) ≥ d and ω(S)Ω(S) ≥ d.
Furthermore, if ω(S) ≥ 1, then ω(S)Ω(S) ≥ d.
Proof. Suppose that χ(S) = k and that abelian projections P1, . . . , Pk ∈Md satisfy∑k
i=1 Pi =
Id. We then have that∑k
i=1 rankPi = d. For all P ∈ Pa(S), Lemma 4.1.1 gives that
rankP ≤ α(S), and so kα(S) ≥ d, and the first assertion holds. The other results follow in
the same way.
4.1 Parameters for non-commutative graphs from convex corners 133
The following inequality is immediate from Definition 4.1.4, Lemma 3.2.25 and Corollary
3.2.30.
Corollary 4.1.16. For operator system S ⊆Md and k = 1, 2, . . .,
θd2(S) = θ(S) ≤ θk+1(S) ≤ θk(S).
For operator system S ⊆Md, Lemmas 3.2.7 and 3.2.36 show that ap(S), cp(S) and th(S)
are sandwiched between AId and BId ; by Lemma 2.2.10 it is easily seen that so also are
their anti-blockers. Lemmas 2.4.36 and 2.4.37 then yield some simple equivalences in the
extreme cases of α(S), ω(S), ωf(S),Ωf(S), θ(S) ∈ 1, d. The same cannot be said for fp(S)
and fp(S)]. The next lemmas concern ω(S) and Ωf(S).
Lemma 4.1.17. The following equivalences hold for operator system S ⊆Md:
Ωf(S) = 1 ⇐⇒ ω(S) = d ⇐⇒ S = Md.
Proof. Lemma 3.2.7 gives that fp(S) ⊆ BId , and hence AId ⊆ fp(S)]. Thus, if Ωf(S) =
γ(fp(S)]) = 1, then fp(S)] = AId , yielding fp(S) = BId and ω(S) = d. It then holds
that I ∈ fp(S) and I is an S-full projection. In that case there exists an orthonormal
basis v1, . . . , vd of Cd such that viv∗j ∈ S for all i, j ∈ [d], giving that S = span(viv∗j :
i, j ∈ [d] = Md. The proof is completed by noting that if S = Md, then fp(S) = BId and
Ωf(S) = γ(fp(S)]) = γ(AId) = 1.
Proposition 4.1.18. (i) If operator system S ⊆M+d satisfies ω(S) = 0, then Ωf(S) =∞.
(ii) The following equivalences hold for operator system S:
Ωf(S) =∞ ⇐⇒ fp(S)] is unbounded
⇐⇒ fp(S) has empty interior relative to M+d
Proof. (i) The condition ω(S) = 0 holds if and only if fp(S) = 0, or equivalently fp(S)] =
M+d , which yields Ωf(S) = γ(fp(S)]) =∞.
(ii) Since Ωf(S) = γ(fp(S)]), this is immediate from Lemma 2.1.3 and Proposition 2.2.16.
Remark 4.1.19. Operator systems S satisfying ω(S) = 0 are precisely those for which no unit
vector v satisfies vv∗ ∈ S; a simple example which we have already noted is spanId for
d > 1. Note that the converse of Proposition 4.1.18 (i) does not hold. As a counter example,
consider the non-commutative graph K ⊆ Md for d ≥ 3 given by K = span(Id, e1e∗1). It is
4.1 Parameters for non-commutative graphs from convex corners 134
not hard to see that the only K-full projection is e1e∗1, and it then follows that fp(K) = M ∈
M+d : M ≤ e1e
∗1 and ω(K) = 1.
By Lemma 2.2.33 we have fp(K)] = M ∈ M+d : Tr(Me1e
∗1) ≤ 1 and so ke2e
∗2 ∈ fp(S)]
for all k ∈ R+, giving that Ωf(K) =∞.
4.1.3 Non-commutative graph homomorphisms
It is well-known that many graph parameters can be defined in terms of graph homomor-
phisms. (See [51], for example.) In [51], Stahlke defines the concept of a homomorphism
between non-commutative graphs, and shows how this concept can be used, for instance, to
give an equivalent definition of independence number [51, Theorem 13, Definition 11]. Here
we use the theory of homomorphisms between non-commutative graphs to prove a stability
property of the Lovasz number. We begin with the following definition.
Definition 4.1.20. [51, Definition 7] Let S ⊆Md and T ⊆Mk be non-commutative graphs.
A homomorphism from S into T is a quantum channel Γ : Md → Mk that has a Kraus
representation Γ(S) =∑m
i=1AiSA∗i , such that
A∗iTAj ∈ S for all T ∈ T and i, j ∈ [m]. (4.15)
If there exists a homomorphism from S into T , we write S → T .
Since I ∈ T , if Γ is a homomorphism from S into T , then by (4.15) it holds that A∗iAj ∈ S
for all i, j ∈ [m], and we have SΓ ⊆ S and Γ ∈ C(S).
Proposition 4.1.21. If S ⊆Md and T ⊆Mk are non-commutative graphs satisfying S → T ,
then θ(S) ≤ θ(T ).
Proof. Let Γ : Md → Mk be a homomorphism from S into T with the Kraus representation
Γ(S) =∑m
i=1AiSA∗i . Suppose Ψ ∈ C(T ) has Kraus representation Ψ(T ) =
∑ni=1BiTB
∗i . It
is clear that ΨΓ is a quantum channel with set of Kraus operators BiAj : i ∈ [n], j ∈ [m].
Then
SΨΓ = spanA∗jB∗iBkAl : i, k ∈ [n], j, l ∈ [m] ⊆ S,
using that B∗iBk ∈ T for all i, k ∈ [n], and then applying (4.15). We can conclude that
Ψ Γ ∈ C(S). Letting S ∈ th(S) be such that Tr(S) = θ(S), we have that Γ(S) ∈ M+k
and, since Ψ Γ ∈ C(S), it holds that Ψ(Γ(S)) = (Ψ Γ)(S) ≤ I. Thus Γ(S) ∈ th(T ), and
θ(T ) ≥ Tr(Γ(S)) = TrS = θ(S).
4.1 Parameters for non-commutative graphs from convex corners 135
Remark 4.1.22. For S, T ⊆ Md with S ⊆ T , the channel I : Md → Md with single Kraus
operator Id is clearly a homomorphism from T into S, and Proposition 4.1.21 gives that
θ(T ) ≤ θ(S). Thus in the case of the Lovasz number, Proposition 4.1.21 is a generalisation
of Lemma 4.1.13. (For obvious reasons, the channel I is known as the identity channel, and
will be discussed in Section 4.6.)
Lemma 4.1.23. Let S ⊆ Md be an operator system and m ∈ N. Then S → Mm(S) and
Mm(S)→ S.
Proof. For r ∈ [m], let Vr ∈ Mm,1(Md) ∼= Mmd,d be the operator given by Vr = (Idδrj)mj=1.
(That is, Vr =(
0 . . . 0 Id 0 . . . 0)t
, with the Id entry in the rth position.) It is easy
to verify thatm∑r=1
V ∗r Vr = mId.
Then Λ : Md →Mmd given by
Λ(S) =1
m
m∑r=1
VrSV∗r for S ∈Md
is a quantum channel with Kraus operators 1√mV1, . . . ,
1√mVm. (In fact, Λ is given by Λ(S) =
1mIm⊗S for S ∈Md.) Letting M ∈Mm(S) be given by M = (Sij)i,j∈[m] with Sij ∈ S for all
i, j ∈ [m], we have1
mV ∗r MVs =
1
mSrs ∈ S,
and so Λ is a homomorphism from S into Mm(S), and S →Mm(S).
Let Γ : Mmd →Md be given by
Γ(M) =
m∑r=1
V ∗r MVr for all M ∈Mm(Md).
Using that∑m
r=1 VrV∗r = Imd, it is clear that Γ is a quantum channel with Kraus operators
V ∗1 , . . . , V∗m. (In fact, for M = (Sij)i,j∈[m] ∈Mm(S) we have Γ(M) =
∑i∈[m] Sii = mΛ∗(M).)
Furthermore,
VrSV∗s = (Sδirδjs)i,j∈[m] ∈Mm(S) for all S ∈ S and r, s ∈ [m].
Then Γ is a homomorphism from Mm(S) into S, and we have Mm(S)→ S, as required.
It is then clear that the Lovasz number has the following stability condition.
4.1 Parameters for non-commutative graphs from convex corners 136
Corollary 4.1.24. If S ⊆ Md is an operator system, then θ(S) = θ(Mm(S)) for every
m ∈ N.
Proof. This is immediate from Proposition 4.1.21 and Lemma 4.1.23.
We now show that independence number is also a homomorphism monotone. As in
Remark 4.1.22, we note that this generalises Lemma 4.1.13 in the case of independence
number. The proof we give is adapted from [51, Theorem 13]; see [6, p.27] for an alternative
argument.
Proposition 4.1.25. If S ⊆Md and T ⊆Mk are non-commutative graphs satisfying S → T ,
then α(S) ≤ α(T ).
Proof. Let Γ be a homomorphism from S into T with the Kraus representation Γ(S) =∑mi=1AiSA
∗i . Let u1, . . . , un ⊂ Cd be an S-independent set. For each i ∈ [n] choose
k(i) ∈ [m] such that Ak(i)ui 6= 0. (Since∑m
j=1A∗jAj = Id, we note that this is possible for
all i ∈ [n].) For each i ∈ [n] we let vi = (‖Ak(i)ui‖)−1Ak(i)ui ∈ Ck such that ‖vi‖ = 1 for all
i ∈ [n].
Now note for T ∈ T and i 6= j that
‖Ak(i)ui‖‖Ak(j)uj‖⟨viv∗j , T
⟩=⟨Ak(i)uiu
∗jA∗k(j), T
⟩=⟨uiu∗j , A
∗k(i)TAk(j)
⟩= 0, (4.16)
where we use that uiu∗j ∈ S⊥ since u1, . . . , un is S-independent, and that A∗k(i)TAk(j) ∈ S by
(4.15). Then (4.16) shows that viv∗j ∈ T ⊥ for i 6= j and, using that Ik ∈ T , we have 〈vi, vj〉 = 0
when i 6= j. It thus holds that v1, . . . , vn is a T -independent set and α(T ) ≥ α(S), as
required.
Corollary 4.1.26. Let S ⊆Md be an operator system and m ∈ N. Then
α(Mm(S)) = α(S).
Proof. This is immediate from Lemma 4.1.23 and Proposition 4.1.25.
4.1.4 Weighted parameters
We conclude this section with some brief comments on ‘weighted’ versions of graph and non-
commutative graph parameters. Given a graph G on d vertices with ‘weighting’ w ∈ Rd+ on
4.2 Operator anti-systems 137
its vertices, [22, (4.7), (4.1)] define the quantities α(G,w) and θ(G,w), known respectively as
the weighted independence number and weighted Lovasz number of (G,w), by the expressions
α(G,w) = max〈v, w〉 : v ∈ VP(G),
θ(G,w) = max〈v, w〉 : v ∈ TH(G). (4.17)
(See also [17].) It is clear these parameters generalise the independence number and Lovasz
number in the sense that
α(G) = α(G,1), θ(G) = θ(G,1).
For an operator system S ⊆ Md and ρ ∈ M+d , it would seem natural to define weighted
versions of α(S) and θ(S) by
α(S, ρ) = max〈A, ρ〉 : A ∈ ap(S), θ(S, ρ) = max〈A, ρ〉 : A ∈ th(S). (4.18)
Immediately we have
α(S) = α(S, I), θ(S) = θ(S, I).
(This approach can be applied to any non-commutative graph parameter β satisfying β(S) =
γ(A(S)) where A(S) is an Md-convex corner associated with S; simply define the ‘weighted
version’ by β(S, ρ) = max〈A, ρ〉 : A ∈ A(S), and it holds that β(S) = β(S, I).) We do not
develop this concept any further here, but leave an exploration of these ideas to future work.
We also note that in [22] and [18] the reader will find many expressions for the weighted Lovasz
number θ(G,w) equivalent to that given in (4.17). Further work could also examine their
non-commutative generalisations; we note that these may not all necessarily be equivalent to
(4.18), just as we will see in Section 4.4 that θ(G) has a number of distinct non-commutative
generalisations.
4.2 Operator anti-systems
Rather than working with operator systems, some authors, notably in [51] and [21], have
considered their orthogonal complements. A subspace T ⊆ Md will be called an operator
anti-system if there exists an operator system S ⊆ Md such that T = S⊥. (Such subspaces
are called trace-free non-commutative graphs in [51].) It was pointed out in [21, Proposition
8] that a subspace T ⊆Md is an operator anti-system precisely when it is traceless and self-
4.2 Operator anti-systems 138
adjoint, that is, for all T ∈ T , it holds that TrT = 0 and T ∗ ∈ T . With a graph G on vertex
set V (G) = [d], [51, Equation (7)] and [21, Definition 6] associate the operator anti-system
TG = spaneie∗j : i ∼ j in G,
where e1, . . . , ed is the canonical basis of Cd. As noted by [51, p.2], trying to embed the
notion of graph complements in our quantum setting is complicated by the fact that for a
(loopless) graph G, the statements i ∼ j in G and i j in G are not equivalent, unless
it is further stated that i and j are distinct. One way to proceed is to use both operator
systems and operator anti-systems as explained in [21]. Indeed [21, Remark 7] argues that
the orthogonal complement is analogous to the graph complement because of the obvious
result ([21, Proposition 9])
TG = (SG)⊥, (4.19)
where SG is the operator system associated with G, as in Definition 3.1.31.
We briefly review some definitions made in [21, Section 3.1]. For operator anti-system
T ⊆ Md, an orthonormal set v1, . . . , vk ⊆ Cd is called T -independent when viv∗j ∈ T ⊥ for
all i 6= j. By Definition 3.2.1, it is immediate that v1, . . . , vk is T -independent for operator
anti-system T if and only if v1, . . . , vk is a T ⊥-clique, and thus the maximum cardinality
of a T -independent set is equal to ω(T ⊥). When v1, . . . , vk is a T -independent set, we
will call∑k
j=1 vjv∗j a T -independent projection. Then [21, Definition 12] defines χ(T ), the
chromatic number of operator anti-system T , by
χ(T ) = min
k ∈ N : P1, . . . , Pk ∈ Pi(T ),
k∑i=1
Pi = I
,
where Pi(T ) denotes the set of T -independent projections. As every rank one projection is
T -independent for every operator anti-system T , it holds that χ(T ) ≤ d.
Similarly, an orthonormal set v1, . . . , vk is called T -strongly independent when viv∗j ∈
T ⊥ for all i, j. By Definition 3.2.1, v1, . . . , vk is T -strongly independent for operator
anti-system T if and only if v1, . . . , vk is T ⊥-full, and thus the maximum cardinality of a
T -strongly independent set is equal to ω(T ⊥). When v1, . . . , vk is a T -strongly independent
set, we call the projection∑k
j=1 vjv∗j a T -strongly independent projection. Then χs(T ), the
strong chromatic number of operator anti-system T , is given in [21, Definition 20] by
χs(T ) = min
k ∈ N : P1, . . . , Pk ∈ Ps(T ),
k∑i=1
Pi = I
,
4.2 Operator anti-systems 139
where Ps(T ) denotes the set of T -strongly independent projections. If there exists no set of
T -strongly independent projections whose sum is I, then we set χs(T ) =∞.
Remark 4.2.1. By Definition 4.1.6 and the observations above, it is immediate for operator
anti-system T that
χ(T ) = Ω(T ⊥) and χs(T ) = Ω(T ⊥), (4.20)
and hence Ωf(S) can be regarded as the fractional version of χs(S⊥), and Ωf(S) as the
fractional version of χ(S⊥). It is also immediate that from (4.19), (4.20) and Proposition
4.1.12 that
χ(TG) = χ(S⊥G ) = Ω(SG) = χ(G),
χs(TG) = χs(S⊥G ) = Ω(SG) = χ(G), (4.21)
and so χ(TG) and χs(TG) reduce to their expected values when T = TG for a graph G. (This
was also shown in [21, Corollary 28 and Theorem 14].)
In Lemma 3.1.35 we noted that the tensor product of operator systems is the natural
quantum generalisation of the strong product of graphs. In [51] the concept of the co-normal
product of graphs as given in Definition 1.3.1 is generalised for operator anti-systems as
follows.
Definition 4.2.2. [51, Definition 22] The co-normal product of operator anti-systems T1 ⊆
Md1 and T2 ⊆Md2 is defined by
T1 ∗ T2 = T1 ⊗Md2 +Md1 ⊗ T2.
In the definition above, let Ti = S⊥i for operator systems Si, i = 1, 2. From (B.4) it can
then be seen that T1 ∗T2 = (S1⊗S2)⊥, and since S1⊗S2 is an operator system, it follows that
T1 ∗ T2 is an operator anti-system. That Definition 4.2.2 is a generalisation of the co-normal
graph product is justified by the following lemma from [51]. We include the proof because
the same method will be employed later.
Lemma 4.2.3. [51, Definition 22] If G and H are graphs, then TG ∗ TH = TG∗H .
Proof. Set V (G) = [n] and V (H) = [m] and let the canonical bases of Cn and Cm be
4.2 Operator anti-systems 140
e1, . . . , en and f1, . . . , fm respectively. Then
TG ∗ TH = spaneie∗j : i ∼ j in G ⊗ spanfkf∗l : k, l ∈ [m]
+ spaneie∗j : i, j ∈ [n] ⊗ spanfkf∗l : k ∼ l in H
= spaneie∗j ⊗ fkf∗l : i ∼ j in G or k ∼ l in H
= span(ei ⊗ fk)(ej ⊗ fl)∗ : (i, k) ∼ (j, l) in G ∗H = TG∗H ,
as required.
For completeness we list two other graph products which, though not used in the sequel,
also embed naturally into the quantum setting; the first is well known, but the second appears
to be new. The proofs of Lemmas 4.2.5 and 4.2.7 are trivial adaptations of those of Lemmas
4.2.3 and 3.1.35.
Definition 4.2.4. The tensor product G ⊗H of graphs G and H is the graph with vertex
set V (G)× V (H) in which (i, j) ∼ (k, l) if and only if i ∼ k in G and j ∼ l in H.
Note that the tensor product is sometimes, for example in [21], called the categorical
product. Comparing to Definition 3.1.4, it is clear that G ⊗ H is a spanning subgraph of
GH.
Lemma 4.2.5. [21, Proposition 45] If graphs G and H have associated operator anti-systems
TG and TH , then TG ⊗ TH = TG⊗H .
(Corollary 4.2.11 will show that if Ti, i = 1, 2 are operator anti-systems, then T1 ⊗ T2 is
an operator anti-system.)
Definition 4.2.6. We define the co-tensor product G~H of graphs G and H by V (G~H) =
V (G)× V (H) where (i, j) ' (k, l) in G~H if and only if i ' k in G or j ' l in H. We also
define the co-tensor product of operator systems S1 ⊆Md1 and S2 ⊆Md2 by
S1 ~ S2 = S1 ⊗Md2 +Md1 ⊗ S2.
Note that Remark 3.1.25 shows that S1 ~ S2 is an operator system.
Lemma 4.2.7. If G and H are graphs, then SG ~ SH = SG~H .
4.3 Non-commutative graph entropy 141
Bearing in mind that the strong product is also known as the normal product, the dual-
ity under complements shown by the following lemma explains the choice of nomenclature.
(Lemma 4.2.8 (i) is, of course, well known.)
Lemma 4.2.8. If G and H are graphs, then (i) G ∗H = GH, and (ii) G~H = G⊗H.
Proof. These are immediate consequences of the definitions.
The following result is the quantum analogue of Lemma 4.2.8.
Lemma 4.2.9. Let Si ⊆Mdi be operator systems and Ti ⊆Mdi be operator anti-systems for
i = 1, 2. Then it holds that:
(T1 ∗ T2)⊥ = T ⊥1 ⊗ T ⊥2 , and (S1 ~ S2)⊥ = S⊥1 ⊗ S⊥2 .
Proof. By (B.3) and (B.4),
(T1 ∗ T2)⊥ =(T1 ⊗Md2 +Md1 ⊗ T2)⊥ = (T1 ⊗Md2)⊥ ∩ (Md1 ⊗ T2)⊥
=(T ⊥1 ⊗Md2) ∩ (Md1 ⊗ T ⊥2 ) = T ⊥1 ⊗ T ⊥2 .
The second assertion follows similarly.
Corollary 4.2.10. Let G and H be graphs. Then
(TG∗H)⊥ = SGH , (SG~H)⊥ = TG⊗H .
Proof. We have (TG∗H)⊥ = (TG ∗ TH)⊥ = T ⊥G ⊗ T ⊥H = SG ⊗ SH = SGH , using (4.19) and
Lemmas 4.2.3, 4.2.9 and 3.1.35. The other result is proved in the same way.
Corollary 4.2.11. If Ti are operator anti-systems for i = 1, 2, then T1 ⊗ T2 is an operator
anti-system.
Proof. Let Ti = S⊥i for operator systems Si. By Lemma 4.2.9 we have T1 ⊗ T2 = (S1 ~ S2)⊥
where we note that S1 ~ S2 is an operator system.
4.3 Non-commutative graph entropy
Recall in the classical case how we defined the entropy HA(p) of probability distribution
p ∈ Pd over a general Rd-convex corner A. Given a graph G on d vertices, we considered
4.3 Non-commutative graph entropy 142
the special case A = VP(G) to obtain the important concept of graph entropy given by
H(G, p) = HVP(G)(p). In the quantum setting Definition 2.4.26 defined the entropy of a state
ρ ∈ Rd over a general Md-convex corner A. Since for a non-commutative graph S, we have
regarded ap(S) as a quantum version of VP(G), it is natural to make the following definition.
Definition 4.3.1. The non-commutative graph entropy H(S, ρ) of operator system S ⊆Md
with respect to a state ρ ∈ Rd is the quantity
H(S, ρ) = Hap(S)(ρ) = min−Tr(ρ logA) : A ∈ ap(S).
The aim of this section is to summarise some basic properties of non-commutative graph
entropy; in many cases they follow immediately from the theory developed in Chapter 2 and
are obvious analogues of the properties of graph entropy. The next result verifies that the
classical concept of graph entropy embeds naturally into this more general non-commutative
setting.
Proposition 4.3.2. Let G be a graph with V (G) = [d] and p ∈ Pd be a probability distribution
on V (G). Then, setting ρ =∑d
i=1 pieie∗i gives H(G, p) = H(SG, ρ).
Proof. We apply Lemma 2.4.35 to Theorem 3.2.11.
Next we note that non-commutative graph entropy is non-negative and bounded above
by the von Neumann entropy.
Proposition 4.3.3. For an operator system S ⊆ Md and ρ ∈ Rd we have 0 ≤ H(S, ρ) ≤
H(ρ).
Proof. This follows from (2.38) on page 71 and Lemma 3.2.7.
Non-commutative graph entropy is easily seen to satisfy the following monotonicity con-
dition.
Lemma 4.3.4. If S1 ⊆ S2, then H(S1, ρ) ≤ H(S2, ρ).
Proof. If S1 ⊆ S2, then ap(S1) ⊇ ap(S2) by Lemma 3.2.9, and the result follows from
Definition 4.3.1.
4.3 Non-commutative graph entropy 143
Let G be a graph. We note that H(G, p) = minv∈VP(G)
∑ni=1−pi log vi = 0 if and only
if there exists v ∈ VP(G) such that pi > 0 ⇒ vi = 1. This is equivalent to the condition
that i ∈ V (G) : pi > 0 is an independent set. Also note for a graph G on d vertices that
H(G, p) = 0 for all probability distributions p ∈ Pd if and only if G = Kd, the empty graph
on d vertices. (The ‘if’ statement follows from (1.22) on page 19; for ‘only if’, note that when
G is non-empty and p > 0 we have H(G, p) > 0, because in that case 1 /∈ VP(G).) We now
address the equivalent problems in the non-commutative setting.
Proposition 4.3.5. Suppose S ⊆Md is an operator system. Then H(S, ρ) = 0 if and only if
there exists an orthonormal basis v1, . . . , vd ⊆ Cd such that P =∑
i∈T viv∗i ∈ ap(S), where
T = i ∈ [d] : 〈ρvi, vi〉 > 0.
Proof. We have H(S, ρ) = 0 if and only if there exists A ∈ ap(S) such that −Tr(ρ logA) = 0.
For A ∈ ap(S), we set A =∑d
i=1 aiviv∗i for some orthonormal basis v1, . . . , vd and ai ∈ R+
to give −Tr(ρ logA) =∑d
i=1 log(1/ai) 〈ρvi, vi〉, as in (2.27) on page 60. Thus −Tr(ρ logA) =
0 for A ∈ ap(S) if and only if
ai = 1 whenever 〈ρvi, vi〉 > 0. (4.22)
The ‘if’ part of the lemma is now immediate. For ‘only if’ observe that if A ∈ ap(S) satisfies
(4.22), then the projection P =∑
i∈T viv∗i ≤ A and so P ∈ ap(S).
If operator system S ⊆Md satisfies S ⊆ DV for some orthonormal basis V = v1, . . . , vd
of Cd, then we say S is diagonal in basis V .
Corollary 4.3.6. When S ⊆ Md is an operator system, we have that H(S, ρ) = 0 for all
ρ ∈ Rd if and only if S is diagonal in some orthonormal basis.
Proof. By choosing ρ > 0 we have 〈ρv, v〉 > 0 for all non-zero vectors v ∈ Cd. Then if
H(S, ρ) = 0, Proposition 4.3.5 gives I ∈ ap(S) and I is an abelian projection. Hence for
some orthonormal basis V = v1, . . . , vd of Cd it holds that viv∗j ∈ S⊥ for all i 6= j. That
is, for all M ∈ S we have⟨viv∗j ,M
⟩= 〈vi,Mvj〉 = 0 when i 6= j, and we can conclude that
S is diagonal in basis V . Conversely, if S is diagonal in orthonormal basis V = v1, . . . , vd
and M ∈ S, then⟨viv∗j ,M
⟩= 〈vi,Mvj〉 = 0 for all i 6= j, and v1, . . . , vd is S-independent.
Then∑d
i=1 viv∗i = I ∈ ap(S), and Proposition 4.3.5 gives H(S, ρ) = 0 for all ρ ∈ Rd.
The following lemma can be compared to the classical result in (1.21) on page 19 that
4.3 Non-commutative graph entropy 144
H(Kd, p) = H(p) for all probability distributions p ∈ Pd, where Kd is the complete graph on
d vertices.
Lemma 4.3.7. For all ρ ∈ Rd we have H(Md, ρ) = H(ρ).
Proof. As M⊥d = 0, the only Md-independent sets are the singletons vi and ∅. (Note ∅
by convention is independent, as for graphs. We take the associated abelian projection as
P∅ = 0.) The set of Md-abelian projections then consists precisely of the rank-1 projections,
which each have unit trace, and 0. It follows that ap(Md) = AId , and the result follows by
(2.36).
Remark 4.3.8. If G is a graph, then H(G, p) = H(p) for all probability distributions p if and
only if G is complete: (1.21) gives the ‘if’ condition; for the reverse implication simply observe
for non-complete G with distinct and non-adjacent vertices i and j that i, j is independent
in G and, letting p0 be the probability distribution
p0(i) = p0(j) = 1/2, p0(k) = 0, k /∈ i, j
we have
H(G, p0) = 0 < H(p0) = log 2.
In the non-commutative case, Proposition 4.3.12 will give a number of statements which are
equivalent to the condition
H(S, ρ) = H(ρ) for all ρ ∈ Rd, (4.23)
and it is interesting note that we can find operator systems S (Md which satisfy (4.23).
The following proposition, the quantum analogue of Corollary 1.3.11, is immediate from
(4.9).
Theorem 4.3.9. For operator system S ⊆Md it holds that
maxρ∈Rd
H(S, ρ) = logχf(S).
The following propositions, summarising our results for two particular ‘extreme’ cases,
can be compared to Lemmas 2.4.36 and 2.4.37, which examined the same problem for entropy
over a general Md-convex corner.
4.3 Non-commutative graph entropy 145
Proposition 4.3.10. The following are equivalent for a non-commutative graph S ⊆Md:
(i) S is commutative;
(ii) S is diagonal in some orthonormal basis;
(iii) H(S, ρ) = 0 for all states ρ ∈ Rd;
(iv) χf(S) = 1;
(v) I ∈ ap(S) = BId;
(vi) α(S) = d;
(vii) χ(S) = 1.
Proof. (i) ⇒ (ii). First we show that if S is commutative, then every M ∈ S is normal.
To see this, let M ∈ S, and note then that M∗ ∈ S, since S is self-adjoint. It follows that
M commutes with M∗ and hence is normal. Then, by the properties of normal matrices as
summarised in Appendix B, the elements of S are simultaneously unitarily diagonalisable.
(ii) ⇒ (i). If S is diagonal in some orthonormal basis, it is obviously commutative.
(ii) ⇐⇒ (iii). This is simply Corollary 4.3.6.
(iii) ⇐⇒ (iv) ⇐⇒ (v)⇐⇒ (vi). Apply Lemma 2.4.36, recalling that H(S, ρ) = Hap(S)(ρ)
and using that α(S) = γ(ap(S)) by Corollary 4.1.3, and χf(S) = γ(ap(S)]) by Theorem 4.1.9.
(Note that (iii) ⇐⇒ (iv) also follows from Theorem 4.3.9 and Proposition 4.3.3.)
(v) ⇐⇒ (vii). This is clear from (4.2).
Remark 4.3.11. An example of an operator system which has the properties listed in Propo-
sition 4.3.10 is S = span(Id, A1, . . . , Ak) where A1, . . . , Ak ∈Mhd commute pairwise.
Proposition 4.3.12. The following are equivalent for non-commutative graph S ⊆Md:
(i) H(S, ρ) = H(ρ) for all states ρ ∈ Rd;
(ii) χf(S) = d;
(iii) χ(S) = d;
(iv) ap(S) = AId;
(v) α(S) = 1.
4.4 Another quantum generalisation of θ(G) 146
Proof. (ii) ⇒ (iii). This is immediate from (4.8).
(iii) ⇒ (iv). All rank 1 projections are trivially S-abelian. We claim that if χ(S) = d, then
no S-abelian projection P satisfies rankP ≥ 2. Indeed, if there exists an S-abelian projection
P with rank(P ) ≥ 2, then I can certainly be expressed as the sum of P and at most (d− 2)
rank 1 projections, giving χ(S) ≤ d− 1. It is then clear that ap(S) = AId .
(i) ⇐⇒ (ii) ⇐⇒ (iv) ⇐⇒ (v). We apply Lemma 2.4.37.
Remark 4.3.13. Examples of operator systems satisfying Proposition 4.3.12 include Sd, which
will be defined in Definition 4.6.15, and Md.
4.4 Another quantum generalisation of θ(G)
Motivated by the expression θ(G) = γ(TH(G)) for the Lovasz number of a graph G, Def-
inition 4.1.4 defined the Lovasz number of a non-commutative graph S to be given by
θ(S) = γ(th(S)). Furthermore, Proposition 4.1.11 shows that the non-commutative defi-
nition generalises the classical in the sense that θ(SG) = θ(G). In [28], Lovasz gives several
equivalent expressions for θ(G); interestingly we find that their natural non-commutative
versions are not in general equal. One expression for θ(G) in [28] leads to the result
θ(G) = max‖I + T‖ : T ∈ S⊥G , I + T ≥ 0, (4.24)
given in [35, Theorem 6.10]. Motivated by (4.24), for a non-commutative graph S ⊆Md, [13,
Section 4] defines the quantity ϑ(S) by setting
ϑ(S) = max‖I + T‖ : T ∈Md, I + T ≥ 0, T ∈ S⊥.
Also in [13, Definition 5], ϑ(S), a ‘norm-completion’ of ϑ(S), is defined by letting
ϑ(S) = supn∈N
ϑ(Mn(S)).
It was shown in [13] that if G is a graph, then ϑ(SG) = ϑ(SG) = θ(G), and hence ϑ(S) and
ϑ(S) are both, like θ(S), valid generalisations of the classical parameter θ(G). However, θ(S),
ϑ(S) and ϑ(S) are not in general equal; this is discussed further in Remark 4.5.5.
In this section, motivated by an expression for θ(G) given in [28], we define θ(S), yet
another non-commutative version of θ(G) in the sense that θ(SG) = θ(G) for a graph G.
4.4 Another quantum generalisation of θ(G) 147
Later we will show that θ(S) is an upper bound on c(S), the Shannon capacity of S. We will
also see that θ(S) is not in general equal to either ϑ(S) or ϑ(S).
We denote the smallest eigenvalue of A ∈ M+d by λmin(A). For invertible A ∈ M+
d it is
clear that ‖A−1‖ = λmin(A)−1.
Definition 4.4.1. For non-commutative graph S we define θ(S), the second Lovasz number
of S, by
θ(S) = inf∥∥Φ∗(σ)−1
∥∥ : σ ≥ 0,Trσ ≤ 1,Φ ∈ C(S),Φ∗(σ) invertible.
Remark 4.4.2. (i) For S ⊆ Md, we recall that C(S) contains channels Φ : Md → Mk for
many different k ∈ N. In the above expression it is of course assumed that when we consider
Φ ∈ Ck(S), we choose σ ∈M+k . The same applies in the proof of Theorem 4.4.3(i).
(ii) It is clearly sufficient just to consider σ ≥ 0 satisfying Trσ = 1.
(iii) We briefly justify the above definition. First, using (3.32) note that
θ(S) = infλmin(A)−1 : A ∈ P(S)
. (4.25)
Now for a graph G, [28, p. 2 and Theorem 4] gives
θ(G) = min
1
min|⟨c, a(i)
⟩|2 : i ∈ V (G)
:(
(a(i))i∈V (G), c)
is h.o.n.l. of G
. (4.26)
In the notation of Section 3.2.3 and using the first observation in the proof of Lemma 3.2.32,
this becomes
θ(G) = minλmin(A)−1 : A ∈ P0(G)
. (4.27)
Given (3.29) on page 108 and Lemma 3.2.24, we think of P(S) as the non-commutative
version of P0(G). The motivation for Definition 4.4.1 is then clear from a comparison of
(4.25) and (4.27).
The next theorem gives useful, and strikingly similar, characterisations of θ(S) and θ(S).
(Note that in the expressions below C(S) is not compact, and so it does not follow from
Theorem A.0.8 that the supremum and the infimum can be interchanged. See also Remark
4.5.5.)
Theorem 4.4.3. If S ⊆Md is an operator system, then
(i) θ(S)−1 = sup
inf‖Φ(ρ)‖ : ρ ∈ Rd
: Φ ∈ C(S)
;
4.4 Another quantum generalisation of θ(G) 148
(ii) θ(S)−1 = inf
sup‖Φ(ρ)‖ : Φ ∈ C(S)
: ρ ∈ Rd
.
Proof. (i) We have
θ(S)−1 = infλmin(Φ∗(σ))−1 : σ a state with λmin(Φ∗(σ)) > 0,Φ ∈ C(S)
−1
= sup λmin(Φ∗(σ)) : σ a state with λmin(Φ∗(σ)) > 0,Φ ∈ C(S)
= sup λmin(Φ∗(σ)) : σ a state,Φ ∈ C(S)
= sup
sup
inf〈Φ∗(σ)ξ, ξ〉 : ξ ∈ Cd, ‖ξ‖ = 1 : σ a state
: Φ ∈ C(S)
= sup sup inf〈Φ∗(σ), ρ〉 : ρ ∈ Rd : σ a state : Φ ∈ C(S)
= sup sup inf〈σ,Φ(ρ)〉 : ρ ∈ Rd : σ a state : Φ ∈ C(S)
= sup inf sup〈σ,Φ(ρ)〉 : σ a state : ρ ∈ Rd : Φ ∈ C(S)
= sup inf ‖Φ(ρ)‖ : ρ ∈ Rd : Φ ∈ C(S) .
In the penultimate step the supremum and infimum were interchanged using Theorem A.0.8.
It is easy to see the conditions of this ‘minimax’ theorem are satisfied. Indeed, for a given
Φ : Md →Mk we see that the infimum and supremum are over the compact and convex sets
Rd and Rk respectively, and that the function (ρ, σ)→ 〈σ,Φ(ρ)〉 is continuous and linear in
both ρ and σ.
(ii) Since each non-zero T ∈M+d can be written as T = (TrT )ρ with ρ = (TrT )−1T ∈ Rd,
we have
θ(S)−1 = maxTrT : T ∈ th(S)−1
= sup supλ ∈ R : λρ ∈ th(S) : ρ ∈ Rd−1
= sup sup λ ∈ R : ‖Φ(λρ)‖ ≤ 1 for all Φ ∈ C(S) : ρ ∈ Rd−1
= sup
supλ ∈ R : λ ≤ ‖Φ(ρ)‖−1 for all Φ ∈ C(S)
: ρ ∈ Rd
−1
= sup
inf‖Φ(ρ)‖−1 : Φ ∈ C(S)
: ρ ∈ Rd
−1
= inf sup ‖Φ(ρ)‖ : Φ ∈ C(S) : ρ ∈ Rd ,
as required
Theorem 4.4.4. Let S ⊆Md be an operator system. Then
d inf‖Φ(Id)‖−1 : Φ ∈ C(S)
≤ θ(S) ≤ θ(S) ≤ d.
4.4 Another quantum generalisation of θ(G) 149
Proof. For the first inequality take ρ = 1dId in Theorem 4.4.3 (ii) to give
θ(S)−1 ≤ sup
∥∥∥∥Φ
(Idd
)∥∥∥∥ : Φ ∈ C(S)
.
The second inequality is immediate from Theorems 4.4.3 and A.0.7. The last inequality
follows from Theorem 4.4.3 (i) by noting that the identity channel belongs to C(S). It then
follows that θ(S)−1 ≥ inf ‖ρ‖ : ρ ∈ Rd = 1/d.
The next proposition verifies that θ(S) can indeed be regarded as a generalisation of the
classical graph parameter θ(G).
Proposition 4.4.5. Let graph G have vertex set X = [d]. Then θ(SG) = θ(G).
Proof. By Proposition 4.1.11 and Theorem 4.4.4,
θ(G) = θ(SG) ≤ θ(SG). (4.28)
Let ((ax)x∈X , c) be a h.o.n.l. of G with ax, c ∈ Rk, and let ΦA ∈ Ck(SG) be the quantum
channel defined in (3.47). Then, as in Lemma 3.2.32,
Φ∗A(cc∗) =∑x∈X| 〈ax, c〉 |2exe∗x ∈ P(SG),
giving
λmin(Φ∗A(cc∗)) = minx∈X| 〈ax, c〉 |2.
Thus by (4.25) and (4.26) we have
θ(SG) ≤ min
1
minx∈X | 〈ax, c〉 |2: ((ax)x∈X , c) is a h.o.n.l. of G
= θ(G).
Together with (4.28), this completes the proof.
Calculation of θ(S) and θ(S) for a given non-commutative graph S will in general be
difficult, but we do have the following two propositions.
Proposition 4.4.6. For operator system S ⊆Md, the following are equivalent:
(i) θ(S) = 1;
(ii) S = Md;
(iii) θ(S) = 1.
4.4 Another quantum generalisation of θ(G) 150
Proof. (i)⇒ (ii). Corollary 3.2.30 and the method used to prove Theorem 4.4.3 (ii) give that
θ(S)−1 = θd2(S)−1 = inf
sup‖Φ(ρ)‖ : Φ ∈ Cd2(S)
: ρ ∈ Rd
.
Note that ‖Φ(ρ)‖ ≤ Tr(Φ(ρ)) = 1 for all states ρ ∈ Rd and quantum channels Φ ∈ Cd2(S),
and so sup‖Φ(ρ)‖ : Φ ∈ Cd2(S) ≤ 1 for all ρ ∈ Rd. Thus, when θ(S) = 1, it must hold that
sup‖Φ(ρ)‖ : Φ ∈ Cd2(S) = 1 for all ρ ∈ Rd. Setting ρ = 1dI yields
sup
∥∥∥∥Φ
(1
dI
)∥∥∥∥ : Φ ∈ Cd2(S)
= 1.
There then is a sequence (Φk)k∈N of channels in Cd2(S) such that ‖Φk(1dI)‖ → 1 as k →∞.
By Lemma 3.2.27, Cd2(S) is compact, and thus the sequence (Φk)k∈N has a subsequence
converging to some Φ0 ∈ Cd2(S) satisfying ‖Φ0(1dI)‖ = 1 by continuity of the norm. Since
Tr(Φ0(1
dI))
= Tr(
1dI)
= 1, it follows that Φ0(1dI) = vv∗ for some unit vector v ∈ Cd2
. Let
f1, . . . , fd be an orthonormal basis of Cd. Then Φ0(1dI) = 1
d
∑di=1 Φ0(fif
∗i ) = vv∗, where
Φ0(fif∗i ) ∈ Rd2 for all i = 1, . . . , d. By Proposition 2.4.2, the pure state vv∗ ∈ Rd2 cannot be
written as a convex combination of distinct states, and thus it must hold that Φ0(fif∗i ) = vv∗
for each i = 1, . . . , d; indeed, since every unit vector u ∈ Cd is an element of some orthonormal
basis, we have Φ0(uu∗) = vv∗ for all unit vectors u ∈ Cd. By linearity Φ0 is then the trivial
channel given by
Φ0(ρ) = vv∗ for all ρ ∈ Rd. (4.29)
It is easy to verify that ve∗1, . . . , ve∗d is a set of Kraus operators for Φ0 where e1, . . . , ed
is the canonical basis of Cd. We then have SΦ0 = spaneiv∗ve∗j : i, j ∈ [d] = Md, and since
Φ0 ∈ Cd2(S), we have S = Md.
(ii) ⇒ (iii). For any operator system S ⊆Md, Theorems 4.4.4 and 4.1.14 give that
1 ≤ θ(S) ≤ θ(S). (4.30)
Recalling Theorem 4.4.3, we then have
θ(S)−1 = sup
inf‖Φ(ρ)‖ : ρ ∈ Rd
: Φ ∈ C(S)
≤ 1. (4.31)
If S = Md, then the channel Φ0 defined in (4.29) satisfies Φ0 ∈ C(S), and so
θ(Md)−1 ≥ inf ‖Φ0(ρ)‖ : ρ ∈ Rd = ‖vv∗‖ = 1,
4.5 Capacity bounds, the Witsenhausen rate and other limits 151
and we conclude θ(Md) = 1.
(iii) ⇒ (i). This is immediate from (4.30).
Having considered the case θ(S) = θ(S) = 1, we now examine the case that θ(S) =
θ(S) = d.
Proposition 4.4.7. For an operator system S ⊆ Md, it holds that θ(S) = θ(S) = d if and
only if Φ(Id) ≤ Id2 for all Φ ∈ Cd2(S).
Proof. Theorem 4.4.4 gives θ(S) ≤ θ(S) ≤ d, and so θ(S) = θ(S) = d if and only if θ(S) = d.
By Corollary 4.1.16 the latter condition is equivalent to θd2(S) = d, which by Lemma 3.2.36
and Corollary 3.2.30 happens if and only if Id ∈ thd2(S). The result is then immediate from
(3.34).
Remark 4.4.8. We know of no operator system S satisfying θ(S) > θ(S), but nor do we have
a general proof of equality: if θ(S) = θ(S) for all non-commutative graphs S remains an
important open question.
Corollary 3.2.30 showed for operator system S ⊆ Md that θ(S) can be computed using
the channels in Cd2(S). This raises another open question.
Question 4.4.9. Given d ∈ N, does there exist k ∈ N (depending on d), such that for every
non-commutative graph S ⊆ Md the parameter θ(S) can be computed using the channels in
Ck(S)?
4.5 Capacity bounds, the Witsenhausen rate and other limits
In the classical case (3.3) shows that θ(G) is an upper bound on c(G), the Shannon capacity
of G. This section begins by discussing upper bounds on c(S), the Shannon capacity of
operator system S, and establishes that θ(S) and Ωf(S) are two such upper bounds. By
demonstrating that some of the parameters associated with operator systems and operator
anti-systems satisfy either sub-multiplicativity or super-multiplicativity conditions, we show
the existence of a number of limits, one of which will be seen to be a quantum generalisation
of the Witsenhausen rate.
We first recall that any upper bound on independence number which is sub-multiplicative
over tensor products is an upper bound on c(S).
4.5 Capacity bounds, the Witsenhausen rate and other limits 152
Lemma 4.5.1. Suppose that the real parameter β(S) satisfies β(S) ≥ α(S) for every operator
system S, and suppose further that the sub-multiplicativity condition β(S ⊗ T ) ≤ β(S)β(T )
holds for all operator systems S, T . Then c(S) ≤ β(S).
Proof. We have
c(S) = limn→∞
n√α(S⊗n) ≤ lim inf
n→∞n√β(S⊗n) ≤ β(S),
as claimed.
In [13] it is shown that ϑ(S) satisfies the conditions of Lemma 4.5.1, and is therefore an
upper bound on Shannon capacity. We now show that the same can be said for θ(S).
Proposition 4.5.2. Let S1 ⊆Md1 and S2 ⊆Md2 be operator systems. Then
θ(S1 ⊗ S2) ≤ θ(S1)θ(S2).
Proof. For arbitrarily small ε > 0 and for each i ∈ 1, 2, by Definition 4.4.1 we can choose
quantum channels Φi : Mdi → Mki with SΦi ⊆ Si and σi ∈ M+ki
satisfying Tr(σi) ≤ 1 such
that each Φ∗i (σi) is invertible and satisfies
∥∥Φ∗i (σi)−1∥∥ ≤ θ(Si) + ε.
By Lemma 3.1.36, Φ1 ⊗ Φ2 : Md1 ⊗ Md2 → Mk1 ⊗ Mk2 is a quantum channel satisfying
SΦ1⊗Φ2 ⊆ S1 ⊗ S2, and so Φ1 ⊗ Φ2 ∈ C(S1 ⊗ S2). Properties of the tensor product as in
Definition B.0.3 give that (Φ1 ⊗ Φ2)∗(σ1 ⊗ σ2) = Φ∗1(σ1)⊗ Φ∗2(σ2) is invertible with
((Φ1 ⊗ Φ2)∗(σ1 ⊗ σ2))−1 = Φ∗1(σ1)−1 ⊗ Φ∗2(σ2)−1,
and that σ1 ⊗ σ2 ∈Mk1 ⊗Mk2 satisfies Tr(σ1 ⊗ σ2) = Trσ1 Trσ2 ≤ 1. From Definition 4.4.1
we then have
θ(S1 ⊗ S2) ≤∥∥((Φ1 ⊗ Φ2)∗(σ1 ⊗ σ2))−1
∥∥ =∥∥Φ∗1(σ1)−1 ⊗ Φ∗2(σ2)−1
∥∥=
∥∥Φ∗1(σ1)−1∥∥∥∥Φ∗2(σ2)−1
∥∥ ≤ (θ(S1) + ε)(θ(S2) + ε).
The conclusion follows by letting ε→ 0.
The next corollary is immediate and establishes that θ(S) is an upper bound on c(S).
Remark 4.5.4 shows further that it can be an arbitrarily more efficient upper bound than
4.5 Capacity bounds, the Witsenhausen rate and other limits 153
ϑ(S).
Corollary 4.5.3. Let S be a non-commutative graph. Then
α(S) ≤ c(S) ≤ θ(S).
Proof. Theorems 4.1.14 and 4.4.4 give that θ(S) ≥ θ(S) ≥ α(S), whence the upper bound on
c(S) is immediate from Lemma 4.5.1 and Proposition 4.5.2. The lower bound on c(S) follows
from the super-multiplicativity of independence number as given in Lemma 3.1.37.
Whether θ is multiplicative over tensor products remains an open question. Recalling
from Section 4.4 the definitions and properties of ϑ and ϑ as introduced in [13], we make the
following remarks. We note that [13, Corollary 10] gives that ϑ is multiplicative, whereas [13,
Lemma 4] gives that ϑ is multiplicative.
Remark 4.5.4. Both ϑ(S) and θ(S) are upper bounds on c(S). These two parameters, how-
ever, are not in general equal; indeed, Proposition 4.6.1 gives that θ(CId) = d, whereas [13,
equations (6) and (7)] give ϑ(CId) = d2, showing that the ratio θ(S)
ϑ(S)can be arbitrarily small.
It is not known, however, if θ(S) ≤ ϑ(S) in general.
Remark 4.5.5. It is useful to compare further the various quantum generalisations of θ(G)
which we have discussed, to see if any may in fact be identical. Given that θ(S) ≤ θ(S),
it is clear from Remark 4.5.4 that θ(S) and ϑ(S) are not in general equal. Since θ is sub-
multiplicative in the sense of Proposition 4.5.2, and it is shown in [13, p. 9-10] that ϑ is not, it
is clear that θ(S) 6= ϑ(S) in general. We recall from Corollary 4.1.24 that θ(Mm(S)) = θ(S)
for m ∈ N, but that ϑ lacks this stability (see [13, p. 10]), and thus θ(S) 6= ϑ(S) in general. On
the other hand, as noted in Remark 4.4.8, we cannot rule out the possibility that θ(S) = θ(S)
for all operator systems S.
Proposition 4.5.6. The non-commutative graph parameters χ and χf are sub-multiplicative
over tensor products; that is, for non-commutative graphs S ⊆Mc and T ⊆Md, we have
χ(S ⊗ T ) ≤ χ(S)χ(T )
and
χf(S ⊗ T ) ≤ χf(S)χf(T ).
Proof. In the proof of Lemma 3.1.37 it has been shown that if ui : i = 1, . . .m is S-
independent and vi : i = 1, . . . , n is T -independent, then ui ⊗ vj : (i, j) ∈ [m] × [n] is
4.5 Capacity bounds, the Witsenhausen rate and other limits 154
S ⊗ T -independent. In this case let P be the S-abelian projection P =∑m
i=1 uiu∗i and let Q
be the T -abelian projection Q =∑n
i=1 viv∗i . We note that
P ⊗Q =∑
i∈[m],j∈[n]
(uiu∗i )⊗ (vjv
∗j ) =
∑i∈[m],j∈[n]
(ui ⊗ vj)(ui ⊗ vj)∗, (4.32)
and so P ⊗Q is an S ⊗ T -abelian projection.
We begin with the case of the fractional chromatic number. By the definition of χf(S) in
(4.5), for arbitrary δ > 0 we can find positive weightings λ1, . . . , λk and S-abelian projections
P1, . . . Pk such thatk∑i=1
λi ≤ χf(S) + δ withk∑i=1
λiPi ≥ Ic.
Similarly, for arbitrary ε > 0 we can find positive weightings µ1, . . . , µl and T -abelian pro-
jections Q1, . . . , Ql such that
l∑i=1
µi ≤ χf(T ) + ε with
l∑i=1
λiQi ≥ Id.
Now observe that
∑i∈[m],j∈[n]
λiµjPi ⊗Qj =
(m∑i=1
λiPi
)⊗
n∑j=1
µjQj
≥ Ic ⊗ Id = Icd,
where each Pi ⊗Qj is an S ⊗ T -abelian projection, giving that
χf(S ⊗ T ) ≤∑
i∈[m],j∈[n]
λiµj ≤ (χf(S) + δ)(χf(T ) + ε),
and the claim follows on letting δ and ε tend to zero.
The corresponding result for chromatic numbers is found in the same way by requiring
λi, µj ∈ 0, 1 and setting ε = δ = 0.
We now return to the quantum setting of the ‘side information’ problem considered in
Section 3.1.4 by establishing the existence of the limit
limn→∞
1
nlogχ(S⊗nΦ )
for a quantum channel Φ.
Proposition 4.5.7. For an operator system S, the limit limn→∞1n logχ(S⊗n) exists and is
4.5 Capacity bounds, the Witsenhausen rate and other limits 155
equal to infn∈N1n logχ(S⊗n).
Proof. Let an = logχ(S⊗n) and consider the sequence (an)n∈N. By Proposition 4.5.6
an+m = logχ(S⊗(n+m)) ≤ logχ(S⊗n) + logχ(S⊗m) = an + am;
the sequence (an)n∈N is therefore sub-additive, and the claim follows by Lemma 3.1.6.
Definition 4.5.8. For an operator system S, we define the Witsenhausen rate of S by letting
R(S) = limn→∞
1
nlogχ(S⊗n).
Recalling the definition of the Witsenhausen rate R(G) of a graph G from Definition 3.1.9,
we have the following result.
Proposition 4.5.9. For a graph G it holds that R(SG) = R(G).
Proof. This is immediate from (4.11) on page 130 and Lemma 3.1.35.
A similar method to that used in Proposition 4.5.6 establishes that both the full covering
number and its fractional version are also sub-multiplicative over tensor products. We note
that this answers [6, Question 7.5].
Proposition 4.5.10. Let S and T be non-commutative graphs. It holds that
Ω(S ⊗ T ) ≤ Ω(S)Ω(T )
and
Ωf(S ⊗ T ) ≤ Ωf(S)Ωf(T ).
Proof. Suppose the sets u1, . . . , um and v1, . . . , vn are S-full and T -full respectively. As
in (3.12), it is clear that the set ui ⊗ vj : i ∈ [m], j ∈ [n] is orthonormal. Furthermore, by
Definition 3.2.1, for all i, k ∈ [m] and j, l ∈ [n] we have
(ui ⊗ vj)(uk ⊗ vl)∗ = uiu∗k ⊗ vjv∗l ∈ S ⊗ T , (4.33)
and so the set ui ⊗ vj : i ∈ [m], j ∈ [n] is S ⊗ T -full. Let P be the S-full projection given
by P =∑m
i=1 uiu∗i , and Q be the T -full projection given by Q =
∑mi=1 viv
∗i . As in (4.32) we
4.5 Capacity bounds, the Witsenhausen rate and other limits 156
have P ⊗ Q =∑
i∈[m],j∈[n](ui ⊗ vj)(ui ⊗ vj)∗, and P ⊗ Q is an S ⊗ T -full projection. The
sub-multiplicativity of the parameters Ω and Ωf now follows from Definition 4.1.8 and the
argument of Proposition 4.5.6.
Remark 4.5.11. We note that corresponding results for the clique covering number and its
fractional version do not follow by the same argument. It is clear from Definition 3.2.1
that when working with cliques, (4.33) holds if i 6= k and j 6= l, but not necessarily for all
(i, j) 6= (k, l).
Corollary 4.5.12. Let S be a non-commutative graph. Then c(S) ≤ Ωf(S).
Proof. Proposition 4.5.10 and Theorem 4.1.14 (i) show that the fractional full covering num-
ber satisfies the conditions of Lemma 4.5.1 to be an upper bound on Shannon capacity.
Since (4.20) on page 139 gives that χs(T ) = Ω(T ⊥) for an operator anti-system T ,
Proposition 4.5.10 allows us to establish the sub-multiplicativity of the strong chromatic
number, this time over co-normal products. (For the reason noted in Remark 4.5.11, an
equivalent result for the chromatic number of an operator anti-system, which satisfies χ(T ) =
Ω(T ⊥), does not follow.)
Corollary 4.5.13. For operator anti-systems T1 and T2
χs(T1 ∗ T2) ≤ χs(T1)χs(T2).
Proof. Using (4.20), Lemma 4.2.9 and Proposition 4.5.10, we have
χs(T1 ∗ T2) = Ω((T1 ∗ T2)⊥) = Ω(T ⊥1 ⊗ T ⊥2 ) ≤ Ω(T ⊥1 )Ω(T ⊥2 ) = χs(T1)χs(T2),
as required.
In the next proposition we let T n denote the nth co-normal power of the operator anti-
system T , just as we have used Gn to denote the nth co-normal power of the graph G.
Proposition 4.5.14. For an operator anti-system T , the limit limn→∞1n logχs(T n) exists
and is equal to infn∈N1n logχs(T n).
Similarly, for an operator system S, the limit limn→∞1n log Ω(S⊗n) exists and is given by the
expression infn∈N1n log Ω(S⊗n).
If S = T ⊥, then these two limits are equal.
4.5 Capacity bounds, the Witsenhausen rate and other limits 157
Proof. We use the method of Proposition 4.5.7 and the sub-multiplicativity properties of χs
and Ω. The final assertion is immediate by (4.20) and Lemma 4.2.9.
To consider the significance of these limits, first recall from [46, Corollary 3.4.3] that for
a graph G it holds that
limn→∞
n√χ(Gn) = χf(G), (4.34)
with ‘complementary’ version
limn→∞
n
√χ(Gn) = χf(G),
a result first due to Posner and McEliece in [30]. When T = TG for graph G, Lemma 4.2.3
and (4.21) give that χs(T nG ) = χs(TGn) = χ(Gn). Then by (4.34), Proposition 4.1.11 and
(4.19) on page 138,
limn→∞
1
nlogχs(T nG ) = lim
n→∞
1
nlogχ(Gn) = logχf(G) = log Ωf(SG) = log Ωf(T ⊥G ).
Remembering that Ωf(T ⊥) is the fractional version of χs(T ), comparison with (4.34) leads
us to ask the following open question.
Question 4.5.15. Does it hold that
limn→∞
n√χs(T n) = Ωf(T ⊥)
for any operator anti-system T ? (It is, of course, equivalent to ask if the ‘complementary’
result
limn→∞
n
√Ω(S⊗n) = Ωf(S)
holds for any operator system S.)
We now wish to consider full and clique numbers of tensor products of operator systems.
We recall that both these parameters are quantum generalisations of the clique number of a
graph, which has the following well-known multiplicativity property. (The proof is immediate
from Corollary 1.3.4 and Lemma 4.2.8.)
Lemma 4.5.16. ([15, Chapter 7, Exercise 13].) If F and G are graphs, then ω(F G) =
ω(F )ω(G).
We now consider the non-commutative case.
4.5 Capacity bounds, the Witsenhausen rate and other limits 158
Proposition 4.5.17. For operator systems S, T it holds that
ω(S ⊗ T ) ≥ ω(S)ω(T ).
Proof. Let ω(S) = p and ω(T ) = q. By Lemma 4.1.1 we can choose S-full set u1, . . . , up
and T -full set v1, . . . , vq. As shown in Proposition 4.5.10, the set ui ⊗ vj : i ∈ [p], j ∈ [q]
is S ⊗ T -full, and ω(S ⊗ T ) ≥ pq.
It is not known if ω is multiplicative.
Corollary 4.5.18. The limit limn→∞n√ω(S⊗n) exists for any operator system S. If G is a
graph, then n
√ω(S⊗nG ) = ω(G) for all n ∈ N.
Proof. To prove the existence of the limit, apply the method used to prove Corollary 3.1.7.
In the case that G is a graph, Lemma 3.1.35 and Proposition 4.1.11 give that ω(S⊗nG ) =
ω(SGn) = ω(Gn). Lemma 4.5.16 completes the proof.
We leave the value of the limit in Corollary 4.5.18 for an operator system not of the form
SG for a graph G as an open question.
Remark 4.5.19. For the reason discussed in Remark 4.5.11, the method used to prove Propo-
sition 4.5.17 cannot be used to prove super-multiplicativity of clique numbers.
The next proposition examines the clique number of tensor products, where the behaviour
is more subtle.
Proposition 4.5.20. Let S and T be operator systems.
(i) It holds that ω(S ⊗ T ) ≥ minω(S), ω(T ).
(ii) If ω(T ) ≥ 1, then ω(S ⊗ T ) ≥ ω(S).
Proof. (i) Without loss of generality, let ω(S) = p ≤ q = ω(T ), and choose S-clique
u1, . . . , up and T -clique v1, . . . , vq. Now consider the set B = ui ⊗ vi : i ∈ [p]. The
orthonormality of B is immediate. For i 6= j note that
(ui ⊗ vi)(uj ⊗ vj)∗ = (uiu∗j )⊗ (viv
∗j ) ∈ S ⊗ T ,
where we used that uiu∗j ∈ S and viv
∗j ∈ T for i 6= j. This suffices to show that B is an
S ⊗ T -clique, and the result follows.
4.6 Some examples 159
(ii) Since ω(T ) ≥ 1, we can find a T -full projection vv∗. Again let u1, . . . , up be an
S-clique where ω(S) = p. The set C = ui ⊗ v : i ∈ [p] is trivially orthonormal, and
furthermore (ui ⊗ v)(uj ⊗ v)∗ = uiu∗j ⊗ vv∗ ∈ S ⊗ T for i 6= j. It follows that C is an
S ⊗ T -clique, and ω(S ⊗ T ) ≥ ω(S).
Bearing in mind the multiplicativity of clique number over strong graph products given
by Lemma 4.5.16, one might intuitively expect a stronger result than Proposition 4.5.20 (i)
to hold. Questions about the clique numbers of tensor products of operator systems remain,
but there seems to be no trivial way to strengthen Proposition 4.5.20 (i) in general. Indeed,
Example 4.6.20 will establish that there exist operator systems S and T satisfying ω(S⊗T ) <
ω(T ). Further work could usefully consider the behaviour of the sequence (ω(S⊗n))n∈N and
examine both lim infn∈Nn√ω(S⊗n) and lim supn∈N
n√ω(S⊗n).
4.6 Some examples
If G is a graph with associated operator system SG, then we have already seen that the various
non-commutative graph parameters of SG assume values equal to the corresponding graph
parameters of G. (Indeed, we regard this as a necessary property of a valid generalisation
to the non-commutative setting.) For instance, Proposition 4.1.11 gave that α(SG) = α(G),
ω(SG) = ω(G), θ(SG) = θ(G) and ωf(SG) = ωf(G). We note the cases where G is the
complete graph Kd, giving SG = Md, and where G is the empty graph Kd, giving SG = Dd.
In this section we want to give examples of operator systems not associated to any graph G,
and consider their corresponding quantum channels.
The identity channel I : Md → Md given by I(ρ) = ρ for all ρ ∈ Md may be carried out
with single Kraus operator Id and thus has associated operator system CId := αId : α ∈
C. Below we evaluate some non-commutative graph parameters corresponding to this case.
Although this operator system is somewhat trivial, parameters associated with fp(CI) are
seen to display an interesting change of behaviour as d increases from 1 to 2.
Proposition 4.6.1. For d ∈ N we have
α(CId) = θ(CId) = θ(CId) = Ωf(CId) = Ω(CId) = d,
ω(CId) = ωf(CId) = χf(CId) = χ(CId) = 1,
4.6 Some examples 160
Ωf(CId) = Ω(CId) =
1 if d = 1
∞ if d ≥ 2,
ω(CId) =
1 if d = 1
0 if d ≥ 2,
H(CId, ρ) = 0 for all ρ ∈ Rd.
Proof. For orthonormal u, v ∈ Cd we have 〈u, v〉 = 〈uv∗, Id〉 = 0 and so uv∗ ∈ CI⊥d . It
follows that a projection in Md is an CId-clique projection if and only if it has rank 1,
giving ω(CId) = 1 and Ω(CId) = d. The operator system CId is clearly commutative and so
Proposition 4.3.10 gives that α(CId) = d and χ(CId) = χf(CId) = 1. From inequalities in
Theorems 4.1.14 and 4.4.4 and (4.8) we then obtain θ(CId) = θ(CId) = Ωf(CId) = Ω(CId) =
d. Now note that if d = 1 we have e1e∗1 ∈ fp(CI1) and fp(CI1) = [0, 1] = fp(CId)]. This gives
ω(CI1) = Ω(CI1) = Ωf(CId) = 1. However, if d ≥ 2, no unit vector v satisfies vv∗ ∈ CId.
Then fp(CId) = 0 and fp(CId)] = M+d , giving ω(CId) = 0, Ω(CId) =∞ and Ωf(CId) =∞
by (4.8) and Theorem 4.1.9. That H(CId, ρ) = 0 for all ρ ∈ Rd follows from Proposition
4.3.10.
Corollary 4.6.2. For the operator system CId with d ∈ N, it holds that c(CId) = d and
R(CId) = 0.
Proof. To obtain the Shannon capacity, first note that α ((CId)⊗n) = α(CIdn) = dn, giving
c(CId) = d. Similarly, we have χ ((CId)⊗n) = χ(CIdn) = 1, and so R(CId) = 0.
In the proof of Proposition 4.4.6 we had a channel Φ0 : Md →Md2 which was shown to be
‘trivial’ in the sense that it mapped all input states onto the same pure state vv∗. For some
fixed state σ ∈ Rk we now define the trivial channel Φσ : Md →Mk by Φσ(M) = σTrM for
all M ∈Md. It is easy to see that Φσ is trace preserving and satisfies Φσ(ρ) = σ for all states
ρ ∈ Rd.
Proposition 4.6.3. The trivial channel Φσ defined above is a c.p.t.p. map with associated
operator system SΦσ = Md.
Proof. Let σ =∑k
i=1 µiηiη∗i for µi ∈ R+ and orthonormal basis η1, . . . , ηk ⊆ Ck. Observe
4.6 Some examples 161
for M ∈Md that
∑i∈[k],j∈[d]
(√µi ηie
∗j )M(
√µi ejη
∗i ) =
∑i∈[k]
(√µi ηi)
∑j∈[d]
(e∗jMej)(√µi η
∗i )
=σTrM = Φσ(M),
where e1, . . . , ed is the canonical basis of Cd. We also have that
∑i∈[k],j∈[d]
(√µi ηie
∗j )∗√µi ηie∗j =
∑i∈[k]
µi∑j∈[d]
eje∗j = Id,
using that∑
i∈[k] µi = Trσ = 1. It follows by Proposition 3.1.18 that Φσ is a c.p.t.p. map
with Kraus operators √µi ηie∗j : i ∈ [k], j ∈ [d], and the operator system SΦσ associated
with Φσ is given by
SΦσ = span√µiµm ejη∗i ηme∗l : i,m ∈ [k], j, l ∈ [d] = spaneje∗l : j, l ∈ [d] = Md,
using that µi > 0 for some i ∈ [k].
Remark 4.6.4. Having shown that the trivial channel Φσ defined above has associated operator
systemMd = SKd , it is immediate that all its parameters are given by the corresponding graph
parameters of the complete graph Kd.
We now introduce an operator system which, though less trivial, is still relatively simple
to analyse.
Definition 4.6.5. We define operator system Td ⊆ Md by Td = CId + CJd where Id is the
d× d identity matrix and Jd is the d× d all-ones matrix.
That is, M ∈ Td if and only if M ∈Md is of the form
M =
λ µ . . . µ
µ λ . . . µ...
.... . .
...
µ µ . . . λ
= (λ− µ)Id + µJd
with λ, µ ∈ C. (Of course, T1 = CI1.) Note that Td = spanId, Jd and that Id and Jd
commute, so Td is an example of the operator systems considered in Remark 4.3.11.)
4.6 Some examples 162
Proposition 4.6.6. For all d ∈ N,
α(Td) = θ(Td) = θ(Td) = Ωf(Td) = Ω(Td) = d;
ω(Td) = ωf(Td) = χf(Td) = χ(Td) = 1;
H(Td, ρ) = 0 for all ρ ∈ Rd.
Proof. As Td is commutative, we can apply Proposition 4.3.10. Theorems 4.4.4 and 4.1.14
along with (4.8) on page 128 give the remaining results.
Corollary 4.6.7. For d ∈ N, we have c(Td) = d and R(Td) = 0.
Proof. Proposition 4.6.6 and Corollary 4.5.3 give c(Td) = d. Proposition 4.6.6 and Proposition
4.5.6 give χ(T ⊗nd ) = 1 for all n ∈ N, whence we have R(Td) = 0.
Parameters for Td related to the full projection convex corner exhibit an interesting de-
pendence on d, as given by the next two lemmas. (Recall T1 = CI1, so by Proposition 4.6.1,
Ω(T1) = Ωf(T1) = ω(T1) = 1.)
Proposition 4.6.8. We have Ω(T2) = Ωf(T2) = 2 and ω(T2) = 1.
Proof. Suppose unit vector v = (vi)2i=1 ∈ C2 satisfies vv∗ ∈ T2, such that v is T2-full. The
ij-element of vv∗ is vivj . Since Tr(vv∗) = 1, for vv∗ ∈ T2, we have |v1|2 = |v2|2 = 1/2. It is
also required that v1v2 = v2v1. Multiplying both sides by v1v2 gives v21|v2|2 = v2
2|v1|2 and so
v21 = v2
2 and v1 = ±v2. Setting v1 = eiθ/√
2 = ±v2 with θ ∈ [0, 2π) gives
vv∗ =1
2
1 ±1
±1 1
∈ T2, (4.35)
and we conclude that the T2-full singleton sets are those of the form eiθ√2
1
±1
: θ ∈ [0, 2π)
.
By Theorem 4.1.14 and Proposition 4.6.6, ω(T2) ≤ ω(T2) = 1, and we conclude ω(T2) = 1. It
follows from Lemma 4.1.15 that Ω(T2) ≥ 2. Now let u = 1√2
1
1
and v = 1√2
1
−1
, and
note that u and v are T2-full sets and that uu∗ + vv∗ = I. This gives that Ω(T2) = 2.
4.6 Some examples 163
From (4.35) it can be seen that the only T2-full projections are
P1 =1
2
1 1
1 1
and P2 =1
2
1 −1
−1 1
,
and fp(T2) is the convex corner generated by P1, P2. Thus by Lemma 2.2.32, for a matrix
M =
a b
b d
∈ M+2 , we have M ∈ fp(T2)] if and only if the following two conditions are
satisfied:
Tr(MP1) =1
2(a+ b+ b+ d) ≤ 1
and
Tr(MP2) =1
2(a− b− b+ d) ≤ 1.
It follows for any M ∈ fp(T2)] that TrM = a+ d ≤ 2. We also see that I ∈ fp(T2)], and can
thus conclude by Theorem 4.1.9 that
Ωf(T2) = maxTrM : M ∈ fp(T2)] = 2,
as required.
Example 4.6.9. To complete our discussion of T2, we give an example of a quantum channel
Φ satisfying SΦ = T2. It is easy to verify that the operators 1√2
1 1
0 0
and 1√2
0 0
1 −1
are Kraus operators for such a channel.
We now analyse Td for d ≥ 3.
Proposition 4.6.10. For d ≥ 3 it holds that Ω(Td) = Ωf(Td) =∞ and ω(Td) = 1.
Proof. Let unit vector v = (vi)di=1 ∈ Cd satisfy vv∗ ∈ Td. Since Tr(vv∗) = 1, this requires
that |vi|2 = 1/d for all i ∈ [d]. The following argument shows further that vi = vj for all
i, j ∈ [d]. Letting i, j, k ∈ [d] be pairwise distinct, we require vivk = vjvk, and vi = vj as
claimed. Then v = eiθ√d1 for some θ ∈ [0, 2π), giving vv∗ = 1
dJd ∈ Td.
Thus for d ≥ 3, the Td-full singleton sets are precisely those of the form
eiθ√d1 : θ ∈ [0, 2π)
.
As for d = 2, for d ≥ 3 we have ω(Td) ≤ ω(Td) = 1 and we conclude that ω(Td) = 1. It also
holds that the only Td-full projection is 1dJd. Then for M ∈M+
d we have M ∈ fp(Td)] if and
4.6 Some examples 164
only if Tr(MJd) ≤ d. Let unit vector w = (wi)di=1 ∈ Cd satisfy
∑di=1wi = 0. Observe that
〈w,1〉 = 0. Now for k ∈ R+ we form M = kww∗ ∈M+d , giving that
Tr(MJd) = kTr(ww∗11∗) = k| 〈w,1〉 |2 = 0.
Hence we have M ∈ fp(Td)] for all k ∈ R+ and since TrM = k it holds that Ωf(Td) = ∞.
Finally note by (4.8) that Ω(Td) =∞.
It is instructive to give an example of a channel Φ satisfying SΦ = T3.
Example 4.6.11. The operators
A1 =1√2
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
, A2 =
1√8
0 1 1
1 0 1
1 1 0
1 −1 0
0 1 −1
1 0 −1
satisfy A∗1A1 = A∗2A2 = 1
2I3 and A∗2A1 = A∗1A2 = 14(J3− I3). Thus the channel Φ : M3 →M6
given by Φ(ρ) = A1ρA∗1 +A2ρA
∗2 for ρ ∈M3 is a quantum channel with
SΦ = spanA∗iAj : i, j ∈ 1, 2 = T3.
We briefly consider tensor products of operator systems of the form Td, after the following
two straightforward lemmas.
Proposition 4.6.12. Consider operator systems Ri ⊆ Mdi where α(Ri) = di for i =
1, . . . ,m. Then
α(R1 ⊗ . . .⊗Rm) = Ω(R1 ⊗ . . .⊗Rm) = Ωf(R1 ⊗ . . .⊗Rm)
= θ(R1 ⊗ . . .⊗Rm) = d1 . . . dm.
Proof. Note that R1⊗. . .⊗Rm ⊆Md1...dm , and by the super-multiplicativity of independence
number as given in Lemma 3.1.37 we have α(R1 ⊗ . . .⊗Rm) ≥ d1 . . . dm. The results follow
from inequalities in Theorem 4.1.14 and (4.8) on page 128.
Proposition 4.6.13. Consider operator systems Ri ⊆Mdi where χ(Ri) = 1 for i = 1, . . . ,m.
4.6 Some examples 165
Then
ω(R1 ⊗ . . .⊗Rm) = χf(R1 ⊗ . . .⊗Rm) = χ(R1 ⊗ . . .⊗Rm) = 1.
Proof. By Proposition 4.5.6, χ(R1 ⊗ . . . ⊗ Rm) ≤ 1, and the results follow from Theorem
4.1.14, (4.8) and (4.10) on page 129.
Using the above two lemmas, we see that Proposition 4.6.6 has the following corollary.
Corollary 4.6.14. It holds that
α(Td1 ⊗ . . .⊗ Tdm) = Ωf(Td1 ⊗ . . .⊗ Tdm) = Ω(Td1 ⊗ . . .⊗ Tdm)
= θ(Td1 ⊗ . . .⊗ Tdm) = d1 . . . dm,
and
ω(Td1 ⊗ . . .⊗ Tdm) = χf(Td1 ⊗ . . .⊗ Tdm) = χ(Td1 ⊗ . . .⊗ Tdm) = 1.
Next we discuss an operator system that has been widely considered in the literature, for
example, see [21] and [27].
Definition 4.6.15. The ‘constant diagonal’ operator system Sd is defined by
Sd = spaneie∗j , Id : i 6= j ⊆Md, d ∈ N,
where e1 . . . , ed is the canonical basis of Cd.
For d ≥ 2, Sd is not commutative, and so it does not reduce to the rather trivial case of
Proposition 4.3.10, and nor is it equal to SG for any graph G. We should thus expect it to
exhibit non-trivial, genuinely quantum behaviour. In [27] it was shown that α(S2) = 1 and in
[21, Examples 4, 22] that χ(Sn) = χs(S⊥n ) = n. Here we extend these results by considering
tensor products of operator systems of this type and calculating the values of some of the
parameters introduced earlier. For n1, n2, . . . , nm ∈ N, let
Sn1,...,nm = Sn1 ⊗ Sn2 ⊗ · · · ⊗ Snm .
Lemma 4.6.16. Let u, v ∈ Cn1...nm be orthogonal vectors.
(i) If uv∗ ∈ S⊥n1,...,nm, then u = 0 or v = 0;
(ii) If uu∗, uv∗ ∈ Sn1,...,nm, then u = 0 or v = 0.
4.6 Some examples 166
Proof. (i) Suppose first that uv∗ ∈ S⊥n1,...,nm . Let m = 1 and note that
S⊥n1=
a1 . . . 0...
. . ....
0 . . . an1
:
n1∑i=1
ai = 0
.
Write u = (ui)n1i=1 and v = (vi)
n1i=1. Then for uv∗ ∈ S⊥n1
we have that uivj = 0 whenever i 6= j
and∑n1
i=1 uivi = 0. Suppose that uk 6= 0 for some k ∈ [n1]. Then vj = 0 for all j 6= k, and it
follows that∑n1
i=1 uivi = ukvk = 0. This gives vk = 0, and hence v = 0, thus establishing the
result in the case that m = 1.
Proceeding by induction, suppose that the statement holds for some m. Note that
Sn1,...,nm+1 =
S S1,2 . . . S1,n1
S2,1 S . . . S2,n1
......
. . ....
Sn1,1 Sn1,2 . . . S
: S, Si,j ∈ Sn2,...,nm+1
. (4.36)
Thus, S⊥n1,...,nm+1consists of all block matrices of the form
D1,1 D1,2 . . . D1,n1
D2,1 D2,2 . . . D2,n1
......
. . ....
Dn1,1 Dn1,2 . . . Dn1,n1
,
where
Di,j ∈ S⊥n2,...,nm+1, i 6= j, (4.37)
andn1∑i=1
Di,i ∈ S⊥n2,...,nm+1. (4.38)
Write
u =
u(1)
...
u(n1)
, v =
v(1)
...
v(n1)
, u(i), v(i) ∈ Cn2...nm+1 , i ∈ [n1],
and suppose that uv∗ =(u(i)v(j)∗)n1
i,j=1∈ S⊥n1...nm+1
. Assume that u(i) 6= 0 for some i ∈ [n1].
By (4.37), u(i)v(j)∗ ∈ S⊥n2...nm+1for all j 6= i, and so by the induction assumption v(j) = 0
whenever j 6= i. Then∑n1
k=1 u(k)v(k)∗ = u(i)v(i)∗ ∈ S⊥n2...nm+1
by (4.38), and by the induction
4.6 Some examples 167
assumption, v(i) = 0; thus, v = 0.
(ii) Suppose that uu∗, uv∗ ∈ Sn1...nm with orthogonal u, v ∈ Cn1...nm and ‖u‖ =√k > 0.
Write u = (ui)n1...nmi=1 and v = (vi)
n1...nmi=1 . We make the following claim, which is clearly
sufficient to prove the required result:
v = 0 and |ui|2 =k
n1 . . . nmfor all i ∈ [n1 . . . nm]. (4.39)
First we establish (4.39) when m = 1. If uu∗ = (uiuj)i,j∈[n1] ∈ Sn1 and ‖u‖2 = k, then
|ui|2 = k/n1 for all i ∈ [n1]. If in addition, uv∗ = (uivj)i,j∈[n1] ∈ Sn1 , then uivi = ujvj for all
i, j ∈ [n1]. Since 〈u, v〉 = 0, we have that uivi = 0 for all i ∈ [n1], and hence vi = 0 for all
i ∈ [n1], which yields v = 0.
Proceeding by induction, suppose (4.39) holds under the stated conditions for some m.
For orthogonal u, v ∈ Cn1...nm+1 write
u =
u(1)
...
u(n1)
, v =
v(1)
...
v(n1)
, u(i), v(i) ∈ Cn2...nm+1 , i ∈ [n1].
Suppose that uu∗, uv∗ ∈ Sn1,...,nm+1 with ‖u‖ =√k > 0. Now uu∗ = (u(i)u(j)∗)i,j∈[n1] ∈
Sn1...nm+1 , and so by (4.36) we have u(i)u(i)∗ = u(j)u(j)∗ ∈ Sn2...nm+1 for all i, j ∈ [n1]. Since
‖u(i)‖2 = Tr(u(i)u(i)∗) we then have ‖u(i)‖ = ‖u(j)‖ for all i, j ∈ [n1]. Since ‖u‖2 = 〈u, u〉 =∑n1i=1 ‖u(i)‖2 = k, we have ‖u(i)‖2 = k/n1 for all i ∈ [n1]. Similarly, uv∗ = (u(i)v(j)∗)i,j∈[n1] ∈
Sn1...nm+1 , and so u(i)v(i)∗ = u(j)v(j)∗ ∈ Sn2...nm+1 for all i, j ∈ [n1]. Since⟨u(i), v(i)
⟩=
Tr(u(i)v(i)∗) this gives⟨u(i), v(i)
⟩=⟨u(j), v(j)
⟩for all i, j ∈ [n1]. Then using that 〈u, v〉 =∑n1
i=1
⟨u(i), v(i)
⟩= 0, we have
⟨u(i), v(i)
⟩= 0 for all i ∈ [n1]. Thus we can apply the induction
hypothesis to u(i), v(i) for each i ∈ [n1] to yield |u(i)j |2 = k
n1...nm+1(where we have used that
‖u(i)‖2 = k/n1) and also that v(i) = 0 for all i ∈ [n1] and j ∈ [n2 . . . nm+1]. Then (4.39) holds
for m+ 1.
Proposition 4.6.17. Let n1, . . . , nm ∈ N. Then
(i) ap(Sn1,...,nm) = AIn1...nm;
(ii) α(Sn1,...,nm) = 1;
(iii) c(Sn1,...,nm) = 1;
(iv) χf (Sn1,...,nm) = χ(Sn1,...,nm) = n1 . . . nm;
4.6 Some examples 168
(v) R(Sn1,...,nm) = log(n1, . . . , nm);
(vi) ω(Sn1,...,nm) = 1;
(vii) Ω(S2) = Ωf(S2) = 2 and Ω(Sn1,...,nm) ≥ Ωf(Sn1,...,nm) ≥ n1 . . . nm;
(viii) ω(Sn) = n and ω(Sn1,...,nm) ≥ minn1, . . . , nm;
(ix) Ωf(Sn) = Ω(Sn) = 1.
Proof. (i) By Lemma 4.6.16 (i), all non-zero Sn1,...,nm-abelian projections have rank 1. Re-
calling the definition of AC in (2.3) on page 44, the claim then follows from Lemma 3.2.7;
(ii) and (iii) follow immediately.
(iv) By (i), ap(Sn1,...,nm)] = BIn1...nmusing Corollary 2.2.34, and hence χf(Sn1,...,nm) =
n1 . . . nm by Theorem 4.1.9. It follows from (4.8) that χ(Sn1,...,nm) = n1 . . . nm.
(v) By (iv), χ ((Sn1,...,nm)⊗n) = nn1 . . . nnm, and applying Definition 4.5.8 yields the result.
(vi) By Lemma 4.6.16 (ii), any non-zero Sn1,...,nm-full projection has rank 1, and so we have
fp(Sn1,...,nm) ⊆ AIn1...nmand ω(Sn1,...,nm) ≤ 1. Observe that if u = 1√
n1,...,nm1 ∈ Cn1...nm ,
then uu∗ = 1n1,...,nm
Jn1...nm ∈ Sn1,...,nm , giving that uu∗ is an Sn1,...,nm-full projection, and
the result is proved.
(vii) From the proof of (vi) we have fp(Sn1,...,nm) ⊆ AIn1...nm, and so BIn1...nm
⊆ fp(Sn1,...,nm)].
We thus have In1...nm ∈ fp(Sn1,...nm)], and using (4.8) we obtain Ω(Sn1,...,nm) ≥ Ωf(Sn1,...,nm) ≥
n1, . . . , nm by Theorem 4.1.9. Since T2 ⊆ S2, Lemma 4.1.13 and Lemma 4.6.8 give Ωf(S2) ≤
Ωf(T2) = 2 and Ω(S2) ≤ Ω(T2) = 2, yielding Ω(S2) = Ωf(S2) = 2.
(viii) It is easy to see that e1, . . . , en, the canonical basis of Cn, is an Sn-clique. By Theorem
4.1.14 we then have ω(Sn) = n. The inequality in the general case follows from Proposition
4.5.20.
(ix) By (viii) we have I ∈ cp(Sn), and applying Definition 4.1.6 gives Ω(Sn) = 1. The value
of Ωf(Sn) follows from (4.8).
We have not yet determined the values of Ω(Sd) or Ωf(Sd) for d ≥ 3. It is perhaps sur-
prising that an operator system which seemingly has such a straightforward structure would
present such challenges, but we have already noted that the behaviour of the full projection
convex corner can be subtle, even in the simplest of cases. The difficulties associated with
4.6 Some examples 169
the operator system Sd only increase when we consider the parameters θ and θ: the determi-
nation of θ(Sd) and θ(Sd) for d ≥ 2 remains an open problem. We now outline what can be
established in this case, before making a conjecture as to the values of θ(Sd) and θ(Sd). We
begin by considering the quantum channels contained in Cn(Sd).
Lemma 4.6.18. If Φ ∈ Cn(Sd), then Φ has a Kraus representation
Φ(ρ) =m∑i=1
AiρA∗i for ρ ∈Md,
where for each i ∈ [m] we have Ai =√λi
(c
(i)1 . . . c
(i)d
)∈ Mn,d, with λi ∈ R+ and
c(i)1 , . . . , c
(i)d ∈ C
n satisfying the following:
(i) for each j ∈ [d], the setc
(1)j , . . . , c
(m)j
⊂ Cn is orthonormal; and
(ii) for all k, l ∈ [d], it holds that∑m
i=1λi
⟨c
(i)k , c
(i)l
⟩= δkl.
(We note that (i) requires m ≤ n and that setting k = l in (ii) gives∑m
i=1 λi = 1.)
Proof. By Corollary 3.1.21, the quantum channel Φ : Md →Mn has a set of Kraus operators
A1, . . . , Am ⊆Mn,d satisfying
Tr(A∗iAj) = 0 for i 6= j. (4.40)
For i ∈ [m], write Ai =(v
(i)1 . . . v
(i)d
)∈Mn,d with v
(i)1 , . . . , v
(i)d ∈ C
n. This gives
A∗iAj =
⟨v
(j)1 , v
(i)1
⟩. . .
⟨v
(j)d , v
(i)1
⟩...
. . ....⟨
v(j)1 , v
(i)d
⟩. . .
⟨v
(j)d , v
(i)d
⟩ ∈Md. (4.41)
If Φ ∈ Cn(Sd), then SΦ = spanA∗iAj : i, j ∈ [m] ⊆ Sd, and from (4.41) it is then clear
that ⟨v
(j)k , v
(i)k
⟩=⟨v
(j)l , v
(i)l
⟩for all i, j ∈ [m], k, l ∈ [d]. (4.42)
Setting i = j in (4.42) yields that ‖v(i)k ‖ = ‖v(i)
l ‖ for all k, l ∈ [d], and hence for all i ∈ [m]
there exists λi ∈ R+ such that ‖v(i)k ‖
2 = λi for all k ∈ [d]. For all i ∈ [m] and k ∈ [d] we set
v(i)k =
√λi c
(i)k , giving
Ai =√λi
(c
(i)1 . . . c
(i)d
), with ‖c(i)
k ‖ = 1 for all i ∈ [m], k ∈ [d].
4.6 Some examples 170
Since Ai 6= 0, we note that λi > 0.
By (4.40) and (4.41)d∑
k=1
⟨v
(j)k , v
(i)k
⟩= 0 when i 6= j, (4.43)
and from (4.42) it follows that
⟨v
(j)k , v
(i)k
⟩= 0 for all k ∈ [d] when i 6= j.
It is immediate that ⟨c
(j)k , c
(i)k
⟩= 0 for all k ∈ [d] when i 6= j,
and (i) holds.
Since A1 . . . , Am is a set of Kraus operators, Proposition 3.1.18 gives∑m
i=1A∗iAi = Id.
In (4.41) we set j = i, sum over i ∈ [m] and consider the (lk)-element to obtain
m∑i=1
⟨v
(i)k , v
(i)l
⟩=
m∑i=1
λi
⟨c
(i)k , c
(i)l
⟩= δlk,
and (ii) is proved.
Consider Φ ∈ Cn(Sd) with Kraus operators as given above. As a consequence of Lemma
4.6.18 (i), there exist unitaries U1, . . . , Ud ∈Mn such that c(i)j = Ujei for i ∈ [m] and j ∈ [d],
where e1, . . . , en is the canonical basis of Cn. Now set Λ =∑m
i=1 λieie∗i ∈ D+
n . Using
Lemma 4.6.18 (ii),
〈U∗l Uk,Λ〉 =m∑i=1
λi 〈Ukei, Ulei〉 =m∑i=1
λi
⟨c
(i)k , c
(i)l
⟩= δkl.
Setting k = l gives that Tr Λ = 1 and hence Λ ∈ Rn. Now observe that
Φ(Id) =
m∑i=1
λi
(c
(i)1 . . . c
(i)d
)c
(i)∗1
...
c(i)∗d
=
m∑i=1
λi
d∑j=1
c(i)j c
(i)∗j
=
m∑i=1
λi
d∑j=1
Ujeie∗iU∗j =
d∑j=1
UjΛU∗j . (4.44)
Conjecture 4.6.19. For d ∈ N, we make the conjecture that θ(Sd) = θ(Sd) = d.
Recalling Proposition 4.4.7, it is clear that Conjecture 4.6.19 holds if and only if Φ(Id) ≤
4.6 Some examples 171
Id2 for all Φ ∈ Cd2(Sd). By (4.44) it is sufficient to prove that if unitaries U1, . . . , Ud ∈ Md2
and diagonal Λ ∈ Rd2 satisfy 〈U∗l Uk,Λ〉 = δkl, then
d∑j=1
UjΛU∗j ≤ Id2 . (4.45)
We lack a general proof that (4.45) holds under these conditions, but it is easy to see that
it holds in the following two special cases:
(i) If λi ≤ 1/d for all i = 1, . . . ,m, then Λ ≤ d−1Im, and the result is immediate.
(ii) If Λ = uu∗ for some unit vector u ∈ Cd2, then letting Uiu = vi, we have
〈vi, vj〉 = Tr(Uiuu∗U∗j ) = Tr(UiΛU
∗j ) =
⟨U∗j Ui,Λ
⟩= δij
and v1, . . . , vd is orthonormal. We then have
d∑i=1
UiΛU∗i =
d∑i=1
Uiuu∗U∗i =
d∑i=1
viv∗i ≤ Id2 .
Having discussed the operator system Sd, we can now give an example of an interesting
phenomenon mentioned at the end of Section 4.5 about the behaviour of clique numbers of
tensor products.
Example 4.6.20. Consider the operator system CI2⊗S2. Recall that ω(S2) = 2 by Proposition
4.6.17 and ω(CI2) = 1 by Proposition 4.6.1. We now claim that ω(CI2 ⊗ S2) = 1 < ω(S2).
Since u is an CI2⊗S2-clique for any unit vector u ∈ C4, it suffices to prove that no (CI2⊗S2)-
clique has cardinality greater than 1. To establish this we show that if uv∗ ∈ CI2 ⊗ S2, then
u = 0 or v = 0. We first observe that
CI2 ⊗ S2 =
λ a 0 0
b λ 0 0
0 0 λ a
0 0 b λ
: λ, a, b ∈ C
.
For u, v ∈ C4, write u = (ui)4i=1 and v = (vi)
4i=1, and suppose that uv∗ = (uivj)
4i,j=1 ∈
CI2 ⊗ S2. This requires
u1v3 = u1v4 = u2v3 = u2v4 = 0, giving u1 = u2 = 0 or v3 = v4 = 0,
4.7 Further questions 172
and
u3v1 = u3v2 = u4v1 = u4v2 = 0, giving u3 = u4 = 0 or v1 = v2 = 0.
Since for uv∗ ∈ CI2 ⊗ S2 we also have
u1v1 = u2v2 = u3v3 = u4v4,
it must then hold that all these terms vanish. Similarly, u1v2 = u3v4 and vanishes because
either v4 = 0 or u1 = 0. Finally, u2v1 = u4v3 and these terms both vanish because u4 = 0
or v1 = 0. We then have uv∗ = 0, and it follows that u = 0 or v = 0, and u, v is not a
CI2 ⊗ S2-clique.
Our study of the non-commutative graph Sd also resolves the following question concern-
ing the quantum sandwich theorem. Recall that cp(S)] ⊆ fp(S)] by Theorem 3.2.10 and that
both cp(S)] and fp(S)] are non-commutative versions of fvp(G). The form of Theorem 1.4.5,
the classical sandwich theorem, and a comparison with Theorem 3.2.37 invite us to ask if
fp(S)] can be replaced by cp(S)] in Theorem 3.2.37. By considering the operator system S2,
the following result shows that this is not possible.
Lemma 4.6.21. It holds that th(S2) 6⊆ cp(S2)].
Proof. It is clear that e1, e2 is an S2-clique, giving that I ∈ cp(S2) and, using Lemma 3.2.7,
we obtain cp(S2) = BI2 . Anti-blocking gives cp(S2)] = AI2 . However, by Proposition 4.4.6 it
holds that θ(S2) > 1, and the result follows.
4.7 Further questions
There is much scope for further work on the links between quantum information theory and
the theory of Md-convex corners. We now gather some of the open questions that have been
raised in this chapter, and we discuss potential directions of further research that may prove
to be promising.
• Section 4.1.4 discussed definitions of weighted parameters for non-commutative graphs;
work could usefully be undertaken on their properties.
• We examined a number of examples of non-commutative graphs in Section 4.6, and
much work could be undertaken in this direction. For self-adjoint M ∈Md, we note that
spanId,M is a non-commutative graph, and this example would be worth considering.
4.7 Further questions 173
• Noting the importance of the fractional chromatic number, it would be interesting if
the result limn→∞n√χ(Gn) = χf(G) could be generalised to the quantum setting, as
discussed in Question 4.5.15.
• For any non-commutative graph S does it hold that θ(S) = θ(S)? This is probably the
most important open question in Chapter 4.
• Section 4.5 also notes the need for further work on the value of limn→∞n√ω(S⊗n) and
on the behaviour of the sequence (ω(S⊗n))n∈N for a non-commutative graph S.
Chapter 5
The classical source with memory
One of the central concepts of Chapter 1 was the graph entropy of a probabilistic graph, as
introduced by Korner [23] to solve the source coding problem over a partially distinguishable
alphabet. It is important to note that the analysis there was of an i.i.d. source. Theory
described for the classical setting in Chapter 1 was generalised to the quantum setting in
Chapters 2 to 4. We now return to the classical case, but seek to generalise the concept
of graph entropy to the ‘non-i.i.d.’ source, where successively emitted symbols are not in-
dependent, but rather follow some joint distribution. This, of course, will be the case with
any ‘real-life’ communication. Such a source is then said to possess memory. We begin by
summarising background material on the source with memory, in particular the concepts of
entropy and isomorphism. A short section containing some new graph theoretic results is
then followed by an attempt to generalise the theory of the source with memory to the situ-
ation of partial distinguishability. In this setting of partial distinguishability we will discuss
a generalisation of the Kolmogorov–Sinai Theorem and a notion of isomorphism, as well as
considering the Bernoulli and Markov shifts.
5.1 Entropy and the source with memory
A full development of the theory of the source with memory leading to the definition of
Kolmogorov–Sinai entropy and the proof of the Kolmogorov–Sinai Theorem can be found
in both [4] and [20]; in this section we summarise the important concepts and results. As
in Chapter 1, we let X denote a fixed, finite alphabet, and we consider a source emitting a
doubly infinite sequence ω = (. . . , ω−1, ω0, ω1, . . .) of elements of X . We denote by Ω the set
of all such sequences. We write Ω = X Z and take Ω as our sample space. We note that it is
174
5.1 Entropy and the source with memory 175
possible to develop an analogous theory for sequences of the form ω = (ω0, ω1, . . .), infinite
in one direction only [4]. Considering doubly infinite sequences in some ways simplifies the
analysis, and is the approach we will follow.
We begin by recalling some standard measure theory. A σ-algebra E on Ω is a collection
of subsets of Ω such that ∅ ∈ E and such that E is closed under countable unions, countable
intersections and taking complements. If E is a σ-algebra on Ω, then (Ω, E) is a called a
measurable space. A probability measure P on measurable space (Ω, E) is a function E →
[0, 1] satisfying P (Ω) = 1 and having the property of countable additivity, namely that
P (⋃∞i=1Ei) =
∑∞i=1 P (Ei) for all pairwise disjoint sets E1, E2, . . . ∈ E . It follows that P (∅) =
0, but we note that non-empty sets may also have zero measure; if A ∈ E satisfies P (A) = 0,
then we say that A is a null set. If P is a probability measure on (Ω, E), we say that the triple
(Ω, E , P ) is a probability space. For measurable spaces (Ω, E) and (Ω′, E ′), we say function
f : Ω→ Ω′ is measurable if f−1(E) ∈ E for all E ∈ E ′.
If C is a collection of subsets of Ω, then the σ-algebra on Ω generated by C is the intersection
of all σ-algebras on Ω containing C, or equivalently the smallest σ-algebra on Ω to contain C.
A set of the form
ω ∈ Ω : ωi1 = j1, . . . , ωin = jn for some n ∈ N, ik ∈ Z, jk ∈ X
will be known as a cylinder, and, following [4], we let F be the σ-algebra on Ω generated by
the cylinders. Letting P be a probability measure on F , we work in the probability space
(Ω,F , P ).
The bijection T : Ω → Ω defined by (Tω)n = ωn+1 is known as the shift transformation
on Ω. We can imagine that a given ω ∈ Ω is sent one coordinate at a time, with ωi being the
symbol sent at time i. Then Tω represents the same message, with the time origin shifted
forward by one unit. It is natural to insist that this shift of time origin should not affect
probabilities, so we require the probability measure P on F to satisfy P (T−1A) = P (A) for
every cylinder A. When P (A) = P (T−1A) for all A ∈ F , we say that T is measure preserving.
Since T is invertible, this is equivalent to the condition that P (TA) = P (A) for all A ∈ F .
Indeed, we then have P (TnA) = P (A) for all n ∈ Z.
It is shown in [4, Theorem 1.1] that if P (A) = P (T−1A) for every cylinder A, then for
all B ∈ F the sets TB, T−1B ∈ F satisfy P (TB) = P (B) = P (T−1B); in other words, in
this case we have that both T and T−1 are measurable and measure-preserving functions on
Ω. Throughout the sequel we insist that this condition holds. The Kolmogorov Existence
5.1 Entropy and the source with memory 176
Theorem, as discussed in [4, Example 1.2], shows that a measure P on F preserved by T is
uniquely determined by specifying the measure it gives to each cylinder. We note here that
quadruples of the form (Ω,F , P, T ) where (Ω,F , P ) is a probability space and T : Ω→ Ω is
a measurable, measure-preserving function are known as dynamical systems and are widely
studied in many contexts, not just in information theory. (In this general setting it is not
necessary that the sample space be of the form Ω = X Z.)
Defining a notion of entropy in this setting is complicated by the fact that Ω is uncount-
able, and furthermore that a given ω ∈ Ω will generally have measure 0. This is the motivation
to consider finite subalgebras. We say that a finite set B ⊂ F is a finite subalgebra of the
σ-algebra F if ∅ ∈ B and if B is closed under unions, intersections and taking complements.
If B is a finite subalgebra of F , then B is automatically a σ-algebra. For a finite subalgebra
B there exists a unique and finite set at(B) consisting of non-empty and pairwise disjoint
elements of B whose union is Ω, and such that every non-empty element of B can be uniquely
expressed as a union of elements of at(B). The elements of at(B) are known as the atoms of
B. (In fact, the atoms of B are those non-empty elements of B which have no proper subset
contained in B.) If B is a finite subalgebra, then TnB is a finite subalgebra for all n ∈ Z, and
at(TnB) = Tn at(B). We denote by A0 the ‘time-0’ finite subalgebra whose atoms are the
cylinders Ai = ω : ω0 = i, i ∈ X . For any k ∈ Z, it is clear T−kA0 is the finite subalgebra
of F with atoms ω : ωk = i, i ∈ X .
Let A be a set of arbitrary cardinality, and for each α ∈ A let Bα be a finite subalgebra
of F . We write∨α∈A Bα to denote the σ-algebra generated by ∪α∈ABα. We then have
F =∨∞n=−∞ T
nA0. We write∨ni=1 Bi = B1 ∨ . . .∨Bn. If B and C are finite subalgebras, then
so is B ∨ C, and it is clear that
at(B ∨ C) = B ∩ C : B ∩ C 6= ∅, B ∈ at(B), C ∈ at(C).
Let B ⊂ F be the finite subalgebra given by B =∨nk=1 T
−ikA0, with i1, . . . , in ∈ Z. The
atoms of B are then the cylinders
Bj1,...,jn = ω : ωi1 = j1, . . . , ωin = jn, with jk ∈ X for all k ∈ [n]. (5.1)
Because we can only ever observe a finite set of coordinates of any ω ∈ Ω, we can think of the
‘physical’ sets as those which are finite unions of cylinders. As in [4], we define the algebra
5.1 Entropy and the source with memory 177
F0 by
F0 =∞⋃n=0
n∨i=−n
T iA0. (5.2)
If F ∈ F0, then F ∈∨ni=−n T
iA0 for some n ∈ N, and so F is a finite union of cylinders.
We can thus think of F0 as the collection of all ‘physical’ sets. If F1, F2 . . . ∈ F0, then⋃ni=1 Fi ∈ F0 for all n ∈ N, but it may hold that
⋃∞i=1 Fi /∈ F0. Thus, unlike F , we see
that F0 is not a σ-algebra. Note that if B ∈ F0, then TnB ∈ F0 for all n ∈ Z. We let S
denote the set of all subalgebras of the form∨nk=1 T
−ikA0 with i1, . . . , in ∈ Z. ‘Physical’ finite
subalgebras of F are those which are contained in F0, and we then observe that if B ⊆ F0 is
a finite subalgebra of F , then B ⊆ C for some C ∈ S.
The following standard definitions will later be generalised to the case of partial distin-
guishability, in the way that Shannon entropy is generalised by graph entropy.
Definition 5.1.1. ([4], [20].) We work in probability space (Ω,F , P ) as described. Let
B, C ⊂ F be finite subalgebras.
(i) We define the entropy of B by
H(B) = −∑
B∈at(B)
P (B) logP (B). (5.3)
(ii) The conditional entropy of C given B is defined by
H(C|B) =∑
B∈at(B)
P (B)∑
C∈at(C)
−P (C|B) logP (C|B), (5.4)
where P (C|B) = P (C∩B)/P (B) and the first summation ignores atoms of B of measure
0.
(iii) The entropy of B relative to T is given by
h(B, T ) = lim supn→∞
1
nH
(n−1∨k=0
T−kB
). (5.5)
(iv) The Kolmogorov–Sinai entropy of T is defined by
h(T ) = suph(B, T ) : B is a finite subalgebra of F. (5.6)
Remark 5.1.2. (i) Note that H(B) is equal to the Shannon entropy of the probability distri-
bution induced on at(B) by P .
5.1 Entropy and the source with memory 178
(ii) The limit superior in the definition of h(B, T ) can be shown to be a limit; a proof using
conditional entropies is given in [4, p.81]. Alternatively, this follows from Fekete’s Lemma
using the method we will use to prove Proposition 5.3.15.
(iii) If P (B ∩ C) = P (B)P (C) for all B ∈ at(B) and C ∈ at(C), then P (C|B) = P (C) and
H(C|B) = H(C).
We summarise some well-known properties of these quantities.
Theorem 5.1.3. ([4, A′5, B3]) For a finite subalgebra B ⊂ F and u, v ∈ Z with u ≤ v it
holds that
(i) H(T−uB) = H(B) and (ii) h
(v∨
k=u
T−kB, T
)= h(B, T ).
Theorem 5.1.4. ([4, A′2, B2 ]) If finite subalgebras B, C ⊂ F satisfy B ⊆ C, then
(i) H(B) ≤ H(C) and (ii) h(B, T ) ≤ h(C, T ).
Theorem 5.1.5. [4, A′1, A′4] For finite subalgebras B, C ⊂ F ,
H(B) +H(C|B) = H(B ∨ C) ≤ H(B) +H(C).
Now we recall the Kolmogorov–Sinai Theorem, a result which makes the computation of
h(T ) feasible in many cases.
Theorem 5.1.6. [4, Theorem 7.1] If a finite subalgebra B satisfies∨∞n=−∞ T
nB = F , then
h(T ) = h(B, T ).
It immediately follows for the time-0 subalgebra A0 that
h(T ) = h(A0, T ). (5.7)
Indeed, the next proposition shows that the supremum in Definition 5.1.1 (iv) is achieved by
any B ∈ S.
Proposition 5.1.7. It holds that h(B, T ) = h(T ) for all B ∈ S.
Proof. By Definition 5.1.1 (iv) h(B, T ) ≤ h(T ). For the reverse inequality, observe that if
5.1 Entropy and the source with memory 179
B ∈ S, then for some n ∈ Z we have TnA0 ⊆ B, giving that
h(T ) = h(A0, T ) = h(TnA0, T ) ≤ h(B, T )
by (5.7), Theorems 5.1.3 (i) and 5.1.4 (ii).
Two well-known special cases deserve particular attention.
(i) Bernoulli shifts [20, Section 1.3, Example 15]. Given a probability distribution p = (pi)i∈X
on alphabet X , let
P (ω : ωk = ik,m ≤ k ≤ n) =n∏
k=m
pik (5.8)
for all m,n ∈ Z with ik ∈ X , k = m, . . . , n. Since any cylinder is a finite union of cylinders of
the form appearing on the left of (5.8), this suffices to give the measure of any cylinder, and
thus, by the Kolmogorov Existence Theorem, to define a probability measure P on (Ω,F)
which is preserved by T . In this case T is called the p-Bernoulli shift. An i.i.d. source clearly
corresponds to a Bernoulli shift. In the case of the p-Bernoulli shift, [4, (7.2)] gives that
h(T ) = H(p), (5.9)
where H(p) is the Shannon entropy of p.
(ii) Markov shifts [20, Section 1.3, Example 17]. Set |X | = d and let matrix Π = (pij) ∈
Md(R+) satisfy∑d
j=1 pij = 1 for all i = 1, . . . , d. Further, let row vector p = (pi) ∈M1,d(R+)
satisfy∑d
i=1 pi = 1 and pΠ = p; such a p is known as an invariant distribution. Again we
use the Kolmogorov Existence Theorem to specify a probability measure P on (Ω = X Z,F)
preserved by T by setting
P (ω : ωk = ik, ωk+1 = ik+1, . . . , ωl = il) = pikpikik+1. . . pil−1il . (5.10)
With P so defined, the shift T is called the (Π, p)-Markov shift. Note for any k ∈ Z that
P (ω : ωk = i) = pi and that P (ω : ωk = i, ωk+1 = j) = pipij ; thus pij is the probability
that ωn+1 is j, given that ωn was i. (Indeed, pij is the probability that ωn+1 is j, given any
previous history culminating in ωn = i.) We note that setting pij = pj for all i, j ∈ [d] gives
the p-Bernoulli shift. The Kolmogorov–Sinai entropy of the (Π, p)-Markov shift is given in
[4, (7.3)] by
h(T ) = −∑i,j∈[d]
pipij log pij . (5.11)
5.2 Graph theoretic preliminaries 180
The following definition describes the notion of isomorphism for systems (Ω,F , P, T ) and
(Ω′,F ′, P ′, T ′).
Definition 5.1.8. [20, Section 1.3, Definition 13] We say that the systems (Ω,F , P, T ) and
(Ω′,F ′, P ′, T ′) are isomorphic, or simply that T and T ′ are isomorphic, if there exists a
bijection φ : Ω→ Ω′ satisfying:
(i) for any A ⊆ Ω we have φ(A) ∈ F ′ if and only if A ∈ F , in which case P ′(φ(A)) = P (A);
and
(ii) for all ω ∈ Ω, it holds that φ(Tω) = T ′φ(ω).
In this case we write (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′), or simply T ∼= T ′, and φ is called an
isomorphism.
As isomorphic systems are in some ways equivalent, it would be desirable to have an en-
tropic quantity that is invariant under isomorphism; the Kolmogorov–Sinai entropy possesses
this property, as shown in the next theorem.
Theorem 5.1.9. [20, Section 1.3, Theorem 14] If (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′), then h(T ) =
h(T ′).
The following famous theorem of Ornstein ([33]), shows that among the Bernoulli shifts,
the converse of Theorem 5.1.9 also holds; we say that the Kolmogorov–Sinai entropy is a
complete invariant among the Bernoulli shifts.
Theorem 5.1.10. [33] The Bernoulli shifts T and T ′ are isomorphic if and only if h(T ) =
h(T ′).
5.2 Graph theoretic preliminaries
Here we give a number of definitions and results concerning co-normal and lexicographic
graph products and their graph entropies that will be needed in the sequel.
First we consider the co-normal product of graphs F and G. Let |V (F )| = c and |V (G)| =
d, and let S ⊆ V (F ) and T ⊆ V (G) have characteristic vectors v(S) and v(T ) respectively.
Then the characteristic vector of S×T is given by v(S×T ) = v(S)⊗v(T ) ∈ Rc⊗Rd, and we will
5.2 Graph theoretic preliminaries 181
write (v(S×T ))(i,j) = (v(S))i(v(T ))j . Note that if S and T are stable in F and G respectively,
then S × T is stable in F ∗G.
By analogy with Definition 2.2.25, for A ⊆ Rd+ we define
her(A) = v ∈ Rd+ : ∃u ∈ A such that v ≤ u.
Definition 5.2.1. For convex corners A ⊆ Rm and B ⊆ Rn, write
A⊗max B = her(conv(a⊗ b : a ∈ A, b ∈ B)).
The next lemma concerns the vertex packing polytope of a co-normal product.
Lemma 5.2.2. For graphs F and G, it holds that
VP(F ∗G) = VP(F )⊗max VP(G).
Proof. Let c =∑
i civ(Si) ∈ VP(F ) and d =
∑j djv
(Tj) ∈ VP(G) where the Si and Tj are
stable sets of F and G respectively, and ci, dj ∈ R+ satisfy∑
i ci =∑
j dj = 1. Using that∑i,j cidj = 1 and cidj ≥ 0, it follows that
c⊗ d =∑ij
cidjv(Si) ⊗ v(Tj) =
∑ij
cidjv(Si×Tj) ∈ VP(F ∗G).
Now VP(F ∗G) is convex by definition and hereditary by Lemma 1.3.7, and thus
VP(F )⊗max VP(G) ⊆ VP(F ∗G).
For the reverse inclusion observe by Lemma 1.3.3 that each stable set of F ∗G is contained
in some kernel of the form S × T , where S and T are kernels of F and G respectively. Thus
for each v ∈ VP(F ∗ G), there exist kernels Si of F and kernels Tj of G, and coefficients
αi,j ∈ R+ satisfying∑
i,j αi,j = 1 such that
v ≤∑i,j
αi,jv(Si×Tj) =
∑i,j
αi,jv(Si) ⊗ v(Tj)
∈ conv(c⊗ d : c ∈ VP(F ), d ∈ VP(G)),
giving that v ∈ VP(F )⊗max VP(G).
5.2 Graph theoretic preliminaries 182
Remark 5.2.3. Note that it can hold that
c⊗ d : c ∈ VP(F ), d ∈ VP(G) ( VP(F ∗G).
We offer the following example to illustrate this. Let F and G be complete graphs on vertex
sets f1, f2 and g1, g2 respectively. Then F has kernels F1 = f1 and F2 = f2, and G
has kernels G1 = g1 and G2 = g2. We then have that
v =1
2v(F1×G1) +
1
2v(F2×G2) =
1
2v(F1) ⊗ v(G1) +
1
2v(F2) ⊗ v(G2)
=1
2
(1, 0, 0, 1
)T∈ conv(c⊗ d : c ∈ VP(F ), d ∈ VP(G)) ⊆ VP(F ∗G).
However it is clear that v 6∈ c⊗ d : c ∈ VP(F ), d ∈ VP(G).
Furthermore, the following strict inclusion can hold
conv(c⊗ d : c ∈ VP(F ), d ∈ VP(G)) ( VP(F ∗G).
As an example, take both F and G to be vertex disjoint copies of the path graph on three
vertices, and form their co-normal product F ∗G as below:
• • • •
F ∗G = • ∗ • • • = • • •
• • • •
Being the characteristic vector of a stable set in F ∗G,
v =(
1, 0, 1, 0, 0, 0, 1, 0, 0)T∈ VP(F ∗G),
but it is easily seen that v /∈ conv(a⊗ b : a ∈ VP(F ), b ∈ VP(G)).
Later in the chapter the concept of the lexicographic graph product will be needed.
Definition 5.2.4. ([15, p.17].) For graphs F and G, the lexicographic product F ·G is the
graph with vertex set V (F )× V (G) and in which (i1, j1) ∼ (i2, j2) if and only if i1 ∼ i2 in F
or i1 = i2 and j1 ∼ j2 in G.
Note in general that F ·G G ·F . It is clear that F ·G is a spanning subgraph of F ∗G,
5.2 Graph theoretic preliminaries 183
and that F ·G = F ∗G if and only if F is a complete graph or G is an empty graph.
Lemma 5.2.5. (See [14, Theorem 1].) If K ⊆ V (F ·G), then K is a kernel of F ·G if and
only if K =⋃i∈Si × Ti where S is a kernel of F , and Ti is a kernel of G for each i ∈ S.
Furthermore, α(F ·G) = α(F )α(G).
Proof. That⋃i∈Si × Ti is a kernel is clear. To show the converse we note that if K is a
kernel, then the projection of K onto V (F ) is contained in some kernel S of F . Furthermore,
each element of K with first coordinate i must have second coordinate in some kernel Ti of
G, giving K ⊆⋃i∈Si × Ti. Since K is maximally stable, it follows that K =
⋃i∈Si × Ti.
Now note that if S, T are kernels of F,G respectively, then S × T is a kernel of F · G, and
so α(F · G) ≥ α(F )α(G). However, if kernel K =⋃i∈Si × Ti, then |K| =
∑i∈S |Ti| ≤
α(F )α(G), completing the proof.
We will use the substitution lemma as outlined in [49] and first proved in [25]. Let F and
G be vertex disjoint graphs and v ∈ V (F ). By substituting G for v, as defined in [9, Section
5], is meant deleting from F the vertex v and all its incident edges, and then adding edges
from every vertex of G to those vertices of F which were adjacent to v in F . We denote
the resulting graph by Fv←G. Let p and q be probability distributions on V (F ) and V (G)
respectively. We define the probability distribution pv←q on V (Fv←G) = V (G) ∪ V (F )\v
by
pv←q(x) =
p(x) if x ∈ V (F )\v
p(v)q(x) if x ∈ V (G).
(5.12)
Lemma 5.2.6 (Substitution lemma). ([49, Lemma 3.3], [25].) With the notation above, it
holds that
H(Fv←G, pv←q) = H(F, p) + p(v)H(G, q).
In [9, Section 5] it is shown how repeatedly substituting copies of a graph G for the
vertices of a graph F produces the graph F · G. The lemma below uses this technique to
find the graph entropy of the graph F · G with an arbitrary joint probability distribution r
on V (F ) × V (G). Recall that if r is a probability distribution on V (F ) × V (G), then the
marginal distribution of r on V (F ) is given by p(i) =∑
j∈V (G) r(i, j).
Lemma 5.2.7. For any graphs F and G and probability distribution r on V (F ·G), we have
H(F ·G, r) = H(F, p) +∑
v∈V (F )
p(v)H(G, r(·|v)),
5.2 Graph theoretic preliminaries 184
where p is the marginal distribution of r on V (F ), and r(·|v) denotes the conditional proba-
bility distribution on V (G) given v ∈ V (F ), where r(x|v) = r(v,x)p(v) for x ∈ V (G).
Proof. We take graph F with probability distribution p on V (F ) and |V (F )| vertex disjoint
copies of G, whose vertex sets are each given the probability distribution r(·|v) for a different
v ∈ V (F ). Now, for each v ∈ V (F ) in turn, we substitute for v the copy of G with probability
distribution r(·|v). It is clear the resulting probabilistic graph is (F ·G, r). Lemma 5.2.6 then
gives the required result.
Remark 5.2.8. We note that Lemma 5.2.7 can also be seen to follow from [53, Proposition
4.4].
Lemma 5.2.9. For graphs F and G and probability distribution r on V (F )×V (G), we have
H(F ∗G, r) ≥ H(F ·G, r).
Proof. Given that F ·G is a spanning subgraph of F ∗G, this follows from Lemma 1.3.19.
Lemma 5.2.10. If r is a probability distribution on V (F )×V (G) with marginal distributions
p and q on V (F ) and V (G) respectively, then
H(F ∗G, r) ≤ H(F, p) +H(G, q).
Proof. Let a = (ai)i∈V (F ) ∈ VP(F ) satisfy −∑
i∈V (F ) p(i) log ai = H(F, p) and let b =
(bj)j∈V (G) ∈ VP(G) satisfy −∑
j∈V (G) q(j) log bj = H(G, q). By Lemma 5.2.2, a ⊗ b ∈
VP(F ∗G), and so
H(F ∗G, r) ≤−∑
(i,j)∈V (F∗G)
r(i, j) log(a⊗ b)(i,j)
=−∑
(i,j)∈V (F∗G)
r(i, j) log ai −∑
i,j∈V (F∗G)
r(i, j) log bj
=−∑
i∈V (F )
p(i) log ai −∑
j∈V (G)
q(j) log bj
= H(F, p) +H(G, q).
We note that Lemma 5.2.10 also follows from [53, Proposition 3.7].
Combining Lemmas 5.2.7, 5.2.9 and 5.2.10 gives the following proposition.
5.2 Graph theoretic preliminaries 185
Proposition 5.2.11. If F and G are graphs and r is a probability distribution on V (F ) ×
V (G) with marginal distributions p and q on V (F ) and V (G) respectively, then
H(F ·G, r) = H(F, p) +∑
v∈V (F )
p(v)H(G, r(·|v))
≤ H(F ∗G, r) ≤ H(F, p) +H(G, q).
If the probability distribution r on V (F ) × V (G) is given by r(i, j) = p(i)q(j) for the
marginal probability distributions p and q on V (F ) and V (G) respectively, we say that r is
a product distribution, and we write r = p × q. The next result shows that in the case of
a product distribution, equality holds throughout in Proposition 5.2.11. (We note that the
second equality in the Proposition below is equivalent to [11, Theorem 5.1].)
Proposition 5.2.12. If r is the product distribution on V (F )× V (G) given by r = p× q for
probability distributions p and q on V (F ) and V (G) respectively, then
H(F ·G, r) = H(F ∗G, r) = H(F, p) +H(G, q).
Proof. In this case the conditional probability distribution r(·|i) on V (G) given i ∈ V (F )
satisfies r(j|i) = q(j). For each v ∈ V (F ) we then have H(G, r(·|v)) = H(G, q). Thus in
Proposition 5.2.11 we have∑
v∈V (F ) p(v)H(G, r(·|v)) = H(G, q) and the result follows.
Remark 5.2.13. In [20, Section 1.1, Theorem 1] we have the equivalent result for Shannon
entropies, namely that
H(r) = H(p) +∑v
p(v)H(r(·|v)) ≤ H(p) +H(q),
where equality holds if and only if r is a product distribution. Note, however, that it is not
necessary that r be a product distribution for equality to hold throughout in Proposition
5.2.11. To show this we offer the following example where r 6= p × q, but H(G, r(·|v)) =
H(G, q) for each v ∈ V (F ), whence equality in Proposition 5.2.11 immediately follows. Take
F = K2 with V (F ) = x, y and p(x) = p(y) = 12 . Then let G be the cycle C4 with
V (G) = [4] and let the conditional probability distribution on V (G) given v ∈ V (F ) be r(·|v)
where (r(i|x))4i=1 = (1
4 + ε, 14 ,
14 − ε,
14)T and (r(i|y))4
i=1 = (14 − ε,
14 ,
14 + ε, 1
4)T for 0 < ε < 14 .
Thus q = (14 ,
14 ,
14 ,
14)T , but r 6= p × q. Here G has kernels 1, 3 and 2, 4 and so, for any
5.3 Graph entropy for the source with memory 186
distribution s = (si)i∈[4] on V (G), by Remark 1.3.13
H(G, s) = minα∈[0,1]
−
4∑i=1
si log ai : a = (α, 1− α, α, 1− α)T
.
We now note for any a of the form given above that
−4∑i=1
r(i|x) log ai = −4∑i=1
r(i|y) log ai = −4∑i=1
qi log ai.
We can therefore conclude that H(G, r(·|x)) = H(G, r(·|y)) = H(G, q), giving equality in
Proposition 5.2.11 as claimed.
Corollary 5.2.14. If Gn, the nth co-normal power of G, has the probability distribution pn
on its vertex set given by
pn(i0, . . . , in−1) =
n−1∏k=0
pik ,
where (pi)i∈V (G) is a probability distribution on V (G), then
H(Gn, pn) = nH(G, p).
Proof. Apply an induction argument to Proposition 5.2.12.
Remark 5.2.15. This also follows directly from the expression for graph entropy given in
Definition 1.3.5:
H(Gk, pk) = limn→∞
1
nlog min
χ(GnkE ) : E ⊆ X nk, pnk(E) > 1− λ
= lim
N→∞
k
Nlog min
χ(GNE ) : E ⊆ XN , pN (E) > 1− λ
= kH(G, p),
where λ ∈ (0, 1).
5.3 Graph entropy for the source with memory
The necessary background is now in place for us to generalise the theory of the source with
memory as described in Section 5.1 to the situation of partial distinguishability. As there we
work in probability space (Ω = X Z,F , P ) where X is a fixed, finite alphabet, and F is the
σ-algebra on Ω generated by the cylinders.
First it is necessary to formalise the concept of distinguishability.
5.3 Graph entropy for the source with memory 187
Definition 5.3.1. We take distinguishability to be a symmetric but not necessarily transitive
relation on Ω = X Z, and we construct an infinite graph G, known as the distinguishability
graph on Ω, with V (G) = Ω. For ω, ω′ ∈ V (G) we set ω ∼ ω′ in G when ω and ω′ are
distinguishable. Sets A,B ⊆ Ω are said to be distinguishable when a ∼ b for all a ∈ A
and b ∈ B, or equivalently when A × B ⊆ E(G). (Note that distinguishable sets are then
necessarily disjoint.) If all distinct ω, ω′ ∈ Ω are distinguishable, then we say graph G is
complete, and then all disjoint subsets of Ω are distinguishable.
Recall it was required that the shift transformation T be measure preserving, in order
that probabilities are unchanged by a shift of the time origin. In the same way we desire
that distinguishability is unaffected by a shift of the time origin; that is, we require that
ω ∼ ω′ in G if and only if Tω ∼ Tω′ in G. With this condition satisfied we say that T is
distinguishability preserving. If T is distinguishability preserving, then for all n ∈ Z,
ω ∼ ω′ in G ⇐⇒ Tnω ∼ Tnω′ in G. (5.13)
When (5.13) holds, we also say that graph G is shift invariant. Throughout the sequel it is
assumed that P is a probability measure on (Ω,F) and G a distinguishability graph on Ω
such that T is both measure preserving and distinguishability preserving.
5.3.1 The graph G[B] and its graph entropy
In Section 5.1 progress was made by considering atoms of finite subalgebras of F , and it
seems natural to use a similar approach here.
Definition 5.3.2. For a finite subalgebra B ⊂ F and a graph G as defined in Definition
5.3.1, we define G[B] to be the graph with vertex set at(B) = B1, . . . , Bk, and such that
Bi ∼ Bj in G[B] when Bi ×Bj ⊆ E(G), that is when Bi and Bj are distinguishable.
If B ⊂ F is a finite subalgebra, then a probability measure P on (Ω,F) induces a prob-
ability distribution on V (G[B]) = at(B), and we denote the resulting probabilistic graph by
(G[B], P ). The graph entropy H(G[B], P ) is now defined as in Section 1.3. If the atoms of B
are all mutually distinguishable, that is G[B] ∼= K| at(B)|, then (1.21) on page 19 and Remark
5.1.2 give that
H(G[B], P ) = H(B), (5.14)
and H(G[B], P ) is equal to the Shannon entropy of the probability distribution on at(B)
5.3 Graph entropy for the source with memory 188
induced by P . In general Lemma 1.3.19 and (1.3) on page 3 give that
H(G[B], P ) ≤ H(B) ≤ log | at(B)|. (5.15)
The next two straightforward propositions generalise Theorems 5.1.3 (i) and 5.1.4 (i)
respectively to establish some basic properties of H(G[B], P ).
Proposition 5.3.3. For any finite subalgebra B ⊂ F and n ∈ Z, we have
H(G[TnB], P ) = H(G[B], P ).
Proof. We have V (G[B]) = at(B) and V (G[TnB]) = at(TnB) = Tn at(B). Now let B,C ∈
at(B). Since T preserves distinguishability, it holds that B ∼ C in G[B] if and only if
TnB ∼ TnC in G[TnB], and graphs G[B] and G[TnB] are isomorphic. Furthermore, T
preserves measure, and hence P (TnB) = P (B) for all B ∈ at(B). The result follows.
Proposition 5.3.4. If B, C ⊂ F are finite subalgebras satisfying B ⊆ C, then
H(G[B], P ) ≤ H(G[C], P ).
Proof. As B ⊆ C, each atom of B is a union of atoms of C. Lemma 1.3.27 shows that null
vertices can be ignored, so without loss of generality choose B ∈ at(B) with P (B) > 0 and let
B =⋃nj=1Cj where C1, . . . , Cn ∈ at(C). Let F be the empty graph with vertices C1, . . . , Cn.
We give F the probability distribution Q on its vertices, where Q(Cj) =P (Cj)
P (B) . In graph G[B]
we substitute the graph F for vertex B to form the graph G [B]B←F ; note from (5.12) that
the substitution algorithm gives to the vertex set of this graph the probability distribution
induced by P . We apply Lemma 5.2.6 to yield
H (G [B]B←F , P ) = H(G[B], P ).
Observe for any D ⊆ Ω that if B ×D ⊆ E(G) then Cj ×D ⊆ E(G) for all j = 1, . . . , n, so
the edges created by the substitution are between distinguishable elements of F . Repeating
for each atom of B thus yields a spanning subgraph of G[C] with graph entropy equal to that
of G[B]. The result follows by the monotonicity result in Lemma 1.3.19.
Corollary 5.3.5. For all finite subalgebras B, C ⊂ F ,
H(G[B ∨ C], P ) ≥ H(G[C], P ).
5.3 Graph entropy for the source with memory 189
Proof. Since C ⊆ B ∨ C, this follows immediately from Proposition 5.3.4.
Remark 5.3.6. Although Definition 5.3.2 and the results above apply to any finite subalgebra
B ⊂ F and distinguishability graph G on Ω, we will often wish to impose further conditions:
(i) We will often specify that G, the distinguishability graph on Ω, arises from a distinguisha-
bility relation on X as follows. Let G0 be a distinguishability graph on the alphabet X as
described in Section 1.3. For ω, ω′ ∈ Ω we set ω ∼ ω′ in G if and only if ωk ∼ ω′k in G0
for some k ∈ Z. The distinguishability graph G on Ω is then given by the infinite co-normal
product G = . . . ∗G0 ∗G0 ∗G0 ∗ . . ., which we will denote by GZ0 . If G = GZ0 , it is clear that
G[TnA0] ∼= G0 for all n ∈ Z, where A0 is the time-0 subalgebra.
(ii) It may be desired to consider finite subalgebras of a more specific form. In Section 5.1
it was argued that the ‘physical’ case concerns finite subalgebras which are contained in
the algebra F0 =⋃∞n=0
∨ni=−n T
iA0. We narrow our focus further to examine the set S of
finite subalgebras of the form∨nk=1 T
−ikA0. These subalgebras are of particular physical
significance because if B ∈ S, then at(B) will be the set of cylinders of the form given
in (5.1). Let G = GZ0 and consider finite subalgebra B =∨nk=1 T
−ikA0 ∈ S with atoms
Bj1,...,jn = ω : ωi1 = j1, . . . , ωin = jn, with jk ∈ X for all k ∈ [n]. It is clear that
Bj1,...,jn × Bk1,...,kn ⊆ E(G) if and only if jl ∼ kl in G0 for some l ∈ [n]. We say that sets
A,B ⊆ Ω are distinguishable on coordinate j ∈ Z if ωj ∼ ω′j in G0 for all ω ∈ A and ω′ ∈ B.
Thus, if B ∈ S and G = GZ0 , then A,B ∈ at(B) satisfy A ∼ B in G[B] if and only if A and
B are distinguishable on at least one coordinate.
For B,C ∈ F where P (B) 6= 0, we write P (C|B) = P (B∩C)P (B) , and note that P (·|B) is then
a probability measure on (Ω,F).
Definition 5.3.7. By analogy with the definition of the conditional entropy H(C|B), we
define the conditional graph entropy of C given B by
H(G[C|B], P ) =∑
B∈at(B):P (B)>0
P (B)H(G[C], P (·|B)).
Note that if P (B ∩C) = P (B)P (C) for all B ∈ at(B) and C ∈ at(C), we have P (C|B) =
P (C) and H(G[C|B], P ) = H(G[C], P ).
We will call B =∨nk=1 T
−ikA0 ∈ S the subalgebra over coordinate set SB = i1, . . . , in.
It is clear that when B, C ∈ S, then B ∨ C ∈ S and SB∨C = SB ∪ SC . Under the ‘physical’
conditions of Remark 5.3.6, we have the following analogue of Theorem 5.1.5.
5.3 Graph entropy for the source with memory 190
Proposition 5.3.8. When B, C ∈ S and G = GZ0 ,
H(G[B], P ) +H(G[C|B], P ) ≤ H(G[B ∨ C], P ) ≤ H(G[B], P ) +H(G[C], P ). (5.16)
Furthermore, if P (B ∩C) = P (B)P (C) for all B ∈ at(B) and C ∈ at(C), then equality holds
throughout in (5.16).
Proof. The atoms of B∨C, and hence the vertices of G[B∨C], are the non-empty intersections
B ∩ C where B ∈ at(B) and C ∈ at(C). Whereas,
V (G[B] ∗G[C]) = (B,C) : B ∈ at(B), C ∈ at(C).
We equip V (G[B] ∗G[C]) with the probability distribution r where
r((B,C)) = P (B ∩ C). (5.17)
We now show that (G[B]∗G[C], r) and (G[B∨C], P ) are related in the way (G, p) and (G′, p′)
are related in Lemma 1.3.27.
Any vertex (B,C) of G[B] ∗ G[C] satisfying B ∩ C = ∅ will have measure r((B,C)) =
P (B ∩ C) = 0. As in Lemma 1.3.27, we form (G[B] ∗ G[C])′ by deleting from G[B] ∗ G[C]
all vertices (B,C) with B ∩ C = ∅, along with their incident edges. We identify vertex
(B,C) ∈ V ((G[B] ∗ G[C])′) with vertex B ∩ C ∈ V (G[B ∨ C]) to put V ((G[B] ∗ G[C])′) and
V (G[B ∨ C]) in a natural one-to-one correspondence. Let Bi, Bj ∈ at(B) and Ck, Cl ∈ at(C)
satisfy Bi ∩ Ck 6= ∅ and Bj ∩ Cl 6= ∅. We claim the following are equivalent:
(1) (Bi, Ck) ∼ (Bj , Cl) in (G[B] ∗G[C])′;
(2) (Bi ∩ Ck) ∼ (Bj ∩ Cl) in G[B ∨ C].
To see this, first suppose that (1) holds. Then either Bi ∼ Bj in G[B], giving Bi ×
Bj ⊆ E(G), or Ck ∼ Cl in G[C], giving Ck × Cl ⊆ E(G). This is sufficient to show that
(Bi ∩ Ck)× (Bj ∩ Cl) ⊆ E(G), and hence (2) holds.
To prove the reverse implication, let B, C ∈ S be the subalgebras over the coordinate sets
SB and SC respectively. Then B ∨ C is the subalgebra over SB ∪ SC . Then (2) implies that
(Bi∩Ck)× (Bj ∩Cl) ⊆ E(G), and as noted in Remark 5.3.6 (ii), there exists t ∈ SB∪SC such
that Bi ∩ Ck and Bj ∩ Cl are distinguishable on coordinate t. If t ∈ SB, we have Bi ∼ Bj
in G[B], and if t ∈ SC , we have Ck ∼ Cl in G[C]. In either case, (Bi, Ck) ∼ (Bj , Cl) in
(G[B] ∗G[C])′, and (1) holds, as required.
5.3 Graph entropy for the source with memory 191
Thus the graphs (G[B] ∗ G[C])′ and G[B ∨ C] are isomorphic, and recalling (5.17) and
regarding r as a probability distribution on V ((G[B] ∗G[C])′), we have
H((G[B] ∗G[C])′, r) = H(G[B ∨ C], P ).
Lemma 1.3.27 then gives
H(G[B ∨ C], P ) = H(G[B] ∗G[C], r). (5.18)
Note that the marginal distributions of r on V (G[B]) = at(B) and on V (G[C]) = at(C)
are both those induced by P and that r(C|B) = P (C|B). The result then follows from
Propositions 5.2.11 and 5.2.12, using Definition 5.3.7.
Remark 5.3.9. Note that the equality of Theorem 5.1.5 becomes an inequality in Proposition
5.3.8. This results in conditional graph entropy lacking some of the useful properties, and
hence also the applications, of conditional entropy; see Remark 5.3.21.
Although in the proof of Proposition 5.3.8 it holds that (1) ⇒ (2) without the conditions
B, C ∈ S and G = GZ0 , a simple counter example shows that in general the reverse implication
does not apply. Let Ω = X Z where X = 1, 2, 3, 4. We define finite subalgebras B, C /∈ S by
at(B) = B1, B2, at(C) = C1, C2 where
B1 = ω : ω0 = 1 or 2, B2 = ω : ω0 = 3 or 4,
C1 = ω : ω0 = 1 or 3, C2 = ω : ω0 = 2 or 4.
Let G = GZ0 where G0 is the graph below.
• 4
• 2
• 3
• 1
We have that B1 B2 in G[B] and C1 C2 in G[C], and so
G[B] ∼= G[C] ∼= K2, and G[B] ∗G[C] ∼= K4.
5.3 Graph entropy for the source with memory 192
(As no Bi ∩ Cj = ∅, we need not delete any vertices to form (G[B] ∗G[C])′.) However,
B1 ∩ C1 = ω : ω0 = 1, B1 ∩ C2 = ω : ω0 = 2,
B2 ∩ C1 = ω : ω0 = 3, B2 ∩ C2 = ω : ω0 = 4,
and thus the graph G[B ∨ C] is as shown below.
•B2 ∩ C1
•B1 ∩ C2
•B2 ∩ C2
•B1 ∩ C1
By defining a new type of infinite product graph, we give a further example of where G[B∨C]
and G[B] ∗G[C] differ.
Definition 5.3.10. For given distinguishability graph G0 on X , let the threshold-t co-normal
product graph GZ,t0 have vertex set Ω = X Z and satisfy ω ∼ ω′ in GZ,t0 when ωi ∼ ω′i in G0
for at least t distinct coordinates.
(It is clear that the graph GZ,t0 is shift invariant. Also note that GZ,10 = GZ0 .)
Now take G = GZ,20 , and let X = a, b, c and G0 be the path graph below:
•a •b •c
Clearly G[A0] ∼= G[T−1A0] ∼= K3, and thus G[A0] ∗G[T−1A0] ∼= K9. However, A0 ∨ T−1A0
has atoms of the form ω : ω0 = i, ω1 = j with i, j ∈ X and we note, for instance, that
ω : ω0 = ω1 = a ∼ ω : ω0 = ω1 = b, and hence G[A0 ∨ T−1A0] is non-empty. Indeed,
letting the vertex labelled ij denote the atom ω : ω0 = i, ω1 = j, it is easy to verify that
G[A0 ∨ T−1A0] is as shown:
•aa •ab •ac
•ba •bb •bc
•ca •cb •cc
5.3 Graph entropy for the source with memory 193
5.3.2 The quantity h(G[B], T ) and its properties
We now generalise the quantity h(B, T ) as defined in Definition 5.1.1 (iii) to the context of a
distinguishability graph G on Ω in probability space (Ω,F , P ).
Definition 5.3.11. We define the graph entropy of finite subalgebra B ⊂ F relative to T by
h(G[B], T ) = lim supn→∞
1
nH
(G
[n−1∨k=0
T−kB
], P
).
The next two propositions establish a monotonicity condition and an upper bound on
h(G[B], T ). (Proposition 5.3.12 can be seen as an analogue of Theorem 5.1.4 (ii).)
Proposition 5.3.12. If B ⊆ C, then h(G[B], T ) ≤ h(G[C], T ).
Proof. If B ⊆ C, then∨k−1i=0 T
−iB ⊆∨k−1i=0 T
−iC, and by Proposition 5.3.4
H
(G
[n−1∨k=0
T−kB
], P
)≤ H
(G
[n−1∨k=0
T−kC
], P
)
for all n ∈ N. The result follows by Definition 5.3.11.
Proposition 5.3.13. For any finite subalgebra B ⊂ F and n ∈ N, we have
h(G[B], T ) ≤ h(B, T ) ≤ log | at(B)|.
Proof. Noting that | at(∨n−1k=0 T
−kB)| ≤ | at(B)|n, we apply (5.15) on page 188 to give
H
(G
[n−1∨k=0
T−kB
], P
)≤ H
(n−1∨k=0
T−kB
)≤ n log | at(B)|,
whence the result follows on dividing by n and taking limits superior.
Remark 5.3.14. If G is complete, then all the atoms of the finite subalgebra B ⊂ F are
mutually distinguishable, and by (5.14) on page 187 we have H(G[∨n−1
k=0 T−kB
], P)
=
H(∨n−1
k=0 T−kB
)for all n ∈ N. We refer to this as the ‘complete case’, when it is easily seen
that h(G[B], T ) = h(B, T ).
We now use Fekete’s Lemma to show that the limit superior in Definition 5.3.11 becomes
a limit under conditions (i) and (ii) of Remark 5.3.6.
5.3 Graph entropy for the source with memory 194
Proposition 5.3.15. For a finite subalgebra B ∈ S and G = GZ0 we have
h(G[B], T ) = limn→∞
1
nH
(G
[n−1∨k=0
T−kB
], P
).
Proof. Setting am = H(G[∨m−1k=0 T
−kB], P ), we have
am+n = H
(G
[m+n−1∨k=0
T−kB
], P
)
= H
(G
[(m−1∨k=0
T−kB
)∨
(m+n−1∨k=m
T−kB
)], P
)
≤ H
(G
[m−1∨k=0
T−kB
], P
)+H
(G
[T−m
(n−1∨k=0
T−kB
)], P
)
= H
(G
[m−1∨k=0
T−kB
], P
)+H
(G
[n−1∨k=0
T−kB
], P
)
= am + an,
where we have used Propositions 5.3.8 and 5.3.3. Thus the sequence (an)n∈N is sub-additive
and the result follows from Lemma 3.1.6.
Under the same ‘physical’ conditions, the sub-additivity of h(G[B], T ) follows from that
of H(G[B], P ) as given in Proposition 5.3.8.
Proposition 5.3.16. When B, C ∈ S and G = GZ0 ,
h(G[B ∨ C], T ) ≤ h(G[B], T ) + h(G[C], T ).
Proof. Applying Proposition 5.3.8 gives
H
(G
[n−1∨k=0
T−k(B ∨ C)
], P
)= H
(G
[n−1∨k=0
T−kB ∨n−1∨k=0
T−kC
], P
)
≤ H
(G
[n−1∨k=0
T−kB
], P
)+H
(G
[n−1∨k=0
T−kC
], P
).
The result follows on dividing by n and taking limits.
The proof of Theorem 5.1.3 (ii) given in [4, B3] can easily be extended to the graph setting
to yield the next lemma, which will be used in our generalisation of the Kolmogorov–Sinai
theorem.
5.3 Graph entropy for the source with memory 195
Lemma 5.3.17. For any finite subalgebra B ⊂ F and u, v ∈ Z with u ≤ v, it holds that
h
G v∨j=u
T−jB
, T = h(G[B], T ).
Proof. First note that
n−1∨i=0
T−i
v∨j=u
T−jB
=
v∨j=u
T−jB
∨ v∨j=u
T−j−1B
∨ . . . ∨ v∨j=u
T−j−n+1B
=T−u
(n+v−u−1∨
k=0
T−kB
).
Then
H
Gn−1∨i=0
T−i
v∨j=u
T−jB
, P =H
(G
[T−u
(n+v−u−1∨
k=0
T−kB
)], P
)
=H
(G
[n+v−u−1∨
k=0
T−kB
], P
),
where we have used Proposition 5.3.3. Then
1
nH
Gn−1∨i=0
T−i
v∨j=u
T−jB
, P
=n+ v − u
n
1
n+ v − uH
(G
[n+v−u−1∨
k=0
T−kB
], P
),
and taking limits superior as n→∞ yields the result. (Recall that when limn→∞(an) exists,
lim supn→∞(anbn) = limn→∞(an) lim supn→∞(bn).)
5.3.3 Generalising the Kolmogorov–Sinai Theorem
We now define a quantity h(G,T ) which generalises the Kolmogorov–Sinai entropy h(T ) to the
graph setting. We will discuss the Bernoulli and Markov shifts, and a notion of isomorphism
will be introduced, under which we will establish the invariance of h(G,T ).
Definition 5.3.18. Given a graph G on Ω and probability space (Ω,F , P ), we define the
graph entropy of the shift T by
h(G,T ) = suph(G[B], T ) : B ⊆ F0 is a finite subalgebra of F.
5.3 Graph entropy for the source with memory 196
The reader will notice that, although h(T ) was defined by a supremum over finite subal-
gebras of F , we have defined h(G,T ) by a supremum over finite subalgebras of F contained
in F0, as given in (5.2) on page 177. The exclusion of finite subalgebras not in F0 will be dis-
cussed in Remark 5.3.21 and Question 5.4.1. For now, note that since the time-0 subalgebra
A0 ⊂ F0 and since (5.7) on page 178 gives h(A0, T ) = h(T ), it is certainly true that
h(T ) = suph(B, T ) : B ⊆ F0 is a finite subalgebra of F.
(It can also be pointed out that, in only considering finite subalgebras of F , the definition of
h(T ) also excludes many finite subalgebras; here we have simply enlarged the set of excluded
subalgebras.)
The Kolmogorov–Sinai Theorem (Theorem 5.1.6) has the following analogue in this set-
ting.
Theorem 5.3.19. Suppose a finite subalgebra B ⊆ F0 satisfies
∞⋃n=0
n∨k=−n
T kB = F0.
Then h(G,T ) = h(G[B], T ).
Proof. It is clearly sufficient to prove that h(G[C], T ) ≤ h(G[B], T ) for any finite subalgebra
C ⊂ F0. To this end, observe that if a finite subalgebra C ⊂ F0, then C ⊆∨nk=−n T
kB for
some n ∈ N. Proposition 5.3.12 and Lemma 5.3.17 then give that
h(G[C], T ) ≤ h
(G
[n∨
k=−nT kB
], T
)= h(G[B], T ),
as required.
The next result, analogous to Proposition 5.1.7, shows that the supremum in Definition
5.3.18 is achieved by any element of S; this makes the calculation of h(G,T ) feasible in some
cases.
Proposition 5.3.20. It holds that h(G[A0], T ) = h(G,T ) where A0 is the time-0 subalgebra.
Furthermore, h(G[B], T ) = h(G,T ) for all B ∈ S.
Proof. The first assertion is immediate from Theorem 5.3.19 and the definition of F0 in
(5.2). Then note for arbitrary B ∈ S that we have B ⊂ F0, and so by Definition 5.3.18
5.3 Graph entropy for the source with memory 197
h(G[B], T ) ≤ h(G,T ). However, we also have that TnA0 ⊆ B for some n ∈ Z, and so
h(G,T ) = h(G[A0], T ) = h(G[TnA0], T ) ≤ h(G[B], T )
by Lemma 5.3.17 and Proposition 5.3.12, and the proof is complete.
Remark 5.3.21. The motivation for the exclusion of subalgebras not in F0 in the definition of
h(G,T ) is now clear; our proof of Theorem 5.3.19 does not yield an upper bound on h(G[C], T )
for a finite subalgebra C ⊂ F not contained in F0. The standard proof of the Kolmogorov–
Sinai Theorem [4, Theorem 7.1] uses properties of conditional entropies to consider the general
case of C ⊂ F , but the issue mentioned in Remark 5.3.9, where we noted that the equality
in Theorem 5.1.5 becomes an inequality in the graph case, seems to preclude employing an
equivalent strategy in this case.
By Proposition 5.3.13, for any distinguishability graph G on Ω, we have
h(G,T ) ≤ h(T ). (5.19)
In the case that G is complete, Remark 5.3.14 and (5.7) on page 178 give that h(G[A0], T ) =
h(A0, T ) = h(T ) and hence in the complete case
h(G,T ) = h(T ). (5.20)
Thus, as would be expected, when G is complete, the situation reduces to that considered in
Section 5.1.
Let p = (pi)i∈X be the probability distribution on X given by pi = P (ω : ω0 = i) for
each i ∈ X . Trivially, H(A0) = H(p) and if G = GZ0 , we have H(G[A0], P ) = H(G0, p). By
Theorems 5.1.5 and 5.1.3,
H
(n−1∨i=0
T−iA0
)≤
n−1∑i=0
H(T−iA0) = nH(A0) = nH(p),
whence by (5.7) and Definition 5.1.1 (iii) we have,
h(T ) = h(A0, T ) ≤ H(p).
It is straightforward to generalise these ideas to the graph case.
Lemma 5.3.22. When G = GZ0 and p = (pi)i∈X is the probability distribution on X given
5.3 Graph entropy for the source with memory 198
by pi = P (ω : ω0 = i), we have
h(G,T ) = h(G[A0], T ) ≤ H(G0, p).
Proof. By Proposition 5.3.8 and Proposition 5.3.3, we have
H
(G
[n−1∨k=0
T−kA0
], P
)≤ nH(G[A0], P ) = nH(G0, p),
whence Proposition 5.3.20 and Definition 5.3.11 complete the proof.
The following proposition generalises (5.9) on page 179 and shows that if T is the p-
Bernoulli shift, then equality holds in Lemma 5.3.22, and h(G,T ) reduces to the graph entropy
of the probabilistic graph (G0, p). (As a Bernoulli shift corresponds to an i.i.d. source, this
is as would intuitively be expected.)
Proposition 5.3.23. Let G = GZ0 , and let T be the p-Bernoulli shift where p is a probability
distribution on V (G0). Then h(G,T ) = H(G0, p).
Proof. The proof proceeds as for Lemma 5.3.22, but (5.8) on page 179 shows that in this case
the condition for equality in Proposition 5.3.8 is fulfilled.
We now consider the graph entropy of the (Π, p)-Markov shift to generalise (5.11) on page
179.
Proposition 5.3.24. Let G = GZ0 and T be the (Π, p)-Markov shift. Set X = [d] and let µ(i)
denote the probability distribution given by the ith row of Π = (pij), that is µ(i)j = pij. Then
d∑i=1
piH(G0, µ(i)) ≤ h(G,T ) ≤ H(G0, p).
Proof. The upper bound on h(G,T ) is given by Lemma 5.3.22. For the lower bound, we write
An =∨nk=0 T
−kA0 and for n ≥ 1 apply Proposition 5.3.8 to An = An−1 ∨ T−nA0 to give
H(G[An−1], P ) +∑
A∈at(An−1)
P (A)H(G[T−nA0], P (·|A)
)≤ H(G[An], P ). (5.21)
Now if X = ω : ωn = j ∈ at(T−nA0) and ω : ωn−1 = i ⊇ A ∈ at(An−1), then (5.10) on
page 179 gives P (X|A) = pij = µ(i)j and so
H(G[T−nA], P (·|A)
)= H(G0, µ
(i)).
5.3 Graph entropy for the source with memory 199
In (5.21) for each i ∈ [d] we sum over those A ∈ at(An−1) contained in ω : ωn−1 = i to
leave
H(G[An−1], P ) +
d∑i=1
piH(G0, µ(i)) ≤ H(G[An], P ).
Rearranging gives
d∑i=1
piH(G0, µ(i)) ≤ H(G[An], P )−H(G[An−1], P ),
and summing over n from 1 to N − 1 yields
(N − 1)
d∑i=1
piH(G0, µ(i)) ≤ H(G[AN−1], P )−H(G[A0], P ).
Using that h(G,T ) = h(G[A0], T ) = limN→∞1NH(G[AN−1], P ) yields the required result.
Remark 5.3.25. Note that setting pij = pj for all i, j ∈ [d] gives µ(i) = p for all i and we have
the p-Bernoulli shift. In this case Proposition 5.3.24 immediately gives h(G,T ) = H(G0, p),
agreeing with Proposition 5.3.23.
We have discussed the concept of isomorphism for systems of the form (Ω,F , P, T ); we
now extend this to systems of the form (Ω = X Z,F , P, T,G). First we define an isomorphism
in this setting, and we then show that h(G,T ) is invariant under such isomorphisms. (For
the system (Ω′ = X ′Z,F ′, P ′, T ′, G′) we denote the time-0 subalgebra by A′0, and write
F ′ =∨∞n=−∞ T
′nA′0 and F ′0 =⋃∞n=0
∨ni=−n T
′iA′0.)
Definition 5.3.26. We say systems (Ω = X Z,F , P, T,G) and (Ω′ = X ′Z,F ′, P ′, T ′, G′) are
isomorphic if there exists a bijection φ : Ω→ Ω′ with the following properties:
(i) For A ⊆ Ω, we have φ(A) ∈ F ′ if and only if A ∈ F , in which case P (A) = P ′(φ(A));
(ii) φ(Tω) = T ′φ(ω) for all ω ∈ Ω;
(iii) For all ω, ω ∈ Ω, it holds that ω ∼ ω in G if and only if φ(ω) ∼ φ(ω) in G′;
(iv) For all B ⊆ Ω it holds that B ∈ F0 if and only if φ(B) ∈ F ′0.
In this case we write (Ω,F , P, T,G) ∼= (Ω′,F ′, P ′, T ′, G′) and say that φ is an isomorphism.
The first two properties are the conditions for T ∼= T ′ as in Definition 5.1.8. Condition
(iii) means that φ preserves distinguishability in the sense that for A,B ∈ F it holds that
5.3 Graph entropy for the source with memory 200
A × B ⊆ E(G) if and only if φ(A) × φ(B) ⊆ E(G′). Condition (iv) ensures that a finite
subalgebra B of F is contained in F0 if and only if φ(B) is contained in F ′0. This is necessary
to ensure the invariance of h(G,T ) under isomorphism, where we recall that h(G,T ) is a
supremum over the finite subalgebras of F0.
We consider a straightforward example. Suppose that (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′) under
an isomorphism φ in the sense of Definition 5.1.8 which satisfies (iv) in Definition 5.3.26. Then
given any shift invariant graph G on Ω, we can form G′ on Ω′ such that (Ω,F , P, T,G) ∼=
(Ω′,F ′, P ′, T ′, G′) simply by letting Definition 5.3.26 (iii) define G′. Defining G′ in this way
and using the shift invariance of G, the following straightforward chain of equivalences for
ω, ω ∈ Ω shows that the resulting graph G′ on Ω′ is shift invariant:
φ(ω) ∼ φ(ω) in G′ ⇐⇒ ω ∼ ω in G ⇐⇒ Tnω ∼ Tnω in G
⇐⇒ φ(Tnω) ∼ φ(Tnω) in G′ ⇐⇒ T ′nφ(ω) ∼ T ′nφ(ω) in G′,
where the last step used condition (ii) of Definition 5.1.8.
Lemma 5.3.27. As defined in Definition 5.3.26, isomorphism is an equivalence relation.
Proof. The argument given in [4, Remark 3, p.54] for isomorphisms as in Definition 5.1.8
extends straightforwardly to this situation. That isomorphism is symmetric and reflexive is
clear; it remains to prove transitivity.
Suppose that
(1) (Ω,F , P, T,G) ∼= (Ω′,F ′, P ′, T ′, G′) under isomorphism φ : Ω→ Ω′, and
(2) (Ω′,F ′, P ′, T ′, G′) ∼= (Ω′′,F ′′, P ′′, T ′′, G′′) under isomorphism ψ : Ω′ → Ω′′.
We now show it follows that (Ω,F , P, T,G) ∼= (Ω′′,F ′′, P ′′, T ′′, G′′) under isomorphism
ψ φ : Ω → Ω′′ by demonstrating that the bijection ψ φ satisfies the conditions (i) to (iv)
of Definition 5.3.26. Observe the following:
(i) A ∈ F ⇐⇒ φ(A) ∈ F ′ ⇐⇒ ψ φ(A) ∈ F ′′, and when these hold, P (A) = P ′(φ(A)) =
P ′′(ψ φ(A));
(ii) ψ φ(Tω) = ψ(T ′φ(ω)) = T ′′ψ φ(ω) :
(iii) ω ∼ ω in G ⇐⇒ φ(ω) ∼ φ(ω) in G′ ⇐⇒ ψ φ(ω) ∼ ψ φ(ω) in G′′;
(iv) B ∈ F0 ⇐⇒ φ(B) ∈ F ′0 ⇐⇒ ψ φ(B) ∈ F ′′0 .
Just as h(T ) is invariant under isomorphisms as defined in Definition 5.1.8, the graph
entropy h(G,T ) is invariant under isomorphisms as defined in Definition 5.3.26.
5.3 Graph entropy for the source with memory 201
Proposition 5.3.28. If (Ω,F , P, T,G) ∼= (Ω′,F ′, P ′, T ′, G′) under isomorphism φ, then
h(G,T ) = h(G′, T ′).
Proof. We can refine the method in [20, Section 1.3, Theorem 14]. Consider an arbi-
trary finite subalgebra B of F which is contained in F0, and let at(B) = B1, . . . , Bm.
Then by property (iv) of Definition 5.3.26, B′ = φ(B) is a finite subalgebra of F ′ which
is contained in F ′0 with at(B′) = φ(B1), . . . , φ(Bm). The atoms of∨n−1k=0 T
−kB are the
non-empty sets⋂n−1k=0 T
−kBik , ik ∈ [m], and the atoms of∨n−1k=0 T
′−kB′ are the non-empty
sets⋂n−1k=0 T
′−kφ(Bik) = φ(⋂n−1k=0 T
−kBik), ik ∈ [m]. Thus φ puts the atoms of∨n−1k=0 T
−kB
and∨n−1k=0 T
′−kB′ into a one-to-one correspondence, and Definition 5.3.26 (iii) ensures that
G[∨n−1k=0 T
−kB] ∼= G′[∨n−1k=0 T
′−kB′]. Furthermore, by Definition 5.3.26 (i), corresponding
atoms of∨n−1k=0 T
−kB and∨n−1k=0 T
′−kB′ have equal measure under P and P ′ respectively.
It thus holds for all n ∈ N that
H
(G
[n−1∨k=0
T−kB
], P
)= H
(G′
[n−1∨k=0
T ′−kB′], P ′
),
and so
h(G[B], T ) = h(G′[B′], T ′).
We can conclude from Definition 5.3.18 that h(G′, T ′) ≥ h(G,T ). An equivalent argument
gives h(G,T ) ≥ h(G′, T ′), and the proof is complete.
Unlike the Kolmogorov–Sinai entropy h(T ), however, the graph entropy h(G,T ) is not a
complete invariant among the Bernoulli shifts.
Proposition 5.3.29. If systems (Ω,F , P, T,G) and (Ω′,F ′, P ′, T ′, G′) satisfy h(G,T ) =
h(G′, T ′) where T and T ′ are Bernoulli shifts, it does not follow that (Ω,F , P, T,G) ∼=
(Ω′,F ′, P ′, T ′, G′).
Proof. Recall from Proposition 5.3.23 that if T is the p-Bernoulli shift and G = GZ0 , then
h(G,T ) = H(G0, p). Our proposition is thus proved by finding a graph G0 with probability
distribution p on its vertex set X , and a graph H0 with probability distribution q on its vertex
set Y, such that H(G0, p) = H(H0, q), but such that
(Ω = X Z,F , P, T,G = GZ0 ) (Ω′ = YZ,F ′, P ′, T ′, G′ = HZ0 ), (5.22)
5.3 Graph entropy for the source with memory 202
where T is the p-Bernoulli shift and T ′ the q-Bernoulli shift. Given that the conditions for
isomorphism in Definition 5.1.8 are also conditions for isomorphism in Definition 5.3.26, it is
clear that (5.22) holds when
(Ω,F , P, T ) (Ω′,F ′, P ′, T ′). (5.23)
Now (5.9) gives h(T ) = H(p) when T is the p-Bernoulli shift, and Theorem 5.1.10 shows
that h(T ) is a complete invariant among the Bernoulli shifts. Thus, if H(p) 6= H(q), we
can conclude that (5.23) and therefore (5.22) hold. So the proposition is proved if we can
find probabilistic graphs (G0, p) and (H0, q) satisfying H(G0, p) = H(H0, q) but such that
H(p) 6= H(q).
We give the following example. Consider the graph G0 = K2 below
•
•
and set p = (1/2, 1/2). Definition 1.1.1 and (1.21) give that H(G0, p) = H(p) = 1.
Then let H0 be the graph K2 ·K2, that is
• •
• •
and set q = (1/4, 1/4, 1/4, 1/4), giving that H(q) = 2. Probabilistic graph (H0, q) can be
formed from the probabilistic graph (K2, p) by substituting for each vertex of K2 a copy of
K2 with probability distribution p on its vertex set according to Lemma 5.2.7 with r = p×p,
which then gives H(H0, q) = H(K2, p) +H(K2, p) = 1.
Much of our work thus far has considered the graph G = GZ0 , where G0 is a fixed graph on
vertex set X . However, there exist many shift invariant graphs G not of this form, for example
the graph G = GZ,t0 as defined in Definition 5.3.10. Further work could be undertaken to
analyse such graphs; here we work towards one straightforward, but interesting, result.
Recall that if G0 is complete, (5.20) gives that h(GZ0 , T ) = h(T ). More generally we have
the following.
Lemma 5.3.30. If G0 is the complete graph, then for all t ∈ N we have h(GZ,t0 , T ) = h(T ).
5.4 Further questions 203
Proof. Let G = GZ,t0 where G0 is the complete graph with vertex set X , and write An−1 =∨n−1k=0 T
−kA0. The atoms of An−1 are the cylinders ω : ω0 = i0, . . . , ωn−1 = in−1, ik ∈ X .
For A,B ∈ at(An−1) we have A ∼ B in graph G[An−1] if and only if A×B ⊆ E(G), that is
precisely when the cylinders A and B differ on at least t coordinates.
We proceed by finding an upper bound on α(G[An−1]) by counting, along with some fixed
atom A = ω : ω0 = i0, . . . , ωn−1 = in−1 ∈ at(An−1), the atoms of An−1 not adjacent to A
in G[An−1]. The number of atoms of An−1 which differ from A on exactly r coordinates is(nr
)(|X | − 1)r. With sufficiently large n, we have
(nr
)≤(nt−1
)for all r = 0, . . . , t− 1, and
α(G[An−1]) ≤t−1∑r=0
(n
r
)(|X | − 1)r ≤ t
(n
t− 1
)(|X | − 1)t−1
≤ tnt−1(|X | − 1)t−1,
giving
logα(G[An−1]) ≤ log t+ (t− 1) log n+ (t− 1) log(|X | − 1).
It then holds that
limn→∞
(logα(G[An−1])
n
)= 0. (5.24)
We now apply Corollary 1.3.14 and use Remark 5.1.2 (i) to give
H(G[An−1], P ) ≥ H(An−1)− logα(G[An−1]). (5.25)
Using (5.24) and (5.25) together with Propositions 5.3.20 and 5.3.15 and (5.7) gives
h(G,T ) = h(G[A0], T ) = lim supn→∞
1
nH(G[An−1], P ) ≥ lim
n→∞
1
nH(An−1) = h(T ).
The proof is completed by recalling from (5.19) on page 197 that h(G,T ) ≤ h(T ).
5.4 Further questions
The theory presented in this chapter raises a number of open questions which we now discuss:
each of them merits further work.
5.4 Further questions 204
5.4.1 Finite subalgebras
Definition 5.3.18 and our attempt to generalise the Kolmogorov–Sinai Theorem to the setting
of partial distinguishability raise the following important open question:
Question 5.4.1. In general do we have equality in the result
h(G,T ) = supB⊂F0
h(G[B], T ) ≤ supB⊂Fh(G[B], T ),
where the suprema are taken over the finite subalgebras of F contained in F0 and F respec-
tively?
Intuitively we might expect an affirmative answer, in which case Definition 5.3.18 could
be rewritten to express h(G,T ) as a supremum over all the finite subalgebras of F , in closer
analogy with the expression for h(T ) in Definition 5.1.1 (iv). An affirmative answer would
also allow condition (iv) in Definition 5.3.26 to be dropped.
5.4.2 Source coding with partial distinguishability
In what follows we discuss a type of regularity that may be possessed by the shift T known
as ergodicity. We follow definitions as in [4, Chapter 1]: in system (Ω,F , P, T ), a set A ⊆ Ω
is said to be invariant if P ((A\T−1A)∪ (T−1A\A)) = 0. The shift T is called ergodic if every
invariant set has measure 0 or 1.
We let G = GZ0 and pn be the probability distribution on X n such that
pn(i0, . . . , in−1) = P (ω : ω0 = i0, . . . , ωn−1 = in−1).
In Appendix C we show how the Shannon–McMillan–Breiman Theorem [4, Theorem 13.1]
and the asymptotic equipartition property [4, Theorem 13.2] imply that if T is ergodic and
0 < λ < 1, then
h(T ) = limn→∞
1
nlog (min|E| : E ⊆ X n, pn(E) ≥ 1− λ) . (5.26)
This generalises (1.1) on page 3 and solves the ‘source coding’ problem in this setting. That
h(T ) has such an interpretation is well-known [16, Section 4.5].
We denote the subgraph of Gn0 induced by E ⊆ V (G) by Gn0 (E). If G0 is complete, then
5.4 Further questions 205
|E| = χ(Gn0 (E)). So when G0 is complete and T is ergodic, (5.20) and (5.26) yield
h(G,T ) = limn→∞
1
nlog (minχ(Gn0 (E)) : E ⊆ X n, pn(E) > 1− λ) (5.27)
for all λ ∈ (0, 1). This motivates the following question.
Question 5.4.2. For G = GZ0 , under what conditions does (5.27) hold?
Where (5.27) holds, h(G,T ) acquires a ‘source coding’ interpretation in the case of partial
distinguishability, like that possessed by H(G, p) in Definition 1.3.5. We saw that (5.27)
holds in the case that G is complete and T ergodic. For the p-Bernoulli shift, pn = pn, and
Proposition 5.3.23 and Definition 1.3.5 give
h(G,T ) = H(G0, p) = limn→∞
1
nlog (minχ(Gn0 (E)) : E ⊆ X n, pn(E) > 1− λ) ,
and again (5.27) holds. It is unclear if (5.27) holds outside of these two cases.
5.4.3 Distinguishability
Fundamental to our work on the source with partial distinguishability is Definition 5.3.1.
With this definition of the distinguishability of sets, we note that the transfer of just a single
element ω ∈ Ω from one atom of a finite subalgebra B to another can change the graph
G[B]. It is arguably desirable to introduce a definition of distinguishability that would leave
G[B] invariant under the transfer of any null set from one atom to another. This could be
achieved by refining Definition 5.3.1 to say that sets A,B ∈ F are distinguishable if and only
if P (A), P (B) > 0 and there exist sets A′, B′ ∈ F such that P (A\A′) = P (B\B′) = 0 and
A′ ×B′ ⊆ E(G).
However, such a refinement of the definition of distinguishability means that the distin-
guishability of sets in F will depend on the measure P ; it is not clear if this would be helpful.
As an example of the difficulties this refined definition would bring, consider a, b, c, d ∈ X
where a b in G0 and c ∼ d in G0. Let G = GZ0 , and recall in this case that it seems
natural to declare two cylinders distinguishable if and only if they are distinguishable on
at least one coordinate. Suppose that P (ω : ω0 = a, ω1 = c) = P (ω : ω0 = a) and
P (ω : ω0 = b, ω1 = d) = P (ω : ω0 = b). (Because T preserves measure, we would then
have that a and b are almost always followed by c and d respectively.) By our refined dis-
tinguishability definition we have ω : ω0 = a ∼ ω : ω0 = b, but these cylinders are not
5.4 Further questions 206
distinguishable at any coordinate.
If we found a satisfactory definition of distinguishability that is invariant on transferring
null sets between atoms, it would also be appropriate to modify Definition 5.3.26 to define
the concept of ‘isomorphism modulo null sets’ as considered in [4, Chapter 2] for the complete
case. There we write (Ω,F , P, T ) ∼= (Ω′,F ′, P ′, T ′) if there is an isomorphism Ω\A→ Ω′\A′
in the sense of Definition 5.1.8 where A ⊂ Ω and A′ ⊂ Ω′ satisfy P (A) = P ′(A′) = 0.
5.4.4 Further generalisations
We recall that the study of dynamical systems is not unique to information theory. Suppose
for a general dynamical system (Ω,F , P, T ), in which Ω may not be of the form X Z, we have
a symmetric relation on Ω described by graph G. The definitions in this chapter lead to the
quantity h(G,T ), the graph entropy associated to the system (Ω,F , P, T,G); further work
could be undertaken to study the significance of this quantity in this more general context.
Finally, having generalised Korner’s graph entropy to non-commutative graphs in Chapter
4, and to the non-i.i.d. classical case in this chapter, it would be natural to ask if it can be
generalised to the non-i.i.d. quantum case. Given the complexities involved in generalising
even the Kolmogorov–Sinai entropy to this setting, this is likely to be a difficult problem.
Appendix A
Convexity and semi-continuity
Here we gather together some standard definitions and results concerning convexity and
semi-continuity.
Definition A.0.1. When T is a vector space, set S ⊆ T is convex if
αu+ (1− α)v ∈ S for all u, v ∈ S and α ∈ [0, 1].
The convex hull of set S is denoted conv(S) and given by the intersection of all convex
sets containing S, in other words, the smallest convex set containing S. Equivalently, conv(S)
is the set of all finite convex combinations of elements of S, that is
conv(S) =
k∑i=1
λisi : k ∈ N, λi ∈ R+,k∑i=1
λi = 1, si ∈ S
.
If set A is convex, then p ∈ A is called an extreme point of A when there do not exist
distinct points q, r ∈ A satisfying p = tq + (1 − t)r for some t ∈ (0, 1). The following is a
standard result due to Minkowski.
Theorem A.0.2. [54, Theorem 1.10] If K is a compact and convex subset of a finite dimen-
sional vector space, then K is the convex hull of its extreme points.
Definition A.0.3. When X is a convex subset of a vector space, function f : X → R is
called concave if for x, y ∈ X and α ∈ [0, 1]
f(αx+ (1− α)y) ≥ αf(x) + (1− α)f(y). (A.1)
207
208
The function f is strictly concave if the inequality (A.1) is strict when α ∈ (0, 1) and x 6= y.
Function f is called (strictly) convex when −f is (strictly) concave.
We denote by R the extended real system given by R = R ∪ −∞,∞.
Definition A.0.4. In metric space X, the function f : X → R is lower semi-continuous at
x0 ∈ X if lim infx→x0 f(x) ≥ f(x0). The function f is lower semi-continuous if it is lower
semi-continuous at every point x ∈ X.
(If f(x0) =∞, lower semi-continuity at x0 requires limx→x0 f(x) =∞.)
Analogous to Definition A.0.4 is the following.
Definition A.0.5. For metric space X, the function f : X → R is upper semi-continuous at
x0 ∈ X if lim supx→x0f(x) ≤ f(x0). The function f is upper semi-continuous if it is upper
semi-continuous at every point x ∈ X.
(Equivalently, f is upper semi-continuous at x0 when lim supn→∞ f(xn) ≤ f(x0) for every
sequence (xn)x∈N in X converging to x0 ∈ X; an analogous statement can be made in the
case of lower semi-continuity.)
We also note that versions of Definitions A.0.4 and A.0.5 apply more generally to topo-
logical spaces, but the forms as given suffice for the work here.
It is clear that a function is continuous if and only if it is both lower semi-continuous
and upper semi-continuous. We now give an important result concerning upper or lower
semi-continuous functions acting on compact spaces.
Theorem A.0.6. [1, Theorem 2.40] (Extreme value theorem.) Let X be compact.
(i) A lower semi-continuous function f : X → R ∪ ∞ is lower bounded and attains its
infimum. (If f(x) =∞ for all x ∈ X, then we say infx∈X f(x) =∞, which is attained
at all x ∈ X.)
(ii) An upper semi-continuous function f : X → R ∪ −∞ is upper bounded and attains
its supremum.
Theorem A.0.7. For any function f : X × Y → R
infx∈X
supy∈Y
f(x, y) ≥ supy∈Y
infx∈X
f(x, y). (A.2)
209
Proof. We denote the left hand side of (A.2) by L and the right hand side by R. First we
consider the case −∞ < R < ∞. In this case, for any ε > 0, observe that R − ε is not an
upper bound on the set infx∈X f(x, y) : y ∈ Y , and hence there exists yε ∈ Y such that
infx∈X f(x, yε) > R− ε. Then f(x, yε) > R− ε for all x ∈ X, and so for every x ∈ X we have
supy∈Y f(x, y) > R− ε. Letting ε→ 0 yields L ≥ R. If R = −∞, the result is trivial. Finally,
if R = ∞, then for arbitrarily large λ > 0 there exists yλ ∈ Y such that f(x, yλ) ≥ λ for all
x ∈ X. Then for all x ∈ X we have supy∈Y f(x, y) ≥ λ and hence L ≥ λ. Letting λ → ∞
gives L =∞.
Here we state and prove the form of the minimax theorem used in this work. The state-
ment and proof are as based on [38], but we generalise to functions with codomain R ∪ ∞
as indeed was suggested in [38].
Theorem A.0.8. Let K be a convex, compact subset of a normed vector space X, and let C
be a convex subset of vector space Y . Let function f : K × C → R ∪ ∞ satisfy:
(i) x→ f(x, y) is convex and lower semi-continuous for each y ∈ C, and
(ii) y → f(x, y) is concave for each x ∈ K.
Then
infx∈K
supy∈C
f(x, y) = supy∈C
infx∈K
f(x, y). (A.3)
Proof. We denote the left hand side of (A.3) by L and the right hand side by R.
By Theorem A.0.7 it holds that L ≥ R.
We now want to show that L ≤ R, or equivalently that for all M ≥ R and for all ε > 0,
we have L ≤M + ε. From the right hand side of (A.3) we see that, for M ≥ R,
infx∈K
f(x, y) ≤M for all y ∈ C. (A.4)
For y ∈ C, let Ky,t = x ∈ K : f(x, y) ≤ t. Then for all ε > 0,
Ky,M+ε 6= ∅ for all y ∈ C. (A.5)
Let f(x, y) = fy(x). We have Ky,t = f−1y (s : s ≤ t) and so Ky,t is the preimage of the
half line (−∞, t] under a lower semi-continuous function and is thus closed. (For this see,
210
for instance, [26, Theorem 7.1.1(iii)].) But K is compact and Ky,t ⊆ K, giving that Ky,t is
compact.
It is also true that Ky,t is convex. To see this, let v, w ∈ Ky,t and γ ∈ [0, 1]. By the
convexity of fy we have
fy(γv + (1− γ)w) ≤ γfy(v) + (1− γ)fy(w) ≤ t
giving that γv + (1− γ)w ∈ Ky,t.
We want to show for all M ≥ R and ε > 0 that
⋂y∈C
Ky,M+ε 6= ∅, (A.6)
for then there will exist x0 ∈ K such that f(x0, y) ≤ M + ε for all y ∈ C, which yields
supy∈C f(x0, y) ≤M + ε. This gives L ≤M + ε, and thus L ≤ R as required.
With M ≥ R and ε > 0, we replace f by f − (M + ε). Then (A.4) and (A.5) give that
for all y ∈ C we have
infx∈K
f(x, y) ≤ −ε and Ky,0 6= ∅. (A.7)
So it is now sufficient (see (A.6)) to show that
⋂y∈C
Ky,0 6= ∅. (A.8)
Recall that a collection of sets is said to have the finite intersection property if all finite
intersections of its members are non-empty. It is a standard result [43, Theorem 2.36] that if
a space X is compact, then every collection of closed sets in X having the finite intersection
property has non-empty intersection. Since K is compact and each Ky,t is closed, (A.8) will
follow if we can show that
⋂y∈C0
Ky,0 6= ∅ for all finite C0 ⊆ C. (A.9)
We begin by showing that
Ky1,0 ∩Ky2,0 6= ∅ for all y1, y2 ∈ C, (A.10)
and will then proceed by induction. For i = 1, 2 we write Ki = Kyi,0 and fi(x) = f(x, yi).
211
Suppose towards a contradiction that K1 ∩ K2 = ∅. We show that this means there exists
α ∈ [0, 1] such that
(1− α)f1(x) + αf2(x) ≥ 0 for all x ∈ K, (A.11)
whence the concavity of the function y → f(x, y) gives f(x, (1−α)y1 +αy2) ≥ 0 for all x ∈ K.
Since (1− α)y1 + αy2 ∈ C by the convexity of C, this contradicts (A.7).
Now (A.11) holds trivially for all α ∈ [0, 1] when x /∈ K1 ∪K2 for then f1(x), f2(x) > 0.
Supposing K1 ∩K2 = ∅, then for all x1 ∈ K1 we have f1(x1) ≤ 0 and f2(x1) > 0. Similarly,
for all x2 ∈ K2 we have f1(x2) > 0 and f2(x2) ≤ 0. For (A.11) to hold for all x1 ∈ K1, we
require α(f2(x1)− f1(x1)) ≥ −f1(x1) for all x1 ∈ K1, that is, we require
α ≥ sup
−f1(x1)
f2(x1)− f1(x1): x1 ∈ K1
. (A.12)
We note this supremum is non-negative. Similarly, for (A.11) to hold for all x2 ∈ K2, we
require α(f1(x2)− f2(x2)) ≤ f1(x2) for all x2 ∈ K2, that is, we require
α ≤ inf
f1(x2)
f1(x2)− f2(x2): x2 ∈ K2
. (A.13)
We note this infimum is less than or equal to 1. We can thus find α ∈ [0, 1] to satisfy (A.12)
and (A.13) if and only if for all x1 ∈ K1 and for all x2 ∈ K2, we have
−f1(x1)
f2(x1)− f1(x1)≤ f1(x2)
f1(x2)− f2(x2). (A.14)
Observe that if f2(x1) = ∞ or if f1(x2) = ∞, then (A.14) holds immediately. Otherwise we
need
f1(x1)f2(x2) ≤ f1(x2)f2(x1) (A.15)
for all x1 ∈ K1 and for all x2 ∈ K2. This is trivial if f1(x1) = 0 or f2(x2) = 0.
Otherwise let θ = −f1(x1)f1(x2)−f1(x1) . We have 0 < θ < 1 and
(1− θ)f1(x1) + θf1(x2) = 0, (A.16)
givingθ
1− θ=−f1(x1)
f1(x2). (A.17)
By the convexity of f1, (A.16) gives that f1((1−θ)x1+θx2) ≤ 0 and so (1−θ)x1+θx2 ∈ K1.
We also have 0 < f2((1 − θ)x1 + θx2) ≤ (1 − θ)f2(x1) + θf2(x2), where the first inequality
212
follows from the assumption K1 ∩K2 = ∅ (that is (1− θ)x1 + θx2 /∈ K2) and the second from
the convexity of f2. This leads to
θ
1− θ<
f2(x1)
−f2(x2). (A.18)
Then (A.17) and (A.18) lead to f1(x1)f2(x2) < f1(x2)f2(x1), and (A.15) holds for all
x1 ∈ K1 and x2 ∈ K2, whence (A.11) holds, leading to the contradiction described. Thus
K1 ∩K2 6= ∅.
We now show that⋂i≤mKi 6= ∅ for m ∈ N. Let K ′i = Kyi,0∩Ky1,0 for i = 2, . . . ,m. Note
that K ′i 6= ∅ by the previous argument. Now take the restriction of f to Ky1,0×C. We recall
Ky1,0 is compact and convex, so we can apply the entire previous argument to K ′1 = Ky1,0 in
place of K to obtain
K ′2 ∩K ′3 = Ky1,0 ∩Ky2,0 ∩Ky3,0 6= ∅.
After m−1 repetitions we reach⋂i≤mKi 6= ∅, establishing (A.9) as required to complete the
proof.
Appendix B
Linear algebra
The set of m×n matrices with entries in S is denoted by Mm,n(S). We write Mm,n = Mm,n(C)
and Md = Md,d. By A = (aij) ∈ Mm,n we mean the element of Mm,n whose (i, j)-entry is
aij ∈ C for i ∈ [m] and j ∈ [n]. The trace of matrix M = (mij) ∈ Md is given by
TrM =∑d
i=1mii. For A ∈ Mn,m and B ∈ Mm,n, the important cyclicality condition
Tr(AB) = Tr(BA) holds. For A = (aij) ∈ Mm,n, the matrix B = (bij) ∈ Mn,m where
bij = aji is called the transpose of A and is denoted by B = At. Similarly, the matrix
C = (cij) ∈ Mn,m where cij = aji is called the Hermitian transpose of A and is denoted by
C = A∗. For matrices A ∈ Mm,n and B ∈ Mn,p, it holds that (AB)∗ = B∗A∗. A self-adjoint
or Hermitian matrix A satisfies A = A∗. The identity matrix in Md will be denoted by Id, or
often just I where context allows. In Md the zero matrix will be denoted by 0 and the all ones
d× d matrix by Jd or just J . A unitary matrix U satisfies UU∗ = U∗U = I. If u1, . . . , ud
and v1, . . . , vd are orthonormal bases of Cd, then there exists a unitary matrix U such that
ui = Uvi for i = 1, . . . , d. If v1, . . . , vd is an orthonormal basis of Cd, then Id =∑d
i=1 viv∗i .
We take inner products to be linear in their first argument and conjugate linear in the
second. Specifically, for u, v ∈ Cd we take 〈u, v〉 = v∗u =∑d
i=1 viui = Tr(uv∗), and the
associated norm is given by ‖u‖ =√〈u, u〉. For A ∈ Md and u, v ∈ Cd it holds that
〈u,Av〉 = 〈A∗u, v〉 . A complex vector space with an inner product and which is complete
with respect to the norm induced by the inner product is called a Hilbert space. We will only
consider finite dimensional Hilbert spaces. Every Hilbert space has an orthonormal basis,
and for a Hilbert space H of dimension d there exists an isometric isomorphism from H
onto Cd which preserves the inner product. In this sense Cd is essentially the only Hilbert
space of dimension d. If H is a Hilbert space with subspace W ⊆ H, then we denote the
dimension of W by dim(W), and the orthogonal complement of W is given by W⊥ = v ∈
213
214
H : 〈v, w〉 = 0 for all w ∈ W. It is well known that a subspace W of a finite dimensional
Hilbert space H satisfies W⊥⊥ = W, and dim(W) + dim(W⊥) = dim(H). The space Md is
a Hilbert space, and for M = (mij), N = (nij) with M,N ∈ Md we will use the Hilbert–
Schmidt inner product 〈M,N〉 =∑d
i,j=1mijnij = Tr(MN∗). Indeed, for any n,m ∈ N and
P,Q ∈ Mn,m we define 〈P,Q〉 = Tr(PQ∗). The associated Hilbert–Schmidt norm will be
denoted ‖M‖2 and is given by ‖M‖2 =√〈M,M〉. We write ‖M‖ for the operator norm,
given by ‖M‖ = sup‖Mv‖ : v ∈ Cd, ‖v‖ = 1. As Md is finite dimensional, standard theory
states that these norms are equivalent in the sense that there exist positive reals c and C
such that c‖M‖2 ≤ ‖M‖ ≤ C‖M‖2 for all M ∈Md.
It is straightforward to see that the following hold for a, b, u, v ∈ Cd and A,B ∈Md :
〈Au,Bv〉 =(Bv)∗(Au) = 〈B∗Au, v〉 = 〈B∗A, vu∗〉
〈au∗, bv∗〉 = Tr(au∗vb∗) = 〈v, u〉 〈a, b〉
〈uv∗, A〉 = 〈u,Av〉 .
Definition B.0.1. Matrix M ∈Md is positive semi-definite if M = M∗ and 〈v,Mv〉 ≥ 0 for
all v ∈ Cd, and in this case we write M ≥ 0. This is equivalent to the condition that M is
Hermitian and has non-negative eigenvalues. We say a Hermitian matrix M ∈Md is positive
definite or strictly positive, and we write M > 0, if 〈v,Mv〉 > 0 for all non-zero v ∈ Cd, or
equivalently, if M has strictly positive eigenvalues.
If set S is a subset of a vector space V, then we recall that the span of S, denoted span(S),
is the set of all finite linear combinations of elements of S. Let M+d and M++
d denote the set
of all positive semi-definite d × d matrices and the set of all strictly positive d × d matrices
respectively. We write A ≥ B to mean A−B ≥ 0. Similarly, A > B means A−B > 0. Let
Mhd denote the set of Hermitian d× d matrices. For M ∈Md we define the range of M by
ran(M) = v ∈ Cd : there exists u ∈ Cd such that Mu = v,
and the kernel of M by
ker(M) = v ∈ Cd : Mv = 0.
If M ∈ Md and non-zero v ∈ Cd satisfies Mv = λv, then v is an eigenvector of M with
eigenvalue λ. If M ∈Mhd , then M can be expressed as M =
∑di=1 λiviv
∗i , where v1, . . . , vd is
an orthonormal basis of Cd, and vi is an eigenvector of M with eigenvalue λi ∈ R, i = 1, . . . , d.
It is then clear that TrM =∑d
i=1 λi. It may be that λi = 0 for some i ∈ [d]. If N ∈Mhd can
215
be expressed as N =∑k
i=1 λiviv∗i where v1, . . . , vk is an orthonormal set and each λi 6= 0,
then the rank of N is given by rank(N) = k. Note that the range of N is then given by
ran(N) = span(v1, . . . , vk), and so rank(N) = dim(ran(N)).
The largest eigenvalue of M ∈M+d is given by
‖M‖ = max〈v,Mv〉 : v ∈ Cd, ‖v|| = 1 = max〈M,ρ〉 : ρ ∈M+d , Tr ρ = 1.
Also note that for M =∑d
i=1 λiviv∗i where v1, . . . , vd is orthonormal, we have ‖M‖2 =√∑d
i=1 λ2i ≥ ‖M‖. Matrix M ∈ Md is normal if and only if MM∗ = M∗M . A matrix
M ∈Md is normal if and only if M can be written as M =∑d
i=1miviv∗i where v1, . . . , vd is
an orthonormal basis of Cd with eigenvalues m1, . . . ,md ∈ C ([19, Theorem 2.5.3]). Matrices
A,B ∈Md commute if AB = BA. A set S of matrices is said to be commutative if AB = BA
for all A,B ∈ S. If A,B ∈ Md can be expressed as A =∑d
i=1 aiviv∗i and B =
∑di=1 biviv
∗i
where v1, . . . , vd is an orthonormal basis of Cd and ai, bi ∈ C, then there is a unitary
matrix U such that UAU∗ and UBU∗ are diagonal matrices, and we say that A and B are
simultaneously unitarily diagonalisable. It is an important but standard result ([19, Theorem
2.5.5]) that a set of normal matrices is commutative if and only if the matrices in the set are
simultaneously unitarily diagonalisable.
Lemma B.0.2. The following are standard results in linear algebra.
For A,B,C ∈Md:
(i) A ≥ 0⇒ TrA ≥ 0.
(ii) A,B ≥ 0⇒ A+B ≥ 0 and λA ≥ 0 for λ ∈ R+.
(iii) A ≤ B and B ≤ C implies A ≤ C.
(iv) A,B ≥ 0⇒ Tr(AB) ≥ 0 (but note AB /∈M+d in general).
(v) If 0 ≤ A ≤ C and B ≥ 0, then Tr(AB) ≤ Tr(CB).
(vi) M++d ⊂M+
d ⊂Mhd .
(vii) If A ∈M+d , there exists A1/2 ∈M+
d such that A = (A1/2)2.
(viii) If V is an orthonormal basis for Cd, then TrA =∑
v∈V 〈Av, v〉.
(ix) For A ∈M+d and k ∈ R+ it holds that A ≤ kId ⇐⇒ ‖A‖ ≤ k.
(x) If A,B ∈M+d , then Tr(AB) = 0 ⇐⇒ AB = 0.
216
If v1, . . . , vn ∈ Cd are orthonormal vectors, then P =∑n
i=1 viv∗i is a rank-n orthogonal
projection and satisfies P 2 = P = P ∗ ≥ 0 and TrP = n. (In this thesis, the term projection
will be used to mean orthogonal projection.) Forming additional vectors vn+1, . . . , vd such
that v1, . . . , vd is an orthonormal basis of Cd, it is easy to see that if v =∑d
i=1 αivi with
αi ∈ C, then Pv =∑n
i=1 αivi, that is, Pv is the projection of v onto span(v1, . . . , vn) =
ran(P ).
If projections P1, . . . , Pk ∈Md satisfy∑k
i=1 Pi = Id, then it holds that PiPj = Tr(PiPj) =
0 for distinct i, j ∈ [k], yielding
ran(Pi) ⊥ ran(Pj) for distinct i, j ∈ [k]. (B.1)
If vi : i ∈ [d] is an orthonormal basis of Cd, then viv∗j : i, j ∈ [d] is an orthonormal
basis of Md. For M =∑d
i,j=1mijviv∗j ∈Md, we have mij = 〈Mvj , vi〉 . Choosing vi = ei gives
the canonical basis Eij : i, j ∈ [d] for Md, where Eij denotes the matrix unit eie∗j .
We use the Kronecker delta δij for i, j ∈ N given by
δij =
0 if i 6= j
1 if i = j.
An important operation we must consider is the tensor product.
Definition B.0.3. (i) The tensor product of vectors u = (ui)i∈[m] ∈ Cm and v = (vi)i∈[n] ∈
Cn is given by
u⊗ v = (uiv)i∈[m] ∈Mm,1(Cn) ∼= Cnm,
and for a, c ∈ Cm and b, d ∈ Cn it holds that 〈(a⊗ b), (c⊗ d)〉 = 〈a, c〉 〈b, d〉 .
(ii) The tensor product of matrices A = (aij)i,j∈[m] ∈ Mm and B = (bij)i,j∈[n] ∈ Mn is
given by
A⊗B = (aijB)i,j∈[m] ∈Mm(Mn) ∼= Mnm, (B.2)
and
Tr(A⊗B) = TrA.TrB, and ‖A⊗B‖ = ‖A‖‖B||.
(Though we will normally be working with square matrices, (B.2) extends trivially to
define tensor products of non-square matrices.)
217
(iii) The tensor product of vector spaces U and V is the vector space given by
U ⊗ V = spanu⊗ v : u ∈ U, v ∈ V .
It is useful to note the following results for matrices A,B,C,D and k ∈ C.
1. Bilinearity: If B + C exists then
A⊗ (B + C) = A⊗B +A⊗ C, (B + C)⊗A = B ⊗A+ C ⊗A,
and
(kA)⊗B = A⊗ (kB) = k(A⊗B).
2. Associativity:
(A⊗B)⊗ C = A⊗ (B ⊗ C).
3. Mixed product property: If the matrix products AC and BD exist, then
(A⊗B)(C ⊗D) = (AC)⊗ (BD).
4. Adjoint property:
(A⊗B)∗ = A∗ ⊗B∗.
For A,B ⊆ V where V is a vector space, we let A+B = a+b : a ∈ A, b ∈ B. For subspaces
A,B ⊆Md we have
(A+ B)⊥ = A⊥ ∩ B⊥. (B.3)
For subspaces Si ⊆Mdi , i = 1, 2, it is clear that
(S1 ⊗ S2)⊥ = S⊥1 ⊗Md2 +Md1 ⊗ S⊥2 . (B.4)
Appendix C
Source coding for the ergodic
source
Here we recall two related theorems for the ergodic source and show that they lead to (5.26).
We work in dynamical system (Ω = X Z,F , P, T ). We let the probability distribution pn on
X n be given by
pn(i0, . . . , in−1) = P (ω : ω0 = i0, . . . , ωn−1 = in−1).
Theorem C.0.1 (Shannon–McMillan–Breiman). [4, Theorem 13.1] For ω ∈ Ω we write
ω = (ωi)i∈Z. If T is an ergodic shift, then
limn→∞
− 1
nlog pn(ω0, . . . , ωn−1) = h(T ) almost everywhere on Ω.
Theorem C.0.2 (Asymptotic equipartition property). [4, Theorem 13.2] Suppose that T is
an ergodic shift and let h(T ) = h. Then for any ε > 0 there exists n0(ε) ∈ N such that for all
integers n ≥ n0(ε) there is a set B(n, ε) ⊆ X n satisfying pn(B(n, ε)) ≥ 1− ε, and such that
2−n(h+ε) < pn(u) < 2−n(h−ε)
for all u ∈ B(n, ε). Indeed, this can be achieved by setting
B(n, ε) =
u ∈ X n :
∣∣∣∣− 1
nlog pn(u)− h
∣∣∣∣ < ε
. (C.1)
From Theorems C.0.1 and C.0.2 we now prove (5.26), restated below.
218
219
Theorem C.0.3. If T is an ergodic shift, then for all ε ∈ (0, 1),
limn→∞
1
nlog (min|A| : A ⊆ X n, pn(A) ≥ 1− ε) = h(T ).
Proof. Write h(T ) = h. Choose δ ∈ (0, ε) and form the set B(n, δ) as in (C.1) so that for
sufficiently large n and for all u ∈ B(n, δ) we have pn(u) > 2−n(h+δ) and
pn(B(n, δ)) > 1− δ > 1− ε.
This gives |B(n, δ)| < 2n(h+δ), and it follows that
lim sup1
nlog (min|A| : A ∈ X n, pn(A) > 1− ε) ≤ lim sup
1
nlog |B(n, δ)|
< h+ δ. (C.2)
Now take A ⊆ X n satisfying pn(A) ≥ 1− ε. Then
pn(A ∩ B(n, δ)) = 1− pn(Ac ∪ B(n, δ)c) ≥ 1− pn(Ac)− pn(B(n, δ)c). (C.3)
It is a standard result for finite measure spaces that convergence almost everywhere implies
convergence in measure [44, p. 74], and so Theorem C.0.1 implies that there exists N ∈ N
such that for all n > N
P
(ω :
∣∣∣∣− 1
nlog pn(ω0 . . . , ωn−1)− h
∣∣∣∣ ≥ δ) < δ,
that is pn(B(n, δ)c) < δ.
Then for n > N , (C.3) gives that
pn(A ∩ B(n, δ)) > 1− ε− δ. (C.4)
and so
2n(h−δ)pn(A ∩ B(n, δ)) > 2n(h−δ)(1− ε− δ). (C.5)
Denoting the left hand side of (C.5) by L, we have
L = 2n(h−δ)∑
u∈A∩B(n,δ)
pn(u)
and since pn(u) < 2−n(h−δ) for all u ∈ B(n, δ) for sufficiently large n, it follows that there
220
exists n0 ∈ N such that L < |A ∩ B(n, δ)| ≤ |A| for all n ≥ n0. Returning to (C.5), we
conclude that, for n ≥ n0, any such set A satisfies |A| > 2n(h−δ)(1− ε− δ). This gives that
lim inf1
nlog (min|A| : A ∈ X n, pn(A) > 1− ε) > lim inf
1
nlog(
2n(h−δ)(1− ε− δ))
= h− δ. (C.6)
Letting δ → 0 in (C.6) and (C.2) gives the result.
Bibliography
[1] Charalambos D. Aliprantis and Kim C. Border. Infinite-dimensional analysis. Springer-
Verlag, Berlin, second edition, 1999. A hitchhiker’s guide.
[2] William B. Arveson. Subalgebras of C∗-algebras. Acta Math., 123:141–224, 1969.
[3] Koenraad M. R. Audenaert and Jens Eisert. Continuity bounds on the quantum relative
entropy. J. Math. Phys., 46(10):102104, 21, 2005.
[4] Patrick Billingsley. Ergodic theory and information. John Wiley & Sons, Inc., New
York-London-Sydney, 1965.
[5] Gareth Boreland. A lower bound on graph entropy. Math. Proc. R. Ir. Acad., 118A(1):9–
20, 2018.
[6] Gareth Boreland, Ivan G. Todorov, and Andreas Winter. Sandwich theorems and ca-
pacity bounds for non-commutative graphs. arXiv:1907.11504, 2019.
[7] Jean Cardinal, Samuel Fiorini, and Gwenael Joret. Minimum entropy coloring. In
Algorithms and computation, volume 3827 of Lecture Notes in Comput. Sci., pages 819–
828. Springer, Berlin, 2005.
[8] Man Duen Choi. Completely positive linear maps on complex matrices. Linear Algebra
and Appl., 10:285–290, 1975.
[9] V. Chvatal. On certain polytopes associated with graphs. J. Comb. Theory B, 18:138–
154, 1975.
[10] Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-
Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2006.
[11] I. Csiszar, J. Korner, L. Lovasz, K. Marton, and G. Simonyi. Entropy splitting for
antiblocking corners and perfect graphs. Combinatorica, 10(1):27–40, 1990.
221
BIBLIOGRAPHY 222
[12] R. Duan. Super-activation of zero error capacity of noisy quantum channels.
arXiv:0906.2527, 2009.
[13] Runyao Duan, Simone Severini, and Andreas Winter. Zero-error communication via
quantum channels, noncommutative graphs, and a quantum Lovasz number. IEEE
Trans. Inform. Theory, 59(2):1164–1174, 2013.
[14] Dennis Geller and Saul Stahl. The chromatic number and other functions of the lexico-
graphic product. J. Combinatorial Theory Ser. B, 19(1):87–95, 1975.
[15] Chris Godsil and Gordon Royle. Algebraic graph theory, volume 207 of Graduate Texts
in Mathematics. Springer-Verlag, New York, 2001.
[16] Robert M. Gray. Entropy and information theory. Springer, New York, second edition,
2011.
[17] M. Grotschel, L. Lovasz, and A. Schrijver. Relaxations of vertex packing. J. Combin.
Theory Ser. B, 40(3):330–343, 1986.
[18] Martin Grotschel, Laszlo Lovasz, and Alexander Schrijver. Geometric algorithms and
combinatorial optimization, volume 2 of Algorithms and Combinatorics: Study and Re-
search Texts. Springer-Verlag, Berlin, 1988.
[19] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press,
Cambridge, second edition, 2013.
[20] Yuichiro Kakihara. Abstract methods in information theory, volume 10 of Series on
Multivariate Analysis. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, second
edition, 2016.
[21] S. Kim and A. Metha. Chromatic numbers and a Lovasz type inequality for non-
commutative graphs. arXiv:1709.05595v1, 2017.
[22] Donald E. Knuth. The sandwich theorem. Electron. J. Combin., 1:Article 1, approx. 48,
1994.
[23] J. Korner. Coding of an information source having ambiguous alphabet and the en-
tropy of graphs, in Transactions of the Sixth Prague Conference on Information Theory,
Prague, 1971. pages 411–425, 1973.
[24] Janos Korner. Fredman-Komlos bounds and information theory. SIAM J. Algebraic
Discrete Methods, 7(4):560–570, 1986.
BIBLIOGRAPHY 223
[25] Janos Korner, Gabor Simonyi, and Zsolt Tuza. Perfect couples of graphs. Combinatorica,
12(2):179–192, 1992.
[26] Andrew J. Kurdila and Michael Zabarankin. Convex functional analysis. Systems &
Control: Foundations & Applications. Birkhauser Verlag, Basel, 2005.
[27] R. Levene, V. Paulsen, and I. Todorov. Complexity and capacity bounds for quantum
channels. arXiv:1710.06456v1, 2017.
[28] Laszlo Lovasz. On the Shannon capacity of a graph. IEEE Trans. Inform. Theory,
25(1):1–7, 1979.
[29] Katalin Marton. On the Shannon capacity of probabilistic graphs. J. Combin. Theory
Ser. B, 57(2):183–195, 1993.
[30] Robert J. McEliece and Edward C. Posner. Hide and seek, data storage, and entropy.
Ann. Math. Statist., 42:1706–1716, 1971.
[31] Robert E. Megginson. An introduction to Banach space theory, volume 183 of Graduate
Texts in Mathematics. Springer-Verlag, New York, 1998.
[32] Michael A. Nielsen and Isaac L. Chuang. Quantum computation and quantum informa-
tion. Cambridge University Press, Cambridge, 2000.
[33] Donald Ornstein. Bernoulli shifts with the same entropy are isomorphic. Advances in
Math., 4:337–352, 1970.
[34] V. Paulsen. Matrix analysis, 2015. Lecture notes, University of Waterloo, available at
http://http://www.math.uwaterloo.ca/ vpaulsen/ matrixanal2-1.pdf, accessed
10-4-2019.
[35] V. Paulsen. Entanglement and non-locality, 2016. Lecture notes, Uni-
versity of Waterloo, available at http://www.math.uwaterloo.ca/∼vpaulsen/
EntanglementAndNonlocality LectureNotes 7.pdf, accessed 17-8-2018.
[36] Vern Paulsen. Completely bounded maps and operator algebras, volume 78 of Cambridge
Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2002.
[37] Sven Polak and Alexander Schrijver. New lower bound on the shannon capacity of c7
from circular graphs. arXiv:1808.07438, 2018.
[38] D. Pollard. Minimax theorem, 2003. Available at http://www.stat.yale.edu/
∼pollard/Courses/602.spring07/MmaxThm.pdf, accessed 17-8-2018.
BIBLIOGRAPHY 224
[39] S. Rezaei. Entropy and graphs. Master of Math thesis, University of Waterloo,
arXiv:1311.5632, 2013.
[40] S. Rezaei and E. Chiniforooshan. Symmetric graphs with respect to graph entropy.
Electron. J. Combin., 24(1), 2017.
[41] Joseph V. Romanovsky. A simple proof of the Birkhoff-von Neumann theorem on bis-
tochastic matrices. In A tribute to Ilya Bakelman (College Station, TX, 1993), volume 3
of Discourses Math. Appl., pages 51–53. Texas A & M Univ., College Station, TX, 1994.
[42] Walter Rudin. Functional analysis. McGraw-Hill Book Co., New York-Dusseldorf-
Johannesburg, 1973. McGraw-Hill Series in Higher Mathematics.
[43] Walter Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., New York-
Auckland-Dusseldorf, third edition, 1976. International Series in Pure and Applied Math-
ematics.
[44] Walter Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third
edition, 1987.
[45] Mary Beth Ruskai. Inequalities for quantum entropy: a review with conditions for
equality. J. Math. Phys., 43(9):4358–4375, 2002. Quantum information theory.
[46] Edward R. Scheinerman and Daniel H. Ullman. Fractional graph theory. Dover Publi-
cations, Inc., Mineola, NY, 2011. A rational approach to the theory of graphs, With a
foreword by Claude Berge, Reprint of the 1997 original.
[47] C. E. Shannon. A mathematical theory of communication. Bell System Tech. J., 27:379–
423, 623–656, 1948.
[48] Claude E. Shannon. The zero error capacity of a noisy channel. Institute of Radio
Engineers, Transactions on Information Theory,, IT-2(September):8–19, 1956.
[49] Gabor Simonyi. Graph entropy: a survey. In Combinatorial optimization (New
Brunswick, NJ, 1992–1993), volume 20 of DIMACS Ser. Discrete Math. Theoret. Com-
put. Sci., pages 399–441. Amer. Math. Soc., Providence, RI, 1995.
[50] Gabor Simonyi. Perfect graphs and graph entropy. An updated survey. In Perfect graphs,
Wiley-Intersci. Ser. Discrete Math. Optim., pages 293–328. Wiley, Chichester, 2001.
[51] Dan Stahlke. Quantum zero-error source-channel coding and non-commutative graph
theory. IEEE Trans. Inform. Theory, 62(1):554–577, 2016.
BIBLIOGRAPHY 225
[52] M. Tribus and E.C. McIrvine. Energy and information. Scientific American, 224, 1971.
[53] Peter Vrana. Probabilistic refinement of the asymptotic spectrum of graphs.
arXiv:1903.01857, 2019.
[54] John Watrous. The theory of quantum information. Cambridge University Press, 2018.
[55] Nik Weaver. A “quantum” Ramsey theorem for operator systems. Proc. Amer. Math.
Soc., 145(11):4595–4605, 2017.
[56] Alfred Wehrl. General properties of entropy. Rev. Modern Phys., 50(2):221–260, 1978.
[57] Mark M. Wilde. Quantum information theory. Cambridge University Press, Cambridge,
second edition, 2017.
[58] H. S. Witsenhausen. The zero-error side information problem and chromatic numbers.
IEEE Trans. Information Theory, IT-22(5):592–593, 1976.
Index
σ-algebra, 175
abelian projection, 97
abelian projection convex corner, 98
adjoint channel, 108
anti-blocker, 10
atom, 176
automorphism, 14
Bernoulli shift, 179, 180, 198, 201
bipartite graph, 23
c.p.t.p. map, 87
channel, 82
Choi matrix, 88
chromatic number, 14, 96, 138
clique, 13, 97
clique covering number, 14, 127
clique number, 13, 126
clique projection, 97
clique projection convex corner, 98
co-normal product, 14, 139
co-tensor product, 140
complement, 13
complete bipartite graph, 27
complete graph, 14
completely positive, 87
concave function, 207
conditional entropy, 177
conditional graph entropy, 189
confusability graph, 82
convex corner, 4, 35
convex function, 208
convex hull, 207
convex set, 207
cycle, 23
cylinder, 175
density matrix, 58
diagonal convex corner, 35
distinguishability, 15
distinguishability graph, 15, 187
distinguishability preserving transformation,
187
dynamical system, 176
eigenvalue, 214
eigenvector, 214
empty graph, 14
entropy, 2, 177
entropy over a convex corner, 5
extreme point, 207
Fekete’s Lemma, 83
finite subalgebra, 176
fractional chromatic number, 15, 128
fractional clique covering number, 128, 129
fractional clique number, 18, 126
fractional full covering number, 128
fractional vertex packing polytope, 30
full covering number, 127
full number, 126
full projection, 97
full projection convex corner, 98
226
INDEX 227
full set, 97
graph colouring, 14
graph entropy, 13, 15, 187, 195
handled orthonormal labelling (h.o.n.l.), 30
handled projective orthogonal
labelling (h.p.o.l.), 115
hereditary, 4
hereditary cover, 41
Hermitian matrix, 213
Hilbert space, 58, 90, 213
Hilbert–Schmidt inner product, 33, 214
Hilbert–Schmidt norm, 33, 214
homomorphism, 14, 134
i.i.d., 2
identity channel, 82, 159
independence number, 13
independent projection, 138
independent set, 13, 97, 138
induced subgraph, 13
inner product, 6, 33, 214
isomorphism, 14, 180, 199
kernel, 14
Kolmogorov Existence Theorem, 176
Kolmogorov–Sinai entropy, 177, 180, 195
Kolmogorov–Sinai Theorem, 178, 196
Kraus operators, 88
Kraus representation, 88
Kronecker delta, 216
Korner, 15
lexicographic product, 182
logarithm, 1
logarithm function, 1, 59
Lovasz, 84
Lovasz corner, 109
Lovasz number, 31, 126, 146
lower semi-continuous function, 208
Markov shift, 179, 198
measurable function, 175
measurable space, 175
measure preserving transformation, 175
measurement system, 86
memory, 174
mixed state, 58, 59
noiseless channel, 2
noisy channel, 82
non-commutative graph, 90
non-commutative graph entropy, 142
norm, 6, 33, 214
normal matrix, 215
null set, 175
one-shot zero-error capacity, 83
operator anti-system, 137
operator norm, 33, 214
operator system, 90
orthogonal complement, 213
orthonormal labelling (o.n.l.), 30
packing number, 85
perfect graph, 23, 108
perfect matching, 27
positive definite matrix, 214
positive map, 87
positive semi-definite matrix, 214
probabilistic graph, 14
probability measure, 175
probability space, 175
projection, 216
INDEX 228
projective orthogonal labelling (p.o.l.), 115
pure state, 58, 59
quantum channel, 87
quantum mechanics, 58, 86, 87
quantum relative entropy, 61
quantum system, 58
reflexivity, 37
relative entropy, 3
sandwich theorem, 31, 99, 120
second anti-blocker theorem, 10, 48, 55
second Lovasz number, 147
self-adjoint matrix, 213
semi-continuity, 208
Shannon, 1
Shannon capacity, 83, 95
Shannon entropy, 2
shift, 175
shift invariant graph, 187
simultaneously unitarily diagonalisable
matrices, 215
source coding, 2, 15
spanning subgraph, 13
stable set, 13
standard convex corner, 4, 38
state, 58
state space, 58
strong chromatic number, 138
strong product, 83
strongly independent set, 138
substitution lemma, 183
symmetric, 23
tensor product, 216
tensor product of channels, 94
tensor product of graphs, 140
theta corner, 30
threshold-t co-normal product, 192
trace, 213
trace preserving, 87
trivial channel, 160
unit corner, 4, 69
unit cube, 4, 69
upper semi-continuous function, 208
vertex packing polytope, 16
vertex transitive, 14
von Neumann entropy, 60
weighted independence number, 137
weighted Lovasz number, 137
Witsenhausen rate, 85, 155
zero-error capacity, 83
zero-error information theory, 82