the max-entropy fallacy

21
KTH/CSC October 13, 2014 Erik Aurell, KTH & Aalto U 1 The max-entropy fallacy Erik Aurell International Mini-workshop on Collective Dynamics in Information Systems 2014 Beijing, October 13, 2014 vli Institute for Theoretical Physics China (KITPC) Loosely based on G. Del Ferraro & E.A. J. Phys. Soc. Japan 83 084001 (2014) C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014)

Upload: zahir-melendez

Post on 01-Jan-2016

48 views

Category:

Documents


1 download

DESCRIPTION

The max-entropy fallacy. Erik Aurell International Mini-workshop on Collective Dynamics in Information Systems 2014 Beijing, October 13, 2014 Kavli Institute for Theoretical Physics China (KITPC). Loosely based on G. Del Ferraro & E.A. J . Phys. Soc. Japan 83 084001 (2014 ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 1October 13, 2014

The max-entropy fallacy

Erik Aurell

International Mini-workshop onCollective Dynamics in Information Systems 2014

Beijing, October 13, 2014

Kavli Institute for Theoretical Physics China (KITPC)

Loosely based onG. Del Ferraro & E.A. J. Phys. Soc. Japan 83 084001 (2014)C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014)

Page 2: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 2October 13, 2014

What entropy?

i

ii pppS log][

By entropy I will mean the Shannon entropy of a probability distribution:

Maximizing S[p] subject to the constraint gives

What maximization?const.

iii EpE

iEZi ep 1

The idea that other probability distributions than equilibrium statistical

mechanics can be derived by maximizing entropy given suitable constraints.

What max-entropy?

Page 3: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 3October 13, 2014

Two reasons to give this talk

Max-entropy: E.T. Jaynes proposed in 1957 that both equilibrium and non-equilibrium statistical mechanics be based upon this criterion.

Max-entropy inference: in the last decade considerable attention has been given to learning pairwise interaction models from data, motivated by max-entropy arguments. This research is highly interesting, but does it support max-entropy?

[...] the probability distribution over microscopic states which has maximum entropy, subject to whatever is known, provides the most unbiased representation of our knowledge of the system.

E.T Jaynes, “Information Theory and Statistical Mechanics II”, Physical Review

108 171-190 (1957)

Page 4: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 4October 13, 2014

Why oppose max-entropy?

Are probabilities in Physics objective or subjective?

(2) does it give the right answers in principle?

”[...]one must recognize the fact that probability theory has developed in two very different directions as regards fundamental notions.””[..] the ’objective’ school of thought ..””[..] the ’subjective’ school of thought... ””[...] the probability of an event is merely a formal expression of our expectation that the event did or will occur, based on whatever information is available”

E.T Jaynes, “Information Theory and Statistical Mechanics I”, Physical Review 106 620-630 (1957)

(4) is max-entropy inference a scientific methodology e.g. in the sense of Popper?

(3) is it necessary to explain the recent successes in inference?

(1) is it a practical method to study non-equilibrium processes (say, on graphs)?

Page 5: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 5October 13, 2014

(1) is it practical?

j

FjF PFPdt

dPjj

,, 21 [master equation]

Auxiliary maximum entropy distribution

True distribution

We consider continuous-time dynamics on graphs. Dynamics could be

driven out of equilibrium, or relaxing towards equilibrium.

Observables in the sense of max-entropy

,...,, 31

321

211

1 sOsOsO NNN

Page 6: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 6

Graph of dynamics Overlayed possible terms Factor graph of auxiliary model

v

October 13, 2014

This is a dimensional reduction

Dynamics of the observables

according to the master equation

klkLkNl

M

OOOOdt

d

1)(

j

ljljNl

T

sOsFOsdt

d)())(()(1

)(

Dynamics of the observables

according to the auxiliary distr.

If the auxiliary distribution is a good model both ways of computing the

dynamics must agree. In this way the changes of the β’s can be computed

and the master equation reduced to a (complicated) finite-dimensional ODE.

The averages have to be computed by the cavity method (or something else).

Page 7: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 7October 13, 2014

The approach not new. But not (much) tested on single graphs

Simplest non-trivial case: the ID Ising spin chain

Simplest max-entropy theory built on observing magnetization and energy

Roy J Glauber, “Time-dependent statistics of the Ising model”, Journal of mathematical physics, 4:294, (1963)

Obeys detailed balance

Simple ferromagnetic Hamiltonian

Essentially solved 51 years ago

Already this is not totally trivial to do…because of the averages…

Page 8: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 8October 13, 2014

…plus, in every time-step, an implicit three-variable equation

change from master equation

Solving equations by Newton

…works reasonably well…

computed by cavity

Energy vs time

Difference to the

Glauber theory, in

energy vs time

Page 9: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 9October 13, 2014

Joint spin-field distribution…

…and works better, though

not perfectly. The longer

range in the auxiliary

distribution the more

complicated the cavity

calculation and equation.

A. C. C Coolen, S. N. Laughton, and D. Sherrington. Physical Review B 53: 8184, (1996)

In principle similar, but needs three cavity fields and solving an

eight-dimensional implicit equation at every step…

Difference to the

Glauber theory

Page 10: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 10October 13, 2014

Internal consistency check…

klkLkNl

M

OOOOdt

d

1)(

j

ljljNl

T

sOsFOsdt

d)())(()(1

)(

Consider again the two ways of computing the changes of observables

They work also if Ol is not in the theory. But then they do not have to agree,

and the discrepancy between the two sides is an internal consistency check.

Simplest tests are for longer-range pairwise correlations. kiikii ssc ,

Magnetization-energy

theory

Joint spin-field

distribution theory

Page 11: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 11October 13, 2014

Answer to: (1) is it practical?

δQ forward: ε+Lθ δQ backward: ε

(2) does max-entropy give right answers outside equilibrium?

No. It is complicated to implement, even in a simple 1D model of a

dynamics relaxing towards equilibrium. One can consider successive

approximations with longer “interactions”, but the complexity grows

very quickly. Which brings us to the next question:

Page 12: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 12October 13, 2014

Traditionally there were no exact (relevant) results

fXNN eX Pr [Gibbs-Boltzmann distribution]

leading-sub,..,

,Pr ttNVtN eX

[putative non-equilibrium distribution]

If max-entropy is relevant for non-equilibrium then the probability distributions should, as Gibbs-Boltzmann distribution, be exponential.

ab

xFxxFBV

)(

log)(),(

2

1

F

FFF

SSEP

rr

rB log

1

1log1,

Now known for 10-15 years this is the case, but the functional V is very non-trivial.

B. Derrida, J. Stat. Mech. (2007) P07023 E. Akkermans et al, EPL 103 20001 (2013)

Recently extended to multi-dimensional systems, for the related

question of fluctuations of the current.

Page 13: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 13October 13, 2014

Answer to: (2) does max-entropy give right answers outside equilibrium? No, because there is no way that a complicated long-range effective

interaction potential can be deduced from maximizing entropy and a

limited number of simple constraints.

For the experts: both systems relaxing to equilibrium such as the Ising

spin chain and the SSEP (and other such solved models) are covered by

the macroscopic fluctuation theory of Jona-Lasinio and co-workers. But

only SSEP-like systems lead to long-range effective interactions. For the

relaxing Ising spin chain the max-entropy approach should hence

probably eventually work, though remain computationally cumbersome.

Page 14: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 14October 13, 2014

(3) is it necessary to explain the recent successes in inference?

The main success is contact prediction in proteins. Folding proteins in

silico is hard, and not a solved problem – unless you have an already

solved structure as template. Predicting which amino acids are in contact

in a structure can be done from co-variation in similar proteins.

Page 15: The max-entropy fallacy

KTH/CSC

X1

X6

X4

X3X

2

X5

X7

Relation btw positional correlation and structure known since 20 ys

August 27, 2014 Erik Aurell, KTH & Aalto 15

Neher (1994) Göbel, Sander, Schneider, Valencia(1994)

Lapedes et al 2001 Weigt et al PNAS 2009Burger & van Nimwegen 2010 Balakrishnan et al 2011Morcos et al PNAS 2011Hopf et al Cell 2012Jones et al Bioinformatics 2012Ekeberg et al Phys Rev E 2013Skwark et al Bioinformatics 2013Kamisetty et al PNAS 2013

Page 16: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto 16

The recent success is to learn a Potts model from data

February 26, 2014

”The prediction method applies a maximum entropy approach to infer evolutionary covariation […]” T. Hopf et al, Cell 149:1607-21 (2012)

”The maximum-entropy approach to potentially solving the problem of protein structure prediction from residue covariation patterns […]”

D. Marks et al, Nat Biotechnol. 30:1072-80 (2012)

”To disentangle direct and indirect couplings, we aim at inferring a statistical model P(A1, ...,AL) for entire protein sequences (A1, ...,AL) […] aim at the most general, least-constrained model […] achieved by applying the maximum-entropy principle ”

F. Morcos et al, PNAS 108:E1293–E1301 (2011)

Page 17: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto 17August 27, 2014

Actually we have all the data

It is a choice to reduce multiple sequence

alignments to nucleotide frequencies and

correlations for data analysis. But we start

from all the data. The conceptual basis of

max-entropy is therefore not there.

Furthermore, the best available

methods to learn these Potts

models use all the data.

M Ekeberg et al, Phys Rev. E (2013) M Ekeberg et al, J Comp Phys (2014)

http://plmdca.csc.kth.se/

Page 18: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto 18February 26, 2014

Learning better models...

Multiple sequence alignments generally have stretches of gaps.

Not generated with high probability from a Potts model.

Marcin Skwark and Christoph Feinauer, AISTATS (2014)C. Feinauer, M. Skwark, A. Pagnani & E.A. PLoS Comp Biol 10 e1003847 (2014)

This (and previous) slide show the – real, but admittedly not very large

– improvement in contact prediction by learning two models

(gplmDCA and plmDCA20) which do take into account gap stretches.

Page 19: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 19October 13, 2014

Answer to: (3) is max-entropy necessary to explain the recent successes in inference? No.

Why that is or should be so? Nobody knows! Perhaps an important

problem for evolutionary theory? And perhaps has other uses?

The successes are better explained by the distribution of amino acids in

homologous proteins, as a result of all evolution of life, is actually in an

exponential family, and rather close to a Potts model.

We are back to the objective / subjective interpretations of probability,

from the start the most contentious issue surrounding max-entropy.

Page 20: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 20October 13, 2014

(4) is max-entropy inference a scientific methodology?

According to Popper science is based on falsifiability. This same basic idea has been stated by many others, before and after.

We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress

R.P. Feynman, as on famousquotes.org.

You cannot falsify anything by a single experiment or single data set with no theory of prediction beforehand to falsify.

Also according to Popper, scientific knowledge is built as a collective enterprise of scientists. Therefore, Jaynes’ conditional ... […]subject to whatever is known [..]

...implicitly includes all human knowledge up to that time –which is not a simple constraint.

A similar philosophical objection can be made against Rissanen’s Minimum Description Length principle.

Page 21: The max-entropy fallacy

KTH/CSC

Erik Aurell, KTH & Aalto U 21October 13, 2014

Thanks to

Gino Del Ferraro

Alexander Mozeika

Marcin Skwark

Christoph Feinauer

Andrea Pagnani

Magnus Ekeberg

Angelo Vulpiani