positive and negative randomness paul vitanyi cwi, university of amsterdam joint work with kolya...

Post on 28-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Positive and Negative Randomness

Paul Vitanyi CWI, University of Amsterdam

Joint work with Kolya Vereshchagin

Non-Probabilistic Statistics

Classic Statistics--Recalled

Probabilistic Sufficient Statistic

Kolmogorov complexity

K(x)= length of shortest description of x K(x|y)=length of shortest description of x

given y.

A string is random if K(x) ≥ |x|.

K(x)-K(x|y) is information y knows about x. Theorem (Mutual Information). K(x)-K(x|y) = K(y)-K(y|x)

Randomness Deficiency

Algorithmic Sufficient Statistic where model is a set

Algorithmic suficient statistic where model is a total computable function

Data is binary string x;Model is a total computable function p ;Prefix complexity is K(p) (size smallest TM computing p);Data-to-model code length l_x(p)=min_d {|d|:p(d)=x.

x is typical for p if δ(x|p)=l_x(p)-K(x|p) is small.p is a sufficient statistic for x if K(p)+l_x(p)=K(x)+O(1) and p(d)=x for the d that achieves l_x(p). Theorem: If p is ss for x then x is typical for p.

p is minimal ss (sophistication) for x if K(p) minimal.

Graph Structure Function

h_x(α)

α

log |S| Lower boundh_x(α)=K(x)-α

Minimum Description Length estimator, Relations between estimators

Structure function h_x(α)= min_S{log d(S): x in S and K(S)≤α}.

MDL estimator λ_x(α)= min_S{log |S|+K(S): x in S and K(S)≤α}.

Best-fit estimator: β_x(α) = min_S {δ(x|S): x in S and K(S)≤α}.

Individual characteristics: More detail, especially for meaningful (nonrandom) Data

We flip the graph so that log|.|is on the x-axis and K(.) is on the y-axis. This is essentally theRate-distortion graph for list (set)distortion.

Primogeniture of ML/MDL estimators

•ML/MDL estimators can be approximatedfrom above;•Best-fit estimator cannot be approximatedEither from above or below, up to anyPrecision.•But the approximable ML/MDL estimatorsyield the best-fitting models, even thoughwe don’t know the quantity of goodness-of-fit ML/MDL estimators implicitlyoptimize goodness-of-fit.

Positive- and Negative Randomness,

and Probabilistic Models

Precision of following given function h(α)

h(α)

d

h_x(α)

Model cost α

Data-to-Model cost log |S|

Logarithmic precision is sharp

Lemma. Most strings of length n have structure functions close to the diagonal n-n. Those arethe strings of high complexity K(x) > n.

For strings of low complexity, say K(x)< n/2,The number of appropriate functions is muchgreater than the number of strings. Hence there cannot be a string for every such function. Butwe show that there is a string for every approximate shape of function.

All degrees of neg. randomness

Theorem: For every length n there are strings x of every minimal sufficient statstic in between 0 and n(up to a log term)

Proof. All shapes of the structure function are possible, as long as it starts from n-k anddecreases monotonicallyand is 0 at k for some k ≤ n.(Up to the precision in the previous slide).

Are there natural examples of negative randomness

Question: Are there natural examples of strings ofwith large negative randomness. Kolmogorov didn’tThink they exist, but we know the are abundant..

Maybe information distance between strings xand y yields large negative randomness.

Information Distance:

• Information Distance (Li, Vitanyi, 96; Bennett,Gacs,Li,Vitanyi,Zurek, 98)

D(x,y) = min { |p|: p(x)=y & p(y)=x}

Binary program for a Universal Computer(Lisp, Java, C, Universal Turing Machine)

Theorem (i) D(x,y) = max {K(x|y),K(y|x)}

Kolmogorov complexity of x given y, definedas length of shortest binary ptogram thatoutputs x on input y.

(ii) D(x,y) ≤D’(x,y) Any computable distance satisfying ∑2 --D’(x,y) y for every x.

≤ 1

(iii) D(x,y) is a metric.

Not between random strings

• The information distance between random strings x and y of length n doesn’t work.

• If x,y satisfy K(x|y),K(y|x) > n then

p=x XOR y where XOR means bitwise

exclusive-or serves as a program to

translate x too y and y to x. But if x and y

are positively random it appears that p

is so too.

T

Selected BibliographyN.K. Vereshchagin, P.M.B. Vitanyi, A theory of lossy compression of individual data, http://arxiv.org/abs/cs.IT/0411014, Submitted. P.D. Grunwald, P.M.B. Vitanyi, Shannon Information and Kolmogorov complexity, IEEE Trans. Information Theory, Submitted. N.K. Vereshchagin and P.M.B. Vitanyi, Kolmogorov's Structure functions and model selection, IEEE Trans. Inform. Theory, 50:12(2004), 3265- 3290. P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463. Q. Gao, M. Li and P.M.B. Vitanyi, Applying MDL to learning best model granularity, Artificial Intelligence, 121:1-2(2000), 1--29. P.M.B. Vitanyi and M. Li, Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity, IEEE Trans. Inform. Theory, IT-46:2(2000), 446--464.

top related