an intuitive introduction to information theory

An intuitive introduction to An intuitive introduction to information theoryinformation theory

Ivo Grosse

Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben

Bioinformatics Centre Gatersleben-Halle

22

OutlineOutline

Why information theory?

An intuitive introduction

33

History of biologyHistory of biology

St. Thomas Monastry, Brno

44

GeneticsGenetics

Gregor Mendel1822 – 1884

1866 Mendel‘s laws

Foundation of Genetics

Ca. 1900:Biology becomes a quantitative science

55

50 years later … 195350 years later … 1953

James Watson & Francis Crick

66

50 years later … 195350 years later … 1953

88

DNADNA

Watson & Crick1953

Double helix structureof DNA

1953:Biology becomes amolecular science

99

1953 – 2003 … 50 years of revolutionary 1953 – 2003 … 50 years of revolutionary discoveriesdiscoveries

1010

19891989

1111

19891989

Goals:

Identify all of the ca. 30.000 genes

Identify all of the ca. 3.000.000.000 base pairs

Store all information in databases

Develop new software for data analysis

1212

2003 Human Genome Project officially finished2003 Human Genome Project officially finished

2003: Biology becomes an information science

1313

2003 – 2053 … biology = information science2003 – 2053 … biology = information science

1414

2003 – 2053 … biology = information science2003 – 2053 … biology = information science

SystemsSystemsBiologyBiology

1515

What is information?What is information?

Many intuitive definitions

Most of them wrong

One clean definition since 1948

Requires 3 steps- Entropy- Conditional entropy- Mutual information

1616

Before starting with entropy …Before starting with entropy …

Who is the father of informationtheory?

Who is this?

Claude Shannon1916 – 2001

A Mathematical Theory of Communication. Bell SystemTechnical Journal, 27, 379–423 & 623–656, 1948

1717

Before starting with entropy …Before starting with entropy …

Who is the grandfather ofinformation theory?

Simon bar KochbaCa. 100 – 135

Jewish guerilla fighter againstRoman Empire (132 – 135)

1818

EntropyEntropy

Given a text composed from an alphabet of 32 letters (each letter equally probable)

Person A chooses a letter X (randomly) Person B wants to know this letter B may ask only binary questions

Question: how many binary questions must B ask in order to learn which letter X was chosen by A

Answer: entropy H(X)

Here: H(X) = 5 bit

1919

Conditional entropy (1)Conditional entropy (1)

The sky is blu_

How many binary questions? 5?

No! Why? What’s wrong?

The context tells us “something” about the missing letter X

2020


Given a text composed from an alphabet of 32 letters (each letter equally probable)

Person A chooses a letter X (randomly) Person B wants to know this letter B may ask only binary questions A may tell B the letter Y preceding X

E.g. L_ Q_

Question: how many binary questions must B ask in order to learn which letter X was chosen by A

Answer: conditional entropy H(X|Y)

2121


H(X|Y) <= H(X)

Clear!

In worst case – namely if B ignores all “information” in Y about X – B needs H(X) binary questions

Under no circumstances should B need more than H(X) binary questions

Knowledge of Y cannot increase the number of binary questions

Knowledge can never harm! (mathematical statement, perhaps not true in real life )

2222

Mutual information (1)Mutual information (1)

Compare two situations:

I: learn X without knowing Y II: learn X with knowing Y

How many binary questions in case of I? H(X) How many binary questions in case of II? H(X|Y)

Question: How many binary questions could B save in case of II? Question: How many binary questions could B save by knowing

Y?

Answer: I(X;Y) = H(X) – H(X|Y)

I(X;Y) = information in Y about X

2323


H(X|Y) <= H(X) I(X;Y) >= 0

In worst case – namely if B ignores all information in Y about X or if there is no information in Y about X – then I(X;Y) = 0

Information in Y about X can never be negative

Knowledge can never harm! (mathematical statement, perhaps not true in real life )

2424


Example 1: random sequence composed of A, C, G, T (equally probable)

I(X;Y) = ?

H(X) = 2 bit H(X|Y) = 2 bit I(X;Y) = H(Y) – H(X|Y) = 0 bit

Example 2: deterministic sequence … ACGT ACGT ACGT ACGT …

I(X;Y) = ?

H(X) = 2 bit H(X|Y) = 0 bit I(X;Y) = H(Y) – H(X|Y) = 2 bit

2525


I(X;Y) = I(Y;X) Always! For any X and any Y! Information in Y about X = information in X about Y

Examples: How much information is there in the amino acid sequence about

the secondary structure? How much information is there in the secondary structure about the amino acid sequence?

How much information is there in the expression profile about the function of the gene? How much information is there in the function of the gene about the expression profile?

Mutual information

2626

SummarySummary

Entropy Conditional entropy Mutual information

There is no such thing as information content Information not defined for a single variable 2 random variables needed to talk about information Information in Y about X

I(X;Y) = I(Y;X) info in Y about X = info in X about Y

I(X;Y) >= 0 information never negative knowledge cannot harm

I(X;Y) = 0 if and only if X and Y statistically independent I(X;Y) > 0 otherwise

an intuitive introduction to information theory

Documents