markov, shannon, and turbo codes: the benefits of hindsight professor stephen b. wicker school of...

41
Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY 14853

Upload: myles-tate

Post on 18-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Markov, Shannon, and Turbo Codes:The Benefits of Hindsight

Professor Stephen B. Wicker

School of Electrical Engineering

Cornell University

Ithaca, NY 14853

Page 2: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Introduction

Theme 1: Digital Communications, Shannon and Error Control Coding

Theme 2: Markov and the Statistical Analysis of Systems with Memory

Synthesis: Turbo Error Control: Parallel Concatenated Encoding and Iterative Decoding

Page 3: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Digital Telecommunication

The classical design problem: transmitter power vs. bit error rate (BER)

Complications:– Physical Distance– Co-Channel and Adjacent Channel Interference– Nonlinear Channels

Page 4: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Shannon and Information Theory

Noisy Channel Coding Theorem (1948): – Every channel has a capacity C.– If we transmit at a data rate that is less

than capacity, there exists an error control code that provides arbitrarily low BER.

For an AWGN channel:

C =Wlog2 1+Es

N0

⎝ ⎜ ⎞

⎠ ⎟ bits per second

Page 5: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Coding Gain

Coding Gain: PUNCODED - PCODED

– The difference in power required by the uncoded and coded systems to obtain a given BER.

NCCT: Almost 10dB possible on an AWGN channel with binary signaling.

1993: NASA/ESA Deep Space Standard provides 7.7 dB.

Page 6: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Classical Error Control Coding

MAP Sequence Decoding Problem: – Find X that maximizes p(X|Y).– Derive estimate of U from estimate of X.– General problem is NP-Hard - related to many

optimization problems.– Polynomial time solutions exist for special

cases.

EncoderU=(u1, ... , uk) X

NoisyChannel

Y

Page 7: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Class P Decoding Techniques

Hard decision: MAP decoding reduces to minimum distance decoding.

Example: Berlekamp algorithm (RS codes) Soft Decision: Received signals are quantized. Example: Viterbi algorithm (Convolutional

Codes) These techniques do NOT minimize

information error rate.

Page 8: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Binary Convolutional Codes

Memory is incorporated into encoder in an obvious way.

Resulting code can be analyzed using state diagram.

Page 9: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Trellis for a Convolutional Code

Page 10: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Trees and Sequential Decoding

Convolutional code can be depicted as a tree. Tree and metric define a metric space. Sequential decoding is a local search of a

metric space. Search complexity is a polynomial function of

memory order. May not terminate in a finite amount of time. Local search methodology to return...

Page 11: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Theme 2: Markov and Memory

Markov was, among many other things, a cryptanalyst.– Interested in the structure of written text.– Certain letters are can only be followed by

certain others. Markov Chains:

– Let I be a countable set of states and let be a probability measure on I.

Page 12: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

– Let random variable S range over I and set i = p(S = i)

– Let P = {pij} be a stochastic matrix with rows and columns indexed by I.

– S = (Sn)n≥0 is a Markov chain with initial distribution and transition matrix P if

- S0 has distribution

- p(Sn+1 | S0, S1, S2, …, Sn – 1, Sn) = P(Sn+1 | Sn) = pij

Page 13: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Hidden Markov Models

HMM :

– Markov chain X = X1, X2, …

– Sequence of r.v.’s Y = Y1, Y2, … that are a probabilistic function f() of X.

Inference Problem: Observe Y and infer:– Initial state of X– State transition probabilities for X– Probabilistic function f()

Page 14: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Hidden Markov Models are Everywhere...

Duration of eruptions by Old Faithful Movement of Locusts (Locusta Migratoria) Suicide rate in Capetown, SA. Progress of epidemics Econometric models Decoding of convolutional codes

Page 15: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Baum-Welch Algorithm

Lloyd Welch and Leonard Baum developed iterative solution to the HMM inference problem (~1962).

Application-specific solution was classified for many years.

Published in general form:– L. E. Baum and T. Petrie, “Statistical Inference for

Probabilistic Functions of Finite State Markov Chains,” Ann. Math. Stat., 37:1554 - 1563, 1966.

Page 16: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

BW Overview

Member of the class of algorithms now known as “Expectation-Maximization”, or “EM” algorithms.

– Initial hypothesis 0

– Series of estimates generated by the mapping i = T(i-1)

– P(0) ≤ P(1) ≤ P(2) ≤ … , where is the maximum likelihood parameter estimate.

limi→ ∞

i

Page 17: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Forward - Backward Algorithm: Exploiting the Markov Property

Goal: Derive probability measure p(xj, y).

BW algorithm recursively computes ’s and ’s.

p x j ,y( ) =p xj ,y j−( ) ⋅p yj |xj ,y j

−( ) ⋅p yj+ |xj ,yj ,y j

−( )

=p xj ,y j−( ) ⋅p yj |xj( )⋅p y j

+ |xj( )

= xj( )↑

past

⋅ γ xj( )↑

present

⋅β x j( )↑

future

Page 18: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Forward and Backward Flow

Define flow (xi, xj) to be the probability that a random walk starting at xi will terminate at xj.

(xj) is the forward flow to xj at time j.

(xj) is the backward flow to xj at time j.

x j( ) = p x j ,y j−

( ) = α x j−1( )Q x j | x j−1( )γ x j−1( )x j−1∈X j−1

x j( ) = p y j+ | x j( ) = Q x j+1 | x j( )γ x j+1( )β x j+1( )

x j+1∈X j+1

Page 19: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Earliest Reference to Backward-Forward Algorithm

Several of the woodsmen began to move slowly toward her and observing them closely, the little girl saw that they were turned backward, but really walking forward. “We have to go backward forward!” cried Dorothy. “Hurry up, before they catch us.”

– Ruth Plumly Thompson, The Lost King of Oz, pg. 120, The Reilly & Lee Co., 1925.

Page 20: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Generalization: Belief Propagation in Polytrees

Judea Pearl (1988) Each node in a

polytree separates the graph into two distinct subgraphs.

X D-separates upper and lower variables, implying conditional independence.

Page 21: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Spatial Recursion and Message Passing

Page 22: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Synthesis: BCJR

1974: Bahl, Cocke, Jelinek, and Raviv apply portion of BW algorithm to trellis decoding for convolutional and block codes.– Forward and backward trellis flow: APP

that a given branch is traversed.– Info bit APP: sum of probabilities for

branches associated with particular bit value.

Page 23: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

BW/BCJR

u j( ) uj( )γ u j( )

Page 24: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Synthesis Crescendo:Turbo Coding

May 25, 1993: G. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon-Limit Error Correction Coding: Turbo Codes.”

Two Key Elements: – Parallel Concatenated Encoders– Iterative Decoding.

Page 25: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Parallel Concatenated Encoders

One “systematic” and two parity streams are generated from the information.

Recursive (IIR) convolutional encoders are used as “component” encoders.

Page 26: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Recursive Binary Convolutional Encoders

Page 27: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Impact of the Interleaver

Only a small number of low-weight input sequences are mapped to low-weight output sequences.

The interleaver ensures that if the output of one component encoder has low weight, the output of the other probably will not.

PCC emphasis: minimize number of low weight code words, as opposed to maximizing the minimum weight.

Page 28: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

The PCE Decoding Problem

CC2

Encoder

Interleaver

CC1

Encoder

U =( u

1

, ... , u

k

)

X

1

U

Channel

Y

1

=( y

1

, ... , y

1

)

1 k

Y

s

=( y

s

, ... , y

s

)

1 k

X

2

Y

2

=( y

2

, ... , y

2

)

1 k

BELi a( ) =p(ui =a|y)

= i a( )systematic

term

{ ⋅π i a( ) a prioriterm

{ ⋅ p(y1 |x1)p(y2 |x2 ) j uj( )π j uj( )j=1j≠i

k

∏u:ui =a∑

extrinisic term1 2 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4

Page 29: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Turbo Decoding

BW/BCJR decoders are associated with each component encoder.

Decoders take turns estimating and exchanging distribution on information bits.

Page 30: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Alternating Estimates of Information APP

BELi1 a( ) =α λi a( )

systematicterm

{ ⋅π i2 a( )

updatedterm

1 2 3 ⋅ p(y1 |x1) λ j uj( )π j uj( )j=1j≠i

k

∏u:ui =a∑

extrinisic term1 2 4 4 4 4 4 3 4 4 4 4 4

BELi2 a( ) =α λi a( )

systematicterm

{ ⋅πi1 a( )

updatedterm

1 2 3 ⋅ p(y2 |x2) λj uj( )π j uj( )j=1j≠i

k

∏u:ui =a∑

extrinisic term1 2 4 4 4 4 4 3 4 4 4 4 4

Decoder 1: BW/BCJR derives

Decoder 2: BW/BCJR derives

Page 31: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Converging Estimates

Information exchanged by the decoders must not be strongly correlated with systematic info or earlier exchanges.

πim( ) a( ) =

αPr ui =a |Ys =ys,Y1 =y1{ }

λi a( )πim−1( ) a( )

if m is odd

αPr ui =a|Ys =ys,Y2 =y2{ }

λi a( )πim−1( ) a( )

if m is even

⎨ ⎪ ⎪

⎩ ⎪ ⎪

Page 32: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Impact and Questions

Turbo coding provides coding gain near 10dB – Within 0.3 dB of the Shannon limit.– NASA/ESA DSN: 1 dB = $80M in 1996.

Issues:– Sometimes turbo decoding fails to correct all

of the errors in the received data. Why?– Sometimes the component decoders do not

converge. Why?– Why does turbo decoding work at all?

Page 33: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Cross-Entropy Between the Component Decoders

Cross entropy, or the Kullback-Leibler distance, is a measure of the distance between two distributions.

Joachim Hagenauer et al. have suggested using a cross-entropy threshold as a stopping condition for turbo decoders.

D= π 1 uj =a|Y( )a=0

1

∑j=1

N

∑ logπ1 uj =a |Y( )π 2 uj =a|Y( )

Page 34: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Correlating Decoder Errors with Cross-Entropy

Page 35: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Neural Networks do the Thinking

Neural networks can implement any piecewise-continuous function.

Goal: Emulation of indicator functions for turbo decoder error and convergence.

Two Experiments: – FEDN: Predict eventual error and convergence

at the beginning of the decoding process.– DEDN: Detect error and convergence at the

end of the decoding process.

Page 36: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Network Performance

Missed detection occurs when number of errors is small.

The average weight of error events in NN-assisted turbo is far less than that of CRC-assisted turbo decoding.

When coupled with a code combining protocol, NN-assisted turbo is extremely reliable.

Page 37: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

What Did the Networks Learn?

Examined weights generated during training. Network monitors slope of cross entropy (rate

of descent). Conjecture:

– Turbo decoding is a local search algorithm that attempts to minimize cross-entropy cycles.

– Topology of search space is strongly determined by initial cross entropy.

Page 38: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Exploring the Conjecture

Turbo Simulated Annealing (Buckley, Hagenauer, Krishnamachari, Wicker)– Nonconvergent turbo decoding is nudged

out of local minimum cycles by randomization (heat).

Turbo Genetic Decoding (Krishnamachari, Wicker)– Multiple processes are started in different

places in the search space.

Page 39: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

Turbo Coding: A Change in Error Control Methodology

“Classical” response to Shannon: – Derive probability measure on transmitted

sequence, not actual information.– Explore optimal solutions to special cases of

NP-Hard problem.– Optimal, polynomial time decoding algorithms

limit choice of codes.

Page 40: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

“Modern”: Exploit Markov property to obtain temporal/spatial recursion:– Derive probability measure on information, not

codeword– Explore suboptimal solutions to more difficult

cases of NP-Hard problem.– Iterative decoding – Graph Theoretic Interpretation of Code Space– Variations on Local Search

Page 41: Markov, Shannon, and Turbo Codes: The Benefits of Hindsight Professor Stephen B. Wicker School of Electrical Engineering Cornell University Ithaca, NY

The Future Relation of cross entropy to impact of cycles in

belief propagation. Near-term abandonment of PCE’s as unnecessarily

restrictive. Increased emphasis on low density parity check

codes and expander codes.– Decoding algorithms that look like solutions to K-

SAT problem.– Iteration between subgraphs.– Increased emphasis on decoding as local search.