using neural networks in communication problems theory and ... · an information theoretic...

31
Using Neural Networks in Communication Problems – Theory and Examples Lizhong Zheng MIT Globecom, December 10, 2019 Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 1 / 32

Upload: others

Post on 24-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Using Neural Networks in Communication Problems

– Theory and Examples

Lizhong Zheng

MIT

Globecom, December 10, 2019

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 1 / 32

Page 2: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Collaborators

Shao-lun Huang Xiangxiang Xu Anuran Makur David Qiu Mohamed AlHajri

Greg Wornell Lingjia Liu Zhou Zhou Jing Liang

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 2 / 32

Page 3: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Introduction

The Success of Neural Networks, and us

Computer Vision & NLP:

complex problems with no

clear model

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 3 / 32

Page 4: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Introduction

Use NN in Communication Problems?

The next revolution in communication networks.

Difficulties

Use too much resources, computation and samples;

Domain knowledge and reusable solutions;

No guarantee, optimality and robustness;

Some problems are particularly hard for SGD;

“It is human nature to prefer simplicity”

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 4 / 32

Page 5: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Introduction

In This Talk:

An information theoretic interpretation of learning in neural networks

An example to use NN for a physical layer communication problem

Simplify and Specialize

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 5 / 32

Page 6: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Information Theory for NN

Neural Networks as Information Processing

Processing result:

Features S(x)

Corresponding features

v(y)

QY |X ∝ exp[ST (x)·v(y)]

Not sufficient to represent the true model.

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 6 / 32

Page 7: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Information Theory for NN

A Little Machinery: Local Geometry

Variation of distribution: Prior P0 → Posterior Q,

Can write in vector form

P0 → Q = [Q(x)− P0(x), x ∈ X ]

LLR( QP0

) =[log Q(x)

P0(x), x ∈ X

]≈

[Q(x)−P0(x)

P0(x), x ∈ X

]

Information Vector, with reference P0,

φ(Q) =

[Q(x)− P0(x)√

P0(x), x ∈ X

]

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 7 / 32

Page 8: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Information Vector for Feature Functions

A feature function f : X 7→ R

w.o.l.g. require EP0 [f (X )] = 0.

Recall: LLR( QP0

) =[log Q(x)

P0(x), x ∈ X

]≈

[Q(x)−P0(x)

P0(x), x ∈ X

]

Evaluating any feature function is equivalent as computing an LLR, or

making a binary decision, or estimate a scalar parameter.

Information vector for a feature function

φ(f ) =[√

P0(x) · f (x), x ∈ X]

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 8 / 32

Page 9: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Turning Things Euclidean

In functional space:

Euclidean norm: ‖φ(f )‖2 = varP0 [f (X )]

Inner product: 〈φ(f1), φ(f2)〉 = EP0 [f1(X )f2(X )]

orthogonal features are uncorrelated (no repetitive information).

In distribution space

K-L divergence: D(P||Q) ≈ ‖φ(P) − φ(Q)‖2

length of information vector measure the information volume.

Inner product ↔ Fisher information

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 9 / 32

Page 10: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Empirical Average

P0

bP $ �

⌫ $ f

X1, . . . ,Xn i.i.d. from P0

Empirical distribution P̂ ↔ φ

For a feature function f ↔ ν

The empirical average

1

n

n∑i=1

f (xi ) = Ep̂ [f (X )] =∑x

P̂(x) · f (x)

=∑x

(P̂(x)− P0(x)) · f (x)

=∑x

P̂(x)− P0(x)√P0(x)

·√

P0(x)f (x) = 〈φ, ν〉

Taking a feature as projection of info. vector.

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 10 / 32

Page 11: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Joint Distribution

For a joint distribution PXY

Use PXPY as the reference,

Weak dependence,

Canonical Dependence Matrix (CDM): B ∈ R|Y|×|X |

B(x , y) =PXY (x , y)− PX (x)PY (y)√

PX (x)PY (y), x ∈ X , y ∈ Y

Mutual information:

I (X ;Y ) = D(PXY ||PXPY ) ∝ ‖B‖2

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 11 / 32

Page 12: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Decomposition of Mutual Information

SVD B =∑

i σi · ψi· φT

i

Recall I (X ;Y ) ∝ ‖B‖2 =∑

i σ2i

Dependence between X ,Y can be written as a number of modes

Similar statement in common information

HGR maximal correlation: given X ,Y ∼ PXY

maxf ,g

corr[f (X ), g(Y )] = σ1, f ∗(x) =φ1(x)√PX (x)

, g∗(y) =ψ1(y)√PY (y)

CCA, correspondence analysis.Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 12 / 32

Page 13: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Neural Network with Discrete Inputs

True PXY and Empirical

P̂XY ;

Try to fit

minQ

D(P̂XY ||QXY )

Ideal network.

Softmax regression:

Q(k)XY (x , y) ∝ PX (x)PY (y) exp

[k∑

i=1

si (x)vi (y)

]

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 13 / 32

Page 14: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Guess What Would an Ideal NN Do?

Q(k)XY (x , y) ∝ PX (x)PY (y) exp

[k∑

i=1

si (x)vi (y)

]

B̂(k) =

k∑

i=1

√PX (x)si (x)︸ ︷︷ ︸

φi

·√

PY (y)vi (y)︸ ︷︷ ︸ψi

Low rank approximation: min ‖B̂(k) − B‖2,

s∗i (x) =φi (x)√PX (x)

, v∗i (y) =ψi (y)√PY (y)

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 14 / 32

Page 15: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

NN as SVD Solver

Capture the most significant k

modes of dependence, without

further bias.

S(x) =φ∗(x)√PX (x)

v(y) =ψ∗(y)√PY (y)

Who did the SVD? Backprop - ACE - Power method

v(y) = E [S(X )|Y = y ] ψ = B · φ

S(x) = E [v(Y )|X = x ] φ = BT · ψ

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 15 / 32

Page 16: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

Connections to Information Theory

Decomposition of mutual information, common randomness, ...

HGR maximal correlation;

Low rank matrix completion for models;

Universal feature: maximize average relevance to unknown queries.

The most “learn-able” partial model.

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 16 / 32

Page 17: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Local Geometry

More Realistic Neural Networks

Network structure puts a limit on what

feature functions can be generated;

Approximate the ideal feature function

with Euclidean errors;

Every quantity has a name;

Many equivalent representations, need

canonical basis (whitening).

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 17 / 32

Page 18: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Use This in Communication Problems

Where to use?

When there is no clear model, no optimal solution;

Non-linear, None-Gaussian.

Physical layer vs. higher layer.

Focus the learning power

Combine with classical processing.

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 18 / 32

Page 19: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Symbol Detection over Interference Channel

Y = h · X + W

X ∈ QAM / PAM, CSIR: h is known at the receiver;

W is non-Gaussian, with fixed unknown PDF pW .

Standard solution: Linear MMSE /w min. distance.

NN solution:

How do we use the knowledge of h, the QAM?

Train with one channel realization / SNR, use for another?

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 19 / 32

Page 20: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

First Simple Step: Regularity of PAM

Y = h · X + W , X ∈ {−3,−1,+1,+3}

Observation p(y |X = +1) = p(y + 2h|X = +3)...

Reuse binary decision

modules.

CNN over amplitude.

Reuse of training

samples.

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 20 / 32

Page 21: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Transferable Knowledge

Y = hs · X + W , X ∈ {−1,+1}

Suppose we trained network for hs , can we use it for a different ht?

Knowledge of the pdf pW , where is it stored, how to use it?

Concrete example, interference W from PAM, but the receiver

doesn’t know it.

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 21 / 32

Page 22: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Source Problem: Linear Approach

Y = hs · X + W , X ∈ {−1,+1}

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 22 / 32

Page 23: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Ground Truth for the Source Problem

Y = hs · X + W , X ∈ {−1,+1}

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 23 / 32

Page 24: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

What Does the NN Learn?

Y = hs · X + W , X ∈ {−1,+1}

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 24 / 32

Page 25: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Make NN Work Harder

Lower the SNR:

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 25 / 32

Page 26: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Make NN Work Harder

Throw in some “sand”:

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 26 / 32

Page 27: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Make NN Work Harder

Throw in some “sand”:

Knowledge Region:

(y ± hs)TK−1Z (y ± hs) < γ

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 27 / 32

Page 28: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Make NN Work Harder

Throw in some “sand”:

Knowledge Region:

(y ± hs)TK−1Z (y ± hs) < γ

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 28 / 32

Page 29: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Transfer of Knowledge

Trained for Y = hs · X + Z ;

Target problem Y = ht · X + Z

same interference structure;

different fading ht .

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 29 / 32

Page 30: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Detection Problem

Receiver Structure

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 30 / 32

Page 31: Using Neural Networks in Communication Problems Theory and ... · An information theoretic interpretation of learning in neural networks An example to use NN for a physical layer

Conclusion

Concluding Remarks

Focus the NN learning power to the non-linear, non-Gaussian,

non-ideal part of the problem, and fill in the rest;

Theoretic understanding:

Processing at the input, output, or in the middle of NNs;

Choose features by the “relevance” metric;

Provable guarantees;

A spectrum of methods from more “COMM” ones to more “NN”

ones;

Lizhong Zheng (MIT) Using NN for Comm. Problems Globecom, December 10, 2019 31 / 32