multidimensional data analysis : the blind source separation problem. outline : blind source...

43
Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis : Karhunen-Loeve Basis Gaussianity Mainstream Independent Component Analysis Spectral Matching ICA Diversity and separability Multiscale representations and sparsity Applications : MCA, MMCA, ICAMCA, denoising ICA, etc.

Upload: dominic-garrett

Post on 23-Dec-2015

242 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Multidimensional Data Analysis :the Blind Source Separation problem.

Outline :• Blind Source Separation

• Linear mixture model

• Principal Component Analysis : Karhunen-Loeve Basis

• Gaussianity

• Mainstream Independent Component Analysis

• Spectral Matching ICA

• Diversity and separability

• Multiscale representations and sparsity

• Applications : MCA, MMCA, ICAMCA, denoising ICA, etc.

Page 2: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Blind Source SeparationExamples :• The cocktail party problem

Page 3: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Blind Source SeparationExamples :• The cocktail party problem

• Foregrounds in Cosmic Microwave Background imaging

– Detector noise– Galactic dust– Synchrotron– Free - Free– Point Sources– Thermal SZ– …

Page 4: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Blind Source SeparationExamples :• The cocktail party problem

• Foregrounds in Cosmic Microwave Background imaging

• Fetal heartbeat estimation

Page 5: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Blind Source SeparationExamples :• The cocktail party problem

• Foregrounds in Cosmic Microwave Background imaging

• Fetal heartbeat estimation

• Hyperspectral imaging for remote sensing

Page 6: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Blind Source SeparationExamples :• The cocktail party problem

• Foregrounds in Cosmic Microwave Background imaging

• Fetal heartbeat estimation

• Hyperspectral imaging for remote sensing

Differentes views on a single scene obtained with different instruments.COHERENT DATA PROCESSING IS REQUIRED

Page 7: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A simple mixture model

• linear, static and instantaneous mixture• map observed in a given frequency band sums the

contributions of various components• the contributions of a component to two different detectors

differ only in intensity. • additive noise • all maps are at the same resolution

Page 8: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A simple mixture model

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

23 GHz

33 GHz

41 GHz

94 GHz

Free-

free

CM

B

Syn

chro

tron

Page 9: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A simple mixture model

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

23 GHz

33 GHz

41 GHz

94 GHz

Free-

free

CM

B

Syn

chro

tron

Page 10: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

• Processing coherent observations.

• All component maps are valuable !

• How much prior knowledge?– Mixing matrix, component and noise spatial covariances known

[Tegmark96, Hobson98].– Subsets of parameters known.– Blind Source Separation assuming independent processes

[Baccigalupi00, Maino92, Kuruoglu03, Delabrouille03].

• Huge amounts of high dimensional data.

Foreground removal as a source separation problem.

Page 11: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Simplifying assumption

• The noiseless case : X = AS

• Can A and S be estimated (up to sign, scale and permutation indeterminacy) from an observed sample of X?

Solution is not unique.Prior information is needed.

Page 12: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

• Estimation of A and S assuming uncorrelated sources and orthogonal mixing matrix.

• Practically : – Compute the covariance matrix of the data X– Perform an SVD : the estimated eigenvectors are the lines of A

• Adaptive approximation.• Dimensionality reduction;• Used for compression and noise reduction.

Karhunen-Loeve Transform a.k.a.Principal Component Analysis

Page 13: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Application :

original image

Page 14: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Application :

KL basis estimated from 12 by 12 patches.

Page 15: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Application :

keeping 10 percent of basis elementscompression

Page 16: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Application :

keeping 50 percent of basis elementscompression

Page 17: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

• Still in the noiseless case : X = AS• Goal is to find a matrix B such that the entries of Y = BX are

independent.

• Existence/Uniqueness : – Most often, such a decomposition does not exist.– Theorem [Darmois, Linnik 1950] : Let S be a random vector with

independent entries of which at most when is Gaussian, and C an invertible matrix. If the entries of Y = CS are independent, then C is almost the identity.

Independent Component Analysis

Goal will be to find a matrix B such that the entries of Y = BX are as independent as possible.

Page 18: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

• Independence :

Independence or De-correlation?

• De-correlation measures linear independence: E( (x - E(x)) (y - E(y)) ) = 0

• Mutual information or an approximation is used to assess statistical independence:

Page 19: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

ICA for gaussian processes?

Two linearly mixed independent Gaussian processes do not separate uniquely into two independent components. Need diversity from elsewhere.

• Algorithms based on non-gaussianity i.e. higher order statistics.Most mainstream ICA techniques: fastICA, Jade, Infomax, etc.

•Techniques based on the diversity (non proportionality) of variance (energy) profiles in a given representation such asin time, space, Fourier, wavelet : joint diagonalization of covariance matrices, SMICA, etc.

Different classes of ICA methods

Page 20: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Example : Jade • First, whitening : remove linear dependencies

Page 21: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Example : Jade • First, whitening : removes linear dependencies, and normalizes the

variance of the projected data.

• Then, minimize a measure of statistical dependence ie by finding a rotation that minimizes fourth order cross-cumulants.

Page 22: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Example : Jade • First, whitening : removes linear dependencies, and normalizes the

variance of the projected data.

• Then, minimize a measure of statistical dependence ie by finding a rotation that minimizes fourth order cross-cumulants.

Page 23: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Application to SZ cluster map restoration

Input maps

Page 24: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Application to SZ cluster map restoration

Output map Simulated map

Page 25: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

ICA of a mixture of independent Gaussian, stationary, colored processes (1)

• Same mixture model in Fourier Space : X(l) = A S(l)

• Parameters : the n*n mixing matrix A and the npower spectra DSi(l), for l = 1 to T .

• Whittle approximation to the likelihood :

Page 26: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

ICA of a mixture of independent Gaussian, stationary, colored processes (2)

• Assuming the spectra are constant over symmetric subintervals, maximizing the likelihood is the same as minimizing:

• D turns out to be the Kulback Leibler divergence between two gaussian distributions.

• After minimizing with respect to the source spectra, we are left with a joint diagonality criterion to be minimized wrt B = A-1. There are fast algorithms.

Page 27: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A Gaussian stationary ICA approach to the foreground removal problem?

• CMB is modeled as a Gaussian Stationary process over the sky.• Working with covariance matrices achieves massive data

reduction, and the spatial power spectra are the main parameters of interest.

• Easy extension to the case of noisy mixtures.• Connection with Maximum-Likelihood guarantees some kind

of optimality.• Suggests using EM algorithm.

• [Snoussi2004, Cardoso2002, Delabrouille2003]

Page 28: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Spectral Matching ICA

• Back to the case of noisy mixtures :

• Fit the model spectral covariances :

• to estimated spectral covariances :

• by minimizing :

• with respect to :

Page 29: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Algorithm

• EM can become very slow, especially as noise parameters become small.

• Speed up convergence with a few BFGS steps.

• All /part of the parameters can be estimated.

Page 30: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Estimating component maps

• Wiener filtering in each frequency band :

• In case of high SNR, use pseudo inverse :

Page 31: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A variation on SMICA using wavelets

• Motivations :– Some components, and noise are a priori not stationary.– Galactic components appear correlated.– Emission laws may not be constant over the whole sky.– Incomplete sky coverage.

Page 32: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A variation on SMICA using wavelets

• Motivations :– Some components, and noise are a priori not stationary.– Galactic components are spatially correlated.– Emission laws may not be constant over the whole sky.– Incomplete sky coverage.

Page 33: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A variation on SMICA using wavelets

• Motivations :– Some components, and noise are a priori not stationary.– Galactic components appear correlated.– Emission laws may not be constant over the whole sky.– Incomplete sky coverage.

Page 34: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

A variation on SMICA using wavelets

• Motivations :– Some components, and noise are a priori not stationary.– Galactic components appear correlated.– Emission laws may not be constant over the whole sky.– Incomplete sky coverage.

Use wavelets to preserve space-scale information.

Page 35: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

The à trous wavelet transform

• shift invariant transform

• coefficient maps are the same size as the original map

• isotropic

• small compact supports of wavelet and scaling functions (B3 spline)

• Fast algorithms

Page 36: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

wSMICA

Model structure is not changed :

Objective function :

Same minimization using EM algorithm and BFGS.

Page 37: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Experiments Simulated Planck HFI observations at high galactic latitudes :

Component maps

Mixing matrix

Planck HFI nominal noise levels

Page 38: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Experiments Simulated Planck HFI observations at high galactic latitudes :

Page 39: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Experiments

Contribution of each component to each mixture as a function of spatial frequency :

Page 40: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Results

Jade wSMICA initial maps

Page 41: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Results

Errors on the estimated emission laws :

Page 42: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Results

Rejection rates : where

Page 43: Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis

Multiscale transforms and the quest for sparsity