was - library and archives canada · acknowledgement i have received both constant encouragement...

NOTE TO USERS

Page(s) not included in the original manuscript are unavailable from the author or university. The

manuscript was microfilmed as received.

This reproduction is the best copy available.

MODEL-BASED CLUSTERING

ALGORITHMS, PERPORMANCE AND APPLICATION

JUN LIU

MJng., Shanghai Jiao Tong University

B.%, Huazhong University of Science & Techwlogy

A Thesis - - - - - - - - - - - - - - - - -

Submitted to the School of Graduate Studies

in Partial F'uiûlment of the bquirernents

for the D e p

Ph. D.

McMaster University

uisilions and A c q u m e t r@icSeniiess seivieeobibbgmphiques

The author has granted a non- exclusive licence dowing the National Library of Canada to feproduct, 1 0 4 distriaute or s e l copies of this thesis in rnicrofm paper or electronic fonmts.

The author retains ownership of the copyright in this thesis. Neither the

L'auteur a accordé une licence non exc1usive permettant A la Bibliothéque nationale du Canada de reproduire, prêter, distr'buer ou vendre des copies de cette thèse sous la f m e de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur consente la propriété du droit d'auteur qui protège cette thèse.

thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimes reproduced without the author's ou auûement reproduits sans son permission. autorisation.

MODELBASED CLUSTERING

ALGORITHMS, PERFORMANCE AND APPLICATION

PH. D. (2000)

(Electrical aud Computer Engineering)

TITLE:

AUTHOR:

MCMASTER UNiVERSITY

H d t o n , Ontaio

Model-Based Clustering

Algorithms, Pedormance And Application

Jun Liu

M.Eng., Shanghai Jiao Tong Univemity

B.Sc., Huashong Univaeity of Science & Technology

SUPERVISOR(S) : Dr. K.M. Wong and Dr. ZPQ. Luo

P r o f ~ r s ,

Department of Elect r i d and Computer Engineering

NUMBER OF PAGES: xv, 120

ABSTRACT

The main contributions of this thesis are the development of new clustering algorithms (with

duster validation) both oE4he and on-line, the performance analpis of the new aigorithma

and their applications to intrapulse analysis.

Bayesian inference and minimum eneoding inference incluàing Wallace's minimum ws-

sage length (Mm) and Rissane& minimum description length (MDL), are reviewed for

model selection. It i fouid that the MML eoding length is more accurate than the otber

tno in the view of quantization. By introducing a penalty weight, ail aiteria considaad

here are cast into the framewotk of a pen- likelihd method.

Based on minimum encoàing inference? an appropriate measure of coding length is

proposed for cluster validation, and the coding lengths under four dinerent Gaussian mixture

models are fully d e r i d . This provides us with a criterion for the development of a new

clustering algorithm. Judging h m the performance cornparison with other dgorithms, the

new clustering algorithm is more suitable to process high dimensional data with satisfactory

performance on a m d i and medium samples. This clustering algorithm is off-line because it

requires al1 the data available at the same tirne.

The theoreticai error performance of our clustering algorithm is evaluated under rea-

sonable assumptions. It is shown here how the dimension of data space, the sample size,

the mbchg portion and the inter-cluster distance a f k t the performance of our clustering

algorithm to detect the true number of clusters. Furt hermore, we examine the impact of the

penalty weight under the fhmework of the penaüzed likelihood method. It is found that

there is a range of the penaîty d g h t within which the best performance of our clustering

iii

algorithm can be achieved. Therefore, with some supervision we couid adjust the penalty

weight to further i m p m the performance of our clustering algorithm.

The application of o u clustering algorithm to intrapulse analysis is investigated in d e

tail. We b t develop the pre-proceslling techniques including data compression for received

pulses and formulateci the problem of emitter number detection and pulseexnitter asmi-

ation into a multivariate clustering problem. ARer applying the above (otT-line) clustering

algorithm here, we further develop two on-he clustering algorithms, one is based on some

known thresholds while the other is b a d on a model-based detection scheme. Performance

on intrapuh data by using our pre-processing techniques and clustering algorithms is r e

ported, and the results demonstrate that our new clustering aigorithms are very dectiw

for intrapulse d y s i s , especially the model-bad on-line algorithm.

Finally, the DSP implementation for intrapuise adpis is c o n s i d d Some relevant

phpical parameters are estimateci such as the likely maximal inabming p h rate. Then

a suitable system diagram is p r o p d and its system requirements are investigated. Our

on-line clustering aigorithm is implemented as a core classification module on a TMS320C4-4

DSP board.

Acknowledgement

I have received both constant encouragement and expert supervision fiom my supervisors

Dr. K.M. Wong and Dr. Z.Q. Luo. h m both 1 have leamed a great deal, of whida only a

part is in this thesis.

I wïsh to express aiacere gratitude to supeMsory committee members Dr. J.P. M y , Dr.

S. Qiao and Dr. T. Todd for their encouragement and ~upervision. 1 am espeadly pleased

to thank my external advisor Dr. J.P.Y. Lee, h m the Defense Research Establishment

Ottawa, for hb encouragement and expert guidance.

1 aas fortunate to have been part of the Advanced Signal Processing for Communica-

tions (ASPC) group led by Dr. K.M. Wong and Dr. Z.Q. Luo. 1 wouid like to thanlr fdow

ASPCers: Dr. T.N. Davidson, Dr. S.Q. Wu, Dr. S.W. Gao, Dr. J. Wu, Mr. L. Li, Mr. J.

Zhang for th& valuable help. 1 wodd ako îike to adtnowledge the financial support p n

vided by the Department of Electrical and Cornputer Engineering and the Defense Research

Establishment Ottawa

Finally, 1 am indebted to my parents Qiji Liu and Ande Chen, especiaiiy my d e Dieqian

Han, for their underatandhg aad great support.

Acronyms

BIC

MML

MDL

Lm

p.d.f.

ML

LPF

HPF

PR1

PRF

EDF

Bayesian Interence Criterion

Minimum Mcsss(le Length

Minimum Description Length

Likelihood Ftatio Test

pmbabiity density function

Maximum Likelihaad

Law Pass Filter

High Pasa Filter

Pulse Repetition Interval

Pulse Repetition Frequency

rotation per minute

floating point operation

Static Random Acceaa Memory

Empirical Distribution Fûnction

Notations

Var [RI

Naturai logarit hm

Siimmation over n

Product over n

7 h u l s ~ of W

Inverse of w Tkace of w Determinant of W

DiagonaJ matrix of W

L2 nom of p

Expectation value of R

Vasiance of R

Muhivariate normal distribution

with mean vector p and covariance mat& C

Noncentrd F &tri bution

with ul , v2 degrees of fieedom and noncentral parameter 6

Noncentral Wishart distribut ion

with the number of variatm M, fieedom degree n,

covariance matrix C and noncentral matrix n

Contents

ABSTRACT iii

Notations vii

1 Introduction 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Intrapulse Analysis 1

. . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Model-Ba23ed Cluster Analysia 2

. . . . . . . . . . . . . . . . . . . . . . . 1.3 Major Contributions of The Thesis 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Outline of The Thesis 8

2 Mode1 Selection Criteria 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Bayesian Inference 10

. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 M i u m Enooduig Merence 12

. . . . . . . . . . . . . . . . . . . . . 2.3.1 Riseanen's Deecription Length 13

. . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Wallace's Message Length 15

. . . . . . . . . . . . . . . . . . . 2.4 hamework: P e r d i d Likelibood Method 15

3 Model-Based Clustering

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 UltrOduction 17

. . . . . . . . . . . . . . . . . . . . . . . . 3.2 Genemi Coding Length Measure 18

3.3 Coding Lengths Under Gaussian Mixture Modeis . . . . . . . . . . . . . . . 19

. . . . . . . . . . . . . . . . 3.3.1 Covariance Structure 1: Cr = $I. Vk 22

3.3.2 COYBSi811ce Stnicture 2: Cr = 61. Vk . . . . . . . . . . . . . . . . 24

. . . . . . . . . . . . . . . . . 3.3.3 Covariance Structure 3: Ek = D. Vk 26

. . . . . . . . . . . . . . . . . 3.3.4 Covariance Structure 4: Ck = DA. Vk 28

. . . . . . . . . . . . . . . . . . . . . . 3.3.5 Summary of Coding Lengths 30

. . . . . . . . . . . . . . . . . . . . . . 3.4 A Mode1-Basecl Clustering Algorithm 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Procedure 31

. . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Computatiod Complexity 32

3.5 Cornparison with SNOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Experimental Resuits 36

3.7 S m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Detection Performance Analpis 45

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Probability of A Mias 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 ProbabilityofAhbeAlam 54

. . . . . . . . . . . . . . . . . . . . . . . 4.4 Optima Range of Penalty Weight 58

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Application ta InfrapurSe Anaipis 61

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction 61

. . . . . . . . . . . . . 5.2 Signal Mode1 And PmPmessing of Received PuLses 62

. . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Amplitude Normalization 64

. . . . . . . . . . . . . . . . 5.2.2 Tirne Aiigament Basd on Threeholding 64

. . . . . . . . . . . . 5.2.3 Phase Adjastment B d on Poiynomial Fitting 65

. . . . . . . . . . . 5.2.4 Data Compression Using Wavelet Decomposition 66

. . . . . . . . . . . . . . . . . 5.3 Clustering Algorithrmr for Intrapuise Anaiysis 68

. . . . . . . . . . . . . . 5.4 An On-line Clustering Algorithm Using Thtesholds 69

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Procedure 69

. . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Computational Complarity 71

. . . . . . . . . . . . . . . . . 5.5 A Omhe Model-Bad Clustering Algorithm 74

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Procedure 76

. . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Computational Complexity 78

. . . . . . . . . . . . . . . . . . 5.6 Numetid Experiments on Intra-pulse Data 82

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 PulseGeneration 82

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Examplel 83

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Conclusions 88

5.7 Summsry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

0 DSP Implementation 90

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Tntduction 90

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Physical Scenario Anaiysis 90

. . . . . . . . . . . . . . 6.2.1 Probability of Receiving Overkpped Pulses 91

. . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Receiving Pulse Sequence 92

. . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Nw-Far Phenornenon 92

. . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Maximal Incoming Puise Rate 93

. . . . . . . . . . . . . . . . . . . . . . 6.4 System Diagram And Requirements 94

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Pre+processing 94

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Initial Grouping 95

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 On-lhe Clustering 96

. . . . . . . . . . . . . . . . . . . . . . 6.5 C/DSP Coding of On-line Clustering 96

7 Conclusions 99

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Future Work 101

A The Value of S(N. 6)

D Mult ida te Notmality 108

D.l EDF Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1ûû

D.2 Multivariate Nornaiity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

D.3 Gaussianity Test of Compresseci Preprocessed f &es . . . . . . . . . . . . 110

List of Figures

. . . . . . . . . . . . . 3.1 The di- of our model-based dustering algorithm

3.2 Simulated data for M=22. where 2-8x51 is the index of data sample points

. . . . . . . . . . . . . . . . . . and y& is the amplitude of simulated data

4.1 Two Gaussian clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The illustration of Pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Mies probab'ity nuveci for two true clusters: M-l c=0.5 . . . . . . . . . . 4.4 Miss probabiity cunm for two true ciusters: M=l c=0.2 . . . . . . . . . . 4.5 Miss probability curves for two true dusters: M=2 c=0.5 . . . . . . . . . . 4.6 Mies probability eurves for two tme claeters: M=2 cz0.2 . . . . . . . . . .

. . . . . . . . . . 4.7 Miss probability curves for two tme clusters: M=22 c=0.5

. 4.8 Miss probability curves for two tme cluters: M=22 ~ 0 . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 One Gaussian cluster

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 The illustration of Pt

. . . . . . . . . . . 4.11 False probabiity curves for one tme cluster: M=l

. . . . . . . . . . . 4.12 False alarm probabiity curves for one true cluster: M=2

. . . . . . . . . . 4.13 Faise alarm probability curves for one true cluater: M=22

. . . . . . . . . . . . . . . . . . 5.1 Radar puises received for intrapuise adpis

5.2 Polynorniai fitting for phase aàjustment . . . . . . . . . . . . . . . . . . . . 5.3 Data compression using wavelet decomposition . . . . . . . . . . . . . . . . 5.4 The diagtam of OUI on-line clustering algorithm using thresholds . . . . . .

5.5 The àiagram of o u on-line model-based clustering algorithm . . . . . . . . 79

5.6 Amplitude and phase of 100 receiveà pulses fkom 5 unknown emit ters. where

. . . . . . . . . . . . . . . . . . . . 2-cuis is the index of data sample points 84

5.7 Amplitude and phase of the preproced puises. where 2-axis is the index

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . of data sample points 84

5.8 Amplitude and phase of the compreseed preprocessed pulses. where z-axis

. . . . . . . . . . . . . . . . . . . . . . . . is the index of data sample points 84

5.9 Determination of the number of emitters using our (off-line) clustering alg*

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nthm 85

. . . . . 5.10 Detamination of the nwber of emitters using the SNOB program 86

. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Phpical d o ezample 1 91

. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Phpical sanMo aample 2 91

. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Physieal S a s d o example 3 92

6.4 The DSP system diclgam for intrapulae d y s i s . . . . . . . . . . . . . . . 94

6.5 The tree structure for initial puping . . . . . . . . . . . . . . . . . . . . . 96

D.1 Red and imaginary parb of 50 simulateci pulses. where 2-axis is the index

. . . . . . . . . . . . . . . of data sampIe points aiid paxis is the amplitude 112

D.2 R d anà imaginsry parts of the compressed pte-pmessed pulses. w h m z-

. . . . . . axis is the index of data ample points and ysxis is the amplitude 112

List of Tables

3.1 Cluster validation results for two true clusters: M=l. c=0.5 . . . . . . . . . 40 3.2 Cluster validation resulte for two true clusters: M=l. c=0.2 . . . . . . . . . 40 3.3 Cluster validation reaults for one true ciuster: M=l . . . . . . . . . . . . . . 40 3.4 Clusta vslidation results for two tnie clustm: M=2.c=0.5 . . . . . . . . . 41

3.5 Cluster vaiidation results for two tnis dustem: M=2. c=0.2 . . . . . . . . . . 41

3.6 Cluster validation resulfs for one tnte clusfer: M=2 . . . . . . . . . . . . . . 41

3.7 Cluster validation mults for two true clusters: M=22. c=0.5 . . . . . . . . 42

3.8 Cluster validation muib for two true clusters: M=22. c=0.2 . . . . . . . . 42

3.9 Cluster validation results for one tme cluster: M=22 . . . . . . . . . . . . . 42

3.10 Cornparison of performance of SNOB and our algorithm. M=l . . . . . . . 43

3.11 Cornpariaon of performance of SNOB and our algorithm. M=2 . . . . . . . 43

3.12 Cornparison of performance of SNOB and our algorithm. Mt22 . . . . . . . 43

4.1 The critical distance O .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Limitingvaluesof(F',-pR, ) . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 Optimal ranges of the pedty weight . . . . . . . . . . . . . . . . . . . . . . 58

4.4 Cluster validation results for two true clusters: M=22 c=0.5 A = 1.1 . . . . 59

4.5 Cluster validation results for one true cluster: M=22 X = 1.1 . . . . . . . . 59

5.1 Clustering results for Example 1 by the off-line model-based clustering dg*

nthm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2 Clustering resdts for Example 1 by the SNOB aigorithm . . . . . . . . . . 86

xiv

Clustering results for Example 1 by the on-line clustering algorithm using

known tbresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustering reeults for Example 1 by the on-line model-based dus tering dg*

rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . DECCA Groupa SA and 12A relative motion marine radars

. . . . . . . . . . . . The benchmark of DSP codes of the on-line clustering

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Six EDF statistics

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifieci EDF statistics

Gawianity teat of originai (simulateci) pulses at si@cance levei 0.05 . . . Gaussianitv test of compressed ~re-processed pulses at siPnificance level0.05

Chapter 1

Introduction

Radar emitter classification beiisd on a collection of received radar si& is a subject of

wide intaest in both civii and military applicationa. The si& received ueudly consht of

sequences of pulses emitted fmm multiple radar transmitters. If M e m t radars transmit

putses with diaerent d e r hquenciea or pulse repetition intervals (PRIS), then it is not

difficult to distinguish them km one another. However, in modem radar systems, more

sophisticated signal waveforms have been useci and inter-pulse information alone may not

be enough to separate thaie receival p h according to their originations. To clasai@

radar emitters in such an environment, we need to explore the detailed etmcture inside

each putse, i.e. the so caiied intrrrptùae infoinietion. This is because each emitter has its

own electricai signal structure inside each of its trammittecl pulm due to both intentionai

and unintentional modulations. This structure motivates us to explore the passibiity of

using intrapube information of a coUection of pulses to determine the number of emitters

present and to classify t h m pulses according to their originations. In other woràs, the

objectives of the tesearch are to: (1) determine the number of emitters present; (2) classify

the incoming p h aecording to the exnitteni. The phpicai scenario in detail is iiiustrated

in Section 5.2.

T h e am t h re important isauea in the design of a processing algorithm for intrapulse

analysis:

rn The algorithm is suitable to proce88 high dimensional data because in m a t cases more

than 40 data points are té~uired to describe a pulse.

a The performance of the algorithm is satisfactory for small or medium sample cases

since it is desirable to identify the emitters present by using a few received pulses.

The algorithm is computationally d a t i v e and on-line clustering is requiseci for near

real-time implementation.

In practice, a radar puiae intercepted by a passive receiver may be contaminateci by an

absolute amplitude and phase, time delay and residual W e r fkquency. We fircrt develop

preprocessing techniqua, including data compression for received pubes and then formulate

our objectives into a multiraMte clustering problem. In the cluster aridysis literature

[l, 23,31,39,49,69], the k t objective is known as duster validation while the second b

d e d clwtering. G e n d y speaking, the current clustering methods range from thœe that

are largely heuristic to more formal procedures based on statistical models. One major

advantage of mdel-based methoda is that they provide a precise theoretical fiamework for

assessing the clustering stmcture of a given data set, especially for detenniaing a relevant

number of cluatere. In next section, model-based cluster d y s i s is discussed in detaü.

1.2 Model-Based Cluster Analysis

In model-based cluster analysis 128,391, it is asaumed that the data under consideration are

generated from a finite mixture of probability distributions (e.g. normal distributions) and

each component of the mixture represents a dinerent cluster. Given N observations from K

clusters, there are two ways to formulate the mixture mode]: one is the swalied classifica-

tion approach which aaeigns an observation to one of the K clusters deterministically, and

the other is the wwdlet i mixture approach which assigns an observation to the K clusters

probabiitiedly. An empiricai comparison [15] in a finite sample setting between these two

approaches suggested that the classifiication approach is preferred for small sample cases,

although h m the studies in [13,14,30], asymptotidy, the mixture approach tends to

perfom better than the classikation approsch when classimg ill-separated components,

with a sufticiently large sample.

Given a data set and underlying models, the finit question to be considered is how to

select a model which best fita the data? This is a critical question common to the fields of

Statistics, Machine Leamhg and Artifiwal Intelligence. Two principles can be applied to

search for the snawer, one is Bayesian Inference [l7,28,35, 48,541 and the other is Minimum

Encoding Inference [M, 52,62,64]. In the Bayesian h e w o r k , a model is ch- as the

best if it has higbest pceterior probability. In the minimum encoding inference fiamework,

the best model is the one that yields the minimri eoding length of the data. For the

latter principle to interpret the encoding p-, there are tao Merent approaches, one

by Wallace [Ml, termecl the Minimum Message Length (MML) criterion; and the other by

Rissanen [52], termed the Minimum Description Length (MDL) critenon. A comparative

study of Bayesian inference, MML and MDL waa reportecl in [8,44,45]. There will be a

further discussion of this important issue in the next chapter.

AAer establishing a criterion, either an ad hoc or a model-based, is chosen for cluster

validation. There is the question of how to ciassify the observations under an assumed

known numba of clusters. A simple and common method is k-means [33] which minhima

the within-group sums of squares. It startrr with an initial estimate, and then regroup the

given data set a few times until the cluster centers are convergent. The k-means method can

be used for the classification a p p r d . Another common method is the EM algorithm [66]

which maximizes the underlying likelihood. Starting with an initial estimate, the EM ad

gorithm consists of two steps: the expectation step which estimates the model parameters

including the probabilities of an observation belonging to each cluster, and then the max-

imization step whidi evaluates the resulting likelihood. The EM algorithm is suitable for

both the classification approach and the mixture approach. However, the EM algorithm is

more computationally Inteasive than the k-means algonthm.

In madel-baseà cluster analysis, duster validation and dustering are c o m b i i by fmst

formulating a statistical model for the problem which is pararneterized by k, the number of

clusters, then seiecting the hypothesis that best fits the data Among statistical models for

clusta analysis, Gawian mixtures are wïdely d. Three Gaussian clustering algorithma

are iisted below:

1. MCLUST: Developed by Fraley and Raf'tery [6,27-291. MCLUST incorporates eight

dinezent Gaussian mixture models in texms of the covariance matrix C, allows a

choiw of &ha the ciassification approach or the mixture approeeh, and applies an

asymptotic criterion of Bayesian i n f i c e for 888e98ing the number of dustem. This

aigorithm works weil for medium andior large sample cases but might not provide

satisktory results for small ample cases.

2. Autociass: Developed t y Cheeseman, Self and Kelly [16,17]. Autoclass only assumes

the general cavariance mat& structure, follows the mixture approach, and appües

Bayesian inference for duster didation. Since only the general covariance m a t e

structure is assumed, it weights the model complexity too much for high dimensional

cases so that it is not suitable to proeess hi& dimemional data.

3. SNOB: Developed by Wallace and its ccmorkers [IO, 11,61,63]. SNOB assumes the

covariance matrix is cllagonal, f o l i m the mixture approach, and applies MML infer-

ence for cltater validation. It works for both high dimensionai cases and smail ample

cases.

Another important issue is the error performance analpis. FoUOWiDg the terminology in

statistical signal proceshg [Sa], if one cluster is present for a given data set but a clustering

algorithm detects two clusters, then a f&e dann occurs; On the other hand, if two clusters

are present but the clustering algorithm only says one, then a mirs occm. The error

aaalysis of model-based clustering is related to the question of the number of components

in a mixture, which is a problem that hm not been completely solved in statistiai. A general

method to this question ia the Likelihood Ratio Test (Lm). Suppose that a random sample

Y is avaiiable and we wish to test the following two hypotheses:

Ho : Y is generated fiom a mixture of K normais

Hl : Y k genetated Crom a mixture of Kg normab (Kg > K)

However, the reguiarity condition for the usual asymptotic theory fail when the null hypoth-

esis Ho is true, see details in [39, Page 211 and (561. It is indeed a difncuit problem, even

for detecting a univariate nonnal mixture with two components. Fot the aforementioned

"simplen case, empiricai tabulation and an asymptotic d y s i s were presented in [24,32]

by following the cWication a p p d ; Empùical tabulation was presented in [41,42] and

an asymptotic anaipis waa derived in [9] by following the mixture a p p r d .

1.3 Major Contributions of The Thesis

Motivateci by explorhg the possibility of using intrapulse information of a collection of

pulsas to identify the emitters present, extensive! research on model-based clustering have

been conducted. The major contributions are twofold: model-based cluster d y s i s and

intrapluse d y s i s .

Model-Based Cluster Anaîysh

Bayesian inference and minimum encoding infknce including Wdace's minimum mes-

sage length (MML) and Rissanen's minimum description length (MIIL), are reviewed and

compared for model selection. It is found that the MML coding length is more accurate

than the 0th- two in the view of quantization. Al1 model selection criteria considered here

consist of two parts, one is the log-likelihooâ function which measures the goodness of fit

between the data and the model, and another is a penalty function which measures the

complexity of the model. An inference aims to balance the trade-off between goodness of fit

and model complexity. Hence in practice, we can introduce a penalty weight for the penalty

function to control the traàe-off. We cal1 this approach the penalized likelihood method.

Applying minimum en&g inference to the chmification a p p r d of model-based

clustering, ne propose an appmpriate measure of coding length for cluster didation. The

coding lengths tmder four different Gaussian mixture models in terms of the covariance

mat* C are denved. The f h t covariance structure is the simpleat and d y used for the

purpuw of a theoretid error performance adysis. The second and fourth sn successfidly

applied to intrapulse analysis. The third one might be usenil for other applications so we

indude it for completeness. Compondingly, we develop an e t i v e clustering algorithm

which starts with viewing a given data set as a cluster and then repartitions and regroups

the data to get a new cluster in each step. The new algorithm is ofMine sime it requins

all the data available at the same tirne. Extensive empirical results show th& the new

dustering algorithm (with cluster validation) h more suitable than SNOB to procesi, high

dimensionai &ta with better pedormance on small sample c88es. In h t , our algorithm

is weU designed for the dustering problem in intrapuise anaiysis, in terms of the k t tno

issues pointeci out in Section 1.1.

The theoretid error performance of our clustering algorithm is evaiuated under rat

sonable msumptions. It is shown in this thesis how the dimension of the data space, the

samp1e a h , the mixing portion and the inter-duster distance affect the performance of the

clustering algorithm to detect the tnie number of clusters. Fûrthermore, by introducing a

penaity weight, we iwestigate our &ustering algorithm ar a p e r d i d Welihood method.

The impact of the penalty weight is investigated. With some supervision, we could adjust

the penaity weight to further improve the pertormance of our algorithm. Teating our clus-

tering algorithm on intrapulse data, we have found that the best performance is usually

achieved by using the fourth covariance structure when no supervision is a d a b l e (i.e., the

penalty weight is 1, the default vaiue) , and that the best performance is usually ach ied by

using the second covariance structure when supervision is available (i.e., the penalty weight

can be adapted).

Intrapuise A118i1ysL

First, we develop pre-processing techniques to remove nuisance parameters f m received

puises such a an a b l u t e amplitude and phase!, time delay and tesidual carrier frequency.

As a resuit, we formulate the problem of emitter number detection and pubernitter asso-

ciation into a multivariate clustering problem. In order to reduce the computational cost

for clustering, a suitable data compression method based on a mvelet decomposition is also

included in preproceasing. The pie-processing techniques ate intuitive in nature and are

carried out so that a f k pmprocessing, the pulses received fkom the same emitter maintain

the resemblance to each other, while those ftom different emitters maintain their distinctive

features.

Second, after applyiq the above new clustering algorithm to the clustering problem, ne

investigate how to achieve on-line clmtering, that is, to perform classification dyriamically

as pulses arrive. To solvc this pmblem, we pro- to set up some t h h o l d s and distance

measUres which can be useù to indiate to which existing cluster an incoming pulse shoukl

be assigned, or whether it should form a new duster. To achieve an accurate clsss'ication

result, no have to adapt the thresholds ai, the atatisties of the received pulaes changes in

tirne. Unfortunately, it b u s d y di&ult to modify the thresholds appropriately when a

priori knonledge of the ineoming puises is not amilable. To overcome this drawback, a

novel on-line algorithm b a d on a modd-based detection Bcheme is developed in which

no expliut thresholds are required. This new on-line algorithm dynamicaliy incorporates

cluster splitting, merging and regmuping operations by ueing the mdel-baseù detection.

The perdormance of this on-line mdel-baseù clustering algorithm is h a i t the same as

that of the &line model-based aigorithm but is mueh b t e r .

T hird, to d e c t ively implement our pre-procesahg techniques and clust ering algont hms

for the emitter number detection and the pulse clasdication in near red-the, we estimate

the relevant phyaical parameters such as the Likeiy mwmal incoming pulse ratea. B a d

on these estimates, we then propose a suitable system diagram and investigate the system

requirements. Finally, we implement our on-line clustering algorithm as a core classification

module on a TMS320C44 DSP board.

1.4 Outline of The Thesis

This introduction chapter bas been m&dy concemed with p h h g this thesis in context.

We have r e v i d the problems in inkapuise d y s i s and model- based cluster anaiysis, and

out lined our major contributions to t hese two fields.

In next bpter , we review eome criteria for modei selection, compare Bayesian infer-

ence and minimum encodhg inference (including MDL and MML), and cast them into the

fiamework of a p e d i d likelihaad method.

In Chapter 3, applying minimum eneoding inference to the classification approach to

mdel-baseci dustering, we propose an appropriate measure of coding length for cluster

validation, and duive the coding lengthe under four dinerent Gaussian mixture models.

Then we describe a new clustering aiprithm, compare it with SNOB, and demonstrate

by extensive simulations that our algorithm is more suitable than SNOB to proeess high

dimensionai àata with better performance on a d samp1e cases. In addition, we aliio

examine the performance of the coding length measure baseci on an asymptotic method fbr

Bayesian Inference.

In Chapter 4, we conduct the theoretical ptdormance analysis of our clustaring a l p

nthm, in tenm of two type8 of errors: miss and tdsa ahm. We aiso study the impact of the

penalty weight under the framework of the penaized iikedihd method. The condusion . . .

is that there is a range of the penalty weight within which the best performance of our

clustering algorithm ean be achieveù.

intrapuIse Wysia is CZlSTied out in Chapter 5. We first describe the pre-proce9sing

techniques including data compression for received pulses and formulate our objectives

into a multivariate clustering setting. A R a applying the model-based clustering algorithm

dewloped in Chapter 3 to the dustering problem, we further develop taro on-line clustering

algorith-, one is based on hm thresholds while the other is based on a model-based

detection. Performance on intrapulse data by using our clustering algorithms and SNOB are

reported, and the results demonstrate that our new clustering aigorithms are very eftective

for intrapuIse analpis, especiaiiy the on-line model-based algorithm.

In Chapter 6, the DSP implementation for intrapulse analpis is considered. ne estimate

the relevant physical parameters such as the likely maximal incornhg pulse rate, then

propase a suitable system diagam and investigate the system requirements. The benchmark

of DSP coding of out on-line clustering algorithm is reportecl.

Finally, the last chapter concludes the thesis with a summary of what hm been achieveà,

and outlines areas of future resesrcb.

Chapter 2

Mode1 Selection Criteria

2.1 Introduction

in this chapter, we revîew Bayesian Interence [l7,35,48,M] and Minimum Encoding Infer-

ence [Sû, 52,62,64]. For Bayesian I n f i c e , hm tnoèrence techniques are introduced: one

is using Laplace's method [17,36] and the other is an asymptotic method [54. For Mini-

mum Encoàing Merence, there are two approaches: one is d e d the Minimum Description

Length (MDL) crite15on [52] and the other is caiied the Minimum Message Length (MML)

criterion (641. For MDL, its coding steps are bridy described and the idea of a universal

prior is introduced. For MML, i t ~ coàing stepe are briefly describeci and a amsible prior is

required. In the end of this chapter, Bayesian Inference, MDL and MML are arst into the

fhmework of a penAlized iilsedihd method.

2.2 Bayesian Inference

Given a data clet Y = {y,, - , y and a set of model classes ' parameterized by K (K =

1, , Kmax), let 8 denote the model parameter vector under a model c l w , f (Y 10, K)

denote the conditional probability density huiction (p.d.f.) of the data given 8 and K, and

'For dance, in model-hacurl duster aridysb, K is the number of dustas, and dament partitions which form K clustas beloag to the Mme model class.

h(ûl K) denote the prior p.d.f. of 8 K. Then the conditional p.d.f. of Y given K is

F'urthu let f (Y) denote the p.d.f. of Y oeeurring and Let P(K) be the prior probabiity of

the mode1 dass K. Then Bayes' theorem tells us P(KIY), the posterior pmbability of K

If we take a d o m prior for K, then P(K 1 Y) is proportional to f (Y 1 K ) , i.e.,

By using Laplace's method [17,36] to approximate the integral in Eq. (2.1)' we have

where 8 ie the mrurimum ijkeiihood (ML) estjmate of 8, t is the number of hee parameters

in 8, ( - 1 is the determinant of a m a t e and Fa is the Fisher information m a t h evaluated

at 8. The Fisher intonnation mat& is denwd by

- UsuaUy we examine Eq. (2.4) in the logarithm form. Hence, Bayesian inference criterion

can be describeci by

It is shown in (541 that asymptotidy

e - l o g f ( ~ ~ ~ ) = - iogf(~19, K) + log N. (2.7)

The above asymptotic criterion is usually r e f e d as Bayesian Inference Criterion (BIC).

The advantage of ushg BIC is that it d a s not depend on the prior distribution h ( 8 1 ~ ) .

However, this large-aample criterion mqy not m r k satisfsctorily for small or medium sample

cases. This drawback can be compensateci to some extent by specifying a sensible prior

probability h ( 6 l ~ ) in Eq. (2.6) if ne have some knowledge of the given data set.

A model selection is actuaiiy performed in two levels. The first level is to choose the

best model class to fit the data and the second level is to chooae the best model under the

chosen model class. The ML estimate under the chosen model cl= is usually taken as

the best model.

Notations

Throughout this thesis, if a model 9 is conriidered, it is under some model class K. For

notational simplicity, n chocee not to record this dependence expiicitly with the under-

standing that 8 is dependent on K impiicitly.

2.3 Minimum Encoding Inference

There are taro major a p p d e s of minimum e n d h g idèrence: one by Wdlace [62,64]

and the other by Rissanen [50,52]. Wdlaa tenned hie infefence method the Minimum

Message Length (MML) criterion while Rissanen termed his the Minimum Description

Length (MDL) criterion. MDL appean, more widely kaom in engineering fields [47,65,

7û-721. Wallace and Rissanen's Royai Statietid Society meeting review papers on MML

and MDL were shown aide by side in 1987 [52,64]. A comprehemive cornparison between

them rpas presented in 1994 [8]. The huidamental idcas of MML and MDL are the same:

Given a data set and a M y of competing statistical modeis, the b a t model

is the one that yields the minimal coding length of the data.

We assume that there are a data set Y = {y l , , pN) and a statistical model determined

by l parameters which is describeci as 8, 8 E R'. To assess the goodness of fit between

the model and the data, we construct a code length L(B) of the model and a code length

L(Y(8) of the data in terms of the model under a proper encoding scheme. A good model is

one l e h g a concise total description length L(Y,B) which is the sum of L(i9) and L(Yli9)

- the ahorter the betta. i.e. the minimum encoding interence is to fuid O

0' = argmin L(Y, 8)

Let f (Y 10) be the conditional p.d.f. of Y given 8, we regard - log f (Y le), known in

information theory as seEinf~rmation, to be the number of "nits"2 it talces to encode Y

with an ideal code relative to the assumed m d e i of the data i.e.,

To e n d e 8, we need to hm its prim probability. In addition, we can only eneode 8 to a

limited p-ion, O an optima quantization is needed to yield the total d g length as

short as padble. Bridy, the Merences bebetanen Rissanen's MDL and W&e7t3 MML are

the view of the pria and the selection of the optimal quantization.

2.3.1 RissanenYs Description Length

The major steps of F€issanen's Minimum D d p t i o n Length procedure [51] are desaibed

as follm:

1. Quanthtion: partition the parameter space into regions with centers Oit Bi E

R', i E N aiid quantization volumes V(ei).

2. Indexing: map Oi into a positive integer j .

3. Encoding a prior: use a sctcalied univerd prior for positive integers to encode j

by the length Li').

4. Tbtal description length:

- log f (Y Jei) + Lÿ) . (2.11)

' ~ h e unit is caiied Ynit" by using the natusal logarithm. h fact, n are conarneed with calculating the length of description foa the Saence but we do not naxi d y to transmit if. Thadare, we use codes that are efiicient in tams of cade length, but may not be efncient in the time required to encode/decode da&

Note that for each il we have a pmbability model f (YI&).

The procedure is not complete until un? speciry:

a the optimal quant bation volumes V(Bi), i E N.

0 the rnapping h m Bi to a positive inkger j.

a the universal prior for integers.

A universcil prior for integers

Since the codes of both j and Y are strings, we can not just attach them next to each other.

If the decoder alvays reads the code string h m left to right, then a necemmy and sufncient

condition for the decoder to be able to separate the codeMrd £hm whatever string follows

it, is that the dewords for the integers form a pnf t : set. This means that no codeword

is ailowed to be a p& of another. B a d on this point, the optimum length of a pmitive

integer j used by Rissanen is

Lu) = log' j + logc (2.12)

where log' j = log j + log log j + log log log j + --=, ody induding positive t m , and c L a

small constant (z 2.865064). Therefore, the universal prior ie defineci as

For simplicity, the kt-order approximation of the length Lü) Q used, i.e.,

L(j) 2 log j. (2.14)

hirthemore, a lattice quantization is used here. In his approximation to the optimal

quantization, Rissanen obtained his description length as

where 9 is the ML estimate and the Fisher information m a t e is d&ed by

2.3.2 W a c e % Message Length

The major stepe of Wdlaeei Minimum Message Length procedure [64] are bridy desciiibed

a3 follows:

1. Quantiration: partition the parameter space into regions with centers ai, Oi E

R', i EN, md quantization V O ~ U ~ ~ S V(Bi).

2. Choosing a prior: speafy mme sensible prior h(ei).

3. Tûtal message length:

Note that for each i, we have a probability mode1 f (YI&). It is convenient to ap

prdmate the integral (for sufticiently small V(Oi)) as follm

The procedure is not complete until we specify:

0 the optimal quantization volume V(8,).

0 a priol pmbability density h(8,) o m the parameter space.

By a caldation simikr to the optimal iattice quantization, Waliace deriveci his message

length as 1 e e

L(Y,8) -log f (Y@ + qlog(FBI + 5 + 51ogGt - logh(@) (2.18)

where 9 is the ML estimate, 1 * 1 denotes the determinaat of a matrix and Gc is the b

dimensional optimal lattice quantization constant which can be found in [18, Page 611, and

h(8) is the prior p.d.f. of 6.

2.4 Framework: Penalized Likelihood Method

By comparing Eqs. (2.6) and (2.18), we observe that the major difference between them

is a quantization constant. Specifically, the hyper-sphere constant & is used in Laplaee'a

method for Bayesian inference but the optimal Lattice constant Cc is used in the MML

message length. In the view of optimal quantization, the MML coding length is more

accurate. In addition, ne believe that we should speQty a sensible prior if we have some

knwleàge of a given data set, instead of some u n i d prior. Hence, ne pder the MML

message length formula Eq. (2.18) for our application to model-besed cluster analysis.

We also notice that all model selection criteria considered here consist of two parts,

one is the log-likeiihood function which measures the goodness of fit between the data and

the model, and another is a penalty funetion which meastues the complexity of the model.

An inference aims to balance the tradeoff between goodnese of fit and model complexity.

Hence in practice, we can introduce a penalty weight for the penalty function to control the

trade-off as fo11ows:

L(Y, 8) = L(YIB) + u ( e ) (2.19)

where X is the pedty weight.

Roughly speaking, an inférence tends to underestimate when X is large, and it tends to

overestimatte when X is small. Therefore, m have to determine the suitable range X which

guarantee the true estimation. This is investigated in detail in Section 4.4.

Chapter 3

Model-Based Clust ering

3.1 Introduction

In model-based clustering (28,391, it is assumed that the data under consideration are

generated h m a finite mixture of probability distributions; each component of the mixture

represents a dinerent goup or duster. Therefore, given a set of observed data vectom, our

objectives are

0 cluster validation, to determine the number of components in the mixture;

0 clustering, to detamine which data vectors arise fkom each component.

In the previous chapter, Bayesian inference and minhum encoding inference (MDL and

MML) for mode1 deetion were discusseà. In addition, different clustering algorithme for

Gaussian mixture models were bridy compared in Section 1.2. These aigorithms include

MCLUST, Autoclass and SNOB. Only SNOB is suitable for both high dimensional c a m

and s m d sample cases.

In this chapter, we apply minimum encoding inference to model-based clustering, pro-

pose an appropriate coding length measure for cluster validation, and fully derive the coding

lengths under four different Gauasian mixture models. Then we describe our moàel-based

clustering algorithm, compare it with SNOB, and demonstrate by extensive simulations that

our algorithm is more suitable than SNOB to process high dimensional data with better

performance on a d sample caees. In addition, we also investiete the performance of the

coding length measure based on the asymptotic method to Bayesian Inference. Part of this

chapter has been published in [68].

3.2 General Coding Length Measure

Given a data set Y eonsisting of N obsefved data vectoni y l, 9 2 , . . . , Y N , each of dixmuion

M, the data vector y, (n = 1,2,. . . , N) is to be assigned among K clusters. Let an

d a t i o n parameter vector 4 = [a1, 0 2 , . . . , Q ~ ] ~ , sudl that if an = k, then the data

vector v, is assigneci to the kth duster. In a model-based method, the k-th cluster (k =

1,-, K) h assumecl to be a sample of a simple distribution, denotecl by f k ( - ) with its

parameter vector Or. Therefore, the conditional density function for the data is

where the mixture mode1 parameter vector B consists of independent parameters in the set

{Bi, &, . . . , B k ) and 1 is the dimension of B.

Now, fiom Shannon's coding theorem [19], the minimum code length is given by the

entropy of the data. Thus, using the naturai logarithm, the minimum code length in "nita"

(see the footwte of Section 2.3) is

L(Y, 2) = E[- log f (Y)] - - log f (y lé, a) - log f ( e ) - log ~ ( o i )

In Eq. (3.2), we have used the evaluation of the coding length at the maximum liltelihood

(ML) estimates e and oi to appraximate the expected coding length, nhere f (9) is the

probability density function evaluated at O = 8, and P(â) is the probability of cl = a.

First, let us examine the last term in Eq. (3.2). a is a particular association vector,

the nth element a, of which denotes the association of the nth data vector with the anth

cluster. Now, to partition N data vectom into K clusters, the numbez of dinerent raya as

shown in A p p à i x A is

where Nk is the number of data vectors assigned to the kth cluster (k = 1, # - - , K; Ni + N2+---+Ni; . = N), and- L thenambaofclusterr, withndatavectors (n = 1,2,=-, N).

If a uniform a priori probab'ility is assumed for a, then

- log P(L) = log S(N, &). (3.4)

The f h t and second tenna in Eq. (3.2) can be described by the message length formula

of Eq. (2.18). Thdore , we have the total coding length

where L b the numbe~ of independent parameters in 8, Gc is the I-dimensional optima

lattice quantization constant which can be found in [18, Page 611, h(@ is the pnor p.d.f. of

8, 1 1 ia the d%terminaat of a matrix and Fe is the Fisher information m a t h d&ed by

The f h t term in Eq. (3.5) b the negative log-likelihood which measures the goodness

of fit between the data and the model. We denote it by L(YI&,&), i.e.,

The rest of the terms in Eq. (3.5) forma the penalty funetion which meaeurm the model

complexity. If the dominant penalty term f log N shown in Eq. (2.7) is used, we have the

asymptotic co&g lengt h

3.3 Coding Lengths Under Gaussian Mixture Models

In this section we investigate how to dcula te each term in Eq. (3.5) under Gaussian mixture

models. We start wit h L(Y lé, ôi), the log-likelihood term. In this case, fk (*) is assumeci to

be the denaity ninction of a multivariate normai distribution with its meau wctor p k and

ita covariance ma* Ok. Suppose that a particular association veetor a = [al q . . . anrIT

partitions N data vectors into K groups such that we have

the conditional density hct ion for the data [43] is

Hence, the L(Y le, &) tenn b e x p d as

K r

and

n=l

The Fisher infoxmation mat& ka in the second term of Fq. (3.5) is very eophisticated

for a general structure of CI. Fiuthennore, the dimension M for our application is usually

bigher than 40 so there are more than 40 x 40 parameters in just one covariance matrix Ck!

This general structure wili generate severe numerical problems when only small, or medium

samples are available. To avoid the abare limitations, we assume that the covariance mat&

Ck is diagonal. In this case, It is easy to verify that

Therefore, Fa is a diagonal matrix according to that

As m see, coding ia u a d y baaed on since the true 8 is iinknm. 9 is a vec-

tor with C elements &, i = 1, , 1. Theae elements are statistidy independent when

Er, k = 1, , K, are diagonal, 80 eseh di is quantized independently. In other words,

the quantization here is actualiy performed in one-dimension, instead of &dimension. For

the one-dimension caae, the optimal quantization constant is h [18, Page 611. Hence, the

optimal quantization constant we used is

Below we consider four Merent covariance structures:

Covariance Structure 1: Ck = 91, Vk

a Covariance Structure 4: E k = Dk, Vk

The b t covariance atmcture is the sirnplest, and rnainly ueed for the purpose of thmntieal

performance dpis in Chapter 4. The secrnnd and fourth structures have b e n success-

ruUy applied to intrapule analph in Chapter 5. The third one might be useful for other

applications so we indude it here fot completeness.

To W y derive a coding length, ne assume a d o m prior pmbability for each parameter

in pk = [Irki, , and the underlying unmiance matrix Zr over some certain regions.

Let bo and Wo be the mean vector and the covariance matrix of the whole data set Y

Parametric regions will be determinecl by and Wo according to the underlying covariance

s t nic t use, as detailed in the following subsect ions.

3.3.1 Covariance Structure 1: Ck = $1, Vk

Under the assumecl covariance structure, we have only one parameter a to characterize

ail -ce matrices, and KM parameten, {ph( k = 1 ,..., K; m = 1 ,..., M} to

characterb al1 mean vectors. Thdore, the number of fiee puameters are

A. The log-likdihood term L(YI~, ô)

Let ô be the ML estimate of o. h m Eq. (3.9), we obtain

w h m tr(-) h the trace of a matrix and

Daerentiating Eq. (3.8) with respect to u and equating it to zero, we have the ML estirnate

Subtituting Eq. (3.20) into Eq. (3.18), we have

B. The Fisher term i log(FB(

Under this mixture model,

1 i i P ~ ( ~ l ê , a ) k M - i o g l ~ ~ l = ,log + i c 1%

B Z L ( Y ~ ~ , ô) . 2 au2 k=l m=l M m

Dinerentiating Eq. (3.8) twice with respect to o and wing Eq. (3.20), we have

Die~entiating Eq. (3.8) twice with mpect to pk and using Eq. (3.20), we have

Substituthg Eqs. (3.23) and (3.24) into Eg. (3.22), ne have

C. The pdor term - log h(6)

Denoting the rangea of fihn and û by r h and p respectively end assuming a d o m

Let 9 be the standard deviation of the whole data set Y. h m Eq. (3.20),

where Wo h a ~ been d&ed by Eq. (3.16). For simplicity, we further assume that

p = ûo.

Hence, the prior term contributes to the coding length by

D. The total coding length L ( Y , K)

Eqs. (3.21), (3.25), (3.17), (3.14) and (3.28) fom the total coding length defined in

Eq. (3.5). By removing the parts independent of K (because they are incomequential to

the subsequent rninimization), we can s impli~ its expression to

where W, Wo and S(N9 O) a n defhed in Eqs. (3.19), (3.16) and (3.3), respectidy.

Under the amumed covarhce structure, we have K parameten, {uk 1 k = 1, . . . , 8) to chat-

acterize all covariance matrices, and KM parameters {piml k = IV . . . , K; m = 1,. . . , M) to characterize all mean vectors. Thefore, the number of free pararnetets are

A. The log-likelihood tam L(Y 19, ô)

Let ûk be the ML estimate of uk. h m Eq. (3.9), we obtain

where W k is given by Eq. (3.1 1).

Dinerentiating Eq. (3.8) with respect to ok and equating it to zero, we have the ML estimate

Substituthg Eq. (3.32) into Eq. (3.31), we have

B. The Fisher term ) loglF61

Under this covariance structure,

1 1 " a 2 ~ ( ~ l ê , â ) 1 K M log (F*I = - log + 5 c c log

@L(Y lé, &) 2 k=l au;

(3.34) k=l m=l a&"

Dserentiating Eq. (3.8) twice with respect to uk and using Eq. (3.32), we have

Dinerentiating Eq. (3.8) tMa with respect to p k and using Eq. (3.32), we have

Substituthg Eqs. (3.35) and (3.36) into Eq. (3.34), we have

C. The prior term - logh(8)

Denofhg the ranges of fih and & by th anci pk respectively and assuming a d o m

distribution for each, then

Simiiazly to Covariance Structure! 1, ne further assume that

Hence, the prior term contributai to the coding length by

D. The total coding length &(Y, K)

Eqs. (3.33), (3.37), (3.30), (3.14) and (3.39) form the total coding length d h e d in

Eq. (3.5). By removing the partci independent of K (because they are inconsequential to

the subsequent miuhhtion), we can simplify its expmion to

K KM e K e +(M + 1) logNk + 2- log - + - log - + log S(N, a) (3.40)

kl 3N 2 6N

where Wkl Wo and S ( N , ô ) are de6ned in Eqs. (3.11), (3.16) and (3.3), respectively.

NOTE TO USERS

Page(s) not included in the original manuscript are unavailable from the author or university. The

manuscript was microfilmed as received.

This reproduction is the best copy available.

Dinefentiating Eq. (3.8) twice with respect to ok,,, and using Eq. (3.55), we have

Dinereptiating Eq. (3.8) tarice aith respect to p k and using Eq. (3.55), we have

Substituting Eqs. (3.58) and (3.59) into Eq. (3.5?), we have

1 K K KM 5 1% IF81 = - log Idiag(Wt)l+ 2M log Nk + 2- log 2. (3.60) k=l k=l

C. The prior term - log h(8)

Denoting the ranges of fi&,,, and &km by r k , and pk, respectively and assuming a

uniform distnbution for each, then

Simhly to Covariance Structure 3, we further assume that

Hence, the pnor tenn contributes to the coding length by

D. The total coding length L ( Y . K)

&S. (3.56), (3.60), (3.53), (3.14) and (3.62) form the total coding length defined in

Eq. (3.5). By removing the parts independent of K, we can simpüfy its expression to

K f i e +2M C log Nk + KM log 6~ + log S(N, â) k=l

where WL, WO and S(N, oi) are defineci in Eqs. (3.11), (3.16) and (3.3), respectively.

3.3.5 Stmmmy of Coding Lengths

For easy cornparison and reference, the coding lengths under the above four covariance

structures are listed b e h :

Covariance Structure 1: Ck = 021, Vk

MN KM + 1 tr(W0) Li (Y, K) = - log tr(W) + 2 2 log *(W)

M~ KM e +T log N , + log - + log S(N, 4) k=l 3

where W, Wo and S(N, 6) are defined in Eqa. (3.19), (3.16) and (3.3), respectively.

Covariance Structure 2: Er = 61, V&

where Wk, Wo and S(N, a) are dehed in Eqs. (3.11), (3.16) and (3.3), respectively.

Covariance Structure 3: Ek = D, Vk

where W , Wo and S ( N , â ) are defined in Eqs. (3.19), (3.16) and (3.3), respectively.

Covariance Stmctwe 4: Ck = Dk, Vk

L ~ ( Y , K ) =

where 'Wkc wo and S(N, â) are d&ed in Eqs. (3.11), (3.16) and (3.3), respectively.

3.4 A Model-Based Clustering Algorithm

Given N data vectors (denoted by Y) and K clusters, we need an appropriate clustering

procedm to determine the optimal partition (or grouping) of these vectors Y into K

clusters. Hem by "optimal", we mean that, for a given K, the optimal partition a achieves

the maximum likelihood. In other words, the negative log-likelihood L(Y (9, ô) is minimal.

The procedure we propose to optimally repartition K existing clusters into K + 1 new

clusters is as follm: we obtain K candidate partitions, each by a binary splitting of one of

the K existing d u s ters, followed by a repuping of the data into K + 1 clusters; The optimal

clustering among the K candidates is the one which achieves the maximum likelihd.

Therefore, the clustering IUlAlysis eonsists of two loops: (a) In the outer loop K starts

fkom 1 to Km, a pre-selected uppa bound; (b) in the d e r loop the optimal partition

a of Y into K clusters is chceea and L(Y, K) b dcuiated accordhg to 8, ô. Finally, the

number of clusters K* is selected if it yields the minimal L(Y, K). In the following, the

procedure of the clustering algorithm is presented in Section 3.4.1 and its computational

complexity is analyzed in Section 3.4.2.

3.4.1 Procedure

The flow chart of the dustering algorithm is shown in Fig. 3.1. The algorithm is off-line

since it requires ail the data available at the same tirne.

1. Start from K = 1, i.e., the whole data set Y is viewed as one cluster.

2. For each of the K existing clusters, compute the mean vector f ik as the cluster center

and the standard deviation vector Ût as the cluster deviation, k = 1,. . . , K.

3. In this step we will obtain K candidate partitions, each by a binary splitting of one

of the K existing clusters, followed by a regrouping of the data into K + 1 dusters.

For k = 1,. . . , K, compute

(a) Use fil, A, ..., f i k , 8s the initial centers to repartition the data into (K + 1) new clusters and obtain the association vector, accorâing to the minimum

distance priaeiple. In other words, each data vector will be classifieci into a

cluster whose center is the closest. Here the distance measure is the .C2 norm.

(b) Compute all K + 1 new cluster centers, and repeat the repartition process a few

times (gay Nt times) until the cluster centers converge. Thus, the 8880ciation

vector ârk is obtained.

(c) Compute the negative log-likelihd L(Y 16, âr).

4. There are K Merent splitting in Step 3 to repartition K existing clusteni into K+ 1

new clusters. The optimal splitting d e hem is to choose the best splitting p which

yields the m i n a L ( Y I ~ , oc). i.e.,

p - a r g min- L ( Y I ~ , ~ ~ ) . tE[l,.. .,KI

Set â = âi,; i.e., op is the optimum association vector obtained from the above

splittings and repastitions,

5. K = K + 1 and L(Y, K) is calculated according to 9, ô. Go to Step 2 until K r d e s

Kmax-

6. C h o a the optimal number Km such that L(Y, Km) is minimal among d l L(Y, K).

3.4.2 Computational Complexity

Given N M-dimensional data vectors, we staxt by viewing the whole data set as a cluster

and then repartition the data to get a new cluster in each step until the number of clusters

K reaches Kmax. Below one addition, subtraction, multiplication and division are counted

1 flop (floating point operation), respect ively.

As we oùserved in Section 3.4.1, the dominant cost occurs in Step 3. At the k-th stage

( k = 1, ..., K):

Y-datiset k - numbw of dustom mwmeâ in Y 6 ,- mean metor of the k-tti duster Kmax - rnaximrl wmber of dustem assumecl in Y f - standard d~~ Hdor of Ih. k-th dust~f L(Y& - d i n g O ~ Y w ~ h R dustem .uimeâ a -assœiaüon~d~t~ lo ï~ K a - ~ n u m b w o f u n k n a r i v n d u ~ i n Y

(a) duster validation

Figure 3.1: The diagram of our model-based clustering algorithm

(a) Repattition Ai data vectors according to the K + 1 cluster centers.

TO compute the distance square between a data vector and a cluster center, it

requires M subtractions, M multipïications and M additions. Thus, ~ M ( K + 1)

flops are required for computing the distance squares between one data vector and

all K + 1 cluster centers. To choose the minimum distance square, it apprmhately

takes log2(X + 1) compatisoas. The cast of these cornparisons is negligible compared

to ~ M ( K + 1) flops. Thus, to assign the N data vectors into K + 1 clusters, it requirea

(b) Update dl K + 1 cluster centers, and repeat the tepartition proeess Nt times until

the cluster centers converge.

To compute the k-th cluster center, it requires MNk additions and M divisions. Thus,

to compute aU K + 1 cluater centers, it tequires

~ + l C ( M N ~ + M ) =M(N+K+~) .- MN, k=l

when ne assume N > K. Hence, to repeat (a) and (b) by Nt times, it requins

(c) Cornpute the negative log-likelihood L (Y le, &) . The computational cost in (c) is negligible compared to those in (a) and (b).

There are K candidate partitions in Step 3. Hence, the computational cait for Step 3 is

Step 3 is repeated lrom that K = 1 to that K = Km - 1. Therefore, the total computa-

tional coet of the clustering aigorithm described in Section 3.4.1 is appraximately

= MNNt [(K- - 1)3 + 3.5(Kmru - 1)? + 2.5(Kmax - l)] . (3.70)

'Ta ihd the minimum distance of a data vector to al1 cluster centers, it is equivaleot to find the minimum distance sqwn. In tbis -y, one square root operation is saved.

It is shown in Eq. (3.70) that the computational complexity is apprmhately propor-

t i o d to K&. Thus, the computational cost increases dramatically as K- increases.

To alleviate the computatiod burden of the model-baseci clustering scheme, a fsst algo-

rithm is developed in Section 5.5. In this chapter, we emphasize on the development of o u

clus tering sdieme and cornparison wit h existing algorit hms.

3.5 Cornparison with SNOB

WaJlace [62] starteci his idea on the minimum message length (MML) critenon from clag

aification. Over the past three d d e s , Wallace and his ceworkers [10,11,61,63] have

developed and maintained a clustering program d e d SNOB. Bas idy , SNOB probabilis-

tically assigns a data vector y, to a cluster k, say in probability P(a, = k). Hence, the

data is conditionaliy mcdeled by

1 P k = - P(a, = k). ,,

.It is al80 aasumed in SNOB that each COYBfi8I1ce mat& Er is diagonal, and then the

message length formula of Eq. (2.18) is directly appiied for cluster validation. Detailed

descriptions of the SNOB appmach were presented in [46] and (7, Chapter 71. Here the

difierences between SNOB and our method are s d z e d below:

We foilow the classification approach using the deterministic assignment which only

allows P(a, = k) to be O or 1, instead of the mixture approach using the probabilis-

tic assignment in the SNOB program, so the mixture models are ditrerent (compare

Eqs. (3.71) and (3.1)).

Our mode1 parameter vector 0 does not require A. This resultg in a simpler coding

length.

a The theoretic anal- of our approach is mathematidy tractable but the andysis

of SNOB'S approach is far more M c u i t (see next chapter for detaüs).

The determiaietic assignment can be done by a simple procedure such as the k-maom

algorithm [33] but the pmbab'itic assignment repuires a more complicated procedure

such as the EM algorithm [66], Therefore, our clustering algorithm is computationally

simpler t h SNOB.

For clustering, we start by viewing the whole data set as a cluster and then repartition

and regroup the data to get a new cluster in each step. The SNOB program starts

with a randomly initial estimate of the clustering structure and then split a cluster or

merge two clustem recursively.

3.6 Experimental Results

An empirical cornpariaon of Autoclass, SNOB and taro neural netwotk classifiers (Kohonen's

network and Adaptive Resonauce Theory) was made in [59,60], where the conclusion is that,

overall, statistical classifiers, especidy SNOB, perform better than the neural network claa-

siners on both cluster validation and clustering. For our coding length measurea described

in Section 3.3, we demonstrated that it outperforxns some well-known non-parametric cri-

tena in 137,671. Here to tauly compare the performance of our clustering aigorithm with

that of SNOB, we incorporate the same prior specification h(8) as SNOB, which has been

describeà in Section 3.3. We also examine the performance of the coding length measure of

Eq. (3.7) based on the Bayesian Inference Criterion (BIC).

To simulate a mixture of two clusters, let N be the sample size of the two clustem in

total and c be the mixing portion. Then the populations of the two clusteni are cN and

(1 - c)N respectively. We define the foUowing measure for the inter-cluster distance:

For simplicity, we have chosen the covariance matrices of the two clustem to be identicai such

that Xi = Cz = 021u. Let the inter-cluster distance be D, we define pl = [0,0,- ,O]=

and = [rn, m, , SIT. Fbrthermore, for the high dimension case (M = 22), data

is generated by adding Gaussian noises in two noiselesa pulae patterns, such examples are

shown in Fig. 3.2.

In the following, we m a t e various humcluster mixtures when the data vector dimension

M = 1,2,22, the mixing portion c = 0.5,0.2, the sample size N = 40,100, lûûû and the

distance measure D = 2,3,4,6,8. We also m a t e the case of one cluster when M = 1,2,22

and the sample size N = 40,100,1000. We empioy our clustering algorithm under four

covariance structures which are denoted by Li, L2, L3 and LI respectively, and the SNOB

algorithm to perform cluster validation and clustering. We also employ our clustering

algorithm under the BIC of Eq. (3.7) based on Covariance Structure 4. In thir way, m

can M y compare the performance of 4, BIC and SNOB. Ten trials of eaeh mixture are

d e d out using Li, 4, L3, 4, BIC and SNOB. In each trial, we assume the number

of clusters h m 1 to 4 and then ch008e the number at which the correspondhg criterion

is minimai. The results of cluater validation for al1 criteria extunineci h, as represented

by the number of times out of 10 trials that the correct number of dusters is determined,

are s h a n in Tables 3.1 - 3.9. To further compare our clustering algorithm and SNOB, the

accuracy of the cormponding clustering results are shown in Xàbles 3.10 - 3.12 for N = 100

when both algorithms have made the correct decision on cluster validation out of 10 trials.

From Tables 3.1 - 3.12, it can be obsenred that

1. None of the criteria is diable for small and medium samples (N = 40,100) when

D < 3.

2. The performance of ail criteria is improvd s the distance D and the sample size N

3. The performance of Li, IQ and La is very similar to that of LI.

4. BIC is inferior to A4 and SNOB in duster validation in generai but performs very weU

for the medium and large samples (N = 100,1000).

5. The performance of LI is superior to that of SNOB in cluster validation for the small

and medium samples (N = 40,100).

6. In the cases where both algorithms perform perfectly in cluster validation, SNOB

yielàs slightly more accurate results in clustering than LI.

Obsemtion 1 is easy to justiCy since if D < 3, the mr lap between two clusters is very

extensive, which d e s it M c u l t for any of the criteria to work properly. Observation 2 is

intuitively clear: the larger is the inter-cluster distance DT the less overlap is there between

the two clusters, and the higher accuracy is achieved in parameter estimation. A h , since

the data here is generated by using Ci = C g = the performance of Li, L2, L3 and

L4 is likeJy to be more or les8 the same, as confirmd by Oùservation 3. Observation 4 is

natural because the BIC of Eq. (3.7) is an asymptotic criterion. In k t , BIC has been used

in many applications due to its simplicity and the hct that no prior Lnowledge is required.

Rom Observstions 5 and 6, our dustering algorithm shows much higher reiiability in

cluster vaiidation than SNOB, while d c i n g marginally on the accuraey in clustering.

R d that an extra set of parameters pk is included in SNOB whereas such parameters

have not been taken into consideration in the development of out algorithm. These param-

eters prescribe the probabilities of the nth data vector being generated by the kth cluster

and m u t be estimatd. This d B e n c e between SNOB and OUI aigorithm has profound

implications in their performance. For cluster validation, the probability that the nth data

veetor associates with the &th cluster is irrelevant, i.e., regardless of the values of these

probabilities, the number of clusters temains the same. Therefore, for cluster validation,

these extra parameters are nuisance pasameters and their inclusion WU lower the accuraey

of the determination of the number of clusters, and hence the new algorithm shows better

performance than SNOB in cluster validation. On the other hand, the probabilities of as-

sociating the data vectors to the clusters are highiy relevant parameters in the process of

clustering. Therefore, their inclusion provides more information gained fkom the data and

renders SNOB the more accurate algorithm in clustering.

n

The h t coliimn is for D = 4 and the second is for D = 8

Figure 3.2: Simulateci data for M=22, where 2-4s is the index of data sample points and ysxis is the amplitude of simulated data.

Table 3.1: Cluster validation muits for two true clusters: M=1, c=0.5

Table 3.2: Cluster validation results for two true clusters: M=1, c=0.2

IFr -MI - Dm1

DtJ

-4

0-6

D=8

Sunpli S i u [ Li La LI L4 BiC - SRde N=40 8/10 10/10 8/10 10/10 8/10 10/10 N=100 10/10 10/10 16/10 lO/lO 10/10 10/10

h p h S i r N=lO Nd00 N+1000 N ~ 4 0 NrlOO N-1000 NrlO N-100 N=lom NrlO N-100 Nrlam pl140 NilW N=lOOa

Table 3.3: Cluster validation results for one true cluster: M=ï

BIC 1/10 O/lO 0/10

4/10 3/10 6/10 I

io/io 10/10 6/10 10/10 io/ro 6/10 lO/lO 10/10

L4 1/10 O/XO 0/10 3/10 4/10 3/10 o/io io/io 10/10

. 10/10 10/10 10/1o lO/lO io/io lO/lO

SNOB 0/10 O/lO 1/10

3 f l Ï v 1/10 10/10 ~ i o io/io 10/10 B/lo 10/10 iO/lO io/io lO/lO lO/lO

' Ls 1/10 O/lO O/lO 3/10 4/10

. 5/10 6/10 io/ro 10/10 9/10 10/10 io/io 9/10 lO/lO lO/lO

Li 1/10 0/10 0/10 3/10 O 3/10 O 10/10 10/10 9/10 10/10 10/lO e/io 10/10 10/10

ta 1/10 '

0/10 0/10

. 3/10 4/10 3/10 @/IO lO/lO 10/10 9/10 lO/lO 10/1o lO/lO 10/1o lO/lO

Table 3.4: Cluster validation results for two tme clusten: M=2, c=0.5

Table 3.5: Cluster validation results for two true clusters: M=2, c=0.2

'Iàble 3.6: Clustet validation results for one true cluster: M=2

SNOB 16/10 ' lO/lO

W C ' 1010 lO/lO

S ~ l p l m S i m N=4O N=100

La 9/10 lQ/lO

' LI 1/10 m/ro

Lx 8/16 lO/lO

La 16/10 lO/lO

Table 3.7: Clustes validation resuits for two true clusters: M=22, c=0.5

'ïàble 3.8: Cluster vaiidation r d t s for tao true clusters: M=22, c=0.2

Table 3.9: Cluster validation results for one true cluster: M=22

Table 3.10: Comparison of performance of SNOB and our algorithm, M= 1

Table 3.11: Comparison of performance of SNOB and our algorithm, M=2

Tbble 3.12: Comparison of performance of SNOB and our algorithm, M=22

3.7 Summary

In this chapter, model-based clustering has been considered. Based on minimum encoding

inference, an appropriate coding length measure is propoeed for cluster validation, and

the coduig lengths under four ciifTetent Gaussian mixture models are fully deriveci. The

corresponding clustering dgorithm is developed.

Judging fiom the pedormance cornparison, our coding length measure outperforms the

BIC in cluster validation since it is not based on the large sample assumption. More

importantly, our clun tering algori thm shows much higher reliabili ty in clus ter validation than

SNOB, whiie sacrificing marginaîiy on the accuracy in clustering. Indeed, our clustering

algorithm ia well designeci to effectively process high dimensional data with satisfhctory

performance on small and medium samples. Thus, the new algorithm is an attractive

approaeh for the clustering problem in intra-pulse analysis, in temu of the f h t two issues

pointed out in Section 1.1.

Chapter 4

Detection Performance Analysis

4.1 Introduction

The error U y s i s of modsl-based clustering concerns about the estimation accuracy of

the number of componenta in a Gawian mixture. This ia a problem that has not been

completely solved in statistics, as stated in the end of Section 1.2. In this chapter, we con-

duct a b i detection performance anaipis of our clustering algorithm under Covariance

Structure 1 described in Section 3.3.1, by estimating the two types of errors: miss and faise

alarm. We aamine the impact of the penalty weight on the error probability under

the fiamework of the penalized likelihd methd described in Section 2.4.

Under C d a n c e Structure 1, each of dl k dustem is assumed to be a sample from

a multivariate normal distribution N(p,u21M). The log-likelihood function Eq. (3.21) is

rewritten here

and the penalty function, obtained by combining Eqs. (3.25), (3.17), (3.14) and (3.28), is

KM + 1 trWo M ' n ( ~ ) = + log - KM e + - C logNk + -log - + l ogS(N ,â ) tw 2 2 3 (4.2) k=l

where Nk is the number of members in the kth cluster; W, Wo and S(N, Q) are d h e d in

Eqs. (3.19), (3.16) and (3.3), respcctively.

Therefore, the total description length is that

The detection wiU select the number of clusters to be K' if

K' = arg q i n DL(K). lSKSN

We investigate a binary detection performance here. Given a data set wi th N observation

vectors, let W be the sample covaxiance rnatrix when taking the whole data set as a cluster.

By some claesification, the data set is partitioned into two clusters, one with a sample size

Ni and a sample covariance matrix Wl and the other with a sample size N2 and a sample

covariance matrix W2. In addition, we let c be the mixing portion, Le., Ni = cN and

N 2 = ( 1 - c ) N , whereO<c<l. Dethe

and

M c(l - c)Ne N! + - log 2 3 (d)!((l - c)N)!* + log (4.6)

Therefore, to evduate the error performance, we h k to know the distribution of the trace

ratio tw or its variation.

4.2 Probability of A Miss

A miss occurs when two clusters are embedded in a data set but the binary detection says

that only one cluster exists. A simple illustration is shown in Fig. 4.1. Let Hi and H2

denote respectively the hypot heses oi one clus t er and two clus ters, the miss probabili ty

given HZ is

Figure 4.1: Two Gaussian clusters

Let

Dinerent partition criteria may result in dinerent values of &. Here we assume that

our clustering algorithm can aeparate two Gaussian clustera perfectly. This assumption

is reasonable when two clusters are weîi separated. We expect more deviation from the

assumption when two clusters are closer to each other in distance. Under the perfect

separation assumption, it is proven in Appendll B that R, is distributeci as a noncentrai

F distribution FM,M(N-2) (6) with the noncentral parameter

and D is the normalized inter-cluster distance defined by

Pr - P2 D=Il II- Substituting Eqs. (4.5) and (4.6) into Eq. (4.8), we have

M c(l - c) Ne N ! +- log 2 3

+ log (&)!((l - C ) N)! > 0 I H 2 1 -

Rewriting the above equation in terms of R,, we then have

Figure 4.2: The illustration of Pm

where the threshold

and the d u e of Pm is represented by the shadow area in Fig. 4.2.

From [34, Chapter 301, the mean and variance of a non-centrai F distribution with vl , 02

degreea of fkdorn and a non-central parameter 6 are given respectively by

W V , .w (41 = w(v1+ 4

(UZ > 211 (4.14) v1(w - 2)

W F V l , W ( 4 1 = 2 (;) (ol + 6)2 + (VI + 26) (9 - 2) ( y > 4 ) . (4.15)

(Y - 212(% - 4)

Here ui = M, = M ( N - 2) and 6 is specined by Eq. (4.10). substituting these values

into Eqs. (4.14) and (4.15) and assiiming that N > 1, we have

6 c(1 - C) D2 E[&] 2: 1 + - = 1 + M M N, (4.16)

2c?(l- c)~D' + k(1 - c)D2MN, Var[&] z M3 (4.1 7)

Property 1: Given the dimension M, the mixing portion cl we define Do such that

If D > Do, then Pm tends to O as the number of observations N increases. If D < Do, then

Pm tends to 1 as the number of observations N increases.

Proof: By some mathematical manipulations, we have

To obtain Eq. (4. 19), we have used the knowledge that rd) !((I-c]N~ N! z [ P ( 1 - c)-('-~)] for large N, as shown in Appendix C.

First, let us check the case when D > DO. Given M and c, we know from Eq. (4.19) that

Fp, < E[&] asymptotidy if and only if D > Do. By using Chebyshev's inequality (53,

Page 691, we have

h m Eqs. (4.13),(4.16) and (4.17), we hm that Var[&] is proportional to N but

(El&] - Fp,I2 is proportionai to N? Thus,

The left hand side of Eq. (4.20) is the area under the p.d.f. of R, over the intervak

(-00, Fp,] and [2E[&] - Fp, , +m). Hence,

, Second, we check the case when D < Do. In this case, Fp, > E[&] for a sutnciently large

N. Simüarly by using Chebyshw's inequality for N + 00, we have

Thus,

lim P{2E[&] - Fp, 5 & or & 2 Fp,) = O. N + w

This further implies

Consequently,

irin P{2E[&] - Fp, < R, < Fp,} = 1. N + w

iim Pm = lim P{& < Fpm) = 1. Nd00 N-00

'Iiable 4.1: The critical distance Do

Therefore, Property 1 holds. Q.E.D.

If it happens that D = Do, then Fp, is asymptotically equal to E[&] and Pm is a

positive number between O and 1. Some values of Do given M and c are listed in 'Pable 4.2.

Indeed, Do is a critical distance. While D > Do, the two clusters are weU separated, the

probab'ity density function (p.d.f.) of the mixture is bimodal and our clustering algorithm

can succesafully separate these two clusters. While D 5 Do, the p.d.f. of the mixture

becornes unimodal, the overlap between the two clusters is very extensive and the similarity

makes it difEcult for our algorithm to work properly, as for other existing algorithms.

Property 2: Pm is a monotonidiy decreasing fimction of the nomalized inter-cluster

distance D = 11 9 11. Proofi P{FM,M(N-2) (6) < Fpm ) is a decreasing function of 6, see [34, Page 1931 for detaiis.

Property 2 holds since 6 is proportional to the square of D as show in Eq. (4.10). Q.E.D.

Figs. 4.3 - 4.8 show the theoretical Pm and the testing Pm vs. the number of observations

N for the mixtures of two clusters considered in Section 3.6. The dotted lines in these figures

are the theoretical Pm curves and the solid lines are the testing Pm curves baaed on 100

trials for each N. From these figures, Properties 1 and 2 are clearly observed. We a h

notice that there is a discrepancy between the theoretical Pm and the testing one. The

reason is that the actual partition may deviate more or l e s from the ideal partition which

separates the two clusters perfectly. This discrepancy increases as the two clusters become

closer in distance.

Figure 4.3: M b probability cvves for two true clusters: M=l c=0.5

Figure 4.4: Miss probability curves for two true clusters: M=l c=0.2

Figure 4.5: Miss probability curves for two tme ciusters: M=2 c=0.5

Figure 4.6: Miss probability curves for two true clusters: M=2 c=0.2

Figure 4.7: Miss probability curvee for two true clusters: M=22 c=0.5

Figure 4.8: Miss probability curves for two true clusters: M=22 c=0.2

4.3 Probability of A False Alarm

A ialse alatm occm when one cluster is embedded in a data set but the binary detection

says that two ciustsni exist. A simple illustration is shown in Fig. 4.9. So the fslse alann

probability, given Hl (the hypothesis of one cluster), is

In the miss case, Pm is exactly analyzeà according to the Gauasian mixture model. However,

in the Ealse alarm case, the model is a mixture of truncated normal distributions, whose

exact d y s i s is still incomplete in statistics.

Figure 4.9: One Gaussian cluster

The key to the analysis is the Aderstancihg of how our clustering algorithm will parti-

tion a N ( p , 021M) sample data into two clusters. Given a large sample, the sample mean

vector f i and the sample standard deviation vector â of the data are close to the true ones p

and cr respectively. By our partition describai in Section 3.4, two new clusters are centered

in f i - û and f i + û respectively. Therefore, the partition is near symmetric to the true

mean vector and produces two clusters of about equal aize, i.e., the mixing portion c zz 0.5.

Define

Asauming that the sample is reaaonabiy large and following Hartigan's work [32], Hawkins

[23, Page 3391 suggested that asymptoticaiiy, RI is approxhately normal N(pR,, SZRI ) with

the mean 6

and the variance

2 , 2 i r 2 ~ ( r ~ - 2M - 1) UR/ - N(nM - 2)2

Substituting Eqs. (4.5) and (4.6) into Eq. (4.26), we have

M eN +- log - + log N!

2 12 (O.SN)!(O.SN)! < 0 IHi}-

Rewriting the above equation in terms of RI, we then have

where the threshold

and the value of Pi is represented by the shadow area in Fig.(d.lO).

Figure 4.10: The illustration of PI

Property 1: PI tends to O as the nurnber of observations N increases.

Proof: It is easy to YeTify that

2 iim ('p, -p4) = -') - 2 ) > '1 N4ao

(4.32)

iim 0~~ = 0. N + w

(4.33)

Some of these lirniting values are listeci in Table 4.3. By using Chebyshev's inequality [53,

Page 691, we have

Fkom Eqs. (4.32) and (4.331, we hm that

The le& hand side of Eq. (4.34) is the area under the p.d.f. of Rf over the intervals

(-cm, 2E[RI] - Fpf ] and [F4, +m). Hence,

lim Pl = lim P{RI > Fp,} =O. N+ao N+oo

(4.36)

Therefore, Pj tends to O asymptotidiy. Q.E.D.

Figs. 4.11 - 4.13 show the testing PI vs. the number of observations N for the one tmth

cluster cases considered in Section 3.6. The solid lines are the testing Pf curves baseci on

1000 trials for each N. The corresponding theoretical values of P' are very very small, less

than IO-^. It is o b s e d fiom Figs. 4.11 - 4.13 that the testing PI is negligible for medium

and large samples, although a small probabiiity of faise aiam does d t for smaii samples.

Figure 4.11: False alarm probability curves for one true cluster: M=l

Figure 4.12: Faise alarm probability c w e s for one tme cluster: M=2

Figure 4.13: False alarm probability curves for one true cluster: M=22

4.4 Optimal Range of Penalty Weight

If the penalty function in Eq. (4.3) is multiplied by a penalty weight X as described in

Section 2.4, then under Covariance Structure 1, we have

By the same remming as that used for Pmperty 1 in Section 4.2, ne require that

in order to make Pm + O asymptoticaily. This requirement lets us determine an upper

bound for the penalty weight

Similarly, by the same reasoning used to prove Property 1 in Section 4.3, we require that

in order to make PI j O asymptotically. This requirement lets us determine a lmer bound

for the p e d v weight M x,, = --log

7rM 2 n M - 2 '

According to Eqs. (4.39) and (4.40), we ean tabuiate in Table 4.3 the (asymptotically)

optimal penalty weight ranges for the cases considered in Section 3.6.

Table 4.3: Optimal rauges of the penalty weight

Apparently, penalty weights within the optimal ranges listed above may have âifferent

impacts on our clustering algorithm pedotmance for processing small or medium samples,

I D = 2 I N = IOO I ojio 1 0110 I ojio I 0110 I oiio I

I 1 N = i ï 1 ioiio 1 iojio I ioiio 1 ioiro I io/ro I N = IO I 10110 I 10110 1 wro I lono I w o

Table 4.4: Cluster validation results for two true clusters: M=22 c=0.5 X = 1.1

Table 4.5: Cluster validation results for one true cluster: M=22 X = 1.1

dthough their impacts are the Mme in the asymptotic sense. In practice, the number of

*

clusters iP searchecl h m 1 to a pmselected upper bound Km (> 2), one true cluster

- LI io/io lO/lO lO/IO

Saaspl. S i n NI«) N r 100 N = 1000

might be partitioned into a few group if the penalty weight is not large enough. ki thh

case, we could choose the penalty weight sligbtly larger. For example, Tables 4.4 and 4.5

- L4 lO/lO lO/lO lO/lO

Li 10/1o 10/10 10/10

show the clustering results by using X = 1.1 for the case M = 22. It is observed that the

- BIC 1o/1o lO/lO 10/10

La lO/lO 10/10 10/10

performance of our clustering algorithm on srnall samples are slightly i m p d when the

penalty weight increasss h m 1 (se IsbIes 3.7 and 3.9) to 1.1 (see Tables 4.4 and 4.5).

4.5 Summary

In this chapter, we have conducted a binary detection performance analysis of our clustering

algorithm under Covariance Structure 1. We assumed that a partition can separate two

Gaurnrian clusters pedectly in order to analyze the miss probability, and that the partition

of one Gaussian cluster is nearly symmetric to the cluster center in order to analyze the false

a h probability. Ehtemive tests show that these two assumptions ase satisfied fairly well

by using our clustering algorithm developed in Section 3.4. Among four factors considerd

here (the dimension of the data space M, the sample size N, the mixing portion c and

the inter-cluster diatance D), D is the mast important factor. There is a critical distance

Do d&ed in Eq. (4.18), when D > Do, our cluatering algorithm caa successfully separate

two clusters. On the other hand, when D 5 Do, the overlap between the two clusters is

very extensive and it ia difacult for our algorithm to work properly, as for other exhting

algorithms.

Furthermore, we have examineci the impact of the penalty weight under the framework

of the penalized likelihood method as described in Section 2.4. It is found that there is a

range of the penalty weight within which the best performance of our clustering algorithm

can be achieved. Thaefore, with some supervision, are can adjust the penalty veight to

W h e r i m p m the perdormance of our clustering algorithm.

Chapter 5

Application to Int rapulse Analysis

Introduction

We consider the situation where a radar intercept receiver coiiects incoming pulse samples

fiom a number of iinltnown emitters. Our objectives are to (1) detemine the number

of emittem present (cluster validation); (2) ciassiify the incoming pulses according to the

emitters tram which they originate (clustering). The concept of intrapulse analysis ha9 been

introduced in Section 1.1. Briefly, the determination in Uitrapulse analysis is only based on

intrinsic pulse shapes, without any inter-pulse information such as pulse repetition intervals,

directions of arrivai, carrier kequencies, or Doppler shiRs.

In this chapter, we first describe the prepmessing techniques iqcluding data compres-

sion for received pulses, and then formulate the problem of emitter number detection and

pubernitter association into a multivariate clustering problem. After applying the new

clustering algorithm developed in Chapter 3 to the clustering problem, we develop two on-

Line clustering algorithms, one is bas4 on Lnown thresholds while the other is based on

a model-bas4 detection scheme. Performance on intrapulse data by using our clustering

algorithms and SNOB are reported, and the results demonstrate that our new clustering

algorithms are very dec t ive for int rapulse analysis, especially the on-line model-basecl al-

gori t hm.

5.2 Signal Mode1 And Pre-Processing of Received Pulses

Let us first examine the signai representation of the received pulses. The physical scenario

is illustrateci in Fig. 5.1 in which there are, in total, K distinct emitters. The radar intercept

receiver receives altogether N non-overlapping pulses h m the emitters. We designate the

nth received pulse by zn(t; a,,), n = 1, . . . , N. Here on is an association parameter which

assumes an integer value, 4 E (1, . . . , K), such that if a,, = k, then the nth pulse is

determined to be from the kth emitter. We can therefore express the nth pulse as

where

0 fhr denotes the ahlute amplitude of the received pulse;

$n denotes the added phase of the received pulse after transmission;

rn denotes the time delay of the received pulse with respect to the reference;

0 w, denotea the residuai d e r fresuency of the nth pulse;

vn(t) is the Gaussian noise accompanying the nth pulse.

The received pulse in Eq. (5.1) contains several nuisance parameters: w, t,ûm, Sn, and wn-

These parameters are of no use in intra-puise analysis and should be removed. This is carrieci

out by the pre-processing techniques which are introduced in the foiiowing paragraphs.

These pre-proce88ing techniques are intuitive in nature and are carried out 80 that aRer

preprocessing, the pulses received trom the same emitter maintain the resemblance to each

other, whiie thoee fiom dinetent emitters maintain their distinctive features.

K distinct emit!crs 1 rcceiver rtceiving N pulses

* Inter-band and inter-pulse information not usable

Figure 5.1: Radar pulses received for intrapulee anaiysis

Noise Suppression

The received pulses are passeci through a band pas3 fiiter suppressing the out-of-band

noise. We now dehe the amplitude and phase profles of the received signal respectively

as:

In the rest of the pre-process, it is assumed that the SNR is reasonably large so that the

noise contribution has negiigible dects.

5.2.1 Amplitude Normalization

Let %(w; a,,) and 4 J w ) be the Fourier transforms of sn(t; a,) and a,,(t), respectively.

We remove the parameter q,, by a simple procedure of normalization resulting in S:(w; a=)

such that

SA (w; s) - %Aa, (w )e-jWrn g ( ~ ; a n ) = -

S: (0; a, ) *Aam (0)

= A;: ( 0 ) ~ ~ . ( 4e - jwTn . (5.4)

Therefore, q(L; a,), the inverse Fourier transform of &; a,), cm be viewed as the nor-

maiized amplitude profile.

5-2.2 Time Alignment Based on Thresholding

After amplitude normalization, the removal of time shift is considered. The time shift can

be removed by locating the first point, iq(to; a,), of S.(t; an), whose magnitude is iarger

than a pre-set threshold A, i.e.,

to = min{t : ii(t; a,) > A}. (5.5)

We then set 7, = h, and define

We use the same threshold A for di amplitude profiles to align the pulses on the time axh.

5.2.3 Phase Aaustment Based on Polynomial Fitting

After time alignment, the linear dope in the phase profile should be removed. The liriear

function can be estimated by polynomial curve fitting. Let

by minimizing the error between g ( t ) ami b,(t; q) in a least square sense, the cdcients

Un and $n cân be determined. Then, the original phsse +an(t) aui be reeovered in a new

coordinate syetem with the time axis being $n + wnt (Fig. 5.2) when n define

where 7 = arctan w,.

We denote by th>& a.) the resdting nth puise with noise after preprocegsing to remove

the nuisance parameters, Le.,

where ûn(t) is the noise afker preprocessing; s,,(t) = A;i(~)a,,~(t)Cio'n(~l is the ideaily

preprocessed pulse waveform. Notice that a,,, (t) is only dependent on a,,, the index for

the emitter with which this pulse signal associates.

In practice, the above preprocessing procedures of the received pulses are d e d out

in discrete tirne. Thus, Eq. (5.10) can be written as

l

Figure 5.2: P o l y n o d fitting for phase adjutment

where T is the mmpling intenia of the pulses and Mt is the number of samplcs in a pulse;

or in a vector fom

Yn(a*) = 8% + fi* (5.12)

with

5.2.4 Data Compression Using Wavelet Decomposition

The number of samplts M' in each prcpmessed pulse is t y p i d y over 100. Thus, the

number of samples to be procesmi for cluster validation and clustering is very Large. In

order to light& the mmputational and pmcessing 104, we have to cornpress the data

We note that the classikation of puises by intr~pulse analpis is based on pulire shapes

(amplitude and phase). Now, the l m frecluency components of a pulse refiect its basic

shape. A suitable technique for cornpressing the data while retaining the baie pulse shape

is by means of =let decompasition [21,38] by which the low freguency coelaeients can

be extractecl retaining the pulse shape information. Wavelet decomposition is d e d out

by using a chosen filter bank in which each one of the filtere is f o l l d by dom-sampling

by 2. In our case, only the low fkequency components are needed and undergo further

Figure 5.3: Data compression using wavelet decompasition

decomposition as s h m in Fig. 5.3. Due to the proces% of dom-sampiing by 2, the size of

each of the outputs of the l m pasa filter (LPF) and of the hi& pass filter (HPF) after one

stage of decomposition is only haif that of the input. Thenfore, for a three-staged wavelet

decomposition, the output sample size is only oneeighth of the original input sample size.

The number of stages in a wavelet decompœition is a trade-off between the amount of ciab

compression and the de* of the puise shape information retained. Here, we employ a

tbs taged filta bank with synlet( Bltm [21]. The coefncients of t h e LP and HP iilten,

are indicated in Fig. 5.3.

We denote the data vector of Eq. (5.12) after compression by y,(a,) which is comprised

of compresaed signai and noise. Each of these complet data vectors is of dimension M which

is only a fraetion of the dimension of the orighd data vector.

5.3 Clustering Algorit hms for Intrapulse Analysis

After prôprocesging, ail received pubes an aligned well. Then intrapulse d y s b is con-

sidered as a mdtivariate dustering problem: given a c o m p d chta set Y conshting of

N observeci data vectors y, (al), ..., y N (aN), each of which dimension is M, our objectives

are

1. cluster validation, to determine the number of emitters present K;

2. clustering, to determine the association parameter 4 so that for a, = k, y,(&,) is

determinecl to be h m the &-th emitter.

The pmprocem b done in the cornplex domain because amplitudes and pham are consid-

ered separately. For clustering, na form each data vector in the reai domain by putting ita

real part k t and thea ita ;mruinur part, i.e.,

Rom now on, eseh àata vector y, is assumecl to be in the real domaia.

Generaiiy, the noise accompanring a radar pulse vector is Gaussian. Hence, a set of pube

vectors emitted by the kth exnitter is the sample of a multivariate normal distribution. 1s

GaussiBDity stili maintainecl a&r praprocessing? In Appenàix D, Monte Car10 simubtions

show that compressecl praprocased puises am be rtU srniurneci as Gaussian in relatively

high contidence. Hence, a set of puise vectors h m K distinct emittere can be modeled by a

mixture of K multivariate normal àistributions [43]. Thedore, the model-based clus tering

algorithm developed in Chapter 3 can be directly depIoyed.

As introduced in Section 1.1, a clustering aigonthm for intrapulse a d y s i s should be

capable to process high dimensionai data and to pmduce satisfhctory results for small

or medium sample cases. In Chapter 3, we developed a suitable model-based clustering

algorithm and demonstrateci by extensive simulations that it outperforms other existing

algorithme euch as SNOB. Indeed, the model-based clustering algorithm ne developed is

well designeci in an off-line mode to eEéctively process high dimensional with satishctory

performance on small or medium samples. In some cases, on-iine ciustering would be more

desirable because we msy aieh to classi i received puises as they arrive dpunicaiiyy This

motivates us to develop on-line clustering algorithms. One approach is to set up some

thresholds, and then assign an incominp puise to an eristing duster or form a new cluster

according to the thresholds. One reaiization of thin appmsch is describecl in Scaion 5.4.

H m m r , this simple approach may not provide satisfactory résults when the statistics of

received data change in time. To overcome this drawbsck, we develop in Section 5.5 a novel

on-iine clustehg algorithm in which no expliut thresholda are requlled. Performance on

intrapulse data b reported in Section 5.6 by using the model-bæed clustering algarithm,

SNOB, and the tao on-line aigorithms.

5.4 An On-line Clustering Algorithm Using Thresholds

We b t iix two thresholùs ti and t2, the posaible amdidates being the mruimni intr*

duster dispersion and the miaimli inteduster distance, respectively. Let d be the minimal

distance between a pulse and each ciuster center. Then, an incoming pulse is assigned to

an existing duster when d 5 ti, or assigned to a new duster when d 2 t2 , or held to some

stores for subsequent classikation. This algorithm is on-line.

5.4.1 Procedure

The diagram of this algorithm b shown in Fig. 5.4.

1. Initialization:

i) Ch- tao thho lds

tl - possibly maximal intnircluster dispersion,

t* - possibly minimsil inter-cluster distance,

with O < ti < t 2 . These two thresholds can be determineci experimentdy

using ciristing data (with known ground truth).

ii) lhke data vector y1 and assign it to Cl;

m t 1; (m counts the number of clusters)

n t 1; (n counts the number of data vecton)

2. The min process:

Urhile theré is anothex data vector do

n t n + 1; Tdce -or un h m the iaput;

Compute d = dkE[i,...,m] ,]JJu,, - mcon of C&

Find k for which d is minimum;

if d 5 tl ,

wign y, to Ck;

e k i f d z t 2

m + m + l ;

store y,;

if the storage size 2 TI, then goto step 3;

end;

end;

end;

3. The storage pmeess:

If the store is empty,

et0 step 2;

else

n t O;

execute step 2 but now the vector y, is taken fiom the store;

if the store bas changed,

goto $tep 3;

else

mve the current y, to a mt-identification store;

end;

end;

4. The postiidentitication pmcess:

For each àata vector y, in the partidentification store,

assiga it to the cluster which is narmrt.

Suppose that after the on-line process N M-dimensiod data vectom are assigned to K

dustem, resulting in that the kth duster has Nt data e r s (k = 1, - , K). Obviously,

N 1 + N 2 + - + N * = N. ThestonsizeT, i s u s d y s e t to b e a d n u m b e r s o that

the computationai cœt of the storage process ie negligible compared to that of the main

procea9. However, The computational oost of the main proceas msy vary when the same set

of pulses arrives in a dinerent sequence. This mdres the exact complexity d y s b d&dt.

In this subsection, An upper compleJcity bound and an average complexity are analyzed;

Then an example is illustrafed in the end.

Upper Bound

In the main proeess, than are trio dominant operations:

1. For each incoming data vector, its distances to all existing k clustas are cornputcd

and then the minimm distance is chœen.

As discasaed in Step (a) of Section 3.4.2, this operation appmaimately reguires 3Mk

flops. There are at muet k clueters so that the computational cost of this operation

to proce58 aU N data vectors is upper-bounded by

2. After an incoming data vector is assignecl to an d t i q cluster, the cluster center has

to be updated.

(a) The main psocesa

Ca (b) The storage and poet-identification procesaes

Figure 5.4: The diagram of our on-line clustering algorithm using thresholds

As cihuswd in Step (b) of Section 3.42, if the nth member of a duster aniveg, it

requires M(n + 1) flopa to update its center. Note that thm is no ciuster cent=

update as the h t member of a duster arrives. Thus, the computationai coet of this

operation to process aü N data vectors is

where we have used the fact that

Therefore, the upper eomplexity bound of the on-line dusterhg aigorithm describeci in

Section 5.4.1 is

Bal = M N ( ~ K - 0.5) + 0 . 5 ~ ~ ~ . (5.13)

Average Compldty

Let us examine the case when each duster ha3 the same number of memben (i.e.,

N~ =& = -..= Nk = $). Since the computational cost of step 1 in the main process ïs

sensitive ta the sequence in which the pulses arrive , we assume that the members of the

b t cluster arrive Grst, and then those of the second ciuster arrive next, and so on. The

computationd coet for this case is roughly an average complexity of the on-line clustering

algorithm. SimiIarLy aa in the uppa-bound analysis, the computatiod costs of Step 1 and

Step 2 in the main procem hem are given respectively by

Therefore, the average complexity of the on-line clustering algorithm described in Sec

tion 5.4.1 is apprCIOtimELtely

Example: Let us eoiuider the example shown in Section 5.6.2, then are 100 44-dimensioxmi

preplocessed puise vectors (N = 1W, M = 44) and 5 emitten, (K = 5). Substituting the

value8 of N, M and K in Eqs. (5.13) and (5.14), we have the upper bound and the average

complc%.i~ as follmm:

Now consider using the ofMine model-based aigorithm developed in Section 3.4. Given

that Nt = 5 (Nt U the number of ikrations fbr clustcr center convergence) and that K- =

8 (K- is the maximai number of dustem to be d d ) , then according to Eq. (3.70)

we have

Cd = 11.70Mflops

which is the computationa3 met by using the model-based aigorithm for the a h example.

Obviow1y the on-line algorithm is much @ter than the off-line model-bssed algorithm.

Unfortunately, the pedormance of the on-line algorithm is usualiy inferior to that of the

model-based aigorithm as wi l l be s h in Section 5.6.2.

5.5 A On-line Model-Based Clustering Algorit hm

A major admtage of the foUowing new aJgonthm is that no explicit thnsholds are re

quired. The algorithm dynaxnicaily incorporates duster splitting, merging and repuping

operations by using the model-based detection modided h m the clustering algorithm in

Section 3.4.

On-line P rocess

As incoming data vectors are being classifieci into dusters, the sizes of clusters will

continue to grow. A aize increment counter is set for eaeh duster, and a cluster is checked if

its size increment counter réaches a pnset threshold, Tc. In the very beginning, the first t

pulses are assigned to the k t cluster. Checking a cluster or two clusters involves a binary

detection fiiriction dehed as follows:

[K., G1, &] = bâetector (data).

If data cornes h m a single duster, the Wetector funetion returns

either K* = 1 indicating no splitting the dusta;

or K* = 2 indicating splitting the cluster into G1 and &.

If data cornes h m taro dustem, the bâetector function retunis

& h a iC8 = 1 indicating merging the two clusters into one;

or Km = 2 indicathg rregn,uping into the clusfers c.l and &.

This b ' ï detcction- function is easily formed by setting K- = 2 in the clustering

aigorithm d d b e in Section 3.4.

As mentioned previously, a cluster is checked when its size increment counter reaches a

pnset threshoid Tc. If the refurned value K* = 1, then the size increment cornter is resct

to O; EK* = 2, then the duster is split into two new on= Ca and &. Since these two new

dusters need to be merged or regroupecl with other BUsting clusters, the c l m t existing

cluster is found for eada new cluster and both of them are sent to bdetector to check

if they should be merged or regrouped. Therefore, as pulses arrive, the sizes of clusters

increa~e, and the aigorithm conducts ciuster splitting, merging and regrouping operations

appropriately by using the abore scheme.

Poetproceasing

In fht, the sama set of pulses arriving in a Mixent sequence may result in a dinerent

clustering structure since each pulse b c W i e d into a duster according to the previously

arrived pube information during the on-line proces. In other wotds, the pure on-line proces

is c m , though it is very Ésst. Therefore, a post-processing scheme is introduced below

to i m p m the classikation accuracy. mer the on-liae proass, ne regroup ail received

palses aecording to the existing duster centers and then compute the new centers. The

regrouping procescl is repeafed a few times (say Nt times) until the centers converge. Then

each clusta is ch& if it rhould be split , merged or regrouped with another cluster. This

ha i splitting, merging and ngouping proearil is repeated a fen timeo (say Nr times) until

the dutering structure converges. AAa the poetptocessing, the performance of the on-line

dustering is elmgt the same as the &Une dgarithm in Section 3.4 but it b mueh faste.

5.5.1 Procedure

The flw disgam of this aigorithm is shom in Fig. 5.5.

1. initialbation:

i) Define a b i i detection fundion: [K., c.~, Ce] = Wetector (data).).

ii) Tdre data vector yl and assign it to Ci;

m t 1; (m counts the numùer of claritcni)

n t 1; (n counts the number of puise samplcs)

i s c i t O; (increment of the size of 4)

2. The main process:

Whiie there is another sarnple do

begh

n t n + 1; 'hke data vector y, from the input;

Compute d = ~ E ~ l , . . . , m l Ilun - m e ~ n of 411;

Find k for wbich d is minimum;

Assign y, to Ct and increase i s c k by 1;

if isch < Tc, goto Step 2;

else

&ta = CI, check data by the binary detector,

a y = i i s e k t O (met i sck);

goto Step 2;

end

if 4 = 2, then goto Step 3;

end;

end;

3. The ciuster splittingl-upinglmerging process:

mer Step 3, the number of the clusters, m, may either increase by 1, or decrease by

1, or temain the same. The cornter index stiU ranges fiom 1 to m.

m t m + l ;

Cm 4- Cn;

Ck + Cri;

Find Cj whi& b d m t~ CL;

dotu = Ci + Ck, check &tu by the binary detector;

if ki = 1 (indicates merging)

Ci t Cj + 4; i s c j t O;

C k + cm; delete Cm; m t m- 1;

Find Cj which is CI-t to 4;

data = Ci + CI, check &tu by the binary detector;

if ÿ = 1 (indiates merging)

ci t Cj + ck; i a c j t 0;

ck t Cm; i 8 c k t 0;

delete Cm; m c m - 1; end

if = 2 (indicôtes regrouping)

Cj t Czl; a s e j t O;

Cm t Cz2; isc-m e 0;

end

end

if ki = 2 (indicates regr~uping)

Cj + Czl; i~d t O;

ck cca; h G k ~ 0 ;

Fhd Cr which is daiest tû C';

&ta = Ci + Cm, eh& duta by the b i i detcetor;

if = 1 (indicates ma&)

C, t C, + Cm; isc-j t O;

delete Cm; m t m - 1; end

if 6 = 2 (indiate regrouping)

Ci t CS1; b y + O;

Cm 4- Ce; i s c m t 0;

end

end

5.5.2 Computational Complexity

Suppose that &sr the on-line process N M-dimezsional data vecton are assigneci to K

clusters, resdting in that the kth duster has Nk data vectors (k = 1, , K). As rnentioned

in Section 5.4.2, the exact complexity dpis is difFcuit due to the fhct that the compu-

tational coet of on-line ptocess may vary when the same set of pulses d v e s in a dinemet

sequence. In this subsection, An upper complexity bound and an average complexity are

analyzed; Then an example is Uustrated in the end.

Upper Bound

There are three stem in this on-üne clustering algonthm:

1. The h t is the main process.

The main process here is the same as the one considerd in Section 5.4.2. Hence, the

computational cost of this process according to Q. (5.13) is upper-bounded by

(a) The main process

(b) The cluster splitting, merging and regrouping process

Figure 5.5: The diagram of our on-line model-based clustering algorithm

2. The second is the cluster splitting/regmuping/merging proces.

The dominant complexity for the binsy ddector is to partition the given number of

data vectors into tno clusters, i.e., K- = 2. Accordhg to Eq. (3.70) in i n i o n

3.4.2, this computational coat io 7Mn& where n is the numbet of data vectors

sent to the b ' ï detector and Nt is the number of iterations for cluster center

convergence. We need to use the b i i detector timee to check if a cluster should

be split, and such a checking may result in one or two more b ' i deteetions to

pmue duster repouping and merging. Thus, the totai number of times (the binary

detector being used) varies between and 3& Obviously the number of data vectots

sent to the binary detector is at mmt N. H m , the computational cost of the duster

spÜttinp/regmuping/me~@g process b upper-bounded by

3. The thid is the postproceas.

To ~ u p dl data vectoxs according the k cluster centers by Nt times (see Steps (a)

and (b) in Section 3.4.2), the computational eoet is

Nt [MN + ~ M N K ] = M N N ~ ( ~ K + 1). (5.15)

The h a i splitting on k dustem requites wing the binary detector K times so the

computatiod ca t of the h ù splitfing is

Then are O.SK(K - 1) diffe~ént combiitions to send two clusters to the binary

detector for the final regrouping/merging purpoee so the computational cost for the

fiiial regrouping/merging is

Flrthennore, The Bnal splitting/regrouping/merging operation is repeated N, times.

Hcnce, the eomputatiod cœt of the fiad splitting/regrouping/merging operation is

Therefore, the total computational cost of the on-line clustering aigorithm describeci in

Section 5.5.1 ia upper-bounded by

Average Complexity

As in Section 5.42, we clrnmine the case w h m each chuter has the ssme nnmber of

members (Le., Ni = Nz = - = = Nk = f ). Since the computationd cost of the on-line

pmcees is sensitive to the sequence in whkh the pulpes arrive, we w e e that the members

of the h t cluster arrive h t , and then those of the second cluster arrive next, and so

on. The computational cust for thb case is roughly an average complexity of the on-line

clustering aigorithm. Noa we examine the thne steps in t his on-line clustering algorithm

1. The first is the main pmms. The computatiod cost of thia process according to Eq.

(5.14) i appmxixnately

2. The second is the cluster splitüng/regrouping/merging process. We neeà to estimate

the times the K i detector bebg useù, and the number of data vectors involved

each tirne. For simplicity, an average number 1.5$ is used here, and an average size

of data sent to the binary detector is assumed to be #. Hence, the computatiod

cœt for the d u s t a splitting/regrouping/merging process is appmximately

3. The third is the p-process. The computational cost of the post-process is the same

as the one in the uppa-bound d y s i s , i.e., M N N ~ ( ~ K + 1) + ~ M N N ~ N , ~

Thdore, the average complenty of the on-line clustering aigorithm described in -ion

Example: Let us considex the Mme ezample as in Section 5.4.2, th- are 100 44-

dimemsional preprocessed pulse vecton (N = 100, M = 44) and 5 emitters (K = 5). Given

that Tc = 10, Nt = 5 and N, = 2, Subetituting the values of N, M, K, Te, Nt and N, in

Eqs. (5.16) and (5.17), lm have! the upper bound and the average complexity as folows:

As illustrateci in Section 5.4.2, the computational cocrt by using the off-line rnodel-based

algorithm for the same esample is 11.70 mops. Obviously the on-line algorithm is faster

t h the *line aigorithm while its performance is almat the same ze that of the off-line

~~unttzpart as wi l l be shuwn in Section 5.6.2.

5.6 Numerical Experiments on Intra-pulse Data

To illustrate the data pmprocessing techniques and the eEectiveness of the clustering algo-

rithms developed in this chapter, we have camïed out numerous experiments ushg cornputer

simulated data. Ali programs including those for pre-processing and data compression an

written in MATLAB and the simulations are run on a Pentium PC (400MHz).

5.6.1 Pulse Generation

We generate the pulses sccording to the signal rep~e~entations Eq. (Sol), such that

the &tribation of a b l u & axnplitude % b d o m in [0.5,1];

e the distributiori of initial phase tCtn b UJilform in [or, T];

the distribution of time dehy 7,, is d o m in [O, ST];

the àistrïbution of carrier hquency w, is Gaussian and its standard deviation is 10

percent of its nonnalized mean value;

0 the distribution of additive noise v,(t) is Gaussian with zem mean and the standard

deviation about 0.05.

h m a gim set of signature ~~ {sk(t)), we can generate received pulsea using the

abuve parameter distributions.

Fig. 5.6 (a) and (b) show the amplitude and phase of a p u p of 100 puises received by the

detector. Those are h m 5 dSerent emittem each transmitting 20 pulses. The five emitters

are numbered by 1,2,--,5. if a puise is h m the kth emitter (k = l ,2, ,5), then its

emitter index is k. Suppose that K clusters are determined by a clustering algorithm and

that thme K dustem are numbered by 1,2, - , K. if a pulse is assigneci to the kth cluster

(k = 1,2, œ o o , K), then its cluster index is k. By cmetichccking cluster indices against

emitter indices, we can count the classification accuracy in percentage.

Prapmcessing

For the pulses shown in Fig. 5.6 (a) and (b), the pre-processhg techniques of amplitude

normaiization, time alignment and phase sdjustment as described in Section 5.2 are applied

to remm the nuisance parameters. The amplitude and phase of the p r e - p r d pulses are

s h o n in Fig. 5.7 (a) and (b) respectively. Each pulse is represented by 128 time samples.

A Slevel wavelet decomposition ushg sweq Elters is then applied to each of these pre

p r o c d pulses and only the lm kquency ater output samples are retained. Each pulse

is non tepmenteci by 2Zdimensional samples (The number 22 > 12818 is used because of

Figure 5.6: Amplitude and phaae of 100 received puises h m 5 nnlniown emitters, where 2-axis D the inda of data ample points.

Fi- 5.7: Amplitude and phase of the prc+procesd pulses, where z-sxis is the index of data samp1e pointa.

Figure 5.8: Amplitude and phase of the compresseù preprocessed pubes, where x-axis is the index of data sample points.

Figure 5.9: Determiaation of the ninnber of emitta using our (off-line) clusterhg algorithm

Tsble 5.1: Clustering results for Example 1 by the off-line model-based clustering algorithm

>

the transient dect of the filtas). As a result, data cumpression has been achieved. The

Cluster index

1

amplitude and phase of the comp& puises are s h m in Fig. 5.8 (a) and (b) respectively.

Li the fou-, clclustering is based on the compressed data F'urth-ore, to Compare our

Number of puka

20

dustering algorithm and SNOB on a hir bis , Covariance S t ~ c t u r e 4 is assumed.

Exnitter index assignexi to the pulses

~ ~ i r i i i i ~ i i l i i i i l i i i

The off-line model-baad clustering algorithm

The clustering aigorithm deveioped in Section 3.4 is applied to the compresseci data

The evaluation of L(Y, K) for various values of K is plotted in Fig. 5.9. The number of

clusters which is the value of K at which L(Y, K) is minimum is correetly determined to be

5. The association of the pulses using the dustering algorithm is show- in nble 5.1. It can

be obeerved that apart fiom the seven pulses in Emittas 2 and 3, ail the other pulees have

been correctly associated. The classikation acnvacp in this case is 93%. For this example,

Figun 5.10: Determinstion of the number of emitters using the SNOB program

1 Cluster] ~pmai. 1 Emitter index 1

'fàble 5.2: Clustering mults for Example 1 by the SNOB algorithm

indat. 1

the dustering aigorithm taha appmxhate 20 seconds to produce the above mults.

SNOB

The SNOB aigorithm has a b been applied to t h example, and the evaluation for

dinerent values of K is plottd in Fig. 5.10 while the dustering results are shoum in Table

5.2. It is obhierved in 'hble 5.2 that the SNOB aigorithm fails to ideatify all the emittexs

and the si@ from Emittanr 2 and 3 caanot be distinguished. SNOB is written in Fortran

so it b W c d t to compare its s p d with our MATLAB prograra. As dïscusd in Section

3.5, SNOB in priiiciple is more complex than our off-line clustering algorithm.

The on-line algorithm ushg known thresholds

We apply the on-line aigorithm developed in Section 5.4 to the example with ti =

O.W, t z = 0.08, T, = 20. The dustering result is s h m in Tsble 5.3. The number of

Ob pulses 20

-ed to the pulses 11111111111111111111

Table 5.3: Clustering resuits for Example 1 by the on-line dustering algorithm using known threshofds

index 1

clustas is 6 and the clasdication acamcy b 8%. Comparing the resuits of the on-line

with the off-line methods, m h d t h t t b omline renilt is slightly inférior. Hawever,

h m the computational standpoint, the on-line aigorithm based on the tao thresholds is

of p u k s 12

much simpler than the ofl-line counterpsrt. This on-line algorithm takeg less than 1 second

assigned to the pulses 111111111111

to produce the above nsults. R d that the off-line clustering procedure describecl in

Section 3.4 indudes duster splitting and repuping operations. 0bvious1y1 we may i m p m

the performance of this original on-&ne algorithm by introducing cluster splitting, mergbg

and repuping operations appmpriately. Howwer, more thresholds are needed.

The on-line modebôased algorithm

We apply the on-line aigorithm developed in Section 5.5 to the example. The dusterhg

resuit is shuwn in 'hble 5.4. The number of ciuters is 5 and the chmikation accuncy is

93%. The results are the Bame as thœe in Thble 5.1 producecl by the &line model-baeed

algorithm. This on-line algorithm taLes appmcimately 6 seconds to produce the above

results- We also note that, for a given data set, the eomputation cost of the on-line aigorithm

is much lcnuer than thst of the &line. h m Eq. (3.70), ne know that the computational

complexity of the offüna model-based algorithm is appmximately proportional to K&.

The computational cost is reduced signifieaatly by binary partitions K- = 2 involved in

the on-line model-based algorithm. In this sense, we condude that this on-line algorithm is

a fast version of the off-line model-baseà aigorithm.

Table 5.4: Clustering results for Example 1 by the on-line model-based dustering algorithm

Cluster index

1 2 3 4 5

5.6.3 Conclusions

Many other simulation examples have been carried out using diilerent number of d v e d

signais, dinerent number of emittae, and different distributions of random signal pram-

eters. The foIIowing are general observations &am h m the mults of duster validation

Number ofpulses

20 19 21 20 20

and clustering:

Emitter index assigned to the pulses

l t l l l l l l l l l l l ~ l l l l l l 2212223222222222333

2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ' 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

1. The resuits of clustering employing m m p d data h m a 3-stage rynlef$ waveiet

filter bank are in general the same as those employing uncompressed data

2. Judghg h m peiformance, our model-ùaseà off-line dustering algorithm s h m mu&

hîgher Wbility in duster validation than SNOB, while sanificing margiiidly on the

accura~y in clustering.

3. The performance of out model-based on-line clustering algorithm is a lmat the same

as that of the model-based off-line aigorithm but is much &ter.

For O h t i o n 1, it cleems that the original pulse contains redundant information,

so by eompming it, adequate information is still retained. The pafonmnce of the new

off-line clusteziilg aigorithm and SNOB has been compared intensively in Seetion 3.6; The

re~dts there well justifies the second obseFvation. The last observation show that our on-

line model-baseci clustering algorithm is a f i ter version of model-based dustering whiie

retairriog the quaiity of performance.

F'urthermore, it is found that the b a t performance of our clustering algorithm for in-

trapulse analysis is u s d y achieved by using Covariance Stnicture 4 describecl in Section

3.3.4 if the d&dt p d t y weight (A = 1) is used, and that the best performance is usuaüy

achieved by ushg Covariance Structure 2 desaibed in Section 3.3.2 when supervision is

availsble (Le., the penalty weight can be adaptecl).

First, we have developed preprocessing techniques to remove some nuisance paranieters

h m received pulses. Thme indude the a b l u t e amplitude, initial phase, tirne delay and

residual d e r kquency. As a a d t , an have formulated the problem of emitter nnmbu

estimation and pubernitta association as a multivariate Gaussian dustering problem. In

or& to reduce the computationd cost for clustering, a data compression method based

on wavelet decompoaition has a h been included in pre-pmeeasing. The prpprocesahg

techniques are intuitive in nahue and an d e d out so that after preprocessing, the

pulses received hom the same emitter maintain the resemblance to eaeh other, while th-

fiom differe~t emïtters maintain theh distinctive features.

Afta applying the new off-line clustering algorithm developed in Chapter 3 to the clus-

tering problem, we have developed tao on-line clustering algorithms: one is based on known

thresholds while the other maka use of the model-based detection moditied kom the off-line

clustering algorithm. Although the on-line dusterhg based on thresholds is computation-

dly very effective, it is diilicuit to A p t the thresholds as the statistics of received pulsee

changes in t h e . In conhast, the on-line model-based clustering aigorithm does not q u i r e

explkit thresholde; It caa dynamically incorporate dueter splitting, merging and regrouping

operations as the statistics of the received pulses changes.

The performance of our clustering algorithms and SNOB on intrapuIse data have been

reported. The results demonstrate that our new clustering algorithms are very &the

for intrapulse analpis, especidiy the on-iine model-based algorithm. Therefore, the on-line

model-based dustering aigorithm is suitable for near d-tirne implementation, which wil l

be arplored in the next chapter.

Chapter 6

DSP Implement at ion

6.1 Introduction

In the prwious chapters, na have deveioped sevad radar pulse classihtion algorithma

based on Minimum Encoding Inférence, and d d b e d the fiarnewotk of the penalized liLe-

lihood method. Extensive simulations show that the performance of our new algonthms is

promising, e s p d y the on-line model-based algorithm which is well suited for dynamidy

dassifying incoming radar pulses. As a result, ne have imphnented this on-line dustering

algorithm as a core classification module on a TMS320C44 DSP board.

In t h chapter, the DSP implementation for intrapuIse Wysis is desaibed. We do

a simple sndysb of the physid sceaarioe at a radru intercepter and estimate the likely

maximal incoming pulse rate. We then propose a suitable system diagram and investigate

ihe system requirements. The benchmark of the DSP coding of our on-line dustering

algorithm is reporteci.

6.2 P hysical Scenario Analysis

In this section, ne describe some examples and discuss how radar characteristics [57] can

aâect the operation of a typicai rrrdar intercepter. Here and throughout ne assume that the

radar intercepter is passive in all directions.

Figure 6.1: Physical we&o example 1

Figure 6.2: Physical scenario example 2

6.2.1 Probability of Receiving Overlapped Pulses

G i m a radar, let PRF (Puise Repetition hquency) be 1 0 pulses/sec and c be the speed

of light, then the maximum range of the radar is

C R- = 2 x PRF = 150 km.

Let range resolution AR be 150 m, then the pulse width is

So the duty cycle 1

rxPRF=- 1000.

It means that even if a radar intercepter receives pulses h m hro radars with the same

specifications given above, the probability of receiving overlapped pulses is ody 1/1000,

assirming a perfect time synchronization. The iiiustration is shown in Fig. 6.1.

Figure 6.3: Physical Sceiiario example 3

63.2 Receiving Puise Sequence

Given a radar with antenna beamwidth BW, the average incoming pulse rate to the intercept

receiver generated by the radar emifter is

rate = PRF - PRF x B W

360° /BW - 360"

Let a radar emitter with PRF = 1000 pulses/sec and BW = lu, then at the intercept

receiver, rat6 i 3 pulses/sec. If the antenna rotation rate RPM is 30 rpm (rotations p a

minute), we wil i observe 6 incoming pulses h m the intercept receiver in every twesecond

perioà. The illustration Q shown in Fig. 6.2.

6.2.3 Near-Far Phenornenon

The signal pona of an incoming pulse at the intercept nceiver is i n d y proportional

to the square of the distance between the radar and the interceptor. Consider the caae

that one emitter is dose to the interceptor and another is far away, it is possible that at

the interceptor, the incoming pulse amplitude generated by a sidelobe of the close emitter

overwhelms the puke amplitude generated by the mainlobe of the Ear away emitter. This

means that in a given geographic ans, the number of detectable emitters is K f e d by this

Fhquency Band: 3GIh or 9GHz PRF Puise Width (r) Duty Cycle (FPRF)

3.4ûKHz 0.05p 0.17 x IOœ3 1.70KHz 0 . 2 5 ~ 0.43 x lOo3 0.85KHz 0.5 - 1~ 0.43 - 0.85 x lOw3

Antemm Beamwidth (BW): 0.8 - 2.5 degrees Antema Rotation (RPM): typical30 rpm (rotation per minute)

Sîdelobes relatives to main beam: amund -25 dB

Table 6.1: DECCA Groupa 9A and 12A relative motion marine radars

near-far phenornenon. The illustration is ahown in Fig. 6.3.

6.3 Maximal Incorning Pulse Rate

Some important parameters for a set of typicd marine radam (731 are listed in Tsble 6.1.

Suppase th- are a numbei. of ships in the SuIveiULulce areq each equipped with tao

radars. Under normal circumstances, the number of shipg may be mund 20. However, the

ships usually have dinerent types of radars, which csn be identifieci accordhg to inter-puise

information such as d e r fkquency and pulse width. It is very rare that 40 same type

radsrs are operating at the same time in the same area. Neverthdes, we assume for the

worst case that the maximum number of the same type radars present is 40.

Assume that the radar mes are given in Bble 6.1. Then the mnrimni incoming pulse

rate generated by a radar emitter is attained by using Eq. (6.1) when PRF = 3.4 KHz and

BW = 2.5':

rat- = 3400 x 2.51360 - 24 pulses/sec.

The minimai hcoming pulse rate generated by a radar emitter k attaineà by using Eq. (6.1)

when PRF = 0.85 KHz and BW = 0.8":

Therefore, when 40 same type radars are operating at the same time in the same area, the

incoming puise rate in the intercept meiver is at most 40 * 24 = 96û pulsss/sec.

Figure 6.4: The DSP system chgram for intrapulse anaipis

In the fotlowing sections of this chapter, we WU assume the maamal incominp palx

rate at the intercept receiver is 1ûûû pulses/sec.

6.4 System Diagram And Requirements

As describeci in Section 5.5, the on-line clustering algorithm dynamically performs duster

splitting, merghg and regrouping operations as pulses arrive. It is a b suggested that ne

can not determine the pulse-emitter zmociation right away since the clustering structure is

being updated continuously. Here ne assume that the maximd incoming puise rate is 1000

pulees/sec and we output the puhemïtter mmchtion after 20 seconds, i.e., a 20-second

latency is introduced. ThetefiDre, ne need to process up to 20,000 pulses in every Msecond

period. The whole procese is divided into t h independent modules: preprocess, initial

grouping and on-line dustering see the system diagram in Fig. 6.4. Next let us discuss the

computational coet and memory requirement on each module.

The preprocessing desaibed in Section 5.2 comists of four steps: curve rotation, amplitude

nonnaliwLtion, t h e aiignment and data compmasion. Counted by our MATLAB pmgrams,

it repuires 25 Mops to pmproces8 a puise with amund 128 saxnpling points to a 44-

dimensional vector. So 20,000 pulses mquire 500 Mflops. To totally store the 20,000 pra

processed pulses, we need a memory:

20,NMl pulses x 44 floating pointsfpulse x 4 Bytes/floating point i 3.5 MBytes

6.4.2 Initial Grouping

The purpœe of this module is to d u c e the workload of the on-line clustering (Modale 3).

This is achiemd by grouping 20,000 puises hto a certain number (say 512) of groups. Then

in Module 3, we only deal with the mean vectom for the 512 goups, instead of the whole

data set. In the foilmhg, we consider h m to group N M-dimensional vectors into K

P U P *

One simple a p p d is by using the foliowing bafch deme: split the whole data set

into tno groups, and then split caeh goup into another tao groups, and so on. If a group

size is iess than jf = 50, then thh gmup is not split any more. The illustration is s h m

in Fig. 6.5. In the h t stage, N M-dimensional veetom are split into two groups with the

shes being Ni and N2 respectively (Ni + N2 = N). One dective method is the k-means

algorithm which has been ueed in Section 3.4.1. Let Nt be the number of iterations for

cluster center convergence, than the computational cocrt for this stage is roughly 7MNNt

by using Eq. (5.15). At the second stage, esch group is further split into Caro new groups,

and the computatiod cost is

Assume that the average number of splitting stages is L, then the total computational cast

for initial grouping is roughly 7MNNtL.

To split 20,ûûû pulses into 512 groups, we need do in average 9 stage splitting (512 = 2',

i.e., L = 9). Hence, @ven N = 2O,OOO, M = 44, K = 512, Nt = 5, and L = 9, the

computation cost is rouwy

74MN x L = 280 Mflops

To store the mean vectors of the 512 ~ X O U ~ , ne n d a memory: 400 x 44 x 4 - 70 KBytes.

Figure 6.5: The tree structure for initiai grouping

The procedure of the on-line dustexhg algorithm has been descfibed in Section 5.5.1. When

a pulse arrives, it is d g n e d to one of existing dustem accarding to the minimum distance

principle. A ciuster is checked if its size increment counter reaches a preset n u m k Te. After

on-he proces, the final regrouping is mpeated by Nt times and the &ml splitting/merging

is npeated by N+ timea. The cornputafional cat of the on-line aigorithm bas been d y a d

in detail in W i o n 5.5.2. The mrnmrl number of same type radar emittm in a siuveillance

area is assumeci to be 40 (see details in Section 6.3) so that the largest number of clusters

here W around 40. G i m that N = 512, M = 44, K = 40, Tc = 10, Nt = 5 and N+ = 2,

Substituting them in Eq. (5.171, ne have

which is the average complexity for the on-lïne clustering module. The memory for keepiog

the clustering structure is 40 x 44 x 4 = Wyta < 1OKBytes.

6.5 C/DSP Coding of On-line Clustering

In this section, we bridy introduce our C/DSP coding of the model-based on-line clustering

algorithm as a core clustering module on a TMS320C44 DSP board. The tools to cornpiete

this task are

TMS32ûC3x/C4x floatbg-point DSP code generation tools [74-n].

'Pable 6.2: The benchmark of DSP codes of the on-line clustering

Programsize Machum àata size CPU processing fime

(aaxnple 1 in Section 5.6)

0 Code Composer 1781 (ooding and debugging) for C and Assembly h m GO DSP

Corporatiun.

2.5Kuords 448Kw~rd.a

on-line procsiag: 0.2 seconds post-procassing: 0.3 seconds

TMS320W (6ûWop) ddopment board (Dakac) and its SDK (software devdop

ment kits) (79,801 h m Spectnun Signal Proceshg Inc.

The MATLAB programa developed More an redesigned into C pmgram~ which ase

converteci to DSP codes bp TI DSP code géaeration tode [7m.

To eSciently Iode memory for data and program, ne need to study the phpical con-

figuration of the DSP board [79,8û], the local memory (SRAM) pmvided for the C44 is

2M bytes, i.e., 512K no*. We could use the srnaIl memory mode1 because all large ai-

rays in our programs are aliocated at nin-time h m a global pool, or heap (using mslloc).

Therefore, we partition the l d 512K-word memory into taro parts: one with 61K words

is reserved for the small memory model and another with the rest 448K words is proyidd

for the heap. With this memory allocation, our d e s wii l nm without any problem in the

small memory madsl even when 10,000 pulses and each with 40 chta points are p r o c d

(Note: 10,000 x 40 < 448K). In addition, the system staelr is allocated in a 2K atord

i n t d RAM of the C44 chip.

To proce88 the compressed data of Example 1 in Section 5.6. The C44 processing tirne

itsdf takes 0.5 seconds, counted by the profilhg fatufes of the Code Corn- [78]. In

hct, the on-line process takes 0.2 seconds and the post-processing takes the remabhg 0.3

seconds. Thus, there is a tradeoff between the speed and the performance.

As shoam in a b l e 6.2, our on-line clustering module for the TMS32ûC44 development

board is very e8icient in sise and speed. It has the capability to pmeess àata array as large

as 44ûK mrâs. T b on-line clusterhg module ir ready for deplayment on a multi-pmessing

DSP board together with otha modules such as preprocessing.

Chapter 7

Conclusions

In this thesis, we have m v i d Baylesian hfhence and minimum encoding inference indud-

h g WaUaceb minimum message length (MML) and Rissanen's minimum d d p t i o n length

(MDL) for modei selection. It b fbnd thst the MML eoding le@ is more a a w a t e than

the otha two h m the standpoint of quantization. W model selection criteria considered

here consist of tno parts, one is the log-Mœiîhood function which measures the goodness

of fit betftteen the data and the model, and the other is a penalty furiction which measufee

the complexity of the modd. An inference method a b to balance the tradeoff between

gbodness of fit and model mmplexity. As such, a penalty weight for the penaity function

to control the trade-off has been introduced.

Based on minimum encoàing inference, au appropriate measure of codïng length has

been proposed for cluster validation. huthermore, the coding lengths under four dinerent

Gaussian mixture models have been fuily dhved. This provides us with a criterion for

the development of a new clustering aigorithm. Judging h m the perforrm~~~ce cornparison,

out oodiag length m m u e outperfonns the Bayesian Inference Criterion (BK) in clusta

validation since it is not based on the large sample asumption as is BIC. More importantly,

the new dustering algorithm shows much m e r reliabiiity in dwter validation than a d-

known clustering algorithm SNOB, while &cing ody ma~ginaUy on the accuracy in

clustering. Indeed, our dustering algorithm is weU designed to &ectively process high

dimensional data with satisfactory Wotmapce on smali and medium samples.

The etrot perfiormance of our clustering algorithm bas b m evaiuated under ~easonable

assumptions. Tao types of ermm have been adyzed: miss and f i alarme Among four

factors considered hem (the dimension of the data space M, the sample size N, the mixing

portion c and the inter-ciuster distance D), D is the most important fhctor. There is a

critical distance Do d e e d in Eq. (4.18) such that when D > DO, OUI dustering aigorithm

can successfully separate two ciusters a d that when D < Do, the extensive m l a p betwieen

the tro dustem wili cause our algorithm to fail, as like other dustering dgorithms such

as SNOB. Ibthermore, ne have apmined the impact of the penalty d g h t under the

hmework of the penaüzed IWihood methd. It is found that this a range of penaity

weights within which the bat performance of our dustering algorithm can be achieyed.

Thdore , it is suggested that with some supervision, we could adjust the penalty weight

to Wher improve the performance of ow ciustering algorithm.

Another important contribution of this thesis is the application of o u . clustering dg*

rithm to intrspulse anaiysis. We have developed the prepmessing techniques to remove

nuisance parameters ficm received pulaea and formulateci the problem of emitter number

detection and pubernitter association as a multivariate clustering problem. In order to

d u c e the computatiod cmt for dutering, a suitable data compression method based on

waveiet demmposition has ab0 pmpged. These pre-pmcessing techiqua, are intuitive in

nature and are d e d out so that after pmproassing, the pulses received h m the ssme

emitter maintain the tesemblance in eafh other, whüe th- from din't emitters retain

th& distinctive fat-.

There are severd m o r s that make the task of dustering a ddienging one: (1) the di-

mension of data vectors ie high; (2) satisfactory pdormance on small samples is desirable;

(3) near mal-time implementation is re~uireù. The model-based clustering algorithm devel-

oped in Chapter 3 well addresses the &st two tactors. Furthemore, it is found that the best

performance of ou. clustering aigorithm for intrapuise analysis is usually a c h i d by using

Covariance Structure 4 (see Section 3.3.4) when no supervieion is a d a b l e (Le., the penaity

weight h 1, the defàdt value), and that the b a t performance is usudly achieveà by using

Covariaace Structure 2 (see Scaion 3.3.2) when supervision is aMilab1e (i.e., the penalty

weight can be adapted). To achieve on-line dustering, that is, to perform classification

dynamidy as puises arrive, we have further d d o p e d two new algorithms, one is based

on h m thrésholds while the otha is basai on a madel-based detection dente. Although

the on-Une aigorithm based on thresholds is computationally very effective, it is àifûcult

to adapt the threshohis as the statistics of receitred pulses changes in time. In contrast,

the on-line model-based aigorithm does not require explicit thresholds and dynamidy

incorporates duster s p l i t t i . marging and regrouping operations as the statistics of the

received pulses changes. Performance of our dustering dgorithms and SNOB on intrapuIse

data have ben reported. It demonstrates that our new c l u s t a g aigorithms (the on-line

model-baseà aigorithm in particular) are vary &&ctive for intrapulse d y s i s due to their

lm computation mts and high performance.

Our model-baseà clustering dgonthms have ben further implemented on a DSP board

for intrapukse d y s i s . Some relevant physicai parameters have been estimated such as the

likely maximai incoming pulse rate. Then a suitable system diagram has been proposed

and its system requhments have been suggestecl. The on-line model-baseà algorithm has

been imp1emented as a con classification module on a TMS320C44 DSP board.

7.1 Future Work

In this thesis, we have developed new model-based clustering algonthms both off-line and

on-liae, and successfully applied them to intrapulse anaiysis. There are several issues worthy

of tUrtha hvestigations in future mi.srch:

1. Applicabiity of other statistieal models to clustering. In Chapter 5, Gaussiaa mixture

models are applied for the clustering probiem in intrapulse d y s i s since the noise

accompsqying a radar puise is ususlly Gaussian. For different applications, other sta-

tiatical modela may be more suitable. For aample, a mixture of u~Sorm àistributions

was exploreà in [3-51 whae obeavations are edge dementa in an Statistid

models ushg von Mises distributions (261 were applied in (221 to h d cituters in data

of several thousand sets of protein dihedrai angles.

2. 0th- promising criteria for cluster d d a t i o e As intrduced in Section 1.2, the

niimber of dustar, in Gaussian clustering is the number of oomponents in a Gaussian

mixture. The determination of the number is formulatecl as a iikeiihood ratio test.

H m , the analpis hr the nuli hypothesis case is very a c u l t s i a a the regulariw

condition does not hold. Ln kt, the asymptotics that justify the p e d i z d likelihood

criteria (BIC, MDL and MML) are the same as th- underlying the likeiihood ratio

test. One differezlt approach is the use of Monte Catlo (bootstrap) tests. A study for

bootstrapping the case of Gaussian IILxfu~es was reporteci in [40]. H a v a , ernpiri-

cally observed rejection rates may not quite match the expected 1evekP under the null

hypothesis, and it would be of interest to investigate the discrepancies involved.

3. Radar pulse classification by using inter-pulse idormation. In the field of applications,

we have f d on using intrapulse information of a colleaion of pulses to identify

the emitters present. Howewr, radar emitter class'ication in practice is also based on

interpulse information. Our model-bad ciustering schemes can be direetly applïed if

the statistic of the interpulse information are provideci. Ruthemiore, it would be of

gmat internt to achieve the mAlrimum integration gain by combining interpuise and

intrapulse information. A simple approsdi is layenxi classifications by using interpulse

and infrapulse information separately. Another approach is to form an integrated data

vector for classification by ushg interpulse and intrapulse information together. If this

is the case, the weighting between the interpulse part and the intrapulsa part should

be examineci carefully.

Appendix A

The Value of S ( N , &)

S(N, ,ô) is the number of dine~ent wap to partition N data vectors into K groups, resulting r

in that each gmip k (k = 1,-, K) bas Nk &ta vectom. Obviously, Nl+N2+4+NR = N.

At the h t stage, we take NI data wctom out of the whole data set, the number of

dinerent combinations is CF; Then at the second stage, an take N2 data vectors out of

the rest of the whole data set, the number of dinerent combnations is c&, and so on.

At the (K - 1) stage, ne! take (Nk-,) data vectors out of the rest, the number of dinerent Nk- i combinations is CN-Ni-...-NK 2. The last gmup is aùeady determined in the end when the . -

h t K - 1 p u p s h m been selected. Hence, the total number of M e r a t combinations is

In taa, the same gmup in a dinerent order construct the same clustering structure. If m,

dusters contain the same number of data vectors, we can swap these duster orda without

cbariging the clustering structure. Let m, be the number of clusters with n data vectors,

n = 1,2, --==--, N, then the number of dinerent partitions is that

S(N, 4) = N! X

1 N1!N2! . . . N*!(K - l)! mi!m! mN! '

Appendix B

We need two theorems to derive it.

Theorem 1: S u p w that ail association vectot â partitions the N thta vect4t8 into K

groups such that we have

and assume that these &n,ups are the samples h m muitivariate Gaussian distributions

with different means but the same covariance matrices, respectivelyl i.e., the k-th group is

a sample h m N(pk, Z). The sample mean f i k and the sample within-group scatter mat&

W k for the k-th group are dehed in Eqs. (3.10) and (3.11) respectively, and the total

scatter mat* W for the whole data set is definecl in Eq. (3.19).

A likelihd ratio criterion ( (25, Page 1651) for testing the hypothesis Ho : pl = CÇ, =

alld

Then

where

E - central Wishart distribution WM(N - K, E).

0 E and B are statistidy independat.

Theorem 2: If A is WM(N, E, f2) where N b a poeitive integer and a(# O) is a M x 1 k e d

vector, then a T ~ a / o T ~ a b &6)* with 6 = a T ~ f h z / a T C a ( [a, Theorem 10.3.61).

Rom Theoran 1, are knon that, unda the assumption of perf'ect sepatation, our situ&

tion is the s p e d case of the above when K = 2. Thus in our case,

We can write

trB = [l O * O W O]B

Plm-ibn 2 &,,=Ni( 1 +N2( * 2 m - k ) 2 , * = l , - - * , M . (B-7)

In addition, &&), - , $ d ( b M ) are mutually independent since E is diagonal. Thdore,

trB .- 22M(6) (B-8)

and

By the same reasoning as the above, it is obvious that

Hence,

Since E and B are statistically independent accoràing to Theorem 1, we have by the

definition of a noncentral F distribution:

where

Appendix C

N ! - [p(l - C)-( l -c) ]N (cN)!((l-C) N ) ! -

The Stirling's formula for the Gamma tunetion ( [12, P a s 701) states that

Thus for large N, ne have

Then by simple mathematical manipulations, we have

Therefore,

Appendix D

Multivariate Normality

In this appendix, we infroduce a set of empiricai distribution function (EDF) statisties,

and describe h m to test multiYBtiate n o d t y based on the EDF statistics. Thm r uw

Monte Car10 simulations to assag Gaueaiaaity of compresged pre -procd pubes ' when

original pulses are generated h m a Gaussiaa distribution.

D.1 EDF Statistics

Suppase a giwn random sample of size N is si, q, , ZN and let ~ ( ~ 1 < ~ ( 1 ) < < z ( ~ )

be the order statistics. Let F ( z ) denote the distribution fuiiction of z, then the empirical

distribution function (EDF) FN (2) ie d h e d by

A statistic measuring the Merence betcneen FN(z) and F ( z ) is d e d an EDF statistic. TO

test a null hypothesis

HO : a random sample 21, q, --, 2~ cornes h m a àistribution F(z),

Tbble D.l: Six EDF statistics

S tatistic D+ D- V

Expression ms~,{3 - ~(n))

=cf&(*) - 9 1 D+ + Do

~+(,m + 0.12 + D-(JP + 0.12 +

V(sprtN + 0.155 + 3) (w2 - # + p) (1.0 + '-O) W (P - * + ,&(1.0 + *)

For ali N 2 5

Statistic Si@cance levei cu 0.25 0.10 0.05

Thieshoold in the upper tail 0.828 1.073 1.224 0.828 1.073 1.224 1.420 1.620 1.747 0.209 0.347 0.461 0.105 0.152 0.187 1.248 1.933 2.492

M d e d statistic

Table D.2: M d e d EDF statistics

an EDF test procedure was presented in [20, Section 4.4):

(a) Put the zn in ascending order, ql) < z(2) < --* < q~).

(b) Calculate z(,) = F(z(~)) , n = l , ** - ,N.

(c) Ch- and dculate an appropriate test sfatistic listed in Table D.1.

(d) Modify the test statistic as in 'Iàble D.2. If the modified statistic ezceeds the thnshold

in the upper taü at given lm1 a, Ho is rejected at siepiscance level a.

In Step (b), by using the probability integral transformation (Pm), z = F(z ) , the new

variable z is UOiformiy distributeci betaeen O and 1 when F ( x ) is the true distribution of

z. Hence, the six EDF statistice [55] in Table D.l are actually designed to test if the new

variable z is h m a d o m distribution between O and 1. ki general, D+ and Do are

paaaful in detecting whether or not the &set tends to be close to O or to 1, respectively;

110

W2 and are pmverfd in detecting a shift of the mean in either direction; V and fl are

poaerful in detecting a ehsnge in variance, either a grouping of z values at one point, or a

division into two groupe near O and I.

D.2 Multivariate Normality Test

The goodness-of-fit assessrnent is to test the nuil hypothesis

Hi : a mdtivariate 58mp1e yl, y2, , y~ cornes h m a multivariate n o d distribution.

Let the dimension of y be M, the sample mean be fi and the sample cOYBTi8I1ce matrix be

Ê, then under the n d hypothesis Hi the values

wiil have appmâmately a X* distribution with M degr- of freedom. As suggested in 120,

Section 9.71 amd [2], instead of directly assessin8 muitivadate normality, we test the followipg

hypothesis:

H : the d u e s zi, q, , ZN corne from a X ' distribution.

Hence, the EDF test procedure described in Section D.l can be directly applied to test H&

and correspondingly to assess 80.

D.3 Gaussianity Test of Compressed Pre-processed Pulses

In this section, we use the multivarbte n o d t y test described in the previous section to

examine the problem encountered in Section 5.3, i.a, to test whether or not Gawianity is

still maintaineci Bfter receiveà pulses are pre-processeci and compressed.

Here we simulate received pulses by using the covariance matrix C = &,j, O = 0.05.

Figs. D.l and D.2 show the r d and imaginary parts of 50 simulated pulses and those of

the compressed p r e - p r o c d pulsss, respectively. In the foilaning, we generate 100 sets of

received pulses when N = 20,50 and 100, nspeetiveiy. To make the test more convincibe,

lgble D.3: Gaussfie test of original (simulated) pulses at significance leve10.05

Original (simuiated) puises

b Xejection rate based on 100 trials

M = 44, N = 20 4 / 1 0 5f100 6/1W 10/100 7/10 4/100

D+ D- V W 2 V Az Rejection rate based on 100 triais

ali the six EDF statistics infroduced in Section D.l are used. Tables D.3 and D.4 list the

EDF test d t s at significance Level0.05 on original (simulated) pulseg and c o m p d pn-

p r o c d pulses, respectively- In both tables, a resdt like 21100 meana that Gamianity

is rejected 2 times out of 100 trials. From ?8blles D.3 and D.4, it can be O- at

significance levei 0.05 that

1. The rejection rates for the comprassed p r e -p rocd puises are slightly higher thsn

thaie of original (simulatecl) pulsea.

2. The njection rates for the compreased preproceased pulses are stül s d e r than

10/100.

The pre-proces and compression procedures presented in Section 5.2 include non-linear

operations, Gaussianity of the compnssed pmpmcessed pulses may not be strictly held,

as justified by Observation 1. On the other hand, the rejection rates for the compresseci

pre-processeci puises are still very small (< 10%) h m Observation 2; Hence, Gaussianity

of the compresaeà p r e - p r o c d puhs can be still assumed for simplicity.

Figure D.l: Red and haghary psrts of 50 simulated puises, where z-axis L the index of data sample points and yaxis is the amplitude.

Figure D.2: Real and imasinary parts of the compressed prtprocessed pulses,where z-axis is the index of data ssmple points and yaxis is the amplitude.

Bibliography

[1] M.R Anderberg, Clwter anaifsis for applic~tionu, ACôdemic Press, 1973.

[2] D.F. Andreftrs, R G d e s i k a n , and J.L. Warnr, Zhwfonnation of multivariate

data, Biome- 27 (1971), 825-840.

[3] J.D. Banfield and A.E. Raftery, Automotcd tnacking of ice j?oe~: A rtatisticol appwnuh,

IEEE 'lhas. on Geoscience And Remote Sensing 19 (1991), 905-911.

[4] , Ice j?oe identifieution in satellite imtzgcs wtng mathematid morpholopg and

clustering about principle cutvcs, Journal of the American Stat is t i d Association 87

(1992), 7-16.

[SI , Skeietd modeling of ice leads, IEEE W. on Geoscience And Remote Sensing

SO (1992), 918-923.

[6] , Mode-hed gawsion and non-gawsian ciwternig, Biometria 49 (1993), 803-

821.

[7] RA. Baxta, Minimum masage length infernia: Theory and applications, PbD. the

sis, Dept. of Cornputer Science, Monash University, Clayton 3168, Austrdia, December

1996.

[8] RA. Bacter and J.J. Oliver, MDL and MML: similurities and di#ennces, Technical

Report 207, Dept. of Computa Science, Monash University, Vic 3168, Australia, 1994.

(91 A. Berdai and B. Gard, Detecting a univutiate nwmol miz tu~ &th two componenta,

Statietics & Decisi0118 14 ( l W ) , 35-51.

[IO] D.M. Boultcn and C.S. W w A pnymrnfw numdcd dussiifcotion, The Computer

Journal 13 (1970), no. 1, 63-69. a@

[ll] , An infosmotion muasure for hiemmhic ciussificution, The Computer Journd

16 (1973), no. 3, 254-261.

[12] N.G.De- Bniÿn, A~ymptotic methoh in analph, North-Holland Publ ishg Company,

1970.

[13] P. Bryant, Large-sample d b /or optimkotion b e d clwtefing methodr, Joarnd of

ciassikation 8 (1991), 3144.

[14] P. Bryant and J.A. Williamson, Aymptotic Maviour of c l a a ~ t i o n maximum likdi-

hood estimates, Biometirka 65 (1978), 273-281.

[15] G. Celeux and G. Govaert, Compu*on of the m a r e and the ciassification motimum

likclihood an cluster andyais, Joanial of Statistid Computation and Simulation 47

(1993), 127-146.

[16] P. Cbeeseman, J. Kelly, M. Sel& W. 'Iâylor, D. Freeman, and J. Stutz, AutoCIuss: A

Bayuian C'lwsifiaation, Proceedings of the Fifth Intemational Confefence on Machine

Leamhg (h Arbor, MX), 1988, pp. 54-64.

[17] P. Cheeaeman, M. Self, J. Kdy, W. 'hylor, D. fieema~, and J. Stutz, Bayesion

clum$ieoton, SeMdh National Coderence on Artiîlcial Intelligence (Saint Paul, MN),

1988, pp. 607411.

[18] J.H. Conway and N.J.A. Sloane, S p h m pockings, luttices and pups, Springer-

Verlag,New York, 1988.

[19] T.M. Cover and J.A. Thomas, E l m m b of informatMn theory, John Wiley & Sons,

1991.

[20] RB. Dagostino and M.A. Stephens, Goodnesa-of-fit tacliniqu~b, M d hkker, hc.,

1986.

[21] 1. Daubechies, Tm lcctutw on wuvdets, SIAM, 1992.

[22] D.L. Dowe, L. Aïlison, T.I. Dix, L. Hunta, C.S. W W e , and T. EdgmT C i d r

clwtering for protein dihedml anglcs by minimum message letasth, In P r o c d g s of

the 1st P d Symposium on Biocomputing (Ha- U.S. A.), 1996, pp. 241-2255.

[24 D. M. Hawkins (Editor), Topics in applied multivariate adysis, ch. 6. Cluster anaipis,

pp. 301351, Cambridge University Press, 1982, pp. 301-351.

[24] L. Engelmrui aad J.A. Hartigan, P m t a g e pointu of a teut for duster#, J. Amer.

Sts t i~ t - ASSOC- 64 (1969), 1647-1648.

[25] K.T. hng and Y.T. Zhang, Genedized multivariate andgais, Jobn Wiley & Sons,

1990.

[26] N.I. Fisher, StatMcui andysis of e i d a r data, Cambridge Universiw Press, 1993.

(271 C. Fîaley, Algorithm for model-based gawsion hieromhicPl dwtering, S U Journal

on Scient& Computing 20 (1998), 270-281.

[28] C. Mey aad A.E. Raftary, H m man9 c1wtersP which clwtering method? - awwers

mu modei-bai dwter anulgd, Cornputer Journal 41 (1998), 57û-588.

[29] , MCLUST: Sofhuum for modcdheù clwter analgsis, Technid Report 342,

Department of Sbtistica, University of Washington, November 1998, to appear in

Journal of Classification.

[30] S. Ganesalingam, Classifiation and màztutt appwnach to clustering via m&muna like-

lil,bbd, Applied Statistics 38 (1989), 455-466.

[31] J.A. Hartigan, CZwten'ng dgdhmd, John U r i & Sons, 1975.

116

[32] , Aqmptotic distn'butim for clwt&ng dten'a, The Armas of Statistics 6

(1978), 117-131.

[33] J.A. Hartigan aiid M.A. Wong, A A-means dustering dgorifhm, Applied Statistics 28

(1979), 100-108.

[34] N.L. J o h n and S. Kotz, Distributions in stotbtia: continuow uniuariute diahibu-

t iow -8, John Wiiey & Sons, 1970.

[35] RE. Kass and A.E. RsRery, Baga Factor#, Journal of the American Statistical As-

ciation 00 (1995), no. 430, 773-795.

[36] S.M. Lewis and A.E. Raftery, Eatimoting &gm factors via posthor sirnuiution with the

lapa-metmpoir utimutor, Tkchnical Report 279, Department of Statisties, University

of Wwhington, 1994.

J. Liu, S.W. Gao, Z.Q. Luo, T.N. Davidson, and J.P.Y. Lee, The minimum descfiptim

length Mterion applieti to emitter number detation and pube classication, Proceeding

of IEEE workshop on sbtistical Si& and Array Processing (Portland, Oregon, USA),

1998.

S. M*t, A thcoq for multi-molution mal decomposition: the tuauelet npnsenta-

t i o ~ =E 'Praris. on PAMI 11 (1989), 674-693.

G.I. McLachh and K. Basford, Miztue models: infemnce and opplicotions to dw-

tering, Marcel Deldcer. New York, 1988.

G.J. Mdaehlan, On kts tapping the likdihood mtio test rtatistic for the number of

componenb in a nonnal mizhrm, Appiied S tatistics 36 (l987), 318-324.

N.R. Mendell, S.J. Finch, and H.C. Thode, Whem O the likelihood mtio test powerful

for d e t d n g two cmnponent nonnul miztun? (the cod tan t ' s f m m ) , Biometrics 49

(IgM), 907-915.

[42] N.R Manda, H.C. Thode, and S.J. Finch, The likeWid mtio test fw the two-

cornponent nomid mizhcm pmblem: p o ~ t and siampe size analysis, Biometrics 47

(1991), 1143-1148.

[a] RJ. Muirhead, A m of multivanate statistid î h ~ ~ r y , John Wiiey & Sons, 1982.

[44 J.J. Oliver and RA. Baxter, M'ML and boyesàanimx hilantio and diflennces, Tech-

n i d Report 206, Dept. of Computer Science, Momh University, Vic 3 168, Australia,

1994.

[45] J.J. Oliver and D.J. Hand, Intmducfion to minimum mcoding infmnce, Technical

Report 205, Dept. of Computer Science, Monash University, Vic 3168, Austraüa, 1994.

[46] J.D. Patrick, A progmm for discnminating beftueen clases, W c a l Report 151,

Dept. of Cornputer Science, Monash Unimity, Vic 3168, Austral& 1991.

[47] B. Pfahniger, Pmct id w u of the minimum dermition lmgth princàple in inductive

lcmning, Ph.D. thesis, The Austrian Rawmh Inetitute for Artificial Intelligence, 1995.

[48] S. Richardson and P.J. Gteen, On Bayesian and@ of miztum with an unknown

numbet of ~omponmtr (with discussion), J. Royal Statistical Soc.,Ser. B 59 (1997),

no. 4, 731-792.

[49] B.D. Ripley. Pattern r~cognition and neuml networks, Cambridge University Press,

1996.

[50] J. Rissanen, Modelhg by ahortest data description, Automatka 14 (lW8), 465-471.

[51] , A universai prior for the tntegers and estimation by MDL, A m . of Statistics

11 (1983), no. 3, 416-431.

(521 , Stochastic complezity, J. R Statist. Soc. B 49 (1987), no. 3, 223-239.

[53] S.M. Roes, Iniroduction to pm&büi@ modcb, fourth ed., Academic Press, Inc., 1989.

[ a ] CL Schwarz, Eatimating tlie dimension of a modei, AMals of Statistics 6 (1978), 461-

464.

[56] D.M. Titterington? S m ment ricgeamh in the andysis of miztum dirtributim, Statib

tics 21 (1990), 619441.

[57] J-C. Too~nay, Rador pnnciplcs for the n ~ n - ~ d i s t , Wadsworth Inc, 1982.

[58] H.L. Van Thes, D e t d o n , estimation and maldation th-, port i, Nen York: W i ,

1968.

[59] M-A. Upd, Montei Cado compuri8on of non-hienmhicol unuupenMed clasaifiers, h h +

ter's thesis, University of Saskatchewan, Saskatchewan, Canada, 1995.

[GO] M.A. Upal a d E.M. Neufeld, Compu~on of U ~ n r p m M c d CI~~sifier~, Ro- of

the BIS Idormation, Statistics and Induction in Science (Singapore), World ScientSc,

20-23 August 1996, pp. 342-353.

[si] CS. WaUaoe, CI(~dsi'cution bg Minimum Mesrage Length infmnce, Advances in Com-

puthg and Infolmation- ICCI 1990 (Niagara EUS) (S.G. Akl et al., eds.), 1990, pp. 72-

81.

[62] C.S. Wallace and D.M. Boulton, An infonnafion meusure for clmsifkation, Computa

Journal11 (1968), no. 2, 195-209.

[63] C.S. Wallace and D.L. Dore, I n t n ~ i c ciussificution by MML - the Snob progmm, P m

ceedings of the 7th Australian Joint Conference on ArtSicial Intelligence (Singapore),

World ScientSc, 1994, pp. 37-44.

[64] CeS. Wallace and P.R Fkeeman, Estimation and infemnce by compact eoding, J. R

Statist. Soc B 49 (1987)' no. 3,240-265.

(651 M. Wax and T. Kailath, Detection of ign& bg infmataon thcontic c r i t h q IEEE

'Rans. on ASSP 39 (1985), no. 2, 387-392.

[66] J.H. Wolf& Pattern clwtenng by mdtivariate mizture a d p i s , Mult idte Be

haviod Raseareh 5 (1970), 329-350.

[67] K.M. Wong and Z.Q. Luo, Emitter nunakr detedion und pube classificution i n mdar

systems, Tech. Report 356, Communications Research Labotary, McMaster University,

Hamilton, Ontario, Canada, 1997.

[68] K.M- Wong, Z.Q. Luo, J. Liu, J.P.Y. Lee, and S. W. Gao, Rodm emitter clwsifiua-

tien wing uitmpJsc &tu, Internationai Journal of Electmnics and Communications,

h m a n y (1999).

[69] J.K. Wu, Neufml network and shulated methods, New York: M. Dekke.r, 1994.

(701 J. Zhang and J.W. Modestino, A modei-fitting uppmch to ciwter valihtion Mth appli-

eution to stocht.utk model-Oosed image segmentation, IEEE ' ï h s . on PAMI 12 (1990),

no. 10, 1009-1017.

[71] Q.T. Zhang, K.M. Wong, P.C. Yip, and J.P. Reilly, S t a t k t i d a n d @ s of the perfor-

mance of i n f m a t i o n tnmtic criteria in the detection of the nunaber of rignob in

army p r p a s h g , IEEE 'ltans. on ASSP 37 (1989), no. 10, 1557-1567.

[72] L.C. Zhao, P.R Ksishdah, aad Z.D. Bai, On detcetMn of the number of signu& in

pwence of white noiire, Journal of mdtivariate analpis 20 (1986), 1-25.

[73] DECCA Ship's Manud: Gmups 9A and 1M Relative Motion Matine Radars, RACAL-

DECCA Marine Radar Limited, 1975.

[74] TMS3&lC3z/C@ code genemtion took getting stcrted guide, Texas Instruments hc.,

1997.

[75] TMS92OC9z/C'z optimizing c cornpilet user's guide, Texas Instruments Inc, 1997.

120

[76] TMS$WC$z/C& auemblv ianguuge todr user's guide, Texas Instruments Inc., 1997.

[77] TMS32OC!z user's guide, Texas Instruments hc., 1997.

[78] Code composa user's guide, GO DSP Corporation, 1997.

[79] Dakar F5 euder board technid rrfemnce, Spectrum Signai Processing Inc., 1997.

[80] Dokar F5 c m i e r &uni wer's guide, Speztrum Signal Processing Inc., 1997.

was - library and archives canada · acknowledgement i have received both constant encouragement...

Documents