was - library and archives canada · acknowledgement i have received both constant encouragement...
TRANSCRIPT
![Page 1: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/1.jpg)
NOTE TO USERS
Page(s) not included in the original manuscript are unavailable from the author or university. The
manuscript was microfilmed as received.
This reproduction is the best copy available.
![Page 2: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/2.jpg)
![Page 3: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/3.jpg)
MODEL-BASED CLUSTERING
ALGORITHMS, PERPORMANCE AND APPLICATION
JUN LIU
MJng., Shanghai Jiao Tong University
B.%, Huazhong University of Science & Techwlogy
A Thesis - - - - - - - - - - - - - - - - -
Submitted to the School of Graduate Studies
in Partial F'uiûlment of the bquirernents
for the D e p
Ph. D.
McMaster University
![Page 4: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/4.jpg)
uisilions and A c q u m e t r@icSeniiess seivieeobibbgmphiques
The author has granted a non- exclusive licence dowing the National Library of Canada to feproduct, 1 0 4 distriaute or s e l copies of this thesis in rnicrofm paper or electronic fonmts.
The author retains ownership of the copyright in this thesis. Neither the
L'auteur a accordé une licence non exc1usive permettant A la Bibliothéque nationale du Canada de reproduire, prêter, distr'buer ou vendre des copies de cette thèse sous la f m e de microfiche/film, de reproduction sur papier ou sur format électronique.
L'auteur consente la propriété du droit d'auteur qui protège cette thèse.
thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimes reproduced without the author's ou auûement reproduits sans son permission. autorisation.
![Page 5: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/5.jpg)
MODELBASED CLUSTERING
ALGORITHMS, PERFORMANCE AND APPLICATION
![Page 6: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/6.jpg)
PH. D. (2000)
(Electrical aud Computer Engineering)
TITLE:
AUTHOR:
MCMASTER UNiVERSITY
H d t o n , Ontaio
Model-Based Clustering
Algorithms, Pedormance And Application
Jun Liu
M.Eng., Shanghai Jiao Tong Univemity
B.Sc., Huashong Univaeity of Science & Technology
SUPERVISOR(S) : Dr. K.M. Wong and Dr. ZPQ. Luo
P r o f ~ r s ,
Department of Elect r i d and Computer Engineering
NUMBER OF PAGES: xv, 120
![Page 7: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/7.jpg)
ABSTRACT
The main contributions of this thesis are the development of new clustering algorithms (with
duster validation) both oE4he and on-line, the performance analpis of the new aigorithma
and their applications to intrapulse analysis.
Bayesian inference and minimum eneoding inference incluàing Wallace's minimum ws-
sage length (Mm) and Rissane& minimum description length (MDL), are reviewed for
model selection. It i fouid that the MML eoding length is more accurate than the otber
tno in the view of quantization. By introducing a penalty weight, ail aiteria considaad
here are cast into the framewotk of a pen- likelihd method.
Based on minimum encoàing inference? an appropriate measure of coding length is
proposed for cluster validation, and the coding lengths under four dinerent Gaussian mixture
models are fully d e r i d . This provides us with a criterion for the development of a new
clustering algorithm. Judging h m the performance cornparison with other dgorithms, the
new clustering algorithm is more suitable to process high dimensional data with satisfactory
performance on a m d i and medium samples. This clustering algorithm is off-line because it
requires al1 the data available at the same tirne.
The theoreticai error performance of our clustering algorithm is evaluated under rea-
sonable assumptions. It is shown here how the dimension of data space, the sample size,
the mbchg portion and the inter-cluster distance a f k t the performance of our clustering
algorithm to detect the true number of clusters. Furt hermore, we examine the impact of the
penalty weight under the fhmework of the penaüzed likelihood method. It is found that
there is a range of the penaîty d g h t within which the best performance of our clustering
iii
![Page 8: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/8.jpg)
algorithm can be achieved. Therefore, with some supervision we couid adjust the penalty
weight to further i m p m the performance of our clustering algorithm.
The application of o u clustering algorithm to intrapulse analysis is investigated in d e
tail. We b t develop the pre-proceslling techniques including data compression for received
pulses and formulateci the problem of emitter number detection and pulseexnitter asmi-
ation into a multivariate clustering problem. ARer applying the above (otT-line) clustering
algorithm here, we further develop two on-he clustering algorithms, one is based on some
known thresholds while the other is b a d on a model-based detection scheme. Performance
on intrapuh data by using our pre-processing techniques and clustering algorithms is r e
ported, and the results demonstrate that our new clustering aigorithms are very dectiw
for intrapulse d y s i s , especially the model-bad on-line algorithm.
Finally, the DSP implementation for intrapuise adpis is c o n s i d d Some relevant
phpical parameters are estimateci such as the likely maximal inabming p h rate. Then
a suitable system diagram is p r o p d and its system requirements are investigated. Our
on-line clustering aigorithm is implemented as a core classification module on a TMS320C4-4
DSP board.
![Page 9: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/9.jpg)
Acknowledgement
I have received both constant encouragement and expert supervision fiom my supervisors
Dr. K.M. Wong and Dr. Z.Q. Luo. h m both 1 have leamed a great deal, of whida only a
part is in this thesis.
I wïsh to express aiacere gratitude to supeMsory committee members Dr. J.P. M y , Dr.
S. Qiao and Dr. T. Todd for their encouragement and ~upervision. 1 am espeadly pleased
to thank my external advisor Dr. J.P.Y. Lee, h m the Defense Research Establishment
Ottawa, for hb encouragement and expert guidance.
1 aas fortunate to have been part of the Advanced Signal Processing for Communica-
tions (ASPC) group led by Dr. K.M. Wong and Dr. Z.Q. Luo. 1 wouid like to thanlr fdow
ASPCers: Dr. T.N. Davidson, Dr. S.Q. Wu, Dr. S.W. Gao, Dr. J. Wu, Mr. L. Li, Mr. J.
Zhang for th& valuable help. 1 wodd ako îike to adtnowledge the financial support p n
vided by the Department of Electrical and Cornputer Engineering and the Defense Research
Establishment Ottawa
Finally, 1 am indebted to my parents Qiji Liu and Ande Chen, especiaiiy my d e Dieqian
Han, for their underatandhg aad great support.
![Page 10: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/10.jpg)
Acronyms
BIC
MML
MDL
Lm
p.d.f.
ML
LPF
HPF
PR1
PRF
EDF
Bayesian Interence Criterion
Minimum Mcsss(le Length
Minimum Description Length
Likelihood Ftatio Test
pmbabiity density function
Maximum Likelihaad
Law Pass Filter
High Pasa Filter
Pulse Repetition Interval
Pulse Repetition Frequency
rotation per minute
floating point operation
Static Random Acceaa Memory
Empirical Distribution Fûnction
![Page 11: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/11.jpg)
Notations
Var [RI
Naturai logarit hm
Siimmation over n
Product over n
7 h u l s ~ of W
Inverse of w Tkace of w Determinant of W
DiagonaJ matrix of W
L2 nom of p
Expectation value of R
Vasiance of R
Muhivariate normal distribution
with mean vector p and covariance mat& C
Noncentrd F &tri bution
with ul , v2 degrees of fieedom and noncentral parameter 6
Noncentral Wishart distribut ion
with the number of variatm M, fieedom degree n,
covariance matrix C and noncentral matrix n
![Page 12: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/12.jpg)
Contents
ABSTRACT iii
Notations vii
1 Introduction 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Intrapulse Analysis 1
. . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Model-Ba23ed Cluster Analysia 2
. . . . . . . . . . . . . . . . . . . . . . . 1.3 Major Contributions of The Thesis 5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Outline of The Thesis 8
2 Mode1 Selection Criteria 10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction 10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Bayesian Inference 10
. . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 M i u m Enooduig Merence 12
. . . . . . . . . . . . . . . . . . . . . 2.3.1 Riseanen's Deecription Length 13
. . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Wallace's Message Length 15
. . . . . . . . . . . . . . . . . . . 2.4 hamework: P e r d i d Likelibood Method 15
3 Model-Based Clustering
![Page 13: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/13.jpg)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 UltrOduction 17
. . . . . . . . . . . . . . . . . . . . . . . . 3.2 Genemi Coding Length Measure 18
3.3 Coding Lengths Under Gaussian Mixture Modeis . . . . . . . . . . . . . . . 19
. . . . . . . . . . . . . . . . 3.3.1 Covariance Structure 1: Cr = $I. Vk 22
3.3.2 COYBSi811ce Stnicture 2: Cr = 61. Vk . . . . . . . . . . . . . . . . 24
. . . . . . . . . . . . . . . . . 3.3.3 Covariance Structure 3: Ek = D. Vk 26
. . . . . . . . . . . . . . . . . 3.3.4 Covariance Structure 4: Ck = DA. Vk 28
. . . . . . . . . . . . . . . . . . . . . . 3.3.5 Summary of Coding Lengths 30
. . . . . . . . . . . . . . . . . . . . . . 3.4 A Mode1-Basecl Clustering Algorithm 31
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Procedure 31
. . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Computatiod Complexity 32
3.5 Cornparison with SNOB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Experimental Resuits 36
3.7 S m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Detection Performance Analpis 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Probability of A Mias 46
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 ProbabilityofAhbeAlam 54
. . . . . . . . . . . . . . . . . . . . . . . 4.4 Optima Range of Penalty Weight 58
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Application ta InfrapurSe Anaipis 61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction 61
. . . . . . . . . . . . . 5.2 Signal Mode1 And PmPmessing of Received PuLses 62
. . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Amplitude Normalization 64
. . . . . . . . . . . . . . . . 5.2.2 Tirne Aiigament Basd on Threeholding 64
. . . . . . . . . . . . 5.2.3 Phase Adjastment B d on Poiynomial Fitting 65
. . . . . . . . . . . 5.2.4 Data Compression Using Wavelet Decomposition 66
![Page 14: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/14.jpg)
. . . . . . . . . . . . . . . . . 5.3 Clustering Algorithrmr for Intrapuise Anaiysis 68
. . . . . . . . . . . . . . 5.4 An On-line Clustering Algorithm Using Thtesholds 69
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Procedure 69
. . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Computational Complarity 71
. . . . . . . . . . . . . . . . . 5.5 A Omhe Model-Bad Clustering Algorithm 74
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Procedure 76
. . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Computational Complexity 78
. . . . . . . . . . . . . . . . . . 5.6 Numetid Experiments on Intra-pulse Data 82
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 PulseGeneration 82
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Examplel 83
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Conclusions 88
5.7 Summsry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
0 DSP Implementation 90
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Tntduction 90
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Physical Scenario Anaiysis 90
. . . . . . . . . . . . . . 6.2.1 Probability of Receiving Overkpped Pulses 91
. . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Receiving Pulse Sequence 92
. . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Nw-Far Phenornenon 92
. . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Maximal Incoming Puise Rate 93
. . . . . . . . . . . . . . . . . . . . . . 6.4 System Diagram And Requirements 94
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Pre+processing 94
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Initial Grouping 95
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 On-lhe Clustering 96
. . . . . . . . . . . . . . . . . . . . . . 6.5 C/DSP Coding of On-line Clustering 96
7 Conclusions 99
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Future Work 101
A The Value of S(N. 6)
![Page 15: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/15.jpg)
D Mult ida te Notmality 108
D.l EDF Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1ûû
D.2 Multivariate Nornaiity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
D.3 Gaussianity Test of Compresseci Preprocessed f &es . . . . . . . . . . . . 110
![Page 16: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/16.jpg)
List of Figures
. . . . . . . . . . . . . 3.1 The di- of our model-based dustering algorithm
3.2 Simulated data for M=22. where 2-8x51 is the index of data sample points
. . . . . . . . . . . . . . . . . . and y& is the amplitude of simulated data
4.1 Two Gaussian clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The illustration of Pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Mies probab'ity nuveci for two true clusters: M-l c=0.5 . . . . . . . . . . 4.4 Miss probabiity cunm for two true ciusters: M=l c=0.2 . . . . . . . . . . 4.5 Miss probability curves for two true dusters: M=2 c=0.5 . . . . . . . . . . 4.6 Mies probability eurves for two tme claeters: M=2 cz0.2 . . . . . . . . . .
. . . . . . . . . . 4.7 Miss probability curves for two tme clusters: M=22 c=0.5
. 4.8 Miss probability curves for two tme cluters: M=22 ~ 0 . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 One Gaussian cluster
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 The illustration of Pt
. . . . . . . . . . . 4.11 False probabiity curves for one tme cluster: M=l
. . . . . . . . . . . 4.12 False alarm probabiity curves for one true cluster: M=2
. . . . . . . . . . 4.13 Faise alarm probability curves for one true cluater: M=22
. . . . . . . . . . . . . . . . . . 5.1 Radar puises received for intrapuise adpis
5.2 Polynorniai fitting for phase aàjustment . . . . . . . . . . . . . . . . . . . . 5.3 Data compression using wavelet decomposition . . . . . . . . . . . . . . . . 5.4 The diagtam of OUI on-line clustering algorithm using thresholds . . . . . .
![Page 17: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/17.jpg)
5.5 The àiagram of o u on-line model-based clustering algorithm . . . . . . . . 79
5.6 Amplitude and phase of 100 receiveà pulses fkom 5 unknown emit ters. where
. . . . . . . . . . . . . . . . . . . . 2-cuis is the index of data sample points 84
5.7 Amplitude and phase of the preproced puises. where 2-axis is the index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . of data sample points 84
5.8 Amplitude and phase of the compreseed preprocessed pulses. where z-axis
. . . . . . . . . . . . . . . . . . . . . . . . is the index of data sample points 84
5.9 Determination of the number of emitters using our (off-line) clustering alg*
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . nthm 85
. . . . . 5.10 Detamination of the nwber of emitters using the SNOB program 86
. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Phpical d o ezample 1 91
. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Phpical sanMo aample 2 91
. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Physieal S a s d o example 3 92
6.4 The DSP system diclgam for intrapulae d y s i s . . . . . . . . . . . . . . . 94
6.5 The tree structure for initial puping . . . . . . . . . . . . . . . . . . . . . 96
D.1 Red and imaginary parb of 50 simulateci pulses. where 2-axis is the index
. . . . . . . . . . . . . . . of data sampIe points aiid paxis is the amplitude 112
D.2 R d anà imaginsry parts of the compressed pte-pmessed pulses. w h m z-
. . . . . . axis is the index of data ample points and ysxis is the amplitude 112
![Page 18: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/18.jpg)
List of Tables
3.1 Cluster validation results for two true clusters: M=l. c=0.5 . . . . . . . . . 40 3.2 Cluster validation resulte for two true clusters: M=l. c=0.2 . . . . . . . . . 40 3.3 Cluster validation reaults for one true ciuster: M=l . . . . . . . . . . . . . . 40 3.4 Clusta vslidation results for two tnie clustm: M=2.c=0.5 . . . . . . . . . 41
3.5 Cluster vaiidation results for two tnis dustem: M=2. c=0.2 . . . . . . . . . . 41
3.6 Cluster validation resulfs for one tnte clusfer: M=2 . . . . . . . . . . . . . . 41
3.7 Cluster validation mults for two true clusters: M=22. c=0.5 . . . . . . . . 42
3.8 Cluster validation muib for two true clusters: M=22. c=0.2 . . . . . . . . 42
3.9 Cluster validation results for one tme cluster: M=22 . . . . . . . . . . . . . 42
3.10 Cornparison of performance of SNOB and our algorithm. M=l . . . . . . . 43
3.11 Cornpariaon of performance of SNOB and our algorithm. M=2 . . . . . . . 43
3.12 Cornparison of performance of SNOB and our algorithm. Mt22 . . . . . . . 43
4.1 The critical distance O .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Limitingvaluesof(F',-pR, ) . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Optimal ranges of the pedty weight . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Cluster validation results for two true clusters: M=22 c=0.5 A = 1.1 . . . . 59
4.5 Cluster validation results for one true cluster: M=22 X = 1.1 . . . . . . . . 59
5.1 Clustering results for Example 1 by the off-line model-based clustering dg*
nthm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Clustering resdts for Example 1 by the SNOB aigorithm . . . . . . . . . . 86
xiv
![Page 19: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/19.jpg)
Clustering results for Example 1 by the on-line clustering algorithm using
known tbresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustering reeults for Example 1 by the on-line model-based dus tering dg*
rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . DECCA Groupa SA and 12A relative motion marine radars
. . . . . . . . . . . . The benchmark of DSP codes of the on-line clustering
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Six EDF statistics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifieci EDF statistics
Gawianity teat of originai (simulateci) pulses at si@cance levei 0.05 . . . Gaussianitv test of compressed ~re-processed pulses at siPnificance level0.05
![Page 20: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/20.jpg)
Chapter 1
Introduction
Radar emitter classification beiisd on a collection of received radar si& is a subject of
wide intaest in both civii and military applicationa. The si& received ueudly consht of
sequences of pulses emitted fmm multiple radar transmitters. If M e m t radars transmit
putses with diaerent d e r hquenciea or pulse repetition intervals (PRIS), then it is not
difficult to distinguish them km one another. However, in modem radar systems, more
sophisticated signal waveforms have been useci and inter-pulse information alone may not
be enough to separate thaie receival p h according to their originations. To clasai@
radar emitters in such an environment, we need to explore the detailed etmcture inside
each putse, i.e. the so caiied intrrrptùae infoinietion. This is because each emitter has its
own electricai signal structure inside each of its trammittecl pulm due to both intentionai
and unintentional modulations. This structure motivates us to explore the passibiity of
using intrapube information of a coUection of pulses to determine the number of emitters
present and to classify t h m pulses according to their originations. In other woràs, the
objectives of the tesearch are to: (1) determine the number of emitters present; (2) classify
the incoming p h aecording to the exnitteni. The phpicai scenario in detail is iiiustrated
in Section 5.2.
![Page 21: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/21.jpg)
T h e am t h re important isauea in the design of a processing algorithm for intrapulse
analysis:
rn The algorithm is suitable to proce88 high dimensional data because in m a t cases more
than 40 data points are té~uired to describe a pulse.
a The performance of the algorithm is satisfactory for small or medium sample cases
since it is desirable to identify the emitters present by using a few received pulses.
The algorithm is computationally d a t i v e and on-line clustering is requiseci for near
real-time implementation.
In practice, a radar puiae intercepted by a passive receiver may be contaminateci by an
absolute amplitude and phase, time delay and residual W e r fkquency. We fircrt develop
preprocessing techniqua, including data compression for received pubes and then formulate
our objectives into a multiraMte clustering problem. In the cluster aridysis literature
[l, 23,31,39,49,69], the k t objective is known as duster validation while the second b
d e d clwtering. G e n d y speaking, the current clustering methods range from thœe that
are largely heuristic to more formal procedures based on statistical models. One major
advantage of mdel-based methoda is that they provide a precise theoretical fiamework for
assessing the clustering stmcture of a given data set, especially for detenniaing a relevant
number of cluatere. In next section, model-based cluster d y s i s is discussed in detaü.
1.2 Model-Based Cluster Analysis
In model-based cluster analysis 128,391, it is asaumed that the data under consideration are
generated from a finite mixture of probability distributions (e.g. normal distributions) and
each component of the mixture represents a dinerent cluster. Given N observations from K
clusters, there are two ways to formulate the mixture mode]: one is the swalied classifica-
tion approach which aaeigns an observation to one of the K clusters deterministically, and
the other is the wwdlet i mixture approach which assigns an observation to the K clusters
![Page 22: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/22.jpg)
probabiitiedly. An empiricai comparison [15] in a finite sample setting between these two
approaches suggested that the classifiication approach is preferred for small sample cases,
although h m the studies in [13,14,30], asymptotidy, the mixture approach tends to
perfom better than the classikation approsch when classimg ill-separated components,
with a sufticiently large sample.
Given a data set and underlying models, the finit question to be considered is how to
select a model which best fita the data? This is a critical question common to the fields of
Statistics, Machine Leamhg and Artifiwal Intelligence. Two principles can be applied to
search for the snawer, one is Bayesian Inference [l7,28,35, 48,541 and the other is Minimum
Encoding Inference [M, 52,62,64]. In the Bayesian h e w o r k , a model is ch- as the
best if it has higbest pceterior probability. In the minimum encoding inference fiamework,
the best model is the one that yields the minimri eoding length of the data. For the
latter principle to interpret the encoding p-, there are tao Merent approaches, one
by Wallace [Ml, termecl the Minimum Message Length (MML) criterion; and the other by
Rissanen [52], termed the Minimum Description Length (MDL) critenon. A comparative
study of Bayesian inference, MML and MDL waa reportecl in [8,44,45]. There will be a
further discussion of this important issue in the next chapter.
AAer establishing a criterion, either an ad hoc or a model-based, is chosen for cluster
validation. There is the question of how to ciassify the observations under an assumed
known numba of clusters. A simple and common method is k-means [33] which minhima
the within-group sums of squares. It startrr with an initial estimate, and then regroup the
given data set a few times until the cluster centers are convergent. The k-means method can
be used for the classification a p p r d . Another common method is the EM algorithm [66]
which maximizes the underlying likelihood. Starting with an initial estimate, the EM ad
gorithm consists of two steps: the expectation step which estimates the model parameters
including the probabilities of an observation belonging to each cluster, and then the max-
imization step whidi evaluates the resulting likelihood. The EM algorithm is suitable for
both the classification approach and the mixture approach. However, the EM algorithm is
![Page 23: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/23.jpg)
more computationally Inteasive than the k-means algonthm.
In madel-baseà cluster analysis, duster validation and dustering are c o m b i i by fmst
formulating a statistical model for the problem which is pararneterized by k, the number of
clusters, then seiecting the hypothesis that best fits the data Among statistical models for
clusta analysis, Gawian mixtures are wïdely d. Three Gaussian clustering algorithma
are iisted below:
1. MCLUST: Developed by Fraley and Raf'tery [6,27-291. MCLUST incorporates eight
dinezent Gaussian mixture models in texms of the covariance matrix C, allows a
choiw of &ha the ciassification approach or the mixture approeeh, and applies an
asymptotic criterion of Bayesian i n f i c e for 888e98ing the number of dustem. This
aigorithm works weil for medium andior large sample cases but might not provide
satisktory results for small ample cases.
2. Autociass: Developed t y Cheeseman, Self and Kelly [16,17]. Autoclass only assumes
the general cavariance mat& structure, follows the mixture approach, and appües
Bayesian inference for duster didation. Since only the general covariance m a t e
structure is assumed, it weights the model complexity too much for high dimensional
cases so that it is not suitable to proeess hi& dimemional data.
3. SNOB: Developed by Wallace and its ccmorkers [IO, 11,61,63]. SNOB assumes the
covariance matrix is cllagonal, f o l i m the mixture approach, and applies MML infer-
ence for cltater validation. It works for both high dimensionai cases and smail ample
cases.
Another important issue is the error performance analpis. FoUOWiDg the terminology in
statistical signal proceshg [Sa], if one cluster is present for a given data set but a clustering
algorithm detects two clusters, then a f&e dann occurs; On the other hand, if two clusters
are present but the clustering algorithm only says one, then a mirs occm. The error
aaalysis of model-based clustering is related to the question of the number of components
in a mixture, which is a problem that hm not been completely solved in statistiai. A general
![Page 24: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/24.jpg)
method to this question ia the Likelihood Ratio Test (Lm). Suppose that a random sample
Y is avaiiable and we wish to test the following two hypotheses:
Ho : Y is generated fiom a mixture of K normais
Hl : Y k genetated Crom a mixture of Kg normab (Kg > K)
However, the reguiarity condition for the usual asymptotic theory fail when the null hypoth-
esis Ho is true, see details in [39, Page 211 and (561. It is indeed a difncuit problem, even
for detecting a univariate nonnal mixture with two components. Fot the aforementioned
"simplen case, empiricai tabulation and an asymptotic d y s i s were presented in [24,32]
by following the cWication a p p d ; Empùical tabulation was presented in [41,42] and
an asymptotic anaipis waa derived in [9] by following the mixture a p p r d .
1.3 Major Contributions of The Thesis
Motivateci by explorhg the possibility of using intrapulse information of a collection of
pulsas to identify the emitters present, extensive! research on model-based clustering have
been conducted. The major contributions are twofold: model-based cluster d y s i s and
intrapluse d y s i s .
Model-Based Cluster Anaîysh
Bayesian inference and minimum encoding infknce including Wdace's minimum mes-
sage length (MML) and Rissanen's minimum description length (MIIL), are reviewed and
compared for model selection. It is found that the MML coding length is more accurate
than the 0th- two in the view of quantization. Al1 model selection criteria considered here
consist of two parts, one is the log-likelihooâ function which measures the goodness of fit
between the data and the model, and another is a penalty function which measures the
complexity of the model. An inference aims to balance the trade-off between goodness of fit
and model complexity. Hence in practice, we can introduce a penalty weight for the penalty
function to control the traàe-off. We cal1 this approach the penalized likelihood method.
![Page 25: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/25.jpg)
Applying minimum en&g inference to the chmification a p p r d of model-based
clustering, ne propose an appmpriate measure of coding length for cluster didation. The
coding lengths tmder four different Gaussian mixture models in terms of the covariance
mat* C are denved. The f h t covariance structure is the simpleat and d y used for the
purpuw of a theoretid error performance adysis. The second and fourth sn successfidly
applied to intrapulse analysis. The third one might be usenil for other applications so we
indude it for completeness. Compondingly, we develop an e t i v e clustering algorithm
which starts with viewing a given data set as a cluster and then repartitions and regroups
the data to get a new cluster in each step. The new algorithm is ofMine sime it requins
all the data available at the same tirne. Extensive empirical results show th& the new
dustering algorithm (with cluster validation) h more suitable than SNOB to procesi, high
dimensionai &ta with better pedormance on small sample c88es. In h t , our algorithm
is weU designed for the dustering problem in intrapuise anaiysis, in terms of the k t tno
issues pointeci out in Section 1.1.
The theoretid error performance of our clustering algorithm is evaiuated under rat
sonable msumptions. It is shown in this thesis how the dimension of the data space, the
samp1e a h , the mixing portion and the inter-duster distance affect the performance of the
clustering algorithm to detect the tnie number of clusters. Fûrthermore, by introducing a
penaity weight, we iwestigate our &ustering algorithm ar a p e r d i d Welihood method.
The impact of the penalty weight is investigated. With some supervision, we could adjust
the penaity weight to further improve the pertormance of our algorithm. Teating our clus-
tering algorithm on intrapulse data, we have found that the best performance is usually
achieved by using the fourth covariance structure when no supervision is a d a b l e (i.e., the
penalty weight is 1, the default vaiue) , and that the best performance is usually ach ied by
using the second covariance structure when supervision is available (i.e., the penalty weight
can be adapted).
Intrapuise A118i1ysL
First, we develop pre-processing techniques to remove nuisance parameters f m received
![Page 26: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/26.jpg)
puises such a an a b l u t e amplitude and phase!, time delay and tesidual carrier frequency.
As a resuit, we formulate the problem of emitter number detection and pubernitter asso-
ciation into a multivariate clustering problem. In order to reduce the computational cost
for clustering, a suitable data compression method based on a mvelet decomposition is also
included in preproceasing. The pie-processing techniques ate intuitive in nature and are
carried out so that a f k pmprocessing, the pulses received fkom the same emitter maintain
the resemblance to each other, while those ftom different emitters maintain their distinctive
features.
Second, after applyiq the above new clustering algorithm to the clustering problem, ne
investigate how to achieve on-line clmtering, that is, to perform classification dyriamically
as pulses arrive. To solvc this pmblem, we pro- to set up some t h h o l d s and distance
measUres which can be useù to indiate to which existing cluster an incoming pulse shoukl
be assigned, or whether it should form a new duster. To achieve an accurate clsss'ication
result, no have to adapt the thresholds ai, the atatisties of the received pulaes changes in
tirne. Unfortunately, it b u s d y di&ult to modify the thresholds appropriately when a
priori knonledge of the ineoming puises is not amilable. To overcome this drawback, a
novel on-line algorithm b a d on a modd-based detection Bcheme is developed in which
no expliut thresholds are required. This new on-line algorithm dynamicaliy incorporates
cluster splitting, merging and regmuping operations by ueing the mdel-baseù detection.
The perdormance of this on-line mdel-baseù clustering algorithm is h a i t the same as
that of the &line model-based aigorithm but is mueh b t e r .
T hird, to d e c t ively implement our pre-procesahg techniques and clust ering algont hms
for the emitter number detection and the pulse clasdication in near red-the, we estimate
the relevant phyaical parameters such as the Likeiy mwmal incoming pulse ratea. B a d
on these estimates, we then propose a suitable system diagram and investigate the system
requirements. Finally, we implement our on-line clustering algorithm as a core classification
module on a TMS320C44 DSP board.
![Page 27: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/27.jpg)
1.4 Outline of The Thesis
This introduction chapter bas been m&dy concemed with p h h g this thesis in context.
We have r e v i d the problems in inkapuise d y s i s and model- based cluster anaiysis, and
out lined our major contributions to t hese two fields.
In next bpter , we review eome criteria for modei selection, compare Bayesian infer-
ence and minimum encodhg inference (including MDL and MML), and cast them into the
fiamework of a p e d i d likelihaad method.
In Chapter 3, applying minimum eneoding inference to the classification approach to
mdel-baseci dustering, we propose an appropriate measure of coding length for cluster
validation, and duive the coding lengthe under four dinerent Gaussian mixture models.
Then we describe a new clustering aiprithm, compare it with SNOB, and demonstrate
by extensive simulations that our algorithm is more suitable than SNOB to proeess high
dimensionai àata with better performance on a d samp1e cases. In addition, we aliio
examine the performance of the coding length measure baseci on an asymptotic method fbr
Bayesian Inference.
In Chapter 4, we conduct the theoretical ptdormance analysis of our clustaring a l p
nthm, in tenm of two type8 of errors: miss and tdsa ahm. We aiso study the impact of the
penalty weight under the framework of the penaized iikedihd method. The condusion . . .
is that there is a range of the penalty weight within which the best performance of our
clustering algorithm ean be achieveù.
intrapuIse Wysia is CZlSTied out in Chapter 5. We first describe the pre-proce9sing
techniques including data compression for received pulses and formulate our objectives
into a multivariate clustering setting. A R a applying the model-based clustering algorithm
dewloped in Chapter 3 to the dustering problem, we further develop taro on-line clustering
algorith-, one is based on hm thresholds while the other is based on a model-based
detection. Performance on intrapulse data by using our clustering algorithms and SNOB are
reported, and the results demonstrate that our new clustering aigorithms are very eftective
for intrapuIse analpis, especiaiiy the on-line model-based algorithm.
![Page 28: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/28.jpg)
In Chapter 6, the DSP implementation for intrapulse analpis is considered. ne estimate
the relevant physical parameters such as the likely maximal incornhg pulse rate, then
propase a suitable system diagam and investigate the system requirements. The benchmark
of DSP coding of out on-line clustering algorithm is reportecl.
Finally, the last chapter concludes the thesis with a summary of what hm been achieveà,
and outlines areas of future resesrcb.
![Page 29: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/29.jpg)
Chapter 2
Mode1 Selection Criteria
2.1 Introduction
in this chapter, we revîew Bayesian Interence [l7,35,48,M] and Minimum Encoding Infer-
ence [Sû, 52,62,64]. For Bayesian I n f i c e , hm tnoèrence techniques are introduced: one
is using Laplace's method [17,36] and the other is an asymptotic method [54. For Mini-
mum Encoàing Merence, there are two approaches: one is d e d the Minimum Description
Length (MDL) crite15on [52] and the other is caiied the Minimum Message Length (MML)
criterion (641. For MDL, its coding steps are bridy described and the idea of a universal
prior is introduced. For MML, i t ~ coàing stepe are briefly describeci and a amsible prior is
required. In the end of this chapter, Bayesian Inference, MDL and MML are arst into the
fhmework of a penAlized iilsedihd method.
2.2 Bayesian Inference
Given a data clet Y = {y,, - , y and a set of model classes ' parameterized by K (K =
1, , Kmax), let 8 denote the model parameter vector under a model c l w , f (Y 10, K)
denote the conditional probability density huiction (p.d.f.) of the data given 8 and K, and
'For dance, in model-hacurl duster aridysb, K is the number of dustas, and dament partitions which form K clustas beloag to the Mme model class.
![Page 30: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/30.jpg)
h(ûl K) denote the prior p.d.f. of 8 K. Then the conditional p.d.f. of Y given K is
F'urthu let f (Y) denote the p.d.f. of Y oeeurring and Let P(K) be the prior probabiity of
the mode1 dass K. Then Bayes' theorem tells us P(KIY), the posterior pmbability of K
If we take a d o m prior for K, then P(K 1 Y) is proportional to f (Y 1 K ) , i.e.,
By using Laplace's method [17,36] to approximate the integral in Eq. (2.1)' we have
where 8 ie the mrurimum ijkeiihood (ML) estjmate of 8, t is the number of hee parameters
in 8, ( - 1 is the determinant of a m a t e and Fa is the Fisher information m a t h evaluated
at 8. The Fisher intonnation mat& is denwd by
- UsuaUy we examine Eq. (2.4) in the logarithm form. Hence, Bayesian inference criterion
can be describeci by
It is shown in (541 that asymptotidy
e - l o g f ( ~ ~ ~ ) = - iogf(~19, K) + log N. (2.7)
The above asymptotic criterion is usually r e f e d as Bayesian Inference Criterion (BIC).
The advantage of ushg BIC is that it d a s not depend on the prior distribution h ( 8 1 ~ ) .
![Page 31: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/31.jpg)
However, this large-aample criterion mqy not m r k satisfsctorily for small or medium sample
cases. This drawback can be compensateci to some extent by specifying a sensible prior
probability h ( 6 l ~ ) in Eq. (2.6) if ne have some knowledge of the given data set.
A model selection is actuaiiy performed in two levels. The first level is to choose the
best model class to fit the data and the second level is to chooae the best model under the
chosen model class. The ML estimate under the chosen model cl= is usually taken as
the best model.
Notations
Throughout this thesis, if a model 9 is conriidered, it is under some model class K. For
notational simplicity, n chocee not to record this dependence expiicitly with the under-
standing that 8 is dependent on K impiicitly.
2.3 Minimum Encoding Inference
There are taro major a p p d e s of minimum e n d h g idèrence: one by Wdlace [62,64]
and the other by Rissanen [50,52]. Wdlaa tenned hie infefence method the Minimum
Message Length (MML) criterion while Rissanen termed his the Minimum Description
Length (MDL) criterion. MDL appean, more widely kaom in engineering fields [47,65,
7û-721. Wallace and Rissanen's Royai Statietid Society meeting review papers on MML
and MDL were shown aide by side in 1987 [52,64]. A comprehemive cornparison between
them rpas presented in 1994 [8]. The huidamental idcas of MML and MDL are the same:
Given a data set and a M y of competing statistical modeis, the b a t model
is the one that yields the minimal coding length of the data.
We assume that there are a data set Y = {y l , , pN) and a statistical model determined
by l parameters which is describeci as 8, 8 E R'. To assess the goodness of fit between
the model and the data, we construct a code length L(B) of the model and a code length
L(Y(8) of the data in terms of the model under a proper encoding scheme. A good model is
one l e h g a concise total description length L(Y,B) which is the sum of L(i9) and L(Yli9)
![Page 32: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/32.jpg)
- the ahorter the betta. i.e. the minimum encoding interence is to fuid O
0' = argmin L(Y, 8)
Let f (Y 10) be the conditional p.d.f. of Y given 8, we regard - log f (Y le), known in
information theory as seEinf~rmation, to be the number of "nits"2 it talces to encode Y
with an ideal code relative to the assumed m d e i of the data i.e.,
To e n d e 8, we need to hm its prim probability. In addition, we can only eneode 8 to a
limited p-ion, O an optima quantization is needed to yield the total d g length as
short as padble. Bridy, the Merences bebetanen Rissanen's MDL and W&e7t3 MML are
the view of the pria and the selection of the optimal quantization.
2.3.1 RissanenYs Description Length
The major steps of F€issanen's Minimum D d p t i o n Length procedure [51] are desaibed
as follm:
1. Quanthtion: partition the parameter space into regions with centers Oit Bi E
R', i E N aiid quantization volumes V(ei).
2. Indexing: map Oi into a positive integer j .
3. Encoding a prior: use a sctcalied univerd prior for positive integers to encode j
by the length Li').
4. Tbtal description length:
- log f (Y Jei) + Lÿ) . (2.11)
' ~ h e unit is caiied Ynit" by using the natusal logarithm. h fact, n are conarneed with calculating the length of description foa the Saence but we do not naxi d y to transmit if. Thadare, we use codes that are efiicient in tams of cade length, but may not be efncient in the time required to encode/decode da&
![Page 33: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/33.jpg)
Note that for each il we have a pmbability model f (YI&).
The procedure is not complete until un? speciry:
a the optimal quant bation volumes V(Bi), i E N.
0 the rnapping h m Bi to a positive inkger j.
a the universal prior for integers.
A universcil prior for integers
Since the codes of both j and Y are strings, we can not just attach them next to each other.
If the decoder alvays reads the code string h m left to right, then a necemmy and sufncient
condition for the decoder to be able to separate the codeMrd £hm whatever string follows
it, is that the dewords for the integers form a pnf t : set. This means that no codeword
is ailowed to be a p& of another. B a d on this point, the optimum length of a pmitive
integer j used by Rissanen is
Lu) = log' j + logc (2.12)
where log' j = log j + log log j + log log log j + --=, ody induding positive t m , and c L a
small constant (z 2.865064). Therefore, the universal prior ie defineci as
For simplicity, the kt-order approximation of the length Lü) Q used, i.e.,
L(j) 2 log j. (2.14)
hirthemore, a lattice quantization is used here. In his approximation to the optimal
quantization, Rissanen obtained his description length as
where 9 is the ML estimate and the Fisher information m a t e is d&ed by
![Page 34: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/34.jpg)
2.3.2 W a c e % Message Length
The major stepe of Wdlaeei Minimum Message Length procedure [64] are bridy desciiibed
a3 follows:
1. Quantiration: partition the parameter space into regions with centers ai, Oi E
R', i EN, md quantization V O ~ U ~ ~ S V(Bi).
2. Choosing a prior: speafy mme sensible prior h(ei).
3. Tûtal message length:
Note that for each i, we have a probability mode1 f (YI&). It is convenient to ap
prdmate the integral (for sufticiently small V(Oi)) as follm
The procedure is not complete until we specify:
0 the optimal quantization volume V(8,).
0 a priol pmbability density h(8,) o m the parameter space.
By a caldation simikr to the optimal iattice quantization, Waliace deriveci his message
length as 1 e e
L(Y,8) -log f (Y@ + qlog(FBI + 5 + 51ogGt - logh(@) (2.18)
where 9 is the ML estimate, 1 * 1 denotes the determinaat of a matrix and Gc is the b
dimensional optimal lattice quantization constant which can be found in [18, Page 611, and
h(8) is the prior p.d.f. of 6.
2.4 Framework: Penalized Likelihood Method
By comparing Eqs. (2.6) and (2.18), we observe that the major difference between them
is a quantization constant. Specifically, the hyper-sphere constant & is used in Laplaee'a
![Page 35: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/35.jpg)
method for Bayesian inference but the optimal Lattice constant Cc is used in the MML
message length. In the view of optimal quantization, the MML coding length is more
accurate. In addition, ne believe that we should speQty a sensible prior if we have some
knwleàge of a given data set, instead of some u n i d prior. Hence, ne pder the MML
message length formula Eq. (2.18) for our application to model-besed cluster analysis.
We also notice that all model selection criteria considered here consist of two parts,
one is the log-likeiihood function which measures the goodness of fit between the data and
the model, and another is a penalty funetion which meastues the complexity of the model.
An inference aims to balance the tradeoff between goodnese of fit and model complexity.
Hence in practice, we can introduce a penalty weight for the penalty function to control the
trade-off as fo11ows:
L(Y, 8) = L(YIB) + u ( e ) (2.19)
where X is the pedty weight.
Roughly speaking, an inférence tends to underestimate when X is large, and it tends to
overestimatte when X is small. Therefore, m have to determine the suitable range X which
guarantee the true estimation. This is investigated in detail in Section 4.4.
![Page 36: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/36.jpg)
Chapter 3
Model-Based Clust ering
3.1 Introduction
In model-based clustering (28,391, it is assumed that the data under consideration are
generated h m a finite mixture of probability distributions; each component of the mixture
represents a dinerent goup or duster. Therefore, given a set of observed data vectom, our
objectives are
0 cluster validation, to determine the number of components in the mixture;
0 clustering, to detamine which data vectors arise fkom each component.
In the previous chapter, Bayesian inference and minhum encoding inference (MDL and
MML) for mode1 deetion were discusseà. In addition, different clustering algorithme for
Gaussian mixture models were bridy compared in Section 1.2. These aigorithms include
MCLUST, Autoclass and SNOB. Only SNOB is suitable for both high dimensional c a m
and s m d sample cases.
In this chapter, we apply minimum encoding inference to model-based clustering, pro-
pose an appropriate coding length measure for cluster validation, and fully derive the coding
lengths under four different Gauasian mixture models. Then we describe our moàel-based
clustering algorithm, compare it with SNOB, and demonstrate by extensive simulations that
![Page 37: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/37.jpg)
our algorithm is more suitable than SNOB to process high dimensional data with better
performance on a d sample caees. In addition, we also investiete the performance of the
coding length measure based on the asymptotic method to Bayesian Inference. Part of this
chapter has been published in [68].
3.2 General Coding Length Measure
Given a data set Y eonsisting of N obsefved data vectoni y l, 9 2 , . . . , Y N , each of dixmuion
M, the data vector y, (n = 1,2,. . . , N) is to be assigned among K clusters. Let an
d a t i o n parameter vector 4 = [a1, 0 2 , . . . , Q ~ ] ~ , sudl that if an = k, then the data
vector v, is assigneci to the kth duster. In a model-based method, the k-th cluster (k =
1,-, K) h assumecl to be a sample of a simple distribution, denotecl by f k ( - ) with its
parameter vector Or. Therefore, the conditional density function for the data is
where the mixture mode1 parameter vector B consists of independent parameters in the set
{Bi, &, . . . , B k ) and 1 is the dimension of B.
Now, fiom Shannon's coding theorem [19], the minimum code length is given by the
entropy of the data. Thus, using the naturai logarithm, the minimum code length in "nita"
(see the footwte of Section 2.3) is
L(Y, 2) = E[- log f (Y)] - - log f (y lé, a) - log f ( e ) - log ~ ( o i )
In Eq. (3.2), we have used the evaluation of the coding length at the maximum liltelihood
(ML) estimates e and oi to appraximate the expected coding length, nhere f (9) is the
probability density function evaluated at O = 8, and P(â) is the probability of cl = a.
First, let us examine the last term in Eq. (3.2). a is a particular association vector,
the nth element a, of which denotes the association of the nth data vector with the anth
cluster. Now, to partition N data vectom into K clusters, the numbez of dinerent raya as
![Page 38: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/38.jpg)
shown in A p p à i x A is
where Nk is the number of data vectors assigned to the kth cluster (k = 1, # - - , K; Ni + N2+---+Ni; . = N), and- L thenambaofclusterr, withndatavectors (n = 1,2,=-, N).
If a uniform a priori probab'ility is assumed for a, then
- log P(L) = log S(N, &). (3.4)
The f h t and second tenna in Eq. (3.2) can be described by the message length formula
of Eq. (2.18). Thdore , we have the total coding length
where L b the numbe~ of independent parameters in 8, Gc is the I-dimensional optima
lattice quantization constant which can be found in [18, Page 611, h(@ is the pnor p.d.f. of
8, 1 1 ia the d%terminaat of a matrix and Fe is the Fisher information m a t h d&ed by
The f h t term in Eq. (3.5) b the negative log-likelihood which measures the goodness
of fit between the data and the model. We denote it by L(YI&,&), i.e.,
The rest of the terms in Eq. (3.5) forma the penalty funetion which meaeurm the model
complexity. If the dominant penalty term f log N shown in Eq. (2.7) is used, we have the
asymptotic co&g lengt h
3.3 Coding Lengths Under Gaussian Mixture Models
In this section we investigate how to dcula te each term in Eq. (3.5) under Gaussian mixture
models. We start wit h L(Y lé, ôi), the log-likelihood term. In this case, fk (*) is assumeci to
![Page 39: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/39.jpg)
be the denaity ninction of a multivariate normai distribution with its meau wctor p k and
ita covariance ma* Ok. Suppose that a particular association veetor a = [al q . . . anrIT
partitions N data vectors into K groups such that we have
the conditional density hct ion for the data [43] is
Hence, the L(Y le, &) tenn b e x p d as
K r
and
n=l
The Fisher infoxmation mat& ka in the second term of Fq. (3.5) is very eophisticated
for a general structure of CI. Fiuthennore, the dimension M for our application is usually
bigher than 40 so there are more than 40 x 40 parameters in just one covariance matrix Ck!
This general structure wili generate severe numerical problems when only small, or medium
samples are available. To avoid the abare limitations, we assume that the covariance mat&
Ck is diagonal. In this case, It is easy to verify that
Therefore, Fa is a diagonal matrix according to that
![Page 40: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/40.jpg)
As m see, coding ia u a d y baaed on since the true 8 is iinknm. 9 is a vec-
tor with C elements &, i = 1, , 1. Theae elements are statistidy independent when
Er, k = 1, , K, are diagonal, 80 eseh di is quantized independently. In other words,
the quantization here is actualiy performed in one-dimension, instead of &dimension. For
the one-dimension caae, the optimal quantization constant is h [18, Page 611. Hence, the
optimal quantization constant we used is
Below we consider four Merent covariance structures:
Covariance Structure 1: Ck = 91, Vk
a Covariance Structure 4: E k = Dk, Vk
The b t covariance atmcture is the sirnplest, and rnainly ueed for the purpose of thmntieal
performance dpis in Chapter 4. The secrnnd and fourth structures have b e n success-
ruUy applied to intrapule analph in Chapter 5. The third one might be useful for other
applications so we indude it here fot completeness.
To W y derive a coding length, ne assume a d o m prior pmbability for each parameter
in pk = [Irki, , and the underlying unmiance matrix Zr over some certain regions.
Let bo and Wo be the mean vector and the covariance matrix of the whole data set Y
Parametric regions will be determinecl by and Wo according to the underlying covariance
s t nic t use, as detailed in the following subsect ions.
![Page 41: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/41.jpg)
3.3.1 Covariance Structure 1: Ck = $1, Vk
Under the assumecl covariance structure, we have only one parameter a to characterize
ail -ce matrices, and KM parameten, {ph( k = 1 ,..., K; m = 1 ,..., M} to
characterb al1 mean vectors. Thdore, the number of fiee puameters are
A. The log-likdihood term L(YI~, ô)
Let ô be the ML estimate of o. h m Eq. (3.9), we obtain
w h m tr(-) h the trace of a matrix and
Daerentiating Eq. (3.8) with respect to u and equating it to zero, we have the ML estirnate
Subtituting Eq. (3.20) into Eq. (3.18), we have
B. The Fisher term i log(FB(
Under this mixture model,
1 i i P ~ ( ~ l ê , a ) k M - i o g l ~ ~ l = ,log + i c 1%
B Z L ( Y ~ ~ , ô) . 2 au2 k=l m=l M m
Dinerentiating Eq. (3.8) twice with respect to o and wing Eq. (3.20), we have
![Page 42: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/42.jpg)
Die~entiating Eq. (3.8) twice with mpect to pk and using Eq. (3.20), we have
Substituthg Eqs. (3.23) and (3.24) into Eg. (3.22), ne have
C. The pdor term - log h(6)
Denoting the rangea of fihn and û by r h and p respectively end assuming a d o m
Let 9 be the standard deviation of the whole data set Y. h m Eq. (3.20),
where Wo h a ~ been d&ed by Eq. (3.16). For simplicity, we further assume that
p = ûo.
Hence, the prior term contributes to the coding length by
D. The total coding length L ( Y , K)
Eqs. (3.21), (3.25), (3.17), (3.14) and (3.28) fom the total coding length defined in
Eq. (3.5). By removing the parts independent of K (because they are incomequential to
the subsequent rninimization), we can s impli~ its expression to
![Page 43: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/43.jpg)
where W, Wo and S(N9 O) a n defhed in Eqs. (3.19), (3.16) and (3.3), respectidy.
Under the amumed covarhce structure, we have K parameten, {uk 1 k = 1, . . . , 8) to chat-
acterize all covariance matrices, and KM parameters {piml k = IV . . . , K; m = 1,. . . , M) to characterize all mean vectors. Thefore, the number of free pararnetets are
A. The log-likelihood tam L(Y 19, ô)
Let ûk be the ML estimate of uk. h m Eq. (3.9), we obtain
where W k is given by Eq. (3.1 1).
Dinerentiating Eq. (3.8) with respect to ok and equating it to zero, we have the ML estimate
Substituthg Eq. (3.32) into Eq. (3.31), we have
B. The Fisher term ) loglF61
Under this covariance structure,
1 1 " a 2 ~ ( ~ l ê , â ) 1 K M log (F*I = - log + 5 c c log
@L(Y lé, &) 2 k=l au;
(3.34) k=l m=l a&"
Dserentiating Eq. (3.8) twice with respect to uk and using Eq. (3.32), we have
![Page 44: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/44.jpg)
Dinerentiating Eq. (3.8) tMa with respect to p k and using Eq. (3.32), we have
Substituthg Eqs. (3.35) and (3.36) into Eq. (3.34), we have
C. The prior term - logh(8)
Denofhg the ranges of fih and & by th anci pk respectively and assuming a d o m
distribution for each, then
Simiiazly to Covariance Structure! 1, ne further assume that
Hence, the prior term contributai to the coding length by
D. The total coding length &(Y, K)
Eqs. (3.33), (3.37), (3.30), (3.14) and (3.39) form the total coding length d h e d in
Eq. (3.5). By removing the partci independent of K (because they are inconsequential to
the subsequent miuhhtion), we can simplify its expmion to
K KM e K e +(M + 1) logNk + 2- log - + - log - + log S(N, a) (3.40)
kl 3N 2 6N
where Wkl Wo and S ( N , ô ) are de6ned in Eqs. (3.11), (3.16) and (3.3), respectively.
![Page 45: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/45.jpg)
NOTE TO USERS
Page(s) not included in the original manuscript are unavailable from the author or university. The
manuscript was microfilmed as received.
This reproduction is the best copy available.
![Page 46: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/46.jpg)
Dinefentiating Eq. (3.8) twice with respect to ok,,, and using Eq. (3.55), we have
Dinereptiating Eq. (3.8) tarice aith respect to p k and using Eq. (3.55), we have
Substituting Eqs. (3.58) and (3.59) into Eq. (3.5?), we have
1 K K KM 5 1% IF81 = - log Idiag(Wt)l+ 2M log Nk + 2- log 2. (3.60) k=l k=l
C. The prior term - log h(8)
Denoting the ranges of fi&,,, and &km by r k , and pk, respectively and assuming a
uniform distnbution for each, then
Simhly to Covariance Structure 3, we further assume that
Hence, the pnor tenn contributes to the coding length by
D. The total coding length L ( Y . K)
&S. (3.56), (3.60), (3.53), (3.14) and (3.62) form the total coding length defined in
Eq. (3.5). By removing the parts independent of K, we can simpüfy its expression to
K f i e +2M C log Nk + KM log 6~ + log S(N, â) k=l
where WL, WO and S(N, oi) are defineci in Eqs. (3.11), (3.16) and (3.3), respectively.
![Page 47: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/47.jpg)
3.3.5 Stmmmy of Coding Lengths
For easy cornparison and reference, the coding lengths under the above four covariance
structures are listed b e h :
Covariance Structure 1: Ck = 021, Vk
MN KM + 1 tr(W0) Li (Y, K) = - log tr(W) + 2 2 log *(W)
M~ KM e +T log N , + log - + log S(N, 4) k=l 3
where W, Wo and S(N, 6) are defined in Eqa. (3.19), (3.16) and (3.3), respectively.
Covariance Structure 2: Er = 61, V&
where Wk, Wo and S(N, a) are dehed in Eqs. (3.11), (3.16) and (3.3), respectively.
Covariance Structure 3: Ek = D, Vk
where W , Wo and S ( N , â ) are defined in Eqs. (3.19), (3.16) and (3.3), respectively.
Covariance Stmctwe 4: Ck = Dk, Vk
L ~ ( Y , K ) =
where 'Wkc wo and S(N, â) are d&ed in Eqs. (3.11), (3.16) and (3.3), respectively.
![Page 48: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/48.jpg)
3.4 A Model-Based Clustering Algorithm
Given N data vectors (denoted by Y) and K clusters, we need an appropriate clustering
procedm to determine the optimal partition (or grouping) of these vectors Y into K
clusters. Hem by "optimal", we mean that, for a given K, the optimal partition a achieves
the maximum likelihood. In other words, the negative log-likelihood L(Y (9, ô) is minimal.
The procedure we propose to optimally repartition K existing clusters into K + 1 new
clusters is as follm: we obtain K candidate partitions, each by a binary splitting of one of
the K existing d u s ters, followed by a repuping of the data into K + 1 clusters; The optimal
clustering among the K candidates is the one which achieves the maximum likelihd.
Therefore, the clustering IUlAlysis eonsists of two loops: (a) In the outer loop K starts
fkom 1 to Km, a pre-selected uppa bound; (b) in the d e r loop the optimal partition
a of Y into K clusters is chceea and L(Y, K) b dcuiated accordhg to 8, ô. Finally, the
number of clusters K* is selected if it yields the minimal L(Y, K). In the following, the
procedure of the clustering algorithm is presented in Section 3.4.1 and its computational
complexity is analyzed in Section 3.4.2.
3.4.1 Procedure
The flow chart of the dustering algorithm is shown in Fig. 3.1. The algorithm is off-line
since it requires ail the data available at the same tirne.
1. Start from K = 1, i.e., the whole data set Y is viewed as one cluster.
2. For each of the K existing clusters, compute the mean vector f ik as the cluster center
and the standard deviation vector Ût as the cluster deviation, k = 1,. . . , K.
3. In this step we will obtain K candidate partitions, each by a binary splitting of one
of the K existing clusters, followed by a regrouping of the data into K + 1 dusters.
For k = 1,. . . , K, compute
![Page 49: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/49.jpg)
(a) Use fil, A, ..., f i k , 8s the initial centers to repartition the data into (K + 1) new clusters and obtain the association vector, accorâing to the minimum
distance priaeiple. In other words, each data vector will be classifieci into a
cluster whose center is the closest. Here the distance measure is the .C2 norm.
(b) Compute all K + 1 new cluster centers, and repeat the repartition process a few
times (gay Nt times) until the cluster centers converge. Thus, the 8880ciation
vector ârk is obtained.
(c) Compute the negative log-likelihd L(Y 16, âr).
4. There are K Merent splitting in Step 3 to repartition K existing clusteni into K+ 1
new clusters. The optimal splitting d e hem is to choose the best splitting p which
yields the m i n a L ( Y I ~ , oc). i.e.,
p - a r g min- L ( Y I ~ , ~ ~ ) . tE[l,.. .,KI
Set â = âi,; i.e., op is the optimum association vector obtained from the above
splittings and repastitions,
5. K = K + 1 and L(Y, K) is calculated according to 9, ô. Go to Step 2 until K r d e s
Kmax-
6. C h o a the optimal number Km such that L(Y, Km) is minimal among d l L(Y, K).
3.4.2 Computational Complexity
Given N M-dimensional data vectors, we staxt by viewing the whole data set as a cluster
and then repartition the data to get a new cluster in each step until the number of clusters
K reaches Kmax. Below one addition, subtraction, multiplication and division are counted
1 flop (floating point operation), respect ively.
As we oùserved in Section 3.4.1, the dominant cost occurs in Step 3. At the k-th stage
( k = 1, ..., K):
![Page 50: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/50.jpg)
Y-datiset k - numbw of dustom mwmeâ in Y 6 ,- mean metor of the k-tti duster Kmax - rnaximrl wmber of dustem assumecl in Y f - standard d~~ Hdor of Ih. k-th dust~f L(Y& - d i n g O ~ Y w ~ h R dustem .uimeâ a -assœiaüon~d~t~ lo ï~ K a - ~ n u m b w o f u n k n a r i v n d u ~ i n Y
(a) duster validation
Figure 3.1: The diagram of our model-based clustering algorithm
![Page 51: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/51.jpg)
(a) Repattition Ai data vectors according to the K + 1 cluster centers.
TO compute the distance square between a data vector and a cluster center, it
requires M subtractions, M multipïications and M additions. Thus, ~ M ( K + 1)
flops are required for computing the distance squares between one data vector and
all K + 1 cluster centers. To choose the minimum distance square, it apprmhately
takes log2(X + 1) compatisoas. The cast of these cornparisons is negligible compared
to ~ M ( K + 1) flops. Thus, to assign the N data vectors into K + 1 clusters, it requirea
(b) Update dl K + 1 cluster centers, and repeat the tepartition proeess Nt times until
the cluster centers converge.
To compute the k-th cluster center, it requires MNk additions and M divisions. Thus,
to compute aU K + 1 cluater centers, it tequires
~ + l C ( M N ~ + M ) =M(N+K+~) .- MN, k=l
when ne assume N > K. Hence, to repeat (a) and (b) by Nt times, it requins
(c) Cornpute the negative log-likelihood L (Y le, &) . The computational cost in (c) is negligible compared to those in (a) and (b).
There are K candidate partitions in Step 3. Hence, the computational cait for Step 3 is
Step 3 is repeated lrom that K = 1 to that K = Km - 1. Therefore, the total computa-
tional coet of the clustering aigorithm described in Section 3.4.1 is appraximately
= MNNt [(K- - 1)3 + 3.5(Kmru - 1)? + 2.5(Kmax - l)] . (3.70)
'Ta ihd the minimum distance of a data vector to al1 cluster centers, it is equivaleot to find the minimum distance sqwn. In tbis -y, one square root operation is saved.
![Page 52: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/52.jpg)
It is shown in Eq. (3.70) that the computational complexity is apprmhately propor-
t i o d to K&. Thus, the computational cost increases dramatically as K- increases.
To alleviate the computatiod burden of the model-baseci clustering scheme, a fsst algo-
rithm is developed in Section 5.5. In this chapter, we emphasize on the development of o u
clus tering sdieme and cornparison wit h existing algorit hms.
3.5 Cornparison with SNOB
WaJlace [62] starteci his idea on the minimum message length (MML) critenon from clag
aification. Over the past three d d e s , Wallace and his ceworkers [10,11,61,63] have
developed and maintained a clustering program d e d SNOB. Bas idy , SNOB probabilis-
tically assigns a data vector y, to a cluster k, say in probability P(a, = k). Hence, the
data is conditionaliy mcdeled by
1 P k = - P(a, = k). ,,
.It is al80 aasumed in SNOB that each COYBfi8I1ce mat& Er is diagonal, and then the
message length formula of Eq. (2.18) is directly appiied for cluster validation. Detailed
descriptions of the SNOB appmach were presented in [46] and (7, Chapter 71. Here the
difierences between SNOB and our method are s d z e d below:
We foilow the classification approach using the deterministic assignment which only
allows P(a, = k) to be O or 1, instead of the mixture approach using the probabilis-
tic assignment in the SNOB program, so the mixture models are ditrerent (compare
Eqs. (3.71) and (3.1)).
Our mode1 parameter vector 0 does not require A. This resultg in a simpler coding
length.
![Page 53: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/53.jpg)
a The theoretic anal- of our approach is mathematidy tractable but the andysis
of SNOB'S approach is far more M c u i t (see next chapter for detaüs).
The determiaietic assignment can be done by a simple procedure such as the k-maom
algorithm [33] but the pmbab'itic assignment repuires a more complicated procedure
such as the EM algorithm [66], Therefore, our clustering algorithm is computationally
simpler t h SNOB.
For clustering, we start by viewing the whole data set as a cluster and then repartition
and regroup the data to get a new cluster in each step. The SNOB program starts
with a randomly initial estimate of the clustering structure and then split a cluster or
merge two clustem recursively.
3.6 Experimental Results
An empirical cornpariaon of Autoclass, SNOB and taro neural netwotk classifiers (Kohonen's
network and Adaptive Resonauce Theory) was made in [59,60], where the conclusion is that,
overall, statistical classifiers, especidy SNOB, perform better than the neural network claa-
siners on both cluster validation and clustering. For our coding length measurea described
in Section 3.3, we demonstrated that it outperforxns some well-known non-parametric cri-
tena in 137,671. Here to tauly compare the performance of our clustering aigorithm with
that of SNOB, we incorporate the same prior specification h(8) as SNOB, which has been
describeà in Section 3.3. We also examine the performance of the coding length measure of
Eq. (3.7) based on the Bayesian Inference Criterion (BIC).
To simulate a mixture of two clusters, let N be the sample size of the two clustem in
total and c be the mixing portion. Then the populations of the two clusteni are cN and
(1 - c)N respectively. We define the foUowing measure for the inter-cluster distance:
For simplicity, we have chosen the covariance matrices of the two clustem to be identicai such
that Xi = Cz = 021u. Let the inter-cluster distance be D, we define pl = [0,0,- ,O]=
![Page 54: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/54.jpg)
and = [rn, m, , SIT. Fbrthermore, for the high dimension case (M = 22), data
is generated by adding Gaussian noises in two noiselesa pulae patterns, such examples are
shown in Fig. 3.2.
In the following, we m a t e various humcluster mixtures when the data vector dimension
M = 1,2,22, the mixing portion c = 0.5,0.2, the sample size N = 40,100, lûûû and the
distance measure D = 2,3,4,6,8. We also m a t e the case of one cluster when M = 1,2,22
and the sample size N = 40,100,1000. We empioy our clustering algorithm under four
covariance structures which are denoted by Li, L2, L3 and LI respectively, and the SNOB
algorithm to perform cluster validation and clustering. We also employ our clustering
algorithm under the BIC of Eq. (3.7) based on Covariance Structure 4. In thir way, m
can M y compare the performance of 4, BIC and SNOB. Ten trials of eaeh mixture are
d e d out using Li, 4, L3, 4, BIC and SNOB. In each trial, we assume the number
of clusters h m 1 to 4 and then ch008e the number at which the correspondhg criterion
is minimai. The results of cluater validation for al1 criteria extunineci h, as represented
by the number of times out of 10 trials that the correct number of dusters is determined,
are s h a n in Tables 3.1 - 3.9. To further compare our clustering algorithm and SNOB, the
accuracy of the cormponding clustering results are shown in Xàbles 3.10 - 3.12 for N = 100
when both algorithms have made the correct decision on cluster validation out of 10 trials.
From Tables 3.1 - 3.12, it can be obsenred that
1. None of the criteria is diable for small and medium samples (N = 40,100) when
D < 3.
2. The performance of ail criteria is improvd s the distance D and the sample size N
3. The performance of Li, IQ and La is very similar to that of LI.
4. BIC is inferior to A4 and SNOB in duster validation in generai but performs very weU
for the medium and large samples (N = 100,1000).
![Page 55: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/55.jpg)
5. The performance of LI is superior to that of SNOB in cluster validation for the small
and medium samples (N = 40,100).
6. In the cases where both algorithms perform perfectly in cluster validation, SNOB
yielàs slightly more accurate results in clustering than LI.
Obsemtion 1 is easy to justiCy since if D < 3, the mr lap between two clusters is very
extensive, which d e s it M c u l t for any of the criteria to work properly. Observation 2 is
intuitively clear: the larger is the inter-cluster distance DT the less overlap is there between
the two clusters, and the higher accuracy is achieved in parameter estimation. A h , since
the data here is generated by using Ci = C g = the performance of Li, L2, L3 and
L4 is likeJy to be more or les8 the same, as confirmd by Oùservation 3. Observation 4 is
natural because the BIC of Eq. (3.7) is an asymptotic criterion. In k t , BIC has been used
in many applications due to its simplicity and the hct that no prior Lnowledge is required.
Rom Observstions 5 and 6, our dustering algorithm shows much higher reiiability in
cluster vaiidation than SNOB, while d c i n g marginally on the accuraey in clustering.
R d that an extra set of parameters pk is included in SNOB whereas such parameters
have not been taken into consideration in the development of out algorithm. These param-
eters prescribe the probabilities of the nth data vector being generated by the kth cluster
and m u t be estimatd. This d B e n c e between SNOB and OUI aigorithm has profound
implications in their performance. For cluster validation, the probability that the nth data
veetor associates with the &th cluster is irrelevant, i.e., regardless of the values of these
probabilities, the number of clusters temains the same. Therefore, for cluster validation,
these extra parameters are nuisance pasameters and their inclusion WU lower the accuraey
of the determination of the number of clusters, and hence the new algorithm shows better
performance than SNOB in cluster validation. On the other hand, the probabilities of as-
sociating the data vectors to the clusters are highiy relevant parameters in the process of
clustering. Therefore, their inclusion provides more information gained fkom the data and
renders SNOB the more accurate algorithm in clustering.
![Page 56: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/56.jpg)
n
The h t coliimn is for D = 4 and the second is for D = 8
Figure 3.2: Simulateci data for M=22, where 2-4s is the index of data sample points and ysxis is the amplitude of simulated data.
![Page 57: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/57.jpg)
Table 3.1: Cluster validation muits for two true clusters: M=1, c=0.5
Table 3.2: Cluster validation results for two true clusters: M=1, c=0.2
IFr -MI - Dm1
DtJ
-4
0-6
D=8
Sunpli S i u [ Li La LI L4 BiC - SRde N=40 8/10 10/10 8/10 10/10 8/10 10/10 N=100 10/10 10/10 16/10 lO/lO 10/10 10/10
h p h S i r N=lO Nd00 N+1000 N ~ 4 0 NrlOO N-1000 NrlO N-100 N=lom NrlO N-100 Nrlam pl140 NilW N=lOOa
Table 3.3: Cluster validation results for one true cluster: M=ï
BIC 1/10 O/lO 0/10
4/10 3/10 6/10 I
io/io 10/10 6/10 10/10 io/ro 6/10 lO/lO 10/10
L4 1/10 O/XO 0/10 3/10 4/10 3/10 o/io io/io 10/10
. 10/10 10/10 10/1o lO/lO io/io lO/lO
SNOB 0/10 O/lO 1/10
3 f l Ï v 1/10 10/10 ~ i o io/io 10/10 B/lo 10/10 iO/lO io/io lO/lO lO/lO
' Ls 1/10 O/lO O/lO 3/10 4/10
. 5/10 6/10 io/ro 10/10 9/10 10/10 io/io 9/10 lO/lO lO/lO
Li 1/10 0/10 0/10 3/10 O 3/10 O 10/10 10/10 9/10 10/10 10/lO e/io 10/10 10/10
ta 1/10 '
0/10 0/10
. 3/10 4/10 3/10 @/IO lO/lO 10/10 9/10 lO/lO 10/1o lO/lO 10/1o lO/lO
![Page 58: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/58.jpg)
Table 3.4: Cluster validation results for two tme clusten: M=2, c=0.5
Table 3.5: Cluster validation results for two true clusters: M=2, c=0.2
'Iàble 3.6: Clustet validation results for one true cluster: M=2
SNOB 16/10 ' lO/lO
W C ' 1010 lO/lO
S ~ l p l m S i m N=4O N=100
La 9/10 lQ/lO
' LI 1/10 m/ro
Lx 8/16 lO/lO
La 16/10 lO/lO
![Page 59: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/59.jpg)
Table 3.7: Clustes validation resuits for two true clusters: M=22, c=0.5
'ïàble 3.8: Cluster vaiidation r d t s for tao true clusters: M=22, c=0.2
Table 3.9: Cluster validation results for one true cluster: M=22
![Page 60: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/60.jpg)
Table 3.10: Comparison of performance of SNOB and our algorithm, M= 1
Table 3.11: Comparison of performance of SNOB and our algorithm, M=2
Tbble 3.12: Comparison of performance of SNOB and our algorithm, M=22
![Page 61: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/61.jpg)
3.7 Summary
In this chapter, model-based clustering has been considered. Based on minimum encoding
inference, an appropriate coding length measure is propoeed for cluster validation, and
the coduig lengths under four ciifTetent Gaussian mixture models are fully deriveci. The
corresponding clustering dgorithm is developed.
Judging fiom the pedormance cornparison, our coding length measure outperforms the
BIC in cluster validation since it is not based on the large sample assumption. More
importantly, our clun tering algori thm shows much higher reliabili ty in clus ter validation than
SNOB, whiie sacrificing marginaîiy on the accuracy in clustering. Indeed, our clustering
algorithm ia well designeci to effectively process high dimensional data with satisfhctory
performance on small and medium samples. Thus, the new algorithm is an attractive
approaeh for the clustering problem in intra-pulse analysis, in temu of the f h t two issues
pointed out in Section 1.1.
![Page 62: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/62.jpg)
Chapter 4
Detection Performance Analysis
4.1 Introduction
The error U y s i s of modsl-based clustering concerns about the estimation accuracy of
the number of componenta in a Gawian mixture. This ia a problem that has not been
completely solved in statistics, as stated in the end of Section 1.2. In this chapter, we con-
duct a b i detection performance anaipis of our clustering algorithm under Covariance
Structure 1 described in Section 3.3.1, by estimating the two types of errors: miss and faise
alarm. We aamine the impact of the penalty weight on the error probability under
the fiamework of the penalized likelihd methd described in Section 2.4.
Under C d a n c e Structure 1, each of dl k dustem is assumed to be a sample from
a multivariate normal distribution N(p,u21M). The log-likelihood function Eq. (3.21) is
rewritten here
and the penalty function, obtained by combining Eqs. (3.25), (3.17), (3.14) and (3.28), is
KM + 1 trWo M ' n ( ~ ) = + log - KM e + - C logNk + -log - + l ogS(N ,â ) tw 2 2 3 (4.2) k=l
where Nk is the number of members in the kth cluster; W, Wo and S(N, Q) are d h e d in
Eqs. (3.19), (3.16) and (3.3), respcctively.
![Page 63: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/63.jpg)
Therefore, the total description length is that
The detection wiU select the number of clusters to be K' if
K' = arg q i n DL(K). lSKSN
We investigate a binary detection performance here. Given a data set wi th N observation
vectors, let W be the sample covaxiance rnatrix when taking the whole data set as a cluster.
By some claesification, the data set is partitioned into two clusters, one with a sample size
Ni and a sample covariance matrix Wl and the other with a sample size N2 and a sample
covariance matrix W2. In addition, we let c be the mixing portion, Le., Ni = cN and
N 2 = ( 1 - c ) N , whereO<c<l. Dethe
and
M c(l - c)Ne N! + - log 2 3 (d)!((l - c)N)!* + log (4.6)
Therefore, to evduate the error performance, we h k to know the distribution of the trace
ratio tw or its variation.
4.2 Probability of A Miss
A miss occurs when two clusters are embedded in a data set but the binary detection says
that only one cluster exists. A simple illustration is shown in Fig. 4.1. Let Hi and H2
denote respectively the hypot heses oi one clus t er and two clus ters, the miss probabili ty
given HZ is
![Page 64: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/64.jpg)
Figure 4.1: Two Gaussian clusters
Let
Dinerent partition criteria may result in dinerent values of &. Here we assume that
our clustering algorithm can aeparate two Gaussian clustera perfectly. This assumption
is reasonable when two clusters are weîi separated. We expect more deviation from the
assumption when two clusters are closer to each other in distance. Under the perfect
separation assumption, it is proven in Appendll B that R, is distributeci as a noncentrai
F distribution FM,M(N-2) (6) with the noncentral parameter
and D is the normalized inter-cluster distance defined by
Pr - P2 D=Il II- Substituting Eqs. (4.5) and (4.6) into Eq. (4.8), we have
M c(l - c) Ne N ! +- log 2 3
+ log (&)!((l - C ) N)! > 0 I H 2 1 -
Rewriting the above equation in terms of R,, we then have
![Page 65: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/65.jpg)
Figure 4.2: The illustration of Pm
where the threshold
and the d u e of Pm is represented by the shadow area in Fig. 4.2.
From [34, Chapter 301, the mean and variance of a non-centrai F distribution with vl , 02
degreea of fkdorn and a non-central parameter 6 are given respectively by
W V , .w (41 = w(v1+ 4
(UZ > 211 (4.14) v1(w - 2)
W F V l , W ( 4 1 = 2 (;) (ol + 6)2 + (VI + 26) (9 - 2) ( y > 4 ) . (4.15)
(Y - 212(% - 4)
Here ui = M, = M ( N - 2) and 6 is specined by Eq. (4.10). substituting these values
into Eqs. (4.14) and (4.15) and assiiming that N > 1, we have
6 c(1 - C) D2 E[&] 2: 1 + - = 1 + M M N, (4.16)
2c?(l- c)~D' + k(1 - c)D2MN, Var[&] z M3 (4.1 7)
Property 1: Given the dimension M, the mixing portion cl we define Do such that
If D > Do, then Pm tends to O as the number of observations N increases. If D < Do, then
Pm tends to 1 as the number of observations N increases.
![Page 66: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/66.jpg)
Proof: By some mathematical manipulations, we have
To obtain Eq. (4. 19), we have used the knowledge that rd) !((I-c]N~ N! z [ P ( 1 - c)-('-~)] for large N, as shown in Appendix C.
First, let us check the case when D > DO. Given M and c, we know from Eq. (4.19) that
Fp, < E[&] asymptotidy if and only if D > Do. By using Chebyshev's inequality (53,
Page 691, we have
h m Eqs. (4.13),(4.16) and (4.17), we hm that Var[&] is proportional to N but
(El&] - Fp,I2 is proportionai to N? Thus,
The left hand side of Eq. (4.20) is the area under the p.d.f. of R, over the intervak
(-00, Fp,] and [2E[&] - Fp, , +m). Hence,
, Second, we check the case when D < Do. In this case, Fp, > E[&] for a sutnciently large
N. Simüarly by using Chebyshw's inequality for N + 00, we have
Thus,
lim P{2E[&] - Fp, 5 & or & 2 Fp,) = O. N + w
This further implies
Consequently,
irin P{2E[&] - Fp, < R, < Fp,} = 1. N + w
iim Pm = lim P{& < Fpm) = 1. Nd00 N-00
![Page 67: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/67.jpg)
'Iiable 4.1: The critical distance Do
Therefore, Property 1 holds. Q.E.D.
If it happens that D = Do, then Fp, is asymptotically equal to E[&] and Pm is a
positive number between O and 1. Some values of Do given M and c are listed in 'Pable 4.2.
Indeed, Do is a critical distance. While D > Do, the two clusters are weU separated, the
probab'ity density function (p.d.f.) of the mixture is bimodal and our clustering algorithm
can succesafully separate these two clusters. While D 5 Do, the p.d.f. of the mixture
becornes unimodal, the overlap between the two clusters is very extensive and the similarity
makes it difEcult for our algorithm to work properly, as for other existing algorithms.
Property 2: Pm is a monotonidiy decreasing fimction of the nomalized inter-cluster
distance D = 11 9 11. Proofi P{FM,M(N-2) (6) < Fpm ) is a decreasing function of 6, see [34, Page 1931 for detaiis.
Property 2 holds since 6 is proportional to the square of D as show in Eq. (4.10). Q.E.D.
Figs. 4.3 - 4.8 show the theoretical Pm and the testing Pm vs. the number of observations
N for the mixtures of two clusters considered in Section 3.6. The dotted lines in these figures
are the theoretical Pm curves and the solid lines are the testing Pm curves baaed on 100
trials for each N. From these figures, Properties 1 and 2 are clearly observed. We a h
notice that there is a discrepancy between the theoretical Pm and the testing one. The
reason is that the actual partition may deviate more or l e s from the ideal partition which
separates the two clusters perfectly. This discrepancy increases as the two clusters become
closer in distance.
![Page 68: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/68.jpg)
Figure 4.3: M b probability cvves for two true clusters: M=l c=0.5
Figure 4.4: Miss probability curves for two true clusters: M=l c=0.2
![Page 69: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/69.jpg)
Figure 4.5: Miss probability curves for two tme ciusters: M=2 c=0.5
Figure 4.6: Miss probability curves for two true clusters: M=2 c=0.2
![Page 70: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/70.jpg)
Figure 4.7: Miss probability curvee for two true clusters: M=22 c=0.5
Figure 4.8: Miss probability curves for two true clusters: M=22 c=0.2
![Page 71: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/71.jpg)
4.3 Probability of A False Alarm
A ialse alatm occm when one cluster is embedded in a data set but the binary detection
says that two ciustsni exist. A simple illustration is shown in Fig. 4.9. So the fslse alann
probability, given Hl (the hypothesis of one cluster), is
In the miss case, Pm is exactly analyzeà according to the Gauasian mixture model. However,
in the Ealse alarm case, the model is a mixture of truncated normal distributions, whose
exact d y s i s is still incomplete in statistics.
Figure 4.9: One Gaussian cluster
The key to the analysis is the Aderstancihg of how our clustering algorithm will parti-
tion a N ( p , 021M) sample data into two clusters. Given a large sample, the sample mean
vector f i and the sample standard deviation vector â of the data are close to the true ones p
and cr respectively. By our partition describai in Section 3.4, two new clusters are centered
in f i - û and f i + û respectively. Therefore, the partition is near symmetric to the true
mean vector and produces two clusters of about equal aize, i.e., the mixing portion c zz 0.5.
Define
![Page 72: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/72.jpg)
Asauming that the sample is reaaonabiy large and following Hartigan's work [32], Hawkins
[23, Page 3391 suggested that asymptoticaiiy, RI is approxhately normal N(pR,, SZRI ) with
the mean 6
and the variance
2 , 2 i r 2 ~ ( r ~ - 2M - 1) UR/ - N(nM - 2)2
Substituting Eqs. (4.5) and (4.6) into Eq. (4.26), we have
M eN +- log - + log N!
2 12 (O.SN)!(O.SN)! < 0 IHi}-
Rewriting the above equation in terms of RI, we then have
where the threshold
and the value of Pi is represented by the shadow area in Fig.(d.lO).
Figure 4.10: The illustration of PI
Property 1: PI tends to O as the nurnber of observations N increases.
![Page 73: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/73.jpg)
Proof: It is easy to YeTify that
2 iim ('p, -p4) = -') - 2 ) > '1 N4ao
(4.32)
iim 0~~ = 0. N + w
(4.33)
Some of these lirniting values are listeci in Table 4.3. By using Chebyshev's inequality [53,
Page 691, we have
Fkom Eqs. (4.32) and (4.331, we hm that
The le& hand side of Eq. (4.34) is the area under the p.d.f. of Rf over the intervals
(-cm, 2E[RI] - Fpf ] and [F4, +m). Hence,
lim Pl = lim P{RI > Fp,} =O. N+ao N+oo
(4.36)
Therefore, Pj tends to O asymptotidiy. Q.E.D.
Figs. 4.11 - 4.13 show the testing PI vs. the number of observations N for the one tmth
cluster cases considered in Section 3.6. The solid lines are the testing Pf curves baseci on
1000 trials for each N. The corresponding theoretical values of P' are very very small, less
than IO-^. It is o b s e d fiom Figs. 4.11 - 4.13 that the testing PI is negligible for medium
and large samples, although a small probabiiity of faise aiam does d t for smaii samples.
![Page 74: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/74.jpg)
Figure 4.11: False alarm probability curves for one true cluster: M=l
Figure 4.12: Faise alarm probability c w e s for one tme cluster: M=2
Figure 4.13: False alarm probability curves for one true cluster: M=22
![Page 75: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/75.jpg)
4.4 Optimal Range of Penalty Weight
If the penalty function in Eq. (4.3) is multiplied by a penalty weight X as described in
Section 2.4, then under Covariance Structure 1, we have
By the same remming as that used for Pmperty 1 in Section 4.2, ne require that
in order to make Pm + O asymptoticaily. This requirement lets us determine an upper
bound for the penalty weight
Similarly, by the same reasoning used to prove Property 1 in Section 4.3, we require that
in order to make PI j O asymptotically. This requirement lets us determine a lmer bound
for the p e d v weight M x,, = --log
7rM 2 n M - 2 '
According to Eqs. (4.39) and (4.40), we ean tabuiate in Table 4.3 the (asymptotically)
optimal penalty weight ranges for the cases considered in Section 3.6.
Table 4.3: Optimal rauges of the penalty weight
Apparently, penalty weights within the optimal ranges listed above may have âifferent
impacts on our clustering algorithm pedotmance for processing small or medium samples,
![Page 76: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/76.jpg)
I D = 2 I N = IOO I ojio 1 0110 I ojio I 0110 I oiio I
I 1 N = i ï 1 ioiio 1 iojio I ioiio 1 ioiro I io/ro I N = IO I 10110 I 10110 1 wro I lono I w o
Table 4.4: Cluster validation results for two true clusters: M=22 c=0.5 X = 1.1
Table 4.5: Cluster validation results for one true cluster: M=22 X = 1.1
dthough their impacts are the Mme in the asymptotic sense. In practice, the number of
*
clusters iP searchecl h m 1 to a pmselected upper bound Km (> 2), one true cluster
- LI io/io lO/lO lO/IO
Saaspl. S i n NI«) N r 100 N = 1000
might be partitioned into a few group if the penalty weight is not large enough. ki thh
case, we could choose the penalty weight sligbtly larger. For example, Tables 4.4 and 4.5
- L4 lO/lO lO/lO lO/lO
Li 10/1o 10/10 10/10
show the clustering results by using X = 1.1 for the case M = 22. It is observed that the
- BIC 1o/1o lO/lO 10/10
La lO/lO 10/10 10/10
performance of our clustering algorithm on srnall samples are slightly i m p d when the
penalty weight increasss h m 1 (se IsbIes 3.7 and 3.9) to 1.1 (see Tables 4.4 and 4.5).
4.5 Summary
In this chapter, we have conducted a binary detection performance analysis of our clustering
algorithm under Covariance Structure 1. We assumed that a partition can separate two
Gaurnrian clusters pedectly in order to analyze the miss probability, and that the partition
of one Gaussian cluster is nearly symmetric to the cluster center in order to analyze the false
a h probability. Ehtemive tests show that these two assumptions ase satisfied fairly well
by using our clustering algorithm developed in Section 3.4. Among four factors considerd
here (the dimension of the data space M, the sample size N, the mixing portion c and
![Page 77: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/77.jpg)
the inter-cluster diatance D), D is the mast important factor. There is a critical distance
Do d&ed in Eq. (4.18), when D > Do, our cluatering algorithm caa successfully separate
two clusters. On the other hand, when D 5 Do, the overlap between the two clusters is
very extensive and it ia difacult for our algorithm to work properly, as for other exhting
algorithms.
Furthermore, we have examineci the impact of the penalty weight under the framework
of the penalized likelihood method as described in Section 2.4. It is found that there is a
range of the penalty weight within which the best performance of our clustering algorithm
can be achieved. Thaefore, with some supervision, are can adjust the penalty veight to
W h e r i m p m the perdormance of our clustering algorithm.
![Page 78: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/78.jpg)
Chapter 5
Application to Int rapulse Analysis
Introduction
We consider the situation where a radar intercept receiver coiiects incoming pulse samples
fiom a number of iinltnown emitters. Our objectives are to (1) detemine the number
of emittem present (cluster validation); (2) ciassiify the incoming pulses according to the
emitters tram which they originate (clustering). The concept of intrapulse analysis ha9 been
introduced in Section 1.1. Briefly, the determination in Uitrapulse analysis is only based on
intrinsic pulse shapes, without any inter-pulse information such as pulse repetition intervals,
directions of arrivai, carrier kequencies, or Doppler shiRs.
In this chapter, we first describe the prepmessing techniques iqcluding data compres-
sion for received pulses, and then formulate the problem of emitter number detection and
pubernitter association into a multivariate clustering problem. After applying the new
clustering algorithm developed in Chapter 3 to the clustering problem, we develop two on-
Line clustering algorithms, one is bas4 on Lnown thresholds while the other is based on
a model-bas4 detection scheme. Performance on intrapulse data by using our clustering
algorithms and SNOB are reported, and the results demonstrate that our new clustering
algorithms are very dec t ive for int rapulse analysis, especially the on-line model-basecl al-
gori t hm.
![Page 79: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/79.jpg)
5.2 Signal Mode1 And Pre-Processing of Received Pulses
Let us first examine the signai representation of the received pulses. The physical scenario
is illustrateci in Fig. 5.1 in which there are, in total, K distinct emitters. The radar intercept
receiver receives altogether N non-overlapping pulses h m the emitters. We designate the
nth received pulse by zn(t; a,,), n = 1, . . . , N. Here on is an association parameter which
assumes an integer value, 4 E (1, . . . , K), such that if a,, = k, then the nth pulse is
determined to be from the kth emitter. We can therefore express the nth pulse as
where
0 fhr denotes the ahlute amplitude of the received pulse;
$n denotes the added phase of the received pulse after transmission;
rn denotes the time delay of the received pulse with respect to the reference;
0 w, denotea the residuai d e r fresuency of the nth pulse;
vn(t) is the Gaussian noise accompanying the nth pulse.
The received pulse in Eq. (5.1) contains several nuisance parameters: w, t,ûm, Sn, and wn-
These parameters are of no use in intra-puise analysis and should be removed. This is carrieci
out by the pre-processing techniques which are introduced in the foiiowing paragraphs.
These pre-proce88ing techniques are intuitive in nature and are carried out 80 that aRer
preprocessing, the pulses received trom the same emitter maintain the resemblance to each
other, whiie thoee fiom dinetent emitters maintain their distinctive features.
![Page 80: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/80.jpg)
K distinct emit!crs 1 rcceiver rtceiving N pulses
* Inter-band and inter-pulse information not usable
Figure 5.1: Radar pulses received for intrapulee anaiysis
![Page 81: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/81.jpg)
Noise Suppression
The received pulses are passeci through a band pas3 fiiter suppressing the out-of-band
noise. We now dehe the amplitude and phase profles of the received signal respectively
as:
In the rest of the pre-process, it is assumed that the SNR is reasonably large so that the
noise contribution has negiigible dects.
5.2.1 Amplitude Normalization
Let %(w; a,,) and 4 J w ) be the Fourier transforms of sn(t; a,) and a,,(t), respectively.
We remove the parameter q,, by a simple procedure of normalization resulting in S:(w; a=)
such that
SA (w; s) - %Aa, (w )e-jWrn g ( ~ ; a n ) = -
S: (0; a, ) *Aam (0)
= A;: ( 0 ) ~ ~ . ( 4e - jwTn . (5.4)
Therefore, q(L; a,), the inverse Fourier transform of &; a,), cm be viewed as the nor-
maiized amplitude profile.
5-2.2 Time Alignment Based on Thresholding
After amplitude normalization, the removal of time shift is considered. The time shift can
be removed by locating the first point, iq(to; a,), of S.(t; an), whose magnitude is iarger
than a pre-set threshold A, i.e.,
to = min{t : ii(t; a,) > A}. (5.5)
We then set 7, = h, and define
![Page 82: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/82.jpg)
We use the same threshold A for di amplitude profiles to align the pulses on the time axh.
5.2.3 Phase Aaustment Based on Polynomial Fitting
After time alignment, the linear dope in the phase profile should be removed. The liriear
function can be estimated by polynomial curve fitting. Let
by minimizing the error between g ( t ) ami b,(t; q) in a least square sense, the cdcients
Un and $n cân be determined. Then, the original phsse +an(t) aui be reeovered in a new
coordinate syetem with the time axis being $n + wnt (Fig. 5.2) when n define
where 7 = arctan w,.
We denote by th>& a.) the resdting nth puise with noise after preprocegsing to remove
the nuisance parameters, Le.,
where ûn(t) is the noise afker preprocessing; s,,(t) = A;i(~)a,,~(t)Cio'n(~l is the ideaily
preprocessed pulse waveform. Notice that a,,, (t) is only dependent on a,,, the index for
the emitter with which this pulse signal associates.
In practice, the above preprocessing procedures of the received pulses are d e d out
in discrete tirne. Thus, Eq. (5.10) can be written as
![Page 83: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/83.jpg)
l
Figure 5.2: P o l y n o d fitting for phase adjutment
where T is the mmpling intenia of the pulses and Mt is the number of samplcs in a pulse;
or in a vector fom
Yn(a*) = 8% + fi* (5.12)
with
5.2.4 Data Compression Using Wavelet Decomposition
The number of samplts M' in each prcpmessed pulse is t y p i d y over 100. Thus, the
number of samples to be procesmi for cluster validation and clustering is very Large. In
order to light& the mmputational and pmcessing 104, we have to cornpress the data
We note that the classikation of puises by intr~pulse analpis is based on pulire shapes
(amplitude and phase). Now, the l m frecluency components of a pulse refiect its basic
shape. A suitable technique for cornpressing the data while retaining the baie pulse shape
is by means of =let decompasition [21,38] by which the low freguency coelaeients can
be extractecl retaining the pulse shape information. Wavelet decomposition is d e d out
by using a chosen filter bank in which each one of the filtere is f o l l d by dom-sampling
by 2. In our case, only the low fkequency components are needed and undergo further
![Page 84: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/84.jpg)
Figure 5.3: Data compression using wavelet decompasition
decomposition as s h m in Fig. 5.3. Due to the proces% of dom-sampiing by 2, the size of
each of the outputs of the l m pasa filter (LPF) and of the hi& pass filter (HPF) after one
stage of decomposition is only haif that of the input. Thenfore, for a three-staged wavelet
decomposition, the output sample size is only oneeighth of the original input sample size.
The number of stages in a wavelet decompœition is a trade-off between the amount of ciab
compression and the de* of the puise shape information retained. Here, we employ a
tbs taged filta bank with synlet( Bltm [21]. The coefncients of t h e LP and HP iilten,
are indicated in Fig. 5.3.
We denote the data vector of Eq. (5.12) after compression by y,(a,) which is comprised
of compresaed signai and noise. Each of these complet data vectors is of dimension M which
is only a fraetion of the dimension of the orighd data vector.
![Page 85: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/85.jpg)
5.3 Clustering Algorit hms for Intrapulse Analysis
After prôprocesging, ail received pubes an aligned well. Then intrapulse d y s b is con-
sidered as a mdtivariate dustering problem: given a c o m p d chta set Y conshting of
N observeci data vectors y, (al), ..., y N (aN), each of which dimension is M, our objectives
are
1. cluster validation, to determine the number of emitters present K;
2. clustering, to determine the association parameter 4 so that for a, = k, y,(&,) is
determinecl to be h m the &-th emitter.
The pmprocem b done in the cornplex domain because amplitudes and pham are consid-
ered separately. For clustering, na form each data vector in the reai domain by putting ita
real part k t and thea ita ;mruinur part, i.e.,
Rom now on, eseh àata vector y, is assumecl to be in the real domaia.
Generaiiy, the noise accompanring a radar pulse vector is Gaussian. Hence, a set of pube
vectors emitted by the kth exnitter is the sample of a multivariate normal distribution. 1s
GaussiBDity stili maintainecl a&r praprocessing? In Appenàix D, Monte Car10 simubtions
show that compressecl praprocased puises am be rtU srniurneci as Gaussian in relatively
high contidence. Hence, a set of puise vectors h m K distinct emittere can be modeled by a
mixture of K multivariate normal àistributions [43]. Thedore, the model-based clus tering
algorithm developed in Chapter 3 can be directly depIoyed.
As introduced in Section 1.1, a clustering aigonthm for intrapulse a d y s i s should be
capable to process high dimensionai data and to pmduce satisfhctory results for small
or medium sample cases. In Chapter 3, we developed a suitable model-based clustering
algorithm and demonstrateci by extensive simulations that it outperforms other existing
algorithme euch as SNOB. Indeed, the model-based clustering algorithm ne developed is
well designeci in an off-line mode to eEéctively process high dimensional with satishctory
![Page 86: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/86.jpg)
performance on small or medium samples. In some cases, on-iine ciustering would be more
desirable because we msy aieh to classi i received puises as they arrive dpunicaiiyy This
motivates us to develop on-line clustering algorithms. One approach is to set up some
thresholds, and then assign an incominp puise to an eristing duster or form a new cluster
according to the thresholds. One reaiization of thin appmsch is describecl in Scaion 5.4.
H m m r , this simple approach may not provide satisfactory résults when the statistics of
received data change in time. To overcome this drawbsck, we develop in Section 5.5 a novel
on-iine clustehg algorithm in which no expliut thresholda are requlled. Performance on
intrapulse data b reported in Section 5.6 by using the model-bæed clustering algarithm,
SNOB, and the tao on-line aigorithms.
5.4 An On-line Clustering Algorithm Using Thresholds
We b t iix two thresholùs ti and t2, the posaible amdidates being the mruimni intr*
duster dispersion and the miaimli inteduster distance, respectively. Let d be the minimal
distance between a pulse and each ciuster center. Then, an incoming pulse is assigned to
an existing duster when d 5 ti, or assigned to a new duster when d 2 t2 , or held to some
stores for subsequent classikation. This algorithm is on-line.
5.4.1 Procedure
The diagram of this algorithm b shown in Fig. 5.4.
1. Initialization:
i) Ch- tao thho lds
tl - possibly maximal intnircluster dispersion,
t* - possibly minimsil inter-cluster distance,
with O < ti < t 2 . These two thresholds can be determineci experimentdy
using ciristing data (with known ground truth).
ii) lhke data vector y1 and assign it to Cl;
![Page 87: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/87.jpg)
m t 1; (m counts the number of clusters)
n t 1; (n counts the number of data vecton)
2. The min process:
Urhile theré is anothex data vector do
n t n + 1; Tdce -or un h m the iaput;
Compute d = dkE[i,...,m] ,]JJu,, - mcon of C&
Find k for which d is minimum;
if d 5 tl ,
wign y, to Ck;
e k i f d z t 2
m + m + l ;
store y,;
if the storage size 2 TI, then goto step 3;
end;
end;
end;
3. The storage pmeess:
If the store is empty,
et0 step 2;
else
n t O;
execute step 2 but now the vector y, is taken fiom the store;
if the store bas changed,
goto $tep 3;
else
![Page 88: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/88.jpg)
mve the current y, to a mt-identification store;
end;
end;
4. The postiidentitication pmcess:
For each àata vector y, in the partidentification store,
assiga it to the cluster which is narmrt.
Suppose that after the on-line process N M-dimensiod data vectom are assigned to K
dustem, resulting in that the kth duster has Nt data e r s (k = 1, - , K). Obviously,
N 1 + N 2 + - + N * = N. ThestonsizeT, i s u s d y s e t to b e a d n u m b e r s o that
the computationai cœt of the storage process ie negligible compared to that of the main
procea9. However, The computational oost of the main proceas msy vary when the same set
of pulses arrives in a dinerent sequence. This mdres the exact complexity d y s b d&dt.
In this subsection, An upper compleJcity bound and an average complexity are analyzed;
Then an example is illustrafed in the end.
Upper Bound
In the main proeess, than are trio dominant operations:
1. For each incoming data vector, its distances to all existing k clustas are cornputcd
and then the minimm distance is chœen.
As discasaed in Step (a) of Section 3.4.2, this operation appmaimately reguires 3Mk
flops. There are at muet k clueters so that the computational cost of this operation
to proce58 aU N data vectors is upper-bounded by
2. After an incoming data vector is assignecl to an d t i q cluster, the cluster center has
to be updated.
![Page 89: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/89.jpg)
(a) The main psocesa
Ca (b) The storage and poet-identification procesaes
Figure 5.4: The diagram of our on-line clustering algorithm using thresholds
![Page 90: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/90.jpg)
As cihuswd in Step (b) of Section 3.42, if the nth member of a duster aniveg, it
requires M(n + 1) flopa to update its center. Note that thm is no ciuster cent=
update as the h t member of a duster arrives. Thus, the computationai coet of this
operation to process aü N data vectors is
where we have used the fact that
Therefore, the upper eomplexity bound of the on-line dusterhg aigorithm describeci in
Section 5.4.1 is
Bal = M N ( ~ K - 0.5) + 0 . 5 ~ ~ ~ . (5.13)
Average Compldty
Let us examine the case when each duster ha3 the same number of memben (i.e.,
N~ =& = -..= Nk = $). Since the computational cost of step 1 in the main process ïs
sensitive ta the sequence in which the pulses arrive , we assume that the members of the
b t cluster arrive Grst, and then those of the second ciuster arrive next, and so on. The
computationd coet for this case is roughly an average complexity of the on-line clustering
algorithm. SimiIarLy aa in the uppa-bound analysis, the computatiod costs of Step 1 and
Step 2 in the main procem hem are given respectively by
Therefore, the average complexity of the on-line clustering algorithm described in Sec
tion 5.4.1 is apprCIOtimELtely
![Page 91: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/91.jpg)
Example: Let us eoiuider the example shown in Section 5.6.2, then are 100 44-dimensioxmi
preplocessed puise vectors (N = 1W, M = 44) and 5 emitten, (K = 5). Substituting the
value8 of N, M and K in Eqs. (5.13) and (5.14), we have the upper bound and the average
complc%.i~ as follmm:
Now consider using the ofMine model-based aigorithm developed in Section 3.4. Given
that Nt = 5 (Nt U the number of ikrations fbr clustcr center convergence) and that K- =
8 (K- is the maximai number of dustem to be d d ) , then according to Eq. (3.70)
we have
Cd = 11.70Mflops
which is the computationa3 met by using the model-based aigorithm for the a h example.
Obviow1y the on-line algorithm is much @ter than the off-line model-bssed algorithm.
Unfortunately, the pedormance of the on-line algorithm is usualiy inferior to that of the
model-based aigorithm as wi l l be s h in Section 5.6.2.
5.5 A On-line Model-Based Clustering Algorit hm
A major admtage of the foUowing new aJgonthm is that no explicit thnsholds are re
quired. The algorithm dynaxnicaily incorporates duster splitting, merging and repuping
operations by using the model-based detection modided h m the clustering algorithm in
Section 3.4.
On-line P rocess
As incoming data vectors are being classifieci into dusters, the sizes of clusters will
continue to grow. A aize increment counter is set for eaeh duster, and a cluster is checked if
its size increment counter réaches a pnset threshold, Tc. In the very beginning, the first t
pulses are assigned to the k t cluster. Checking a cluster or two clusters involves a binary
![Page 92: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/92.jpg)
detection fiiriction dehed as follows:
[K., G1, &] = bâetector (data).
If data cornes h m a single duster, the Wetector funetion returns
either K* = 1 indicating no splitting the dusta;
or K* = 2 indicating splitting the cluster into G1 and &.
If data cornes h m taro dustem, the bâetector function retunis
& h a iC8 = 1 indicating merging the two clusters into one;
or Km = 2 indicathg rregn,uping into the clusfers c.l and &.
This b ' ï detcction- function is easily formed by setting K- = 2 in the clustering
aigorithm d d b e in Section 3.4.
As mentioned previously, a cluster is checked when its size increment counter reaches a
pnset threshoid Tc. If the refurned value K* = 1, then the size increment cornter is resct
to O; EK* = 2, then the duster is split into two new on= Ca and &. Since these two new
dusters need to be merged or regroupecl with other BUsting clusters, the c l m t existing
cluster is found for eada new cluster and both of them are sent to bdetector to check
if they should be merged or regrouped. Therefore, as pulses arrive, the sizes of clusters
increa~e, and the aigorithm conducts ciuster splitting, merging and regrouping operations
appropriately by using the abore scheme.
Poetproceasing
In fht, the sama set of pulses arriving in a Mixent sequence may result in a dinerent
clustering structure since each pulse b c W i e d into a duster according to the previously
arrived pube information during the on-line proces. In other wotds, the pure on-line proces
is c m , though it is very Ésst. Therefore, a post-processing scheme is introduced below
to i m p m the classikation accuracy. mer the on-liae proass, ne regroup ail received
palses aecording to the existing duster centers and then compute the new centers. The
regrouping procescl is repeafed a few times (say Nt times) until the centers converge. Then
each clusta is ch& if it rhould be split , merged or regrouped with another cluster. This
![Page 93: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/93.jpg)
ha i splitting, merging and ngouping proearil is repeated a fen timeo (say Nr times) until
the dutering structure converges. AAa the poetptocessing, the performance of the on-line
dustering is elmgt the same as the &Une dgarithm in Section 3.4 but it b mueh faste.
5.5.1 Procedure
The flw disgam of this aigorithm is shom in Fig. 5.5.
1. initialbation:
i) Define a b i i detection fundion: [K., c.~, Ce] = Wetector (data).).
ii) Tdre data vector yl and assign it to Ci;
m t 1; (m counts the numùer of claritcni)
n t 1; (n counts the number of puise samplcs)
i s c i t O; (increment of the size of 4)
2. The main process:
Whiie there is another sarnple do
begh
n t n + 1; 'hke data vector y, from the input;
Compute d = ~ E ~ l , . . . , m l Ilun - m e ~ n of 411;
Find k for wbich d is minimum;
Assign y, to Ct and increase i s c k by 1;
if isch < Tc, goto Step 2;
else
&ta = CI, check data by the binary detector,
a y = i i s e k t O (met i sck);
goto Step 2;
end
if 4 = 2, then goto Step 3;
![Page 94: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/94.jpg)
end;
end;
3. The ciuster splittingl-upinglmerging process:
mer Step 3, the number of the clusters, m, may either increase by 1, or decrease by
1, or temain the same. The cornter index stiU ranges fiom 1 to m.
m t m + l ;
Cm 4- Cn;
Ck + Cri;
Find Cj whi& b d m t~ CL;
dotu = Ci + Ck, check &tu by the binary detector;
if ki = 1 (indicates merging)
Ci t Cj + 4; i s c j t O;
C k + cm; delete Cm; m t m- 1;
Find Cj which is CI-t to 4;
data = Ci + CI, check &tu by the binary detector;
if ÿ = 1 (indiates merging)
ci t Cj + ck; i a c j t 0;
ck t Cm; i 8 c k t 0;
delete Cm; m c m - 1; end
if = 2 (indicôtes regrouping)
Cj t Czl; a s e j t O;
Cm t Cz2; isc-m e 0;
end
end
if ki = 2 (indicates regr~uping)
Cj + Czl; i~d t O;
![Page 95: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/95.jpg)
ck cca; h G k ~ 0 ;
Fhd Cr which is daiest tû C';
&ta = Ci + Cm, eh& duta by the b i i detcetor;
if = 1 (indicates ma&)
C, t C, + Cm; isc-j t O;
delete Cm; m t m - 1; end
if 6 = 2 (indiate regrouping)
Ci t CS1; b y + O;
Cm 4- Ce; i s c m t 0;
end
end
5.5.2 Computational Complexity
Suppose that &sr the on-line process N M-dimezsional data vecton are assigneci to K
clusters, resdting in that the kth duster has Nk data vectors (k = 1, , K). As rnentioned
in Section 5.4.2, the exact complexity dpis is difFcuit due to the fhct that the compu-
tational coet of on-line ptocess may vary when the same set of pulses d v e s in a dinemet
sequence. In this subsection, An upper complexity bound and an average complexity are
analyzed; Then an example is Uustrated in the end.
Upper Bound
There are three stem in this on-üne clustering algonthm:
1. The h t is the main process.
The main process here is the same as the one considerd in Section 5.4.2. Hence, the
computational cost of this process according to Q. (5.13) is upper-bounded by
![Page 96: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/96.jpg)
(a) The main process
(b) The cluster splitting, merging and regrouping process
Figure 5.5: The diagram of our on-line model-based clustering algorithm
![Page 97: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/97.jpg)
2. The second is the cluster splitting/regmuping/merging proces.
The dominant complexity for the binsy ddector is to partition the given number of
data vectors into tno clusters, i.e., K- = 2. Accordhg to Eq. (3.70) in i n i o n
3.4.2, this computational coat io 7Mn& where n is the numbet of data vectors
sent to the b ' ï detector and Nt is the number of iterations for cluster center
convergence. We need to use the b i i detector timee to check if a cluster should
be split, and such a checking may result in one or two more b ' i deteetions to
pmue duster repouping and merging. Thus, the totai number of times (the binary
detector being used) varies between and 3& Obviously the number of data vectots
sent to the binary detector is at mmt N. H m , the computational cost of the duster
spÜttinp/regmuping/me~@g process b upper-bounded by
3. The thid is the postproceas.
To ~ u p dl data vectoxs according the k cluster centers by Nt times (see Steps (a)
and (b) in Section 3.4.2), the computational eoet is
Nt [MN + ~ M N K ] = M N N ~ ( ~ K + 1). (5.15)
The h a i splitting on k dustem requites wing the binary detector K times so the
computatiod ca t of the h ù splitfing is
Then are O.SK(K - 1) diffe~ént combiitions to send two clusters to the binary
detector for the final regrouping/merging purpoee so the computational cost for the
fiiial regrouping/merging is
![Page 98: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/98.jpg)
Flrthennore, The Bnal splitting/regrouping/merging operation is repeated N, times.
Hcnce, the eomputatiod cœt of the fiad splitting/regrouping/merging operation is
Therefore, the total computational cost of the on-line clustering aigorithm describeci in
Section 5.5.1 ia upper-bounded by
Average Complexity
As in Section 5.42, we clrnmine the case w h m each chuter has the ssme nnmber of
members (Le., Ni = Nz = - = = Nk = f ). Since the computationd cost of the on-line
pmcees is sensitive to the sequence in whkh the pulpes arrive, we w e e that the members
of the h t cluster arrive h t , and then those of the second cluster arrive next, and so
on. The computational cust for thb case is roughly an average complexity of the on-line
clustering aigorithm. Noa we examine the thne steps in t his on-line clustering algorithm
1. The first is the main pmms. The computatiod cost of thia process according to Eq.
(5.14) i appmxixnately
2. The second is the cluster splitüng/regrouping/merging process. We neeà to estimate
the times the K i detector bebg useù, and the number of data vectors involved
each tirne. For simplicity, an average number 1.5$ is used here, and an average size
of data sent to the binary detector is assumed to be #. Hence, the computatiod
cœt for the d u s t a splitting/regrouping/merging process is appmximately
![Page 99: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/99.jpg)
3. The third is the p-process. The computational cost of the post-process is the same
as the one in the uppa-bound d y s i s , i.e., M N N ~ ( ~ K + 1) + ~ M N N ~ N , ~
Thdore, the average complenty of the on-line clustering aigorithm described in -ion
Example: Let us considex the Mme ezample as in Section 5.4.2, th- are 100 44-
dimemsional preprocessed pulse vecton (N = 100, M = 44) and 5 emitters (K = 5). Given
that Tc = 10, Nt = 5 and N, = 2, Subetituting the values of N, M, K, Te, Nt and N, in
Eqs. (5.16) and (5.17), lm have! the upper bound and the average complexity as folows:
As illustrateci in Section 5.4.2, the computational cocrt by using the off-line rnodel-based
algorithm for the same esample is 11.70 mops. Obviously the on-line algorithm is faster
t h the *line aigorithm while its performance is almat the same ze that of the off-line
~~unttzpart as wi l l be shuwn in Section 5.6.2.
5.6 Numerical Experiments on Intra-pulse Data
To illustrate the data pmprocessing techniques and the eEectiveness of the clustering algo-
rithms developed in this chapter, we have camïed out numerous experiments ushg cornputer
simulated data. Ali programs including those for pre-processing and data compression an
written in MATLAB and the simulations are run on a Pentium PC (400MHz).
5.6.1 Pulse Generation
We generate the pulses sccording to the signal rep~e~entations Eq. (Sol), such that
![Page 100: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/100.jpg)
the &tribation of a b l u & axnplitude % b d o m in [0.5,1];
e the distributiori of initial phase tCtn b UJilform in [or, T];
the distribution of time dehy 7,, is d o m in [O, ST];
the àistrïbution of carrier hquency w, is Gaussian and its standard deviation is 10
percent of its nonnalized mean value;
0 the distribution of additive noise v,(t) is Gaussian with zem mean and the standard
deviation about 0.05.
h m a gim set of signature ~~ {sk(t)), we can generate received pulsea using the
abuve parameter distributions.
Fig. 5.6 (a) and (b) show the amplitude and phase of a p u p of 100 puises received by the
detector. Those are h m 5 dSerent emittem each transmitting 20 pulses. The five emitters
are numbered by 1,2,--,5. if a puise is h m the kth emitter (k = l ,2, ,5), then its
emitter index is k. Suppose that K clusters are determined by a clustering algorithm and
that thme K dustem are numbered by 1,2, - , K. if a pulse is assigneci to the kth cluster
(k = 1,2, œ o o , K), then its cluster index is k. By cmetichccking cluster indices against
emitter indices, we can count the classification accuracy in percentage.
Prapmcessing
For the pulses shown in Fig. 5.6 (a) and (b), the pre-processhg techniques of amplitude
normaiization, time alignment and phase sdjustment as described in Section 5.2 are applied
to remm the nuisance parameters. The amplitude and phase of the p r e - p r d pulses are
s h o n in Fig. 5.7 (a) and (b) respectively. Each pulse is represented by 128 time samples.
A Slevel wavelet decomposition ushg sweq Elters is then applied to each of these pre
p r o c d pulses and only the lm kquency ater output samples are retained. Each pulse
is non tepmenteci by 2Zdimensional samples (The number 22 > 12818 is used because of
![Page 101: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/101.jpg)
Figure 5.6: Amplitude and phaae of 100 received puises h m 5 nnlniown emitters, where 2-axis D the inda of data ample points.
Fi- 5.7: Amplitude and phase of the prc+procesd pulses, where z-sxis is the index of data samp1e pointa.
Figure 5.8: Amplitude and phase of the compresseù preprocessed pubes, where x-axis is the index of data sample points.
![Page 102: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/102.jpg)
Figure 5.9: Determiaation of the ninnber of emitta using our (off-line) clusterhg algorithm
Tsble 5.1: Clustering results for Example 1 by the off-line model-based clustering algorithm
>
the transient dect of the filtas). As a result, data cumpression has been achieved. The
Cluster index
1
amplitude and phase of the comp& puises are s h m in Fig. 5.8 (a) and (b) respectively.
Li the fou-, clclustering is based on the compressed data F'urth-ore, to Compare our
Number of puka
20
dustering algorithm and SNOB on a hir bis , Covariance S t ~ c t u r e 4 is assumed.
Exnitter index assignexi to the pulses
~ ~ i r i i i i ~ i i l i i i i l i i i
The off-line model-baad clustering algorithm
The clustering aigorithm deveioped in Section 3.4 is applied to the compresseci data
The evaluation of L(Y, K) for various values of K is plotted in Fig. 5.9. The number of
clusters which is the value of K at which L(Y, K) is minimum is correetly determined to be
5. The association of the pulses using the dustering algorithm is show- in nble 5.1. It can
be obeerved that apart fiom the seven pulses in Emittas 2 and 3, ail the other pulees have
been correctly associated. The classikation acnvacp in this case is 93%. For this example,
![Page 103: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/103.jpg)
Figun 5.10: Determinstion of the number of emitters using the SNOB program
1 Cluster] ~pmai. 1 Emitter index 1
'fàble 5.2: Clustering mults for Example 1 by the SNOB algorithm
indat. 1
the dustering aigorithm taha appmxhate 20 seconds to produce the above mults.
SNOB
The SNOB aigorithm has a b been applied to t h example, and the evaluation for
dinerent values of K is plottd in Fig. 5.10 while the dustering results are shoum in Table
5.2. It is obhierved in 'hble 5.2 that the SNOB aigorithm fails to ideatify all the emittexs
and the si@ from Emittanr 2 and 3 caanot be distinguished. SNOB is written in Fortran
so it b W c d t to compare its s p d with our MATLAB prograra. As dïscusd in Section
3.5, SNOB in priiiciple is more complex than our off-line clustering algorithm.
The on-line algorithm ushg known thresholds
We apply the on-line aigorithm developed in Section 5.4 to the example with ti =
O.W, t z = 0.08, T, = 20. The dustering result is s h m in Tsble 5.3. The number of
Ob pulses 20
-ed to the pulses 11111111111111111111
![Page 104: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/104.jpg)
Table 5.3: Clustering resuits for Example 1 by the on-line dustering algorithm using known threshofds
index 1
clustas is 6 and the clasdication acamcy b 8%. Comparing the resuits of the on-line
with the off-line methods, m h d t h t t b omline renilt is slightly inférior. Hawever,
h m the computational standpoint, the on-line aigorithm based on the tao thresholds is
of p u k s 12
much simpler than the ofl-line counterpsrt. This on-line algorithm takeg less than 1 second
assigned to the pulses 111111111111
to produce the above nsults. R d that the off-line clustering procedure describecl in
Section 3.4 indudes duster splitting and repuping operations. 0bvious1y1 we may i m p m
the performance of this original on-&ne algorithm by introducing cluster splitting, mergbg
and repuping operations appmpriately. Howwer, more thresholds are needed.
The on-line modebôased algorithm
We apply the on-line aigorithm developed in Section 5.5 to the example. The dusterhg
resuit is shuwn in 'hble 5.4. The number of ciuters is 5 and the chmikation accuncy is
93%. The results are the Bame as thœe in Thble 5.1 producecl by the &line model-baeed
algorithm. This on-line algorithm taLes appmcimately 6 seconds to produce the above
results- We also note that, for a given data set, the eomputation cost of the on-line aigorithm
is much lcnuer than thst of the &line. h m Eq. (3.70), ne know that the computational
complexity of the offüna model-based algorithm is appmximately proportional to K&.
The computational cost is reduced signifieaatly by binary partitions K- = 2 involved in
the on-line model-based algorithm. In this sense, we condude that this on-line algorithm is
a fast version of the off-line model-baseà aigorithm.
![Page 105: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/105.jpg)
Table 5.4: Clustering results for Example 1 by the on-line model-based dustering algorithm
Cluster index
1 2 3 4 5
5.6.3 Conclusions
Many other simulation examples have been carried out using diilerent number of d v e d
signais, dinerent number of emittae, and different distributions of random signal pram-
eters. The foIIowing are general observations &am h m the mults of duster validation
Number ofpulses
20 19 21 20 20
and clustering:
Emitter index assigned to the pulses
l t l l l l l l l l l l l ~ l l l l l l 2212223222222222333
2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ' 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
1. The resuits of clustering employing m m p d data h m a 3-stage rynlef$ waveiet
filter bank are in general the same as those employing uncompressed data
2. Judghg h m peiformance, our model-ùaseà off-line dustering algorithm s h m mu&
hîgher Wbility in duster validation than SNOB, while sanificing margiiidly on the
accura~y in clustering.
3. The performance of out model-based on-line clustering algorithm is a lmat the same
as that of the model-based off-line aigorithm but is much &ter.
For O h t i o n 1, it cleems that the original pulse contains redundant information,
so by eompming it, adequate information is still retained. The pafonmnce of the new
off-line clusteziilg aigorithm and SNOB has been compared intensively in Seetion 3.6; The
re~dts there well justifies the second obseFvation. The last observation show that our on-
line model-baseci clustering algorithm is a f i ter version of model-based dustering whiie
retairriog the quaiity of performance.
F'urthermore, it is found that the b a t performance of our clustering algorithm for in-
trapulse analysis is u s d y achieved by using Covariance Stnicture 4 describecl in Section
![Page 106: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/106.jpg)
3.3.4 if the d&dt p d t y weight (A = 1) is used, and that the best performance is usuaüy
achieved by ushg Covariance Structure 2 desaibed in Section 3.3.2 when supervision is
availsble (Le., the penalty weight can be adaptecl).
First, we have developed preprocessing techniques to remove some nuisance paranieters
h m received pulses. Thme indude the a b l u t e amplitude, initial phase, tirne delay and
residual d e r kquency. As a a d t , an have formulated the problem of emitter nnmbu
estimation and pubernitta association as a multivariate Gaussian dustering problem. In
or& to reduce the computationd cost for clustering, a data compression method based
on wavelet decompoaition has a h been included in pre-pmeeasing. The prpprocesahg
techniques are intuitive in nahue and an d e d out so that after preprocessing, the
pulses received hom the same emitter maintain the resemblance to eaeh other, while th-
fiom differe~t emïtters maintain theh distinctive features.
Afta applying the new off-line clustering algorithm developed in Chapter 3 to the clus-
tering problem, we have developed tao on-line clustering algorithms: one is based on known
thresholds while the other maka use of the model-based detection moditied kom the off-line
clustering algorithm. Although the on-line dusterhg based on thresholds is computation-
dly very effective, it is diilicuit to A p t the thresholds as the statistics of received pulsee
changes in t h e . In conhast, the on-line model-based clustering aigorithm does not q u i r e
explkit thresholde; It caa dynamically incorporate dueter splitting, merging and regrouping
operations as the statistics of the received pulses changes.
The performance of our clustering algorithms and SNOB on intrapuIse data have been
reported. The results demonstrate that our new clustering algorithms are very &the
for intrapulse analpis, especidiy the on-iine model-based algorithm. Therefore, the on-line
model-based dustering aigorithm is suitable for near d-tirne implementation, which wil l
be arplored in the next chapter.
![Page 107: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/107.jpg)
Chapter 6
DSP Implement at ion
6.1 Introduction
In the prwious chapters, na have deveioped sevad radar pulse classihtion algorithma
based on Minimum Encoding Inférence, and d d b e d the fiarnewotk of the penalized liLe-
lihood method. Extensive simulations show that the performance of our new algonthms is
promising, e s p d y the on-line model-based algorithm which is well suited for dynamidy
dassifying incoming radar pulses. As a result, ne have imphnented this on-line dustering
algorithm as a core classification module on a TMS320C44 DSP board.
In t h chapter, the DSP implementation for intrapuIse Wysis is desaibed. We do
a simple sndysb of the physid sceaarioe at a radru intercepter and estimate the likely
maximal incoming pulse rate. We then propose a suitable system diagram and investigate
ihe system requirements. The benchmark of the DSP coding of our on-line dustering
algorithm is reporteci.
6.2 P hysical Scenario Analysis
In this section, ne describe some examples and discuss how radar characteristics [57] can
aâect the operation of a typicai rrrdar intercepter. Here and throughout ne assume that the
radar intercepter is passive in all directions.
![Page 108: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/108.jpg)
Figure 6.1: Physical we&o example 1
Figure 6.2: Physical scenario example 2
6.2.1 Probability of Receiving Overlapped Pulses
G i m a radar, let PRF (Puise Repetition hquency) be 1 0 pulses/sec and c be the speed
of light, then the maximum range of the radar is
C R- = 2 x PRF = 150 km.
Let range resolution AR be 150 m, then the pulse width is
So the duty cycle 1
rxPRF=- 1000.
It means that even if a radar intercepter receives pulses h m hro radars with the same
specifications given above, the probability of receiving overlapped pulses is ody 1/1000,
assirming a perfect time synchronization. The iiiustration is shown in Fig. 6.1.
![Page 109: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/109.jpg)
Figure 6.3: Physical Sceiiario example 3
63.2 Receiving Puise Sequence
Given a radar with antenna beamwidth BW, the average incoming pulse rate to the intercept
receiver generated by the radar emifter is
rate = PRF - PRF x B W
360° /BW - 360"
Let a radar emitter with PRF = 1000 pulses/sec and BW = lu, then at the intercept
receiver, rat6 i 3 pulses/sec. If the antenna rotation rate RPM is 30 rpm (rotations p a
minute), we wil i observe 6 incoming pulses h m the intercept receiver in every twesecond
perioà. The illustration Q shown in Fig. 6.2.
6.2.3 Near-Far Phenornenon
The signal pona of an incoming pulse at the intercept nceiver is i n d y proportional
to the square of the distance between the radar and the interceptor. Consider the caae
that one emitter is dose to the interceptor and another is far away, it is possible that at
the interceptor, the incoming pulse amplitude generated by a sidelobe of the close emitter
overwhelms the puke amplitude generated by the mainlobe of the Ear away emitter. This
means that in a given geographic ans, the number of detectable emitters is K f e d by this
![Page 110: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/110.jpg)
Fhquency Band: 3GIh or 9GHz PRF Puise Width (r) Duty Cycle (FPRF)
3.4ûKHz 0.05p 0.17 x IOœ3 1.70KHz 0 . 2 5 ~ 0.43 x lOo3 0.85KHz 0.5 - 1~ 0.43 - 0.85 x lOw3
Antemm Beamwidth (BW): 0.8 - 2.5 degrees Antema Rotation (RPM): typical30 rpm (rotation per minute)
Sîdelobes relatives to main beam: amund -25 dB
Table 6.1: DECCA Groupa 9A and 12A relative motion marine radars
near-far phenornenon. The illustration is ahown in Fig. 6.3.
6.3 Maximal Incorning Pulse Rate
Some important parameters for a set of typicd marine radam (731 are listed in Tsble 6.1.
Suppase th- are a numbei. of ships in the SuIveiULulce areq each equipped with tao
radars. Under normal circumstances, the number of shipg may be mund 20. However, the
ships usually have dinerent types of radars, which csn be identifieci accordhg to inter-puise
information such as d e r fkquency and pulse width. It is very rare that 40 same type
radsrs are operating at the same time in the same area. Neverthdes, we assume for the
worst case that the maximum number of the same type radars present is 40.
Assume that the radar mes are given in Bble 6.1. Then the mnrimni incoming pulse
rate generated by a radar emitter is attained by using Eq. (6.1) when PRF = 3.4 KHz and
BW = 2.5':
rat- = 3400 x 2.51360 - 24 pulses/sec.
The minimai hcoming pulse rate generated by a radar emitter k attaineà by using Eq. (6.1)
when PRF = 0.85 KHz and BW = 0.8":
Therefore, when 40 same type radars are operating at the same time in the same area, the
incoming puise rate in the intercept meiver is at most 40 * 24 = 96û pulsss/sec.
![Page 111: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/111.jpg)
Figure 6.4: The DSP system chgram for intrapulse anaipis
In the fotlowing sections of this chapter, we WU assume the maamal incominp palx
rate at the intercept receiver is 1ûûû pulses/sec.
6.4 System Diagram And Requirements
As describeci in Section 5.5, the on-line clustering algorithm dynamically performs duster
splitting, merghg and regrouping operations as pulses arrive. It is a b suggested that ne
can not determine the pulse-emitter zmociation right away since the clustering structure is
being updated continuously. Here ne assume that the maximd incoming puise rate is 1000
pulees/sec and we output the puhemïtter mmchtion after 20 seconds, i.e., a 20-second
latency is introduced. ThetefiDre, ne need to process up to 20,000 pulses in every Msecond
period. The whole procese is divided into t h independent modules: preprocess, initial
grouping and on-line dustering see the system diagram in Fig. 6.4. Next let us discuss the
computational coet and memory requirement on each module.
The preprocessing desaibed in Section 5.2 comists of four steps: curve rotation, amplitude
nonnaliwLtion, t h e aiignment and data compmasion. Counted by our MATLAB pmgrams,
it repuires 25 Mops to pmproces8 a puise with amund 128 saxnpling points to a 44-
dimensional vector. So 20,000 pulses mquire 500 Mflops. To totally store the 20,000 pra
![Page 112: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/112.jpg)
processed pulses, we need a memory:
20,NMl pulses x 44 floating pointsfpulse x 4 Bytes/floating point i 3.5 MBytes
6.4.2 Initial Grouping
The purpœe of this module is to d u c e the workload of the on-line clustering (Modale 3).
This is achiemd by grouping 20,000 puises hto a certain number (say 512) of groups. Then
in Module 3, we only deal with the mean vectom for the 512 goups, instead of the whole
data set. In the foilmhg, we consider h m to group N M-dimensional vectors into K
P U P *
One simple a p p d is by using the foliowing bafch deme: split the whole data set
into tno groups, and then split caeh goup into another tao groups, and so on. If a group
size is iess than jf = 50, then thh gmup is not split any more. The illustration is s h m
in Fig. 6.5. In the h t stage, N M-dimensional veetom are split into two groups with the
shes being Ni and N2 respectively (Ni + N2 = N). One dective method is the k-means
algorithm which has been ueed in Section 3.4.1. Let Nt be the number of iterations for
cluster center convergence, than the computational cocrt for this stage is roughly 7MNNt
by using Eq. (5.15). At the second stage, esch group is further split into Caro new groups,
and the computatiod cost is
Assume that the average number of splitting stages is L, then the total computational cast
for initial grouping is roughly 7MNNtL.
To split 20,ûûû pulses into 512 groups, we need do in average 9 stage splitting (512 = 2',
i.e., L = 9). Hence, @ven N = 2O,OOO, M = 44, K = 512, Nt = 5, and L = 9, the
computation cost is rouwy
74MN x L = 280 Mflops
To store the mean vectors of the 512 ~ X O U ~ , ne n d a memory: 400 x 44 x 4 - 70 KBytes.
![Page 113: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/113.jpg)
Figure 6.5: The tree structure for initiai grouping
The procedure of the on-line dustexhg algorithm has been descfibed in Section 5.5.1. When
a pulse arrives, it is d g n e d to one of existing dustem accarding to the minimum distance
principle. A ciuster is checked if its size increment counter reaches a preset n u m k Te. After
on-he proces, the final regrouping is mpeated by Nt times and the &ml splitting/merging
is npeated by N+ timea. The cornputafional cat of the on-line aigorithm bas been d y a d
in detail in W i o n 5.5.2. The mrnmrl number of same type radar emittm in a siuveillance
area is assumeci to be 40 (see details in Section 6.3) so that the largest number of clusters
here W around 40. G i m that N = 512, M = 44, K = 40, Tc = 10, Nt = 5 and N+ = 2,
Substituting them in Eq. (5.171, ne have
which is the average complexity for the on-lïne clustering module. The memory for keepiog
the clustering structure is 40 x 44 x 4 = Wyta < 1OKBytes.
6.5 C/DSP Coding of On-line Clustering
In this section, we bridy introduce our C/DSP coding of the model-based on-line clustering
algorithm as a core clustering module on a TMS320C44 DSP board. The tools to cornpiete
this task are
TMS32ûC3x/C4x floatbg-point DSP code generation tools [74-n].
![Page 114: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/114.jpg)
'Pable 6.2: The benchmark of DSP codes of the on-line clustering
Programsize Machum àata size CPU processing fime
(aaxnple 1 in Section 5.6)
0 Code Composer 1781 (ooding and debugging) for C and Assembly h m GO DSP
Corporatiun.
2.5Kuords 448Kw~rd.a
on-line procsiag: 0.2 seconds post-procassing: 0.3 seconds
TMS320W (6ûWop) ddopment board (Dakac) and its SDK (software devdop
ment kits) (79,801 h m Spectnun Signal Proceshg Inc.
The MATLAB programa developed More an redesigned into C pmgram~ which ase
converteci to DSP codes bp TI DSP code géaeration tode [7m.
To eSciently Iode memory for data and program, ne need to study the phpical con-
figuration of the DSP board [79,8û], the local memory (SRAM) pmvided for the C44 is
2M bytes, i.e., 512K no*. We could use the srnaIl memory mode1 because all large ai-
rays in our programs are aliocated at nin-time h m a global pool, or heap (using mslloc).
Therefore, we partition the l d 512K-word memory into taro parts: one with 61K words
is reserved for the small memory model and another with the rest 448K words is proyidd
for the heap. With this memory allocation, our d e s wii l nm without any problem in the
small memory madsl even when 10,000 pulses and each with 40 chta points are p r o c d
(Note: 10,000 x 40 < 448K). In addition, the system staelr is allocated in a 2K atord
i n t d RAM of the C44 chip.
To proce88 the compressed data of Example 1 in Section 5.6. The C44 processing tirne
itsdf takes 0.5 seconds, counted by the profilhg fatufes of the Code Corn- [78]. In
hct, the on-line process takes 0.2 seconds and the post-processing takes the remabhg 0.3
seconds. Thus, there is a tradeoff between the speed and the performance.
As shoam in a b l e 6.2, our on-line clustering module for the TMS32ûC44 development
board is very e8icient in sise and speed. It has the capability to pmeess àata array as large
![Page 115: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/115.jpg)
as 44ûK mrâs. T b on-line clusterhg module ir ready for deplayment on a multi-pmessing
DSP board together with otha modules such as preprocessing.
![Page 116: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/116.jpg)
Chapter 7
Conclusions
In this thesis, we have m v i d Baylesian hfhence and minimum encoding inference indud-
h g WaUaceb minimum message length (MML) and Rissanen's minimum d d p t i o n length
(MDL) for modei selection. It b fbnd thst the MML eoding le@ is more a a w a t e than
the otha two h m the standpoint of quantization. W model selection criteria considered
here consist of tno parts, one is the log-Mœiîhood function which measures the goodness
of fit betftteen the data and the model, and the other is a penalty furiction which measufee
the complexity of the modd. An inference method a b to balance the tradeoff between
gbodness of fit and model mmplexity. As such, a penalty weight for the penaity function
to control the trade-off has been introduced.
Based on minimum encoàing inference, au appropriate measure of codïng length has
been proposed for cluster validation. huthermore, the coding lengths under four dinerent
Gaussian mixture models have been fuily dhved. This provides us with a criterion for
the development of a new clustering aigorithm. Judging h m the perforrm~~~ce cornparison,
out oodiag length m m u e outperfonns the Bayesian Inference Criterion (BK) in clusta
validation since it is not based on the large sample asumption as is BIC. More importantly,
the new dustering algorithm shows much m e r reliabiiity in dwter validation than a d-
known clustering algorithm SNOB, while &cing ody ma~ginaUy on the accuracy in
clustering. Indeed, our dustering algorithm is weU designed to &ectively process high
![Page 117: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/117.jpg)
dimensional data with satisfactory Wotmapce on smali and medium samples.
The etrot perfiormance of our clustering algorithm bas b m evaiuated under ~easonable
assumptions. Tao types of ermm have been adyzed: miss and f i alarme Among four
factors considered hem (the dimension of the data space M, the sample size N, the mixing
portion c and the inter-ciuster distance D), D is the most important fhctor. There is a
critical distance Do d e e d in Eq. (4.18) such that when D > DO, OUI dustering aigorithm
can successfully separate two ciusters a d that when D < Do, the extensive m l a p betwieen
the tro dustem wili cause our algorithm to fail, as like other dustering dgorithms such
as SNOB. Ibthermore, ne have apmined the impact of the penalty d g h t under the
hmework of the penaüzed IWihood methd. It is found that th- is a range of penaity
weights within which the bat performance of our dustering algorithm can be achieyed.
Thdore , it is suggested that with some supervision, we could adjust the penalty weight
to Wher improve the performance of ow ciustering algorithm.
Another important contribution of this thesis is the application of o u . clustering dg*
rithm to intrspulse anaiysis. We have developed the prepmessing techniques to remove
nuisance parameters ficm received pulaea and formulateci the problem of emitter number
detection and pubernitter association as a multivariate clustering problem. In order to
d u c e the computatiod cmt for dutering, a suitable data compression method based on
waveiet demmposition has ab0 pmpged. These pre-pmcessing techiqua, are intuitive in
nature and are d e d out so that after pmproassing, the pulses received h m the ssme
emitter maintain the tesemblance in eafh other, whüe th- from din't emitters retain
th& distinctive fat-.
There are severd m o r s that make the task of dustering a ddienging one: (1) the di-
mension of data vectors ie high; (2) satisfactory pdormance on small samples is desirable;
(3) near mal-time implementation is re~uireù. The model-based clustering algorithm devel-
oped in Chapter 3 well addresses the &st two tactors. Furthemore, it is found that the best
performance of ou. clustering aigorithm for intrapuise analysis is usually a c h i d by using
Covariance Structure 4 (see Section 3.3.4) when no supervieion is a d a b l e (Le., the penaity
![Page 118: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/118.jpg)
weight h 1, the defàdt value), and that the b a t performance is usudly achieveà by using
Covariaace Structure 2 (see Scaion 3.3.2) when supervision is aMilab1e (i.e., the penalty
weight can be adapted). To achieve on-line dustering, that is, to perform classification
dynamidy as puises arrive, we have further d d o p e d two new algorithms, one is based
on h m thrésholds while the otha is basai on a madel-based detection dente. Although
the on-Une aigorithm based on thresholds is computationally very effective, it is àifûcult
to adapt the threshohis as the statistics of receitred pulses changes in time. In contrast,
the on-line model-based aigorithm does not require explicit thresholds and dynamidy
incorporates duster s p l i t t i . marging and regrouping operations as the statistics of the
received pulses changes. Performance of our dustering dgorithms and SNOB on intrapuIse
data have ben reported. It demonstrates that our new c l u s t a g aigorithms (the on-line
model-baseà aigorithm in particular) are vary &&ctive for intrapulse d y s i s due to their
lm computation mts and high performance.
Our model-baseà clustering dgonthms have ben further implemented on a DSP board
for intrapukse d y s i s . Some relevant physicai parameters have been estimated such as the
likely maximai incoming pulse rate. Then a suitable system diagram has been proposed
and its system requhments have been suggestecl. The on-line model-baseà algorithm has
been imp1emented as a con classification module on a TMS320C44 DSP board.
7.1 Future Work
In this thesis, we have developed new model-based clustering algonthms both off-line and
on-liae, and successfully applied them to intrapulse anaiysis. There are several issues worthy
of tUrtha hvestigations in future mi.srch:
1. Applicabiity of other statistieal models to clustering. In Chapter 5, Gaussiaa mixture
models are applied for the clustering probiem in intrapulse d y s i s since the noise
accompsqying a radar puise is ususlly Gaussian. For different applications, other sta-
tiatical modela may be more suitable. For aample, a mixture of u~Sorm àistributions
![Page 119: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/119.jpg)
was exploreà in [3-51 whae obeavations are edge dementa in an Statistid
models ushg von Mises distributions (261 were applied in (221 to h d cituters in data
of several thousand sets of protein dihedrai angles.
2. 0th- promising criteria for cluster d d a t i o e As intrduced in Section 1.2, the
niimber of dustar, in Gaussian clustering is the number of oomponents in a Gaussian
mixture. The determination of the number is formulatecl as a iikeiihood ratio test.
H m , the analpis hr the nuli hypothesis case is very a c u l t s i a a the regulariw
condition does not hold. Ln kt, the asymptotics that justify the p e d i z d likelihood
criteria (BIC, MDL and MML) are the same as th- underlying the likeiihood ratio
test. One differezlt approach is the use of Monte Catlo (bootstrap) tests. A study for
bootstrapping the case of Gaussian IILxfu~es was reporteci in [40]. H a v a , ernpiri-
cally observed rejection rates may not quite match the expected 1evekP under the null
hypothesis, and it would be of interest to investigate the discrepancies involved.
3. Radar pulse classification by using inter-pulse idormation. In the field of applications,
we have f d on using intrapulse information of a colleaion of pulses to identify
the emitters present. Howewr, radar emitter class'ication in practice is also based on
interpulse information. Our model-bad ciustering schemes can be direetly applïed if
the statistic of the interpulse information are provideci. Ruthemiore, it would be of
gmat internt to achieve the mAlrimum integration gain by combining interpuise and
intrapulse information. A simple approsdi is layenxi classifications by using interpulse
and infrapulse information separately. Another approach is to form an integrated data
vector for classification by ushg interpulse and intrapulse information together. If this
is the case, the weighting between the interpulse part and the intrapulsa part should
be examineci carefully.
![Page 120: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/120.jpg)
Appendix A
The Value of S ( N , &)
S(N, ,ô) is the number of dine~ent wap to partition N data vectors into K groups, resulting r
in that each gmip k (k = 1,-, K) bas Nk &ta vectom. Obviously, Nl+N2+4+NR = N.
At the h t stage, we take NI data wctom out of the whole data set, the number of
dinerent combinations is CF; Then at the second stage, an take N2 data vectors out of
the rest of the whole data set, the number of dinerent combnations is c&, and so on.
At the (K - 1) stage, ne! take (Nk-,) data vectors out of the rest, the number of dinerent Nk- i combinations is CN-Ni-...-NK 2. The last gmup is aùeady determined in the end when the . -
h t K - 1 p u p s h m been selected. Hence, the total number of M e r a t combinations is
In taa, the same gmup in a dinerent order construct the same clustering structure. If m,
dusters contain the same number of data vectors, we can swap these duster orda without
cbariging the clustering structure. Let m, be the number of clusters with n data vectors,
n = 1,2, --==--, N, then the number of dinerent partitions is that
S(N, 4) = N! X
1 N1!N2! . . . N*!(K - l)! mi!m! mN! '
![Page 121: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/121.jpg)
Appendix B
We need two theorems to derive it.
Theorem 1: S u p w that ail association vectot â partitions the N thta vect4t8 into K
groups such that we have
and assume that these &n,ups are the samples h m muitivariate Gaussian distributions
with different means but the same covariance matrices, respectivelyl i.e., the k-th group is
a sample h m N(pk, Z). The sample mean f i k and the sample within-group scatter mat&
W k for the k-th group are dehed in Eqs. (3.10) and (3.11) respectively, and the total
scatter mat* W for the whole data set is definecl in Eq. (3.19).
A likelihd ratio criterion ( (25, Page 1651) for testing the hypothesis Ho : pl = CÇ, =
alld
![Page 122: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/122.jpg)
Then
where
E - central Wishart distribution WM(N - K, E).
0 E and B are statistidy independat.
Theorem 2: If A is WM(N, E, f2) where N b a poeitive integer and a(# O) is a M x 1 k e d
vector, then a T ~ a / o T ~ a b &6)* with 6 = a T ~ f h z / a T C a ( [a, Theorem 10.3.61).
Rom Theoran 1, are knon that, unda the assumption of perf'ect sepatation, our situ&
tion is the s p e d case of the above when K = 2. Thus in our case,
We can write
trB = [l O * O W O]B
![Page 123: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/123.jpg)
Plm-ibn 2 &,,=Ni( 1 +N2( * 2 m - k ) 2 , * = l , - - * , M . (B-7)
In addition, &&), - , $ d ( b M ) are mutually independent since E is diagonal. Thdore,
trB .- 22M(6) (B-8)
and
By the same reasoning as the above, it is obvious that
Hence,
Since E and B are statistically independent accoràing to Theorem 1, we have by the
definition of a noncentral F distribution:
where
![Page 124: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/124.jpg)
Appendix C
N ! - [p(l - C)-( l -c) ]N (cN)!((l-C) N ) ! -
The Stirling's formula for the Gamma tunetion ( [12, P a s 701) states that
Thus for large N, ne have
Then by simple mathematical manipulations, we have
Therefore,
![Page 125: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/125.jpg)
Appendix D
Multivariate Normality
In this appendix, we infroduce a set of empiricai distribution function (EDF) statisties,
and describe h m to test multiYBtiate n o d t y based on the EDF statistics. Thm r uw
Monte Car10 simulations to assag Gaueaiaaity of compresged pre -procd pubes ' when
original pulses are generated h m a Gaussiaa distribution.
D.1 EDF Statistics
Suppase a giwn random sample of size N is si, q, , ZN and let ~ ( ~ 1 < ~ ( 1 ) < < z ( ~ )
be the order statistics. Let F ( z ) denote the distribution fuiiction of z, then the empirical
distribution function (EDF) FN (2) ie d h e d by
A statistic measuring the Merence betcneen FN(z) and F ( z ) is d e d an EDF statistic. TO
test a null hypothesis
HO : a random sample 21, q, --, 2~ cornes h m a àistribution F(z),
![Page 126: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/126.jpg)
Tbble D.l: Six EDF statistics
S tatistic D+ D- V
Expression ms~,{3 - ~(n))
=cf&(*) - 9 1 D+ + Do
~+(,m + 0.12 + D-(JP + 0.12 +
V(sprtN + 0.155 + 3) (w2 - # + p) (1.0 + '-O) W (P - * + ,&(1.0 + *)
For ali N 2 5
Statistic Si@cance levei cu 0.25 0.10 0.05
Thieshoold in the upper tail 0.828 1.073 1.224 0.828 1.073 1.224 1.420 1.620 1.747 0.209 0.347 0.461 0.105 0.152 0.187 1.248 1.933 2.492
M d e d statistic
Table D.2: M d e d EDF statistics
an EDF test procedure was presented in [20, Section 4.4):
(a) Put the zn in ascending order, ql) < z(2) < --* < q~).
(b) Calculate z(,) = F(z(~)) , n = l , ** - ,N.
(c) Ch- and dculate an appropriate test sfatistic listed in Table D.1.
(d) Modify the test statistic as in 'Iàble D.2. If the modified statistic ezceeds the thnshold
in the upper taü at given lm1 a, Ho is rejected at siepiscance level a.
In Step (b), by using the probability integral transformation (Pm), z = F(z ) , the new
variable z is UOiformiy distributeci betaeen O and 1 when F ( x ) is the true distribution of
z. Hence, the six EDF statistice [55] in Table D.l are actually designed to test if the new
variable z is h m a d o m distribution between O and 1. ki general, D+ and Do are
paaaful in detecting whether or not the &set tends to be close to O or to 1, respectively;
![Page 127: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/127.jpg)
110
W2 and are pmverfd in detecting a shift of the mean in either direction; V and fl are
poaerful in detecting a ehsnge in variance, either a grouping of z values at one point, or a
division into two groupe near O and I.
D.2 Multivariate Normality Test
The goodness-of-fit assessrnent is to test the nuil hypothesis
Hi : a mdtivariate 58mp1e yl, y2, , y~ cornes h m a multivariate n o d distribution.
Let the dimension of y be M, the sample mean be fi and the sample cOYBTi8I1ce matrix be
Ê, then under the n d hypothesis Hi the values
wiil have appmâmately a X* distribution with M degr- of freedom. As suggested in 120,
Section 9.71 amd [2], instead of directly assessin8 muitivadate normality, we test the followipg
hypothesis:
H : the d u e s zi, q, , ZN corne from a X ' distribution.
Hence, the EDF test procedure described in Section D.l can be directly applied to test H&
and correspondingly to assess 80.
D.3 Gaussianity Test of Compressed Pre-processed Pulses
In this section, we use the multivarbte n o d t y test described in the previous section to
examine the problem encountered in Section 5.3, i.a, to test whether or not Gawianity is
still maintaineci Bfter receiveà pulses are pre-processeci and compressed.
Here we simulate received pulses by using the covariance matrix C = &,j, O = 0.05.
Figs. D.l and D.2 show the r d and imaginary parts of 50 simulated pulses and those of
the compressed p r e - p r o c d pulsss, respectively. In the foilaning, we generate 100 sets of
received pulses when N = 20,50 and 100, nspeetiveiy. To make the test more convincibe,
![Page 128: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/128.jpg)
lgble D.3: Gaussfie test of original (simulated) pulses at significance leve10.05
Original (simuiated) puises
b Xejection rate based on 100 trials
M = 44, N = 20 4 / 1 0 5f100 6/1W 10/100 7/10 4/100
D+ D- V W 2 V Az Rejection rate based on 100 triais
ali the six EDF statistics infroduced in Section D.l are used. Tables D.3 and D.4 list the
EDF test d t s at significance Level0.05 on original (simulated) pulseg and c o m p d pn-
p r o c d pulses, respectively- In both tables, a resdt like 21100 meana that Gamianity
is rejected 2 times out of 100 trials. From ?8blles D.3 and D.4, it can be O- at
significance levei 0.05 that
1. The rejection rates for the comprassed p r e -p rocd puises are slightly higher thsn
thaie of original (simulatecl) pulsea.
2. The njection rates for the compreased preproceased pulses are stül s d e r than
10/100.
The pre-proces and compression procedures presented in Section 5.2 include non-linear
operations, Gaussianity of the compnssed pmpmcessed pulses may not be strictly held,
as justified by Observation 1. On the other hand, the rejection rates for the compresseci
pre-processeci puises are still very small (< 10%) h m Observation 2; Hence, Gaussianity
of the compresaeà p r e - p r o c d puhs can be still assumed for simplicity.
![Page 129: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/129.jpg)
Figure D.l: Red and haghary psrts of 50 simulated puises, where z-axis L the index of data sample points and yaxis is the amplitude.
Figure D.2: Real and imasinary parts of the compressed prtprocessed pulses,where z-axis is the index of data ssmple points and yaxis is the amplitude.
![Page 130: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/130.jpg)
Bibliography
[1] M.R Anderberg, Clwter anaifsis for applic~tionu, ACôdemic Press, 1973.
[2] D.F. Andreftrs, R G d e s i k a n , and J.L. Warnr, Zhwfonnation of multivariate
data, Biome- 27 (1971), 825-840.
[3] J.D. Banfield and A.E. Raftery, Automotcd tnacking of ice j?oe~: A rtatisticol appwnuh,
IEEE 'lhas. on Geoscience And Remote Sensing 19 (1991), 905-911.
[4] , Ice j?oe identifieution in satellite imtzgcs wtng mathematid morpholopg and
clustering about principle cutvcs, Journal of the American Stat is t i d Association 87
(1992), 7-16.
[SI , Skeietd modeling of ice leads, IEEE W. on Geoscience And Remote Sensing
SO (1992), 918-923.
[6] , Mode-hed gawsion and non-gawsian ciwternig, Biometria 49 (1993), 803-
821.
[7] RA. Baxta, Minimum masage length infernia: Theory and applications, PbD. the
sis, Dept. of Cornputer Science, Monash University, Clayton 3168, Austrdia, December
1996.
[8] RA. Bacter and J.J. Oliver, MDL and MML: similurities and di#ennces, Technical
Report 207, Dept. of Computa Science, Monash University, Vic 3168, Australia, 1994.
![Page 131: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/131.jpg)
(91 A. Berdai and B. Gard, Detecting a univutiate nwmol miz tu~ &th two componenta,
Statietics & Decisi0118 14 ( l W ) , 35-51.
[IO] D.M. Boultcn and C.S. W w A pnymrnfw numdcd dussiifcotion, The Computer
Journal 13 (1970), no. 1, 63-69. a@
[ll] , An infosmotion muasure for hiemmhic ciussificution, The Computer Journd
16 (1973), no. 3, 254-261.
[12] N.G.De- Bniÿn, A~ymptotic methoh in analph, North-Holland Publ ishg Company,
1970.
[13] P. Bryant, Large-sample d b /or optimkotion b e d clwtefing methodr, Joarnd of
ciassikation 8 (1991), 3144.
[14] P. Bryant and J.A. Williamson, Aymptotic Maviour of c l a a ~ t i o n maximum likdi-
hood estimates, Biometirka 65 (1978), 273-281.
[15] G. Celeux and G. Govaert, Compu*on of the m a r e and the ciassification motimum
likclihood an cluster andyais, Joanial of Statistid Computation and Simulation 47
(1993), 127-146.
[16] P. Cbeeseman, J. Kelly, M. Sel& W. 'Iâylor, D. Freeman, and J. Stutz, AutoCIuss: A
Bayuian C'lwsifiaation, Proceedings of the Fifth Intemational Confefence on Machine
Leamhg (h Arbor, MX), 1988, pp. 54-64.
[17] P. Cheeaeman, M. Self, J. Kdy, W. 'hylor, D. fieema~, and J. Stutz, Bayesion
clum$ieoton, SeMdh National Coderence on Artiîlcial Intelligence (Saint Paul, MN),
1988, pp. 607411.
[18] J.H. Conway and N.J.A. Sloane, S p h m pockings, luttices and pups, Springer-
Verlag,New York, 1988.
[19] T.M. Cover and J.A. Thomas, E l m m b of informatMn theory, John Wiley & Sons,
1991.
![Page 132: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/132.jpg)
[20] RB. Dagostino and M.A. Stephens, Goodnesa-of-fit tacliniqu~b, M d hkker, hc.,
1986.
[21] 1. Daubechies, Tm lcctutw on wuvdets, SIAM, 1992.
[22] D.L. Dowe, L. Aïlison, T.I. Dix, L. Hunta, C.S. W W e , and T. EdgmT C i d r
clwtering for protein dihedml anglcs by minimum message letasth, In P r o c d g s of
the 1st P d Symposium on Biocomputing (Ha- U.S. A.), 1996, pp. 241-2255.
[24 D. M. Hawkins (Editor), Topics in applied multivariate adysis, ch. 6. Cluster anaipis,
pp. 301351, Cambridge University Press, 1982, pp. 301-351.
[24] L. Engelmrui aad J.A. Hartigan, P m t a g e pointu of a teut for duster#, J. Amer.
Sts t i~ t - ASSOC- 64 (1969), 1647-1648.
[25] K.T. hng and Y.T. Zhang, Genedized multivariate andgais, Jobn Wiley & Sons,
1990.
[26] N.I. Fisher, StatMcui andysis of e i d a r data, Cambridge Universiw Press, 1993.
(271 C. Fîaley, Algorithm for model-based gawsion hieromhicPl dwtering, S U Journal
on Scient& Computing 20 (1998), 270-281.
[28] C. Mey aad A.E. Raftary, H m man9 c1wtersP which clwtering method? - awwers
mu modei-bai dwter anulgd, Cornputer Journal 41 (1998), 57û-588.
[29] , MCLUST: Sofhuum for modcdheù clwter analgsis, Technid Report 342,
Department of Sbtistica, University of Washington, November 1998, to appear in
Journal of Classification.
[30] S. Ganesalingam, Classifiation and màztutt appwnach to clustering via m&muna like-
lil,bbd, Applied Statistics 38 (1989), 455-466.
[31] J.A. Hartigan, CZwten'ng dgdhmd, John U r i & Sons, 1975.
![Page 133: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/133.jpg)
116
[32] , Aqmptotic distn'butim for clwt&ng dten'a, The Armas of Statistics 6
(1978), 117-131.
[33] J.A. Hartigan aiid M.A. Wong, A A-means dustering dgorifhm, Applied Statistics 28
(1979), 100-108.
[34] N.L. J o h n and S. Kotz, Distributions in stotbtia: continuow uniuariute diahibu-
t iow -8, John Wiiey & Sons, 1970.
[35] RE. Kass and A.E. RsRery, Baga Factor#, Journal of the American Statistical As-
ciation 00 (1995), no. 430, 773-795.
[36] S.M. Lewis and A.E. Raftery, Eatimoting &gm factors via posthor sirnuiution with the
lapa-metmpoir utimutor, Tkchnical Report 279, Department of Statisties, University
of Wwhington, 1994.
J. Liu, S.W. Gao, Z.Q. Luo, T.N. Davidson, and J.P.Y. Lee, The minimum descfiptim
length Mterion applieti to emitter number detation and pube classication, Proceeding
of IEEE workshop on sbtistical Si& and Array Processing (Portland, Oregon, USA),
1998.
S. M*t, A thcoq for multi-molution mal decomposition: the tuauelet npnsenta-
t i o ~ =E 'Praris. on PAMI 11 (1989), 674-693.
G.I. McLachh and K. Basford, Miztue models: infemnce and opplicotions to dw-
tering, Marcel Deldcer. New York, 1988.
G.J. Mdaehlan, On kts tapping the likdihood mtio test rtatistic for the number of
componenb in a nonnal mizhrm, Appiied S tatistics 36 (l987), 318-324.
N.R. Mendell, S.J. Finch, and H.C. Thode, Whem O the likelihood mtio test powerful
for d e t d n g two cmnponent nonnul miztun? (the cod tan t ' s f m m ) , Biometrics 49
(IgM), 907-915.
![Page 134: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/134.jpg)
[42] N.R Manda, H.C. Thode, and S.J. Finch, The likeWid mtio test fw the two-
cornponent nomid mizhcm pmblem: p o ~ t and siampe size analysis, Biometrics 47
(1991), 1143-1148.
[a] RJ. Muirhead, A m of multivanate statistid î h ~ ~ r y , John Wiiey & Sons, 1982.
[44 J.J. Oliver and RA. Baxter, M'ML and boyesàanimx hilantio and diflennces, Tech-
n i d Report 206, Dept. of Computer Science, Momh University, Vic 3 168, Australia,
1994.
[45] J.J. Oliver and D.J. Hand, Intmducfion to minimum mcoding infmnce, Technical
Report 205, Dept. of Computer Science, Monash University, Vic 3168, Austraüa, 1994.
[46] J.D. Patrick, A progmm for discnminating beftueen clases, W c a l Report 151,
Dept. of Cornputer Science, Monash Unimity, Vic 3168, Austral& 1991.
[47] B. Pfahniger, Pmct id w u of the minimum dermition lmgth princàple in inductive
lcmning, Ph.D. thesis, The Austrian Rawmh Inetitute for Artificial Intelligence, 1995.
[48] S. Richardson and P.J. Gteen, On Bayesian and@ of miztum with an unknown
numbet of ~omponmtr (with discussion), J. Royal Statistical Soc.,Ser. B 59 (1997),
no. 4, 731-792.
[49] B.D. Ripley. Pattern r~cognition and neuml networks, Cambridge University Press,
1996.
[50] J. Rissanen, Modelhg by ahortest data description, Automatka 14 (lW8), 465-471.
[51] , A universai prior for the tntegers and estimation by MDL, A m . of Statistics
11 (1983), no. 3, 416-431.
(521 , Stochastic complezity, J. R Statist. Soc. B 49 (1987), no. 3, 223-239.
[53] S.M. Roes, Iniroduction to pm&büi@ modcb, fourth ed., Academic Press, Inc., 1989.
![Page 135: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/135.jpg)
[ a ] CL Schwarz, Eatimating tlie dimension of a modei, AMals of Statistics 6 (1978), 461-
464.
[56] D.M. Titterington? S m ment ricgeamh in the andysis of miztum dirtributim, Statib
tics 21 (1990), 619441.
[57] J-C. Too~nay, Rador pnnciplcs for the n ~ n - ~ d i s t , Wadsworth Inc, 1982.
[58] H.L. Van Thes, D e t d o n , estimation and maldation th-, port i, Nen York: W i ,
1968.
[59] M-A. Upd, Montei Cado compuri8on of non-hienmhicol unuupenMed clasaifiers, h h +
ter's thesis, University of Saskatchewan, Saskatchewan, Canada, 1995.
[GO] M.A. Upal a d E.M. Neufeld, Compu~on of U ~ n r p m M c d CI~~sifier~, Ro- of
the BIS Idormation, Statistics and Induction in Science (Singapore), World ScientSc,
20-23 August 1996, pp. 342-353.
[si] CS. WaUaoe, CI(~dsi'cution bg Minimum Mesrage Length infmnce, Advances in Com-
puthg and Infolmation- ICCI 1990 (Niagara EUS) (S.G. Akl et al., eds.), 1990, pp. 72-
81.
[62] C.S. Wallace and D.M. Boulton, An infonnafion meusure for clmsifkation, Computa
Journal11 (1968), no. 2, 195-209.
[63] C.S. Wallace and D.L. Dore, I n t n ~ i c ciussificution by MML - the Snob progmm, P m
ceedings of the 7th Australian Joint Conference on ArtSicial Intelligence (Singapore),
World ScientSc, 1994, pp. 37-44.
[64] CeS. Wallace and P.R Fkeeman, Estimation and infemnce by compact eoding, J. R
Statist. Soc B 49 (1987)' no. 3,240-265.
![Page 136: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/136.jpg)
(651 M. Wax and T. Kailath, Detection of ign& bg infmataon thcontic c r i t h q IEEE
'Rans. on ASSP 39 (1985), no. 2, 387-392.
[66] J.H. Wolf& Pattern clwtenng by mdtivariate mizture a d p i s , Mult idte Be
haviod Raseareh 5 (1970), 329-350.
[67] K.M. Wong and Z.Q. Luo, Emitter nunakr detedion und pube classificution i n mdar
systems, Tech. Report 356, Communications Research Labotary, McMaster University,
Hamilton, Ontario, Canada, 1997.
[68] K.M- Wong, Z.Q. Luo, J. Liu, J.P.Y. Lee, and S. W. Gao, Rodm emitter clwsifiua-
tien wing uitmpJsc &tu, Internationai Journal of Electmnics and Communications,
h m a n y (1999).
[69] J.K. Wu, Neufml network and shulated methods, New York: M. Dekke.r, 1994.
(701 J. Zhang and J.W. Modestino, A modei-fitting uppmch to ciwter valihtion Mth appli-
eution to stocht.utk model-Oosed image segmentation, IEEE ' ï h s . on PAMI 12 (1990),
no. 10, 1009-1017.
[71] Q.T. Zhang, K.M. Wong, P.C. Yip, and J.P. Reilly, S t a t k t i d a n d @ s of the perfor-
mance of i n f m a t i o n tnmtic criteria in the detection of the nunaber of rignob in
army p r p a s h g , IEEE 'ltans. on ASSP 37 (1989), no. 10, 1557-1567.
[72] L.C. Zhao, P.R Ksishdah, aad Z.D. Bai, On detcetMn of the number of signu& in
pwence of white noiire, Journal of mdtivariate analpis 20 (1986), 1-25.
[73] DECCA Ship's Manud: Gmups 9A and 1M Relative Motion Matine Radars, RACAL-
DECCA Marine Radar Limited, 1975.
[74] TMS3&lC3z/C@ code genemtion took getting stcrted guide, Texas Instruments hc.,
1997.
[75] TMS92OC9z/C'z optimizing c cornpilet user's guide, Texas Instruments Inc, 1997.
![Page 137: was - Library and Archives Canada · Acknowledgement I have received both constant encouragement and expert supervision fiom my supervisors Dr. K.M. Wong and Dr. Z.Q. Luo.hm both](https://reader031.vdocuments.net/reader031/viewer/2022011903/5f152892b337b866b857c9fb/html5/thumbnails/137.jpg)
120
[76] TMS$WC$z/C& auemblv ianguuge todr user's guide, Texas Instruments Inc., 1997.
[77] TMS32OC!z user's guide, Texas Instruments hc., 1997.
[78] Code composa user's guide, GO DSP Corporation, 1997.
[79] Dakar F5 euder board technid rrfemnce, Spectrum Signai Processing Inc., 1997.
[80] Dokar F5 c m i e r &uni wer's guide, Speztrum Signal Processing Inc., 1997.