international journal of geographical information...

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [Inst of Geographical Sciences & Natural Resources Research]On: 19 April 2010Access details: Access Details: [subscription number 917206449]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Geographical Information SciencePublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713599799

Windowed nearest neighbour method for mining spatio-temporal clustersin the presence of noiseTao Pei a; Chenghu Zhou a; A-Xing Zhu a; Baolin Li a;Chengzhi Qin a

a State Key Laboratory of Resources and Environmental Information System, Institute of GeographicalSciences and Natural Resources Research, CAS, Beijing, China

Online publication date: 16 April 2010

To cite this Article Pei, Tao , Zhou, Chenghu , Zhu, A-Xing , Li, Baolin andQin, Chengzhi(2010) 'Windowed nearestneighbour method for mining spatio-temporal clusters in the presence of noise', International Journal of GeographicalInformation Science, 24: 6, 925 — 948To link to this Article: DOI: 10.1080/13658810903246155URL: http://dx.doi.org/10.1080/13658810903246155

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

http://www.informaworld.com/smpp/title~content=t713599799

http://dx.doi.org/10.1080/13658810903246155

http://www.informaworld.com/terms-and-conditions-of-access.pdf

Windowed nearest neighbour method for mining spatio-temporalclusters in the presence of noise

Tao Pei, Chenghu Zhou*, A-Xing Zhu, Baolin Li and Chengzhi Qin

State Key Laboratory of Resources and Environmental Information System, Institute of GeographicalSciences and Natural Resources Research, CAS, Beijing, China

(Received 11 May 2009; final version received 10 August 2009)

In a spatio-temporal data set, identifying spatio-temporal clusters is difficult because ofthe coupling of time and space and the interference of noise. Previous methods employeither the window scanning technique or the spatio-temporal distance technique toidentify spatio-temporal clusters. Although easily implemented, they suffer from thesubjectivity in the choice of parameters for classification. In this article, we use thewindowed kth nearest (WKN) distance (the geographic distance between an event and itskth geographical nearest neighbour among those events from which to the event thetemporal distances are no larger than the half of a specified time window width [TWW])to differentiate clusters from noise in spatio-temporal data. The windowed nearestneighbour (WNN) method is composed of four steps. The first is to construct a sequenceof TWW factors, with which the WKN distances of events can be computed at differenttemporal scales. Second, the appropriate values of TWW (i.e. the appropriate temporalscales, at which the number of false positives may reach the lowest value when classify-ing the events) are indicated by the local maximum values of densities of identifiedclustered events, which are calculated over varying TWW by using the expectation-maximization algorithm. Third, the thresholds of the WKN distance for classification arethen derived with the determined TWW. In the fourth step, clustered events identified atthe determined TWWare connected into clusters according to their density connectivityin geographic–temporal space. Results of simulated data and a seismic case study showedthat the WNN method is efficient in identifying spatio-temporal clusters. The novelty ofWNN is that it can not only identify spatio-temporal clusters with arbitrary shapes anddifferent spatio-temporal densities but also significantly reduce the subjectivity in theclassification process.

Keywords: nearest neighbour; DBSCAN; cluster; spatio-temporal; windowed; expectation-maximization

1. Introduction

In a temporal–spatial event set, clusters (features) are referred to as subgroups of events inrestrained spatio-temporal volumeswhose densities are denser than events outside the volumes(background events or noise). The identification of spatio-temporal clusters may help withrevealing the evolving patterns of spatial anomalies or locating the varying areas of temporalanomalies or detecting the time–space-coupled hot spots. Hence, the spatio-temporal cluster-ing method, known as one of the important branches of spatial data mining and knowledgediscovery, has been extensively examined and widely used in epidemiology, crime behaviour

International Journal of Geographical Information ScienceVol. 24, No. 6, June 2010, 925–948

*Corresponding author. Email: [email protected]

ISSN 1365-8816 print/ISSN 1362-3087 online# 2010 Taylor & FrancisDOI: 10.1080/13658810903246155http://www.informaworld.com

Downloaded By: [Inst of Geographical Sciences & Natural Resources Research] At: 03:28 19 April 2010

mailto:[email protected]

http://www.informaworld.com

prediction, seismicity research, and so on (Sabel et al. 2000, Johnson and Bowers 2004, Bastinet al. 2007, Lian et al. 2007, Grubesic and Mack 2008). Because of their broad applicationarea, it is very important to develop highly efficient spatio-temporal clustering methods.

Although very important, two obstacles exist in the identification of spatio-temporalclusters on account of the noise interference and the complexity caused by time–spacecoupling. The first is the determination of the membership of an event in spatio-temporaldata, which means classifying events into feature and noise. The crucial issue in the first isthe choice of the unit for the estimation of spatio-temporal density around an event (eithercell statistics or distance technique was used in recent approaches, which will be reviewedlater), which is used as a standard to classify events. The second is how to group spatial–temporal ‘close’ events into clusters, which depends on the mechanism for connectingfeature events. Although many methods were developed to identify clusters in spatio-temporal event sets, subjectivities in the choice of thresholds (either with cell statistics ordistance techniques) still remain, which may cause different classification results in terms ofcluster number, cluster shape and the number of false positives (i.e. the feature events ornoise misclassified as noise or feature events, respectively).

So far, clustering methods that were used to discover dense regions of events on the basisof the notion of density [namely, density-based method (Han et al. 2001)] are broadlyclassified into two groups. The first is the cell-based method and the second is the distance-based method. The cell-based methods first aggregate events into user-defined grid and thenidentify significantly clustering regions, which are made up of connected and dense cells.The cell-based methods for geographic clustering have been extensively investigated, forexample, STING, CLIQUE and MAFIA (Wang et al. 1997, Agrawal et al. 1998, Nageshet al. 2009). However, these methods can only deal with spatial objects. Another importantapproach is that of the scanning methods, such as the well-known SaTScan method(Kulldorff 1997), which is proposed for identifying the spatial clustering patterns thathave changed with time; it has been widely used in disease surveillance (Kulldorff andNagarwalla 1995, Kulldorff et al. 2005). In the SaTScan method, the scanning window,defined as a circle (with a spatial radius) or cylinder (with a circular geographic base and theheight corresponding to time), is moved in space and time to detect significantly clusteringregions (Kulldorff and Nagarwalla 1995, Kulldorff et al. 2005, Gaudart et al. 2008).Differing from the method we will discuss in this article, the scanning methods intend toidentify truly significant clusters contained in pre-defined regions by excluding ‘falseclusters’ (those are likely to have occurred by chance). As a result, the methods requiresimulated statistics, such as the Kulldorff scan statistics (Dwass 1957, Kulldorff 1997,Mostashari et al. 2003), or prior knowledge of the distribution of the events over the regionof interest with respect to varying baselines (such as the distribution of Bernoulli, Poisson orexponential) (Gangnon 2006, Yan and Clayton 2006). Nevertheless, they are incapable ofproviding precise information of clusters in terms of position and shape. Moreover, thesubjectivity in defining the window and target regions may cause significant differentclustering results. Although the kernel estimation was proposed later as an alternative forthe scanning method (Kelsall and Diggle 1995), it is not in itself a technique for detectingclustering. Instead, it can only generate distribution maps of density for further analysis.

The distance-basedmethods utilize the kth nearest distance, defined as the distance betweenan event and its kth nearest neighbour, to estimate local density instead of using cell-basedstatistics. Based on the relationship between the local density and the kth nearest distance, avariety of clustering models have been constructed for identifying spatial clustering patterns,such as DBSCAN, DENCLUE OPTICS, CHAMELEON and DECODE (Ester et al. 1996,Hinneburg andKeim 1998, Ankerst et al. 1999, Karypis et al. 1999, Pei et al. 2006, 2009). The

926 T. Pei et al.


distance-based methods are less subjective in estimating local density because they do not needto specify the size and shape of the cell. However, the methods mentioned above are limited tospatial data. To extend the distance-based methods to spatio-temporal data, methods based onspatio-temporal nearest distance were developed, such as Knox test (Knox 1964, Williams1984, Kulldorff and Hjalmars 1999) and Jacquez k-nearest neighbour (k-NN) test (Jacquez1996). In the Knox test, critical distances are defined as subjective threshold values that dictatethe spatial and temporal distance where events are deemed ‘close’ in both space and time. Thecounts of the pairs of events (i.e. two ‘close’ events) are then put to statistical test for clustering.Although the critical distances can be determined according to prior knowledge, the subjectivityin specifying the critical temporal and the critical spatial distance may result in significantlydifferent results. Jacquez k-NN test is constructed on a statistic index of spatio-temporal nearestneighbour (STNN) pairs in which the temporal and the spatial distance between the two eventsis less than the kth spatial and the kth temporal nearest distance for both events (Jacquez 1996).The k-NN utilizes the index Jk, which is referred to as the count of STNNpairs and varies with k,to measure the closeness of events in the data set. The k-NN technique has been used indiscovering crime point patterns (Cromwell et al. 1999, Ratcliffe 2005). It has the ability totrack a sequence of identical events through time in an epidemiological context and connect thedetected events to form a ‘chain’ of cases (Jacquez 1996). The chain begins with an index caseand is spread through a contagious process to other cases. Although the detected ‘chain’ mayreflect the sequence of contagious process, it is not a dense cluster (what we intend to discover).To identify spatio-temporal clustering patterns, Zaliapin et al. (2008) introduced the time–space–magnitude distance to extract aftershocks from background earthquakes. In theirresearch, the time–space–magnitude distance between two events is defined as a function ofspatial distance, time interval and magnitude difference between earthquakes. Although thetime–space–magnitude distance merges the space, time and magnitude factor in one concept,the time interval should be defined according to prior knowledge and the separation of featureand noise can only be realized interactively. Wang et al. (2006) and Birant and Kut (2007)proposed the ST-DBSCAN algorithm to identify clusters from spatio-temporal data in thepresence of noise objects. In ST-DBSCAN, clusters are formed by connecting objects whosekth temporal and kth spatial nearest distances are less than the temporal and the spatial thresh-olds, respectively, which are interactively estimated with the whole data. There are two defectsin ST-DBSCAN. The first is that the threshold may be underestimated and consequently lead tothe overestimation of clusters. The second is it cannot avoid interactive trials on the estimationof the spatial and the temporal thresholds.

In summation, existing methods either need to estimate local density in a subjective wayor need to calculate specific statistics for clustering testing, which requires more priorknowledge. In this article, we propose a new approach to spatio-temporal clustering: wind-owed nearest neighbour (WNN) clustering method. TheWNNmethod assumes that clustersand noise belong to different point processes. The one with a higher intensity can be deemedas clusters and the one with a lower intensity can be treated as noise. Our method iscomposed of the following steps. The first is to construct the sequence of temporal windowwidth (TWW) factors, which is generated in the sequence: wi ¼ ð�1Þiþ5

2�2floorði=2Þ

n oði ¼ 1; 2:::NÞ,

where floorðÞ is the function that maps a real number to the next smallest integer. TheTWW can thus be calculated by multiplying the time scope of data by TWW factors. In thesecond step, the spatio-temporal densities of feature are calculated at different TWWs; theappropriate values of TWW for the classification can be located at the local maximum valuesof the densities. In the third step, feature and noise are separated at the determined TWW byusing expectation-maximization (EM) algorithm. The final step is to connect events intoclusters according to their density connectivity in geographic–temporal space.

International Journal of Geographical Information Science 927


This article is structured as follows. Section 2 introduces basic concepts used in theWNN method. In Section 3, the algorithm of WNN is described and the estimation ofparameters is discussed in detail. The algorithm is then evaluated with simulated data inSection 4. A seismic case study is presented in Section 5. Section 6 provides a summary ofthis article as well as directions for future research.

2. Basic concepts

2.1. Windowed kth nearest distance

Recall that the kth nearest distance of event p in the geographic space is defined as thedistance between p and its kth nearest neighbour (Byers and Raftery 1998). In this article, weextend the concept to the temporal–geographic space and introduce the concept of wind-owed kth nearest (WKN) distance.

Definition 1 (spatio-temporal neighbourhood of event piðxi; tiÞ (A�S;�T ðpiÞ): A spatio-temporal cylinder centred at piwith a geographic radius of �S and a TWWof �T (Figure 1).

Definition 2 (WKN neighbour of event piðxi; tiÞ (piþkðxiþk ; tiþkÞ)): The kth geographicnearest neighbour within the spatio-temporal neighbourhood of piðxi; tiÞ (A1;�T ðpiÞ).

Note that the geographic radius of A1;�T ðpiÞ is infinity in Definition 2. We alsocan deduce that for k windowed nearest neighbours of piðxi; tiÞ piþ1ðxiþ1; tiþ1Þ;ðpiþ2ðxiþ2; tiþ2Þ; . . . ; piþkðxiþk ; tiþkÞÞ, tiþj � ti

�� T ðj ¼ 1; 2; . . . kÞ and xiþ1 � xik k� xiþ2 � xik k � � � � � xiþk � xik k.

Definition 3 (WKN distance of event piðxi; tiÞ) (dk;�T ðpiÞ): The geographic distancebetween pi and its WKN neighbour is piþkðxiþk ; tiþkÞ, that is, dk;�T ðpiÞ ¼ xiþk � xik k.

Definition 4 (spatio-temporal core event): A spatio-temporal core event with respect toð�S;�TÞ and MinPts is the event whose neighbourhood A�S;�T ðpiÞ contains at leastMinPts events.

Definition 5 (spatio-temporal density-connectivity): An event p is considered to bespatio-temporal density-connected to event q with respect to ð�S;�TÞ and MinPts ifthere is a collection of events p1, p2, . . . , pn (with p1 = q and pn = p) so thatpi�1 2 A�S;�T ðpiÞ (i = 2, 3, . . . , n) and A�S;�T ðpiÞ (i = 2, 3, . . . , n - 1) must contain atleast MinPts events.

Figure 1. Spatio-temporal neighbourhood of event pi.

928 T. Pei et al.


Figure 2 shows a sequence of spatio-temporal density-connected events. p1 is spatio-temporal density-connected to p6 with respect to ð�S;�TÞ and 3 (MinPts).

2.2. Assumption about spatio-temporal Poisson point process

A point process P can be deemed as a homogeneous Poisson point process if the number ofevents (k) in any unit of S � X with volume |S| follows the distribution below:

fl Sj jðkÞ ¼e�l Sj jðl Sj jÞk

k!(1)

where k is the number of events in Sðk ¼ PðSÞÞ, X is the support domain of P (the region inwhich the point process P is constrained) and l is the intensity of the process, which isdefined as the ratio between the number of events in P and |X| (Byers and Raftery 1998). Thissays that the events of the homogeneous Poisson point process are equally likely to occuranywhere within X and do not interact with each other. Based on the concept of homo-geneous Poisson point process, we give the definition of spatio-temporal homogeneousPoisson point process and �T intensity.

Definition 6 (spatio-temporal homogenous Poisson point process): For a homogeneousPoisson point process P, if the support domain of P is a subset of the geographic–temporalspace, we then denote P as a spatio-temporal homogeneous Poisson point process (for short,we use ST Poisson process hereafter). In this article, we use intensity for a point process anddensity for a cluster.

Definition 7 (�T intensity for a ST Poisson process): �T intensity (l�T) for a ST Poissonprocess is defined as

l�T ¼ �T � l (2)

where l is the intensity of the ST Poisson process. Note that l is a constant for an ST Poissonprocess. Given an event pi in an ST Poisson process, l�T of A�S;�T ðpiÞ can be computed as

Figure 2. Spatio-temporal density connectivity. Rectangles symbolize A�S;�T ðpiÞ (from a horizontalperspective) with respect to �S, �T and 3; arrows symbolize the linkages between events and theircore events; circles indicate the events not located in any A�S;�T ðpiÞ.



k=ðp�S2Þ, where k is the number of events in A�S;�T ðpiÞ, and l of A�S;�T ðpiÞ can becomputed as k=ðp ��S2 ��TÞ or simply as l�T=�T .

Assumption 1 If a spatio-temporal event set (B fpiðxi; tiÞ i ¼ 1; 2; :::Ng) contains clus-ters and noise, then clusters and noise can be treated as two distinctive ST Poisson pointprocesses Pj ðj ¼ 1; 2Þ

� �with different intensities, where xi is the geographic location of the

event pi and ti is the time attribute of the event pi. Clusters are distributed at the higherintensity and noise is distributed at the lower intensity.

Note that here we exclude the situation where the event set includes several clusters withdifferent densities. (In fact, our method can also handle data containing multiple processes[or clusters], which will be discussed in the latter part of this article.) The assumption enablesus to use the intensity (density) to differentiate between clustered events and noise. Thedefinition of ST Poisson point process shows that events in a homogeneous Poisson pointprocess are independently and uniformly distributed over X. As a result, for an ST Poissonprocess Pl piðxi; tiÞf g, the spatial point process (fxig) and the time point process (ftig) can beviewed as two independent homogeneous Poisson point processes, and the geographiclocation (xi) is independent of the time attribute (ti).

2.3. Probability density function of WKN distance

For a given event pi in an ST Poisson process, the probability density function (pdf) of itsWKN distance (d�T ;k) can be acquired through seeking the probability function of including0, 1, 2, . . ., k - 1 events within its spatio-temporal neighbourhood (A�S;�T ðpiÞ):

Pðd�T ;k � xÞ ¼Xk�1m¼0

e�l�Tpx2ðl�Tpx2Þm

m!¼ 1� Fd�T ;k

ðxÞ (3)

where Fd�T ;kðxÞ is the cumulative distribution function of d�T ;k and l�T is the�T intensity of

A�S;�T ðpiÞ. If dk;�T is larger than x, there must be 0 or 1 or 2 . . . k - 1 events withinA�S;�T ðpiÞ and its pdf fd�T ;k

ðxÞ� �

is the derivative of Fd�T ;kðxÞ:

fd�T ;kðxÞ ¼

dFd�T ;kðxÞ

dx¼ e�l�Tpx22ðl�TpÞkx2k�1

ðk � 1Þ! (4)

where l�T and k are the same as those in Equation (3). The pdf can be treated as a mixturepdf of gamma according to the definition of pdf of gamma, that is, Y,�ðk; l�TpÞ, whereY ¼ x2 (Byers and Raftery 1998).

3. Theory of windowed nearest neighbour

3.1. WNN method

The WNN cluster method is based on the concept of the WKN distance and is composed oftwo stages. The first is to separate clustered events (feature process) from noise; the second isto form distinctive clusters from the clustered events. In the first stage, the local spatio-temporal density can be employed to differentiate clustered events from noise based onAssumption 1. So the determination of the threshold of local spatio-temporal density is thekey to the first stage. In the WNN method, we use the WKN distance instead of spatio-temporal density. Once the threshold of the WKN distance (D�T ;k) is determined, that thresh-old and the parameters (i.e. k and �T) associated with it define a cylinder. The density of the

930 T. Pei et al.


cylinder is k=ðpD2�T ;k�TÞ. Thus, D�T ;k can be treated as a replacement of the threshold of

density. As a result, events are classified as feature if their WKN distances are no larger thanD�T ;k while events are classified as noise if their WKN distances are larger thanD�T ;k .D�T ;kcan be estimated by using EM algorithm (readers are referred to Appendix for more details).

In the second stage of WNN, the cylinder defined by D�T ;k, �T and k is then used forconnecting events into clusters by linking the spatio-temporal density-connected events(which was defined in Section 2). Interested readers may refer to Ester et al. (1996), Wanget al. (2006) and Birant and Kut (2007) for details. Here, we name the cylinder as the spatio-temporal density-connectivity unit. Below we first describe the WNN algorithm, and thendiscuss the parameter choice problem.

Algorithm:WNNCluster (Input: Data, k, DeltaT)

WKNDistance = CalculateWKNDistance(Data, DeltaT, k);[DeltaS, Flamda, M] = MixtureDecomByEM(WKNDistance, k);FeatureSet = Data(M.=0.5); // Events are classified as feature if M � 0.5.

Noise = Data(M,0.5); // Events are classified as noise if M , 0.5.ClusterId := 1;for i = 1 to FeatureSet.size() do

Event := FeatureSet.get(i); // Get each event from FeatureSetif Event.Id == UNCLASSIFIED THEN

if ExpandCluster(FeatureSet, Event, ClusterId, DeltaT, DeltaS, k)ClusterId := ClusterId + 1;

end ifend if

end forend algorithm;function CalculateWKNDistance(Data, DeltaT, k): Array

for i=1 to Data.size()Pi= Data.get(i);for j=1 to Data.size()

if j,.iPj= Data.get(j);if Pj.t,Pi.t + DeltaT /2 & Pj.t.Pi.t –DeltaT /2

DataT.Add(Pj);end if

end ifWKNDistance(i) = CalculateDis(Pi, DataT, k);

end forend forDataT.clear();return WKNDistance;

In function WNNCluster(), function CalculateWKNDistance() returns the WKN dis-tances of all events calculated at the specific DeltaT(�T). Function MixtureDecomByEM()estimates the threshold of the WKN distance (SThreshold) by EM algorithm and returns the�T intensity (Flamda) of the feature and the fuzzy membership values (M(i)) of eventsbelonging to feature at the specific DeltaT (for computational details, please refer to theAppendix). Function ExpandCluster() is to form clusters by linking spatio-temporal density-connected events (with respect to DeltaT, DaltaS and k) into clusters.



Nevertheless, two parameters (i.e. �T and k) need to be tuned before determining thethreshold of the WKN distance. To analyse the sensitivity of WNN to the parameters, inSection 3.2 we first use simulated data to illustrate the effect of �Tand then give the solutionregarding how to determine �T. The effect and choice of k are discussed in Section 3.3.

3.2. Determination of �T

Figure 3 displays a simulated spatio-temporal data that contains a ‘cubic’ feature and noise.We then use different values of �T, acquired by multiplying the time scope of data byvarying TWW factors sampled at the sequence wi ¼ ð�1Þiþ5

2�2floorði=2Þ

n oði ¼ 1; 2; . . . ;NÞ in (0 2] (i.e. 2,

3/2, 1, 3/4, 1/2, 3/8, 1/4, . . .), to classify the data. The results are found to be varying with theTWW factor; see Table 1. In other words, �T has a significant influence on the classificationin terms of the number of false positives and cluster number. In the following text, we first tryto explain the results based on the analysis of the relationship between the density ofidentified feature, the number of false positives and �T, and then provide the algorithmfor estimating �T.

3.2.1. Relationship between density of identified feature and �T

The curve of the density of identified feature versus TWW factor (k = 10) is shown inFigure 4 and that of the number of false positives versus TWW factor is shown in Figure 5.Note that the edge correction was made by simulating data with the same density as noise inthe buffered space, whose buffered distances are halves of the scopes of the data in X-, Y- andT-directions, respectively. Even though, as TWW factor decreases to an extremely smallvalue, events in the data may fail to find its WKN neighbour. If events in a data set, whichcannot find their WKN neighbours, are more than 30% of the total count, the value of TWWfactor will be highlighted (namely, symbols are indicated by circles in Figure 4 and there-after). From Table 1 and Figures 4 and 5, we find that (1) there exists one peak (which islocated in the interval of TWW factor between 3/64 and 3/16) on the curve of density of

Figure 3. Simulated spatio-temporal data.

932 T. Pei et al.


Table1.

Resultsof

simulated

dataatdifferentvalues

ofTWW

factor.

Tim

ewindo

wfactor

1/51

23/10

241/25

63/51

21/12

83/25

61/64

3/12

81/32

3/64

1/16

Thresho

ld(k

=10

)Null

474.6

468.4

434.2

397.3

363.7

335.2

281.2

236.5

183.3

156.7

Num

berof

falsepo

sitives

andclusters

k=10

432(0)

409(8)

356(18)

213(19)

113(9)

66(3)

41(1)

31(1)

23(1)

22(1)

21(1)

k=3

192(58)

143(51)

106(31)

95(24)

65(15)

64(12)

44(9)

50(10)

46(8)

40(8)

39(6)

k=20

Null

433(0)

433(0)

433(0)

394(9)

193(9)

108(1)

47(1)

36(1)

29(1)

27(1)

Density

(10-4

)k=10

1.93

3.08

3.38

3.85

4.01

4.34

4.89

5.34

5.65

6.17

6.30

k=3

6.15

5.05

5.68

5.95

6.19

6.85

6.87

6.93

7.05

7.64

7.28

k=20

Nodata

Nodata

1.39

1.93

1.78

2.58

3.28

4.20

4.67

5.14

5.54

Tim

ewindo

wfactor

3/32

1/8

3/16

1/4

3/8

1/2

3/4

13/2

2Thresho

ld(k

=10

)12

8.5

111.0

91.5

80.1

68.0

61.9

56.5

53.8

50.7

49.9

Num

berof

falsepo

sitives

andclusters

k=10

26(1)

29(1)

24(1)

35(1)

34(1)

46(1)

65(2)

83(2)

90(3)

92(3)

k=3

42(7)

53(9)

60(14)

55(14)

62(10)

66(6)

91(10)

109(13)

131(16)

151(19)

k=20

27(1)

30(1)

29(1)

36(1)

42(1)

48(1)

65(1)

81(1)

90(1)

91(1)

Density

(10–

4)

k=10

6.26

6.23

6.14

5.79

5.29

4.63

3.38

2.67

1.87

1.41

k=3

6.94

6.54

6.47

6.23

5.71

4.85

3.58

2.88

2.07

1.58

k=20

5.79

5.75

5.77

5.46

5.05

4.40

3.30

2.64

1.85

1.40

Note:Num

bersin

parenthesesareclusternu

mbers.



Figure 4. Density of identified feature varied with TWW factor (for k = 10, the local maximum valueof density is generated when TWW factor = 1/16; for k = 3, the local maximum value of density isgenerated when TWW factor = 3/64; for k = 20, the local maximum value of density is generated whenTWW factor = 3/32; symbols enclosed with circles indicate results in which more than 30% eventscannot find their WKN neighbours hereafter).

Figure 5. Number of false positives varied with TWW factor and k.

934 T. Pei et al.


identified feature and (2) the local maximum value of the density of identified feature and theminimum number of false positives are both generated as TWW factor = 1/16.

From the analysis, we conclude that the highest density of identified feature mightindicate a small number of false positives, essentially, the value of �T, which producesthe highest density, can be adopted for the classification.

To explain the dependence of the number of false positives on �T, we chose classifica-tions generated as TWW factor = 2, 1/16 and 3/128 (i.e. �T = 20, 5/8 and 15/64) and k = 10.Figure 6 sketches the rectangle feature of Figure 3 and different spatio-temporal density-connectivity units with the heights of �Ts mentioned above. Figure 7 shows the histogramsof d�T ;k of events in Figure 3 and the classifications generated at different �Ts. When �T islonger than the time scope of feature (e.g. Unit A in Figure 6, �T = 20), k windowed nearestneighbours of a given feature event pi inevitably contain noise events. Furthermore, noiseevents over and below a feature cannot be separated from the feature, thereby leading tomore false positives (Figure 7b). This can also be validated by the fitted curve of thehistogram of the WKN distance (Figure 7a), in which the proportion of feature is largerthan the theoretical (the left component) and that of noise is smaller than the theoretical (theright component). Because of the existence of many noise events in the neighbourhoods offeature events, �T intensity (l�T ) of identified feature events is underestimated.Consequently, the spatio-temporal density of identified feature, which equals l�T=�T(recall Definition 7), is lowered (1:41 · 10�4) and the feature is then overestimated interms of event number and cluster number (92 false positives and 3 clusters are produced).

As �T decreases (Unit B in Figure 6, �T = 5/8), fewer noise events are included in kwindowed nearest neighbours of feature events. Those near the centre of feature may evenexclude noise events from their k windowed nearest neighbours. The ‘purified’ k windowednearest neighbours may reduce the probability of events being misclassified, which is alsosupported by the more distinctive mixture histogram in Figure 7c (compared with that in

Figure 6. Spatio-temporal density-connectivity unit with varying�T (dots indicate the feature eventsand circles indicate noise; A, B and C are spatio-temporal density-connectivity units; their TWWfactors are 2, 1/16 and 3/128 respectively).



Figure 7a). Because events in neighbourhoods (Ad�T ;k ;�T ðpiÞ) of a given feature eventbecome purer as �T decreases, the density of identified feature will reach a higher value(6:3 · 10�4). At the same time, false positives are reduced and no false cluster is generated(21 false positives; see Table 1 and Figure 7d).

As �T decreases to a small value (e.g. Unit C in Figure 6, �T = 15/64), the WKNdistance of a feature event is forced to be larger than the theoretical event because of thedifficulty in finding enough nearest neighbours of same process within such a small �T. The

Figure 7. Results of simulated data using WNN: (a) mixture histogram of the WKN distance whenTWW factor = 2; (b) classification when TWW factor = 2 (�T = 20); (c) mixture histogram of theWKN distance when TWW factor = 1/16; (d) classification when TWW factor = 1/16 (�T = 5/8); (e)mixture histogram of the WKN distance when TWW factor = 3/128; (f) classification when TWWfactor = 3/128 (�T = 15/64).

936 T. Pei et al.


histogram of feature (the left component) shows significant right-bias (Figure 7e) comparedwith that in Figure 7c. This will lead to more false positives (31 false positives; Figure 7f).Because of the existence of noise events in the neighbourhood of a given feature event(Ad�T ;k ;�T ðpiÞ), the density of feature is underestimated (5:34 · 10�4; Table 1) and thenumber of false positives increases accordingly (31 false positives).

To sum up, densities calculated at differing �Ts may indicate different numbers of falsepositives. As �T decreases to a value at which most feature events could find theirk windowed nearest feature neighbours, the density of identified feature may reach its(local) maximum value(s), and at the same time the number of false positives may reach alow value (in this case, Unit B in Figure 7). Note that local maximum values can also indicateappropriate values of �Twhen clusters with different densities and predominant dimensionscoexist, which will be discussed in the results of other simulated data sets (see Section 4).

3.2.2. Algorithm for estimating �T

Based on the analysis of the relationship between the density of identified features and �T,we give the algorithm for estimating �T:

Function: [DeltaT] = DetermineDaltaT(Data, k)let TimeWindowWidthFactor = [1/512, 3/1024, 1/256, 3/512, 1/128, 3/256, 1/64, 3/128,1/32, 3/64, 1/16, 3/32, 1/8, 3/16, 1/4, 3/8, 1/2, 3/4, 1, 3/2, 2];for i = 1 to Length(TimeWindowWidthFactor)

TimeWindowWidth = TimeWindowWidthFactor(i)*Data.GetTScope(); //(1)WKNDistance = CalculateWKNDistance(Data, TimeWindowWidth, k);[DeltaS, Flamda, M] = MixtureDecomByEM(WKNDistance, k);Intensity(i) = Flamda / TimeWindowWidth; //(2)

end for[DeltaT] = LocalMaxIntensity(Intensity); //(3)

end algorithm //(4)

Regarding the algorithm, some points should be noted:

(1) TimeWindowWidth is the width of time window; function Data.GetTScope() returnsthe time scope of the spatio-temporal data.

(2) The spatio-temporal intensity (density) of feature is estimated by dividing Flamda(�T intensity) by TimeWindowWidth.

(3) Function LocalMaxIntensity() returns the appropriate value(s) of �T, at which thelocal maximum values of density of feature are generated. In LocalMaxIntensity(),the local maximum values of density are determined in the sequence of densitiesacquired over varying TWW factors.

(4) Because parameters (i.e. M, Flamda and DeltaS) associated with the estimated �Thave been determined in Function DetermineDaltaT(), the same steps inWNNCluster() can be omitted.

3.3. Choice of k

We then examine the influence of k on the classification. Besides k = 10, Figure 4 shows theplot of the density of identified features versus TWW factor when k = 3 and k = 20. Thenumbers of false positives and clusters can be found in Table 1 while the curves of thenumber of false positives are displayed in Figure 5. According to the comparison of the



numbers of false positives between k = 3 and k = 10 (Figure 5 and Table 1), results for k = 3show more false positives than those for k = 10 when TWW factor exceeds 1/64. Moreimportantly, the smallest number of false positives for k = 3 is larger than that for k = 10, bothof which were generated as TWW factor = 1/16. In addition, the number of clusters isoverestimated at all values of TWW factor as k = 3. Differing from those for k = 3, results fork = 20 show more false positives than those for k = 10 almost at all values of TWW factorexcept when it is larger than 1/2 (Table 1 and Figure 5). The smallest number of falsepositives for k = 20 is larger than that for k = 10. Nevertheless, results for k = 20 show wrongcluster number only when TWW factor is less than 1/64. The comparison of results betweenk = 3, k = 10 and k = 20 shows that the most appropriate value for k is 10.

The determination of k for the NN method has been discussed in Pei et al. (2007). Herewe only recapitulate the result that is also applicable to the scenario of WNN. As k is small,the histogram of the WKN distance is not clearly bimodal (Figure 8a), which leads to manyfalse positives being produced. As k is too large, because the WKN distances of events(either noise or feature) near the border between feature and noise may significantly becomesmaller (for noise) or larger (for feature) than those far away from the border, the mixturehistogram of the WKN distance deviates from the theoretical one (Figure 8b). The featurewill be overestimated while the noise will be underestimated. Therefore, the algorithm maynot produce a good result in this case. In conclusion, if one cares about the total number offalse positives we suggest k = 6–12; if one cares only about the number of false positives offeature (i.e. the number of feature events that have been classified as noise), for example, inthe case of predicting the susceptible area of natural disasters, a larger value of k will beappropriate. In this article, we are concernedmore about the number of false positives and setk to a medium large value in the examples of simulated data and the case study.

4. Results of other simulated data

4.1. Classification results of simulated data

To evaluate the WNN method, we simulated five data sets (Figure 9). The first data setconsists of five clusters and noise; the second consists of two ‘L-shaped’ clusters and noise;the third contains two clusters – one is contained in a cuboid and the other is limited in a thinplane extended in XY dimension; the fourth contains two clusters (with the same number ofevents) – one is in a high density, and the other is in a low density and prolongs along the

Figure 8. Histograms generated at different values of k (TWW factor = 1/16): (a) histogram of theWKN distance as k = 3; (b) histogram of the WKN distance as k = 20.

938 T. Pei et al.


Figure 9. Results of simulated data: (a) densities and numbers of false positives of the first data set;(b) classification (TWW factor = 3/64, false positive number = 10); (c) densities and numbers of falsepositives of the second data set; (d) classification (TWW factor = 1/8, false positive number = 30); (e)densities and numbers of false positives of the third data set; (f) classification (TWW factor = 3/8, falsepositive number = 160); (g) classification (TWW factor = 3/64, false positive number = 20); (h)classification (TWW factor = 3/512, false positive number = 170); (i) densities and numbers of falsepositives of the fourth data set; (j) classification (TWW factor = 3/16, false positive number = 55); (k)classification (TWW factor = 3/128, false positive number = 150); (l) densities and numbers of falsepositives of the fifth data set; (m) classification (TWW factor = 1/16, false positive number = 17); (n)densities and numbers of false positives of the fifth data set (identifying clusters in the rest of data inwhich the denser cluster has been taken out); (o) classification (TWW factor = 3/8, false positivenumber = 39); (p) final classification of the fifth data set. In (a), (c), (e), (i), (l) and (n), squares indicatespatio-temporal densities of identified features, dots indicate numbers of false positives; in the rest ofthe panels, noise is indicated by dots and false positives are highlighted by circles.



Figure 9. (Continued)

940 T. Pei et al.


T-axis; the fifth contains two cubic clusters – the small and dense one is located inside the bigand sparse one. Note that the fourth and the fifth data sets contain clusters of differentdensities.

Figure 9 displays classification results generated by WNN. Regarding the first two datasets, we found the densities reach their maximum values when TWW factor = 3/64 and 1/8,respectively (Figure 9a and c). The curves of the number of false positives also show thatwhen TWW factors mentioned above were adopted, the smallest numbers of false positives(12, 33) were produced for the first and the second data set. The classifications, displayed inFigure 9b and d, show that the clusters are clearly revealed.

The density plot of the third data set is shown in Figure 9e; local maximum values can belocated as TWW factor = 3/8, 3/64 and 3/512. Classifications acquired at these values aredisplayed in Figure 9f, g and h, in which the cubic cluster, both clusters and the plane clusterwere extracted in turns. Although the two clusters are in the same density, they showdifferent predominant dimensions. As a result, the one in the cuboid was identified ifTWW factor was large (the reason is that events in the cuboid have shorter WKN distancesthan those in the plane under such a TWW, and they were classified as feature while the restof events were classified as noise); the one in the plane was identified if TWW factor wassmall (the reason is that events in the plane have shorter WKN distances than those in thecuboid under such a small TWW, and the events in the plane were classified as ‘feature’while the rest of events were classified as ‘noise’); both were identified when TWW factorwas set to a moderate value.

Two local maximum values (generated when TWW factor = 3/16 and 3/128) can berecognized from the curve of the density of the fourth data set (Figure 9i). When TWW factor= 3/16, two clusters were identified (Figure 9j); as TWW factor decreased to 3/128, only thedenser cluster was extracted (Figure 9k). The reason why clusters with different densities canbe identified as TWW factor = 3/16 is that events in the cluster of lower density may have thesame length of theWKN distances on average as those in the clusters of higher density undersuch TWW. Hence, when TWW is large enough, noise can be separated from clusters, andclusters with different densities can be identified simultaneously; when TWW is small, onlythe cluster with the higher density can be extracted.

For the fifth data set, the curve of density shows onemaximumvalue (Figure 9l). The initialclassification only identified the denser cluster (with 17 false positives generated) and the othercluster was still hidden in noise (Figure 9m). In the next step, we treat noise and the hiddencluster as a new data set and run our algorithm again. The curve of density is displayed in

Figure 9. (Continued)



Figure 9n. The other cluster was identified as TWW = 3/8 (at which the maximum value ofdensity was generated); the number of false positives was 39 (Figure 9o). The final result,which is the combination of these two classifications, is shown in Figure 9p.

All results show that local maximum values of density of identified features may indicateappropriate values of�T for classification. In particular, those of the third and the fourth datasets show that a large TWW may enhance the capability for discerning clusters extendedalong T-axis (even with a low density) but may neglect those condensed in a narrow timeinterval (even with a high density), whereas a small TWW may have the opposite impact.Results of the fifth data set show that the clusters with different densities can be identifiedstepwise by applying our algorithm several times. To sum up, for clusters with differentdensities and various predominant dimensions, we can try our algorithm with differentTWWs (which is associated with the local maximal values of the density of identifiedfeature) first; if it does not work, we can apply the stepwise WNN strategy.

4.2. Comparison between WNN and ST-DBSCAN

We used the second data set as an example to compare the efficiency on cluster identificationbetween WNN and ST-DBSCAN. Figure 10a shows the spatial k-distance plot of the seconddata set while Figure 10b shows the temporal k-distance plot (k-distance plot displays the kthspatial/temporal nearest distance of each event lined up in an ascending order). In Figure 10aand b, asterisks indicate the threshold of the kth spatial nearest distance and that of the kthtemporal nearest distance (�S and �T), respectively, which are located at the first valleys ofthe plots, as described in Birant and Kut (2007). The ST-DBSCAN was then conducted with

Figure 10. Classification with ST-DBSCAN: (a) spatial thresholds determined by visual trials; (b)temporal thresholds determined by visual trials; (c) classification with thresholds determined by visualtrials (the number of false positive is 547).

942 T. Pei et al.


�S and �T. However, no clusters can be found. Because the k-distance plots are calculated atthe globe scale, the identified �S and �T are too small to discern any clusters. We thenincrease �S and �T, which are indicated by crosses located at valleys in Figure 10a and b,and the classification result is displayed in Figure 10c. Seven clusters were identified and 547events were misclassified. Furthermore, similar results were also found with other data sets ofFigure 9. (As we are limited by article length, we omit the details of the comparison.) As aresult, we have to admit that the interactive trial may fail to produce reasonable classifications.

4.3. Complexity of WNN method

The complexity of the WNNmethod is decided by three factors. The first is the computationof the WKN distance of each event, whose computational complexity is O(N2 + N*k). Thesecond is the decomposition of the mixture pdf of the WKN distance, whose computationalcomplexity is O(N*Ntime*Nwindow). The third is the forming of clusters, whose computa-tional complexity is O(N*log(N)) (Ester et al. 1996). Here, N is the number of events in thedata set, Ntime is the iteration times that the EM algorithm needs to run and Nwindow is thenumber of TWW factors that should be tried in WNN. Please note that Nwindow could beinfluenced by the data size (N). In detail, a large N may increase Nwindow (which means anevent can find its WKN neighbour even at a very small TWW factor if the N events are notconstrained in a limited spatial scope) while a small N may reduce Nwindow (which meansan event may not find its WKN neighbour even at a relatively large TWW factor).Nevertheless, the explicit relation between Nwindow and N cannot be easily determined.Totally, the complexity ofWNN is O(N(N + k +Ntime*Nwindow + log(N))). As a result, N isthe key factor that decides the run time of WNN.

5. A case study: determining spatio-temporal clusters of earthquakes

5.1. Clusters of earthquakes

Clustered earthquakes are usually perceived as foreshocks (if strong earthquakes occur afterthem) or aftershocks (if strong earthquakes occur before them) of strong earthquakes. As aresult, the detection of clustered earthquakes may help to predict strong earthquakes orunderstand the trend and the mechanism of strong earthquakes (Wu et al. 1990, Chen et al.1999, Ripepe et al. 2000).

As we know, clustered earthquakes are difficult to discover because of the interference ofbackground earthquakes. Nevertheless, clustered earthquakes differ from background earth-quakes because clustered earthquakes are not only spatially clustered but also temporallyclustered while background earthquakes are referred to as small earthquakes that release in alow stable rate (time) and intensity (space), which simultaneously occur and overlap with theclustered earthquakes (Wyss and Toya 2000, Pei et al. 2003). In this regard, backgroundearthquakes and clustered earthquakes can be deemed as two ST Poisson processes withdifferent intensities. The separation of these two types of earthquakes can be used for theevaluation of the WNN method.

5.2. Study area and seismic data

The study area is located between 101� and 106� E and 29� and 34� N (southwestern China)and is one of those areas with the most intensive seismicity in China (Figure 11). Twenty-fivedevastating earthquakes (M � 6.0) occurred in this area between 1970 and 2008, including



the Wenchuan earthquake that occurred at 103.0� E, 31.3� N on 12 May 2008, with themagnitude measured as 8.0 (China Seismograph Network Data Management Center 2009).The catalogue data used for the case study are from Feng and Huang (1980, 1989). Theselected earthquakes are from 15 January 1975 to 15 August 1976 and larger than 1.5 (M).Thus, 484 epicentres are obtained altogether.

5.3. Result of detection of clustered earthquakes

TheWNNmethod was then applied to the seismic data by setting k = 8. The curve of densityof clustered earthquakes (Figure 12) displays two platforms (i.e. TWW factor = 1/128–3/256, 1/16–2). Two local maximum values were obtained as TWW factor = 3/32 and 1/128.

Figure 11. Location of study area.

Figure 12. Densities of identified clustered earthquakes versus TWW factor.

944 T. Pei et al.


Three clusters were identified for the first value (Figure 13a). Are the clustered earthquakesaftershocks or foreshocks? The records of strong earthquakes can help to explain the result.According to Zhang (1990), Cluster 1 (with a higher density) occurred around and after theDaguan earthquake (M = 7.1, occurred at 28�060 N, 104�000 E on 11 May 1974) and can betreated as the aftershocks of the Daguan earthquake. In addition to the Daguan earthquake,four strong earthquakes (i.e. the Songpan earthquakes; M = 7.2, occurred around 32�400 N,104�000 E on 16 August 1976) occurred in and after Cluster 2 and Cluster 3 (these two with alower density) (Zhang 1986). In this regard, Cluster 2 and Cluster 3 can be treated as theforeshocks and indicative of the Songpan earthquakes. Interestingly, the foreshocks areshown as two distinctive clusters, which we cannot observe if applying spatial clusteringmethods.

The classification generated when TWW factor = 1/128 is displayed in Figure 13b. Fromthe figure, only aftershocks of the Daguan earthquake (Cluster 1 in Figure 13b) wereidentified, which is similar to the result of the fourth simulated data set. Both classificationsvalidate that when clusters of varied densities are different in predominant dimensions, alarge TWW may help to reveal clusters of different densities whereas a small TWW mayonly help to reveal clustered earthquakes of a higher density

6. Conclusions and future work

TheWNN algorithm was applied to the simulated data and the seismic data in this article andtestified as an efficient algorithm for discovering clustering patterns in spatio-temporal data.The novelty of the algorithm lies in three aspects. First, it can not only identify spatio-temporal clusters with arbitrary shapes but also provide fuzzy membership values of eventsbelonging to features. Second, only one parameter, that is, k, needs to be adjusted in theWNN algorithm, because TWW can be determined by locating the local maximum values ofdensity of identified feature events. As a result, WNN is a more objective process comparedwith other spatio-temporal clustering methods. Third, clusters with different densities andvarious predominant dimensions can be identified at different scales of TWW. The thirdpoint also shows that the WNN algorithm bears analogy to the Fourier transform, in whichthe components of different frequencies can be extracted through filtering under differentthresholds. Although the algorithm was only applied to the seismic catalogue in this article,

Figure 13. Classification of seismic data using the WNN method: (a) classification generated whenTWW factor = 3/32 (strong earthquakes are symbolized by circles; background earthquakes aresymbolized by dots; Cluster 1 is symbolized by crosses; Cluster 2 is symbolized by triangles;Cluster 3 is symbolized by squares); (b) classification generated when TWW factor = 1/128 (strongearthquakes are symbolized by circles; background earthquakes are symbolized by dots; Cluster 1 issymbolized by crosses).



we believe it may extend to other areas with respect to spatio-temporal data, such asinfectious disease cases and crime venues.

In this article, the algorithm assumes that only two ST Poisson processes exist in a dataset, that is, clusters and noise, which means that the algorithm can only deal with twoprocesses simultaneously when TWW is fixed, although the variant of WNN may adapt todata containing more than two Poisson processes. When the number of processes signifi-cantly increases, the complexity and the stability of the algorithm will be severely chal-lenged. In addition, only homogeneous point processes are allowed in WNN;inhomogeneous clusters, such as Gaussian process, could lead WNN to generate morefalse positives or even false clusters. As a result, more sophisticated models, which arecapable of dealing with more complex spatio-temporal data, deserve further research.

Acknowledgements

This study was funded through support from the National Key Basic Research and DevelopmentProgram of China (Project Number: 2006CB701305), a grant from National Natural ScienceFoundation of China (Project Number: 40830529), the Innovation Project of IGSNRR (ProjectNumber: 200905004) and a grant from the State Key Laboratory of Resource and EnvironmentInformation System (Project Number: 088RA400SA).

References

Agrawal, R., et al., 1998. Automatic subspace clustering of high dimensional data for data miningapplications. In: Proceedings of 1998 ACM-SIGMOD international conference on management ofdata, 4-5 June 1998. New York: ACM Press, 94–105.

Ankerst, M., et al., 1999. OPTICS: Ordering points to identify the clustering structure. In: Proceedingsof ACM-SIGMOD’99 international conference on management data, 27-29 May 1999.Philadelphia, PA: ACM Press, 46–60.

Bastin, L., et al., 2007. Spatial aspects of MRSA epidemiology: a case study using stochasticsimulation, kernel estimation and SaTScan. International Journal of Geographical InformationScience, 21, 811–836.

Birant, D. and Kut, A., 2007. ST-DBSCAN: an algorithm for clustering spatial–temporal data. Dataand Knowledge Engineering, 60, 208–221.

Byers, S. and Raftery, A.E., 1998. Nearest-neighbor clutter removal for estimating features in spatialpoint processes. Journal of the American Statistical Association, 93, 577–584.

Chen, Y., Liu, J., and Ge, H.K., 1999. Pattern characteristics of foreshock sequences. Pure and AppliedGeophysics, 155, 395–408.

China Seismograph Network Data Management Center 2009. China Seismograph Network (CSN)Catalog. Available from: http://www.csndmc.ac.cn [Accessed 5 January 2009].

Cromwell, P., Olson, J.N., and Avary, D.A.W., 1999. Decision strategies of residential burglars. In:P. Cromwell, ed. In their own words: criminals on crime (an anthology). Los Angeles: Roxbury,50–56.

Dwass, M., 1957. Modified randomization tests for nonparametric hypotheses. Annals ofMathematical Statistics, 28, 181–187.

Ester, M., et al., 1996. In: E. Simoudis, J.W. Han and U.M. Fayyad, eds. Proceedings of the 2ndinternational conference on knowledge discovery and data mining, 3-5 August. Portland: AAAIPress, 226–231.

Feng, H. and Huang, D.Y., 1980. A catalogue of earthquake in western China (1970–1975, M � 1).Beijing: Seismological Press (in Chinese).

Feng, H. and Huang, D.Y., 1989. Earthquake catalogue in West China (1976–1979, M � 1). Beijing:Seismological Press (in Chinese).

Gangnon, R.E., 2006. Impact of prior choice on local Bayes factors for cluster detection. Statistics inMedicine, 25, 883–895.

Gaudart, J., et al., 2008. Space–time clustering of childhood malaria at the household level: a dynamiccohort in a Mali village. BMC Public Health, 6, 1–13.

946 T. Pei et al.


http://www.csndmc.ac.cn

Grubesic, T.H. and Mack, E.A., 2008. Spatio-temporal interaction of urban crime. Journal ofQuantitative Criminology, 24, 285–306.

Han, J.W., Kamber, M., and Tung, A.K.H., 2001. Spatial clustering methods in data mining, In: H.J.Miller and J.W. Han, eds. Geographic data mining and knowledge discovery. London: Taylor andFrancis, 188–217.

Hinneburg, A. and Keim, D.A., 1998. An efficient approach to clustering in large multimedia databaseswith noise In: Proceedings of the fourth international conference on knowledge discovery and datamining, August. New York, 58–65.

Jacquez, G.M., 1996. A k nearest neighbour test for space–time interaction. Statistics in medicine, 15,1935–1949.

Johnson, S.D. and Bowers, K.J., 2004. The stability of space–time clusters of burglary. British Journalof Criminology, 44, 55–65.

Karypis, G., Han, E.-H., and Kumar, V., 1999. CHAMELEON: a hierarchical clustering algorithmusing dynamic modeling. IEEE Computer, 32, 68–75.

Kelsall, J. and Diggle, P., 1995. Non-parametric estimation of spatial variation in relative risk. Statisticsin Medicine, 14, 2335–2342.

Knox, E.G., 1964. The detection of space–time interactions. Applied Statistics, 13, 25–30.Kulldorff,M., 1997.A spatial scan statistic.Communications in statistics: theorymethods, 26, 1481–1496.Kulldorff, M. and Hjalmars, U., 1999. The Knox method and other tests for space–time interaction.

Biometrics, 55, 544–552.Kulldorff, M. and Nagarwalla, N., 1995. Spatial disease clusters: detection and inference. Statistics in

Medicine, 14, 799–810.Kulldorff, M., et al., 2005. A space–time permutation scan statistic for disease outbreak detection. Plos

Medicine, 2, 216–224.Lian, M., et al., 2007. Using geographic information systems and spatial and space–time scan statistics

for a population-based risk analysis of the 2002 equine West Nile epidemic in six contiguousregions of Texas. International Journal of Health Geographics, 6, doi:10.1186/1476-072X-6-42.

Mostashari, F., et al., 2003. Dead bird clusters as an early warning system for West Nile virus activity.Emerging Infectious Diseases, 9, 641–646.

Nagesh, H., Goil, S., and Choudhary, A., 1999. Mafia: efficient and scalable subspace clustering forvery large data sets. Technical report TR #9906-010, Northwestern University. Available from:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.8684 [Accessed January 2009].

Pei, T., et al., 2003. Multi-scale expression of spatial activity anomalies of earthquakes and itsindicative significance on the space and time attributes of strong earthquakes. Acta SeismologicaSinica, 3, 292–303.

Pei, T., et al., 2006. A new approach to the nearest-neighbour method to discover cluster features inoverlaid spatial point processes. International Journal of Geographical Information Science, 20,153–168.

Pei, T., et al., 2007. Delineation of support domain of feature in the presence of noise. Computers andGeosciences, 33, 952–965.

Pei, T., et al., 2009. DECODE: A new method for discovering clusters of different densities in spatialdata. Data Mining and Knowledge Discovery, 18, 337–369.

Ratcliffe, J.H., 2005. Detecting spatial movement of intra-region crime patterns over time. Journal ofQuantitative Criminology, 21, 103–123.

Ripepe, M., Piccinini, D., and Chiaraluce, L., 2000. Foreshock sequence of September 26th, 1997Umbria–Marche earthquakes. Journal of Seismology, 4, 387–399.

Sabel, C., et al., 2000. Modelling exposure opportunities: estimating relative risk for motor neuronedisease in Finland. Social Science and Medicine, 50, 1121–1137.

Wang, M., Wang, A.P., and Li, A.B., 2006. Mining spatial–temporal clusters from geo-databases.Lecture Notes in Artificial Intelligence, 4093, 263–270.

Wang,W., Yang, J., andMuntz, R., 1997. STING: a statistical information grid approach to spatial datamining. In: Proceeding of the 23rd international conference on very large data bases (VLDB’97),25–29 August 1997. Athens: 186–195.

Williams, G.W., 1984. Time space clustering of disease, In: R.G. Cornell, ed. Statistical methods forcancer studies. New York: Marcel Dekker, 167–227.

Wu, K.T., et al., 1990. Panorama of seismic sequence. Beijing: Beijing University Press (in Chinese).Wyss, M. and Toya, Y., 2000. Is background seismicity produced at a stationary Poissonian rate?

Bulletin of the Seismological Society of America, 90, 1174–1187.



http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.8684

Yan, P. and Clayton, M.K., 2006. A cluster model for space–time disease counts. Statistics in Medicine,25, 867–881.

Zaliapin, I., et al., 2008. Clustering analysis of seismicity and aftershock identification. PhysicalReview Letters, 101, 018501.

Zhang, Z.C., 1986. Earthquake cases in China (1966–1975). Beijing: Seismological Press (inChinese).

Zhang, Z.C., 1990. Earthquake cases in China (1976–1980, M � 1). Beijing: Seismological Press (inChinese).

AppendixThe estimation of the threshold (D�T ;k) for discriminating between features and noise can be dividedinto two stages. The first is to construct the mixture pdf of d�T ;k of two ST Poisson point processes; thesecond is to estimate the parameters of the mixture of d�T ;k using the EM algorithm.

1. Mixture pdf of WKN distanceAccording to Assumption 1, noise and features can be represented by two overlapped ST Poissonprocesses with different intensities, say l1 and l2; the bimodal pdf of dk;�T can be expressed as follows:

d�T ;k,r�1=2ðk; l�T ;1pÞ þ ð1� rÞ�1=2ðk; l�T ;2pÞ (A1)

where r is the proportion coefficient, and l�T ;1 and l�T ;2 are the�T intensities of the two processes. r,l�T ;1 and l�T ;2 are the three parameters of the mixture pdf.

2. Estimation of parameters using EM algorithmThe EM algorithm for estimating the parameters of the mixture pdf can be divided into two major steps,namely, the expectation step (E-step) and the maximization step (M-step). The E-step aims at estimat-ing the expectation of the fuzzy membership value of each event subjecting to feature (Byers andRaftery 1998).

The E-step is

Eð�̂ðtþ1Þi Þ ¼r̂ðtÞfd�T ;k dð�T ;kÞ;i; l̂ðtÞ1

� �

r̂ðtÞfd�T ;kdð�T ;kÞ;i; l̂ðtÞ�T ;1

� �þ ð1� r̂ðtÞÞfdk;�T

dð�T ;kÞ;i; l̂ðtÞ�T ;2

� � (A2)

while the M-step is

l̂ðtþ1Þ�T ;1 ¼kPn

i¼1 �̂ðtþ1Þi

pPn

i¼1 d2ð�T ;kÞ;i�̂

ðtþ1Þi

and l̂ðtþ1Þ�T ;2 ¼kPn

i¼1 ð1� �̂ðtþ1Þi Þ

pPn

i¼1 d2ð�T ;kÞ;ið1� �̂

ðtþ1Þi Þ

with r̂ðtþ1Þ ¼Xni¼1

�̂ðtþ1Þi

n

(A3)

where n is the number of events, t is the iteration times and dð�T ;kÞ;i is the WKN distance of event pi. If

we define the process with l�T ;1 representing the feature, then events with �̂ðtþ1Þi � 0:5 belong to

feature and events with �̂ðtþ1Þi belong to noise. Here, �̂

ðtþ1Þi can be treated as the fuzzymembership value

of event pi belonging to feature.With all parameters of the mixture pdf of the WKN distance estimated, the threshold (D�T ;k) for

discriminating between feature and noise can be computed with the equation below:

D�T ;k ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilnð1�rr Þ þ k ln

l�T ;2

l�T ;1

pðl�T ;2 � l�T ;1Þ

vuut(A4)

948 T. Pei et al.


international journal of geographical information...

Documents