allan tucker- birkbeck college stephen swift- brunel university nigel martin- birkbeck college

17
1 Time Series Variables: Time Series Variables: Applications to Chemical Applications to Chemical Process and Visual Field Process and Visual Field Data Data Allan Tucker Allan Tucker - - Birkbeck College Birkbeck College Stephen Swift Stephen Swift - Brunel - Brunel University University Nigel Martin Nigel Martin - -

Upload: ksena

Post on 23-Jan-2016

281 views

Category:

Documents


0 download

DESCRIPTION

Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data. Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College Xiaohui Liu- Brunel University. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

1

Grouping Multivariate Time Grouping Multivariate Time Series Variables: Applications Series Variables: Applications

to Chemical Process and to Chemical Process and Visual Field DataVisual Field Data

Allan TuckerAllan Tucker - Birkbeck - Birkbeck CollegeCollege

Stephen SwiftStephen Swift - Brunel - Brunel UniversityUniversity

Nigel MartinNigel Martin - Birkbeck - Birkbeck CollegeCollege

Xiaohui LiuXiaohui Liu - Brunel - Brunel UniversityUniversity

Page 2: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

2

IntroductionIntroduction

Present a methodology to group Present a methodology to group Multivariate Time Series (MTS) variablesMultivariate Time Series (MTS) variables

MTS is a series of observations recorded MTS is a series of observations recorded over timeover time

Test on two real-world applicationsTest on two real-world applications Grouping - partitioning a set of objects Grouping - partitioning a set of objects

into a number of mutually exclusive into a number of mutually exclusive subsetssubsets

Many, if not all, are NP-HardMany, if not all, are NP-Hard

Page 3: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

3

MTS ExampleMTS Example

0

5

10

15

20

25

1 101 201 301 401 501 601 701 801 901

Time

Magnitude

0

5

10

15

20

25

30

35

40

45

Page 4: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

4

Grouping MTS - Grouping MTS - IntroductionIntroduction

Desirable to model MTS as a group Desirable to model MTS as a group of several smaller dimensional MTSof several smaller dimensional MTS

Decompose MTS into several Decompose MTS into several smaller dimensional MTS based on smaller dimensional MTS based on dependencies in datadependencies in data

Large number of dependencies Large number of dependencies because one variable may affect because one variable may affect another after a certain another after a certain time lagtime lag

Page 5: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

5

Grouping MTS - Grouping MTS - MethodologyMethodology

One High Dimensional

MTS (X)

1. Correlation Search (EP)

2. GroupingAlgorithm (GGA)

Several Lower Dimensional

MTS

QQ

(xa, xb, lag)(xc, xd, lag)

...(xe, xf, lag)

1122......QQlenlen

GG{{0,3}{1,4,5}

{2}

Page 6: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

6

Correlation SearchCorrelation Search

Spearman’s Rank Correlation usedSpearman’s Rank Correlation used Entire Search Space is too largeEntire Search Space is too large Invalid Triples:Invalid Triples:

• AutocorrelationsAutocorrelations• duplicates irrespective of direction where duplicates irrespective of direction where

laglag = 0 e.g. = 0 e.g. (x(xi i ,x,xj j ,0),0) and and (x(xj j ,x,xi i ,0) ,0)

Evolutionary Programming approach Evolutionary Programming approach found to be the most efficientfound to be the most efficient

Page 7: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

7

Grouping Genetic Grouping Genetic AlgorithmAlgorithm- Representation and - Representation and OperatorsOperators

Previously compared and contrasted Previously compared and contrasted different GA representations and different GA representations and operatorsoperators

Falkenauer’s Crossover & Mutation Falkenauer’s Crossover & Mutation ensure Schema Theory holds for ensure Schema Theory holds for grouping problemsgrouping problems

0 3 4 1 2 6 5 7

Group 0Group 0 Group 1Group 1 Group 2Group 2

Chromosome: 0 1 1 0 0 2 1 2 : 0 1 2Chromosome: 0 1 1 0 0 2 1 2 : 0 1 2

Page 8: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

8

GroupingGrouping- The Grouping Metric - The Grouping Metric PropertiesProperties

If If QQ is empty, then fitness maximised when is empty, then fitness maximised when each variable is in a separate groupeach variable is in a separate group

If If Q Q contains all pairings of variables (the contains all pairings of variables (the entire search space), then fitness entire search space), then fitness maximised when all variables in the same maximised when all variables in the same groupgroup

If data is from mixed set of MTS, fitness If data is from mixed set of MTS, fitness maximised when variables in the same maximised when variables in the same group have as many correlations as possible group have as many correlations as possible in in QQ and variables in different groups have and variables in different groups have as few correlations as possible in as few correlations as possible in QQ

Page 9: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

9

Oil Refinery DataOil Refinery Data

Oil Refinery Process in ScotlandOil Refinery Process in Scotland Data recorded every minuteData recorded every minute Hundreds of variables Hundreds of variables Years of data available on repositoryYears of data available on repository Selected 50 interrelated variables over Selected 50 interrelated variables over

10000 time points10000 time points Large Time Lags (up to 120 minutes Large Time Lags (up to 120 minutes

between some variables)between some variables)

Page 10: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

10

Visual Field DataVisual Field Data

The interval between testsThe interval between testsis about 6 monthsis about 6 months

Typically, 76 pointsTypically, 76 pointsare measuredare measured

The number of tests canThe number of tests canrange between 10 and 44range between 10 and 44

B

Nerve Fibre BundleNerve Fibre Bundle(Right Eye)(Right Eye)

Usual Position of Usual Position of Blind Spot (Right Eye)Blind Spot (Right Eye)

X

Values Range BetweenValues Range Between60 =very good, 0 = blind60 =very good, 0 = blind

5 6 6 65 5

56 6 7

5 5 54 4 4

5 6 7 73 2

5

2 4 6 7 84 3 3 2 2 1 1 B 8 81314141515 1 1 B 9 9131313141515131110 9

1212121212111010121212111110

12111111

Page 11: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

11

Oil Refinery Data - Results Oil Refinery Data - Results (1)(1)

Very rapid generation of Groups Very rapid generation of Groups (seconds)(seconds)

3 major groups discovered, 2 relating to 3 major groups discovered, 2 relating to the upper and lower trays of the columnthe upper and lower trays of the column

Most of the single variables appear Most of the single variables appear noisynoisy

Used as a method for pre-processing Used as a method for pre-processing data before model building where time data before model building where time is shortis short

Page 12: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

12

Oil Refinery Data - Results Oil Refinery Data - Results (2)(2)

A 2 ABSORB REFLUX TRAY-1 H 28 GAS FLOW TO ABSORB

B 17 ABSORB TAIL-GAS H2 CHROM H 30 F8 I/STAGE DRUM LEVELB 27 M/FRACT TOP REFLUX H 33 ABSORB SPONGE OIL TRAY11

C 22 DE-PROP FEED H 34 M/F TOP REFLUX PRESS CTRL

D 25 WASH WATER H 35 DEBUT DIF PRESS TRAY1/19

E 32 J17-COMP SUCTN. PRESSURE H 38 J17-COMP SPEED

F 40 ABSRB STRIPPER BOTTOM H 41 C11/3 INLET

G 4 ABSORB TAIL-GAS H 42 J17 SUCTN.G 24 C3/C4 EX CDU3 H 43 J17 I/STAGEG 36 AUTO/MAN STN TO GAS MAIN H 44 J17 DISCH

G 37 AUTO/MAN STN TO GIRBOTOL I 7 ABSORB PRESSURE CONTROL

H 0 FRESH FEED A-PASS I 11 ABSORB STRIPPER O/HDSH 1 FRESH FEED B-PASS I 13 ABSRB STRIP RBOIL OUTLETH 3 DEBUT FEED EX ABSORB I 14 E4 OVERHEADS - C3H 5 ABSORB REFLUX TO TRAY-13 I 15 ABSORB TAIL-GAS PCT C3H 6 ABSORB STRIPR WATER LVL I 16 ABSORB T/GAS METH CHROMH 8 REACTOR INLET A I 19 ABSORB. H2 METHANE RATIOH 9 REACTOR INLET B I 29 ABSORB BASE LEVELH 10 SPONGE OIL I 31 ABS/STRP TRAY-10H 12 ABSRB LEAN-OIL TO TRAY11 I 39 ABSORB STRIPPER TRAY-6H 18 DEBUT O/HDS PCT C2 I 45 M/FRACT TOP REFLUX D/OFFH 20 DEBUT OVERHEADS - C2 I 46 M/FRACT TOP TO C06H 21 F8 H/CARBON TO ABSORB I 47 ABSORB STRIPPER FEEDH 23 PROPENE PRODUCT EX J102 I 48 ABSORB STRIPPER TRAY-36H 26 REFRIDGE A201 TOTAL FEED I 49 ABSRB STRIP RBOIL OUTLET

Page 13: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

13

Visual Field Data - Results Visual Field Data - Results (1)(1)- Patient Group - Patient Group ComparisonComparison

Patients are ordered Patients are ordered on Average on Average SensitivitySensitivity

Patient 1 - lowest and Patient 1 - lowest and Patient 82 - the Patient 82 - the highesthighest

Graph goes from light Graph goes from light (BRHC) to dark (TLHC)(BRHC) to dark (TLHC)

Page 14: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

14

Visual Field Data - Results Visual Field Data - Results (2)(2)

High Sensitivity implies similar groupsHigh Sensitivity implies similar groups• Small groups in generalSmall groups in general• Points in the eye will be associated with Points in the eye will be associated with

similar nerve fibre bundlessimilar nerve fibre bundles Low Sensitivity implies dissimilar Low Sensitivity implies dissimilar

groupsgroups• Large groups in generalLarge groups in general• Different areas of the visual field may be Different areas of the visual field may be

deterioratingdeteriorating

Page 15: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

15

ConclusionsConclusions

Decomposing Large, High-Dimensional Decomposing Large, High-Dimensional MTS is a challenging oneMTS is a challenging one

Proposed methodology very encouragingProposed methodology very encouraging Oil Refinery Data : 3 relatively Oil Refinery Data : 3 relatively

independent sub-systems rapidly independent sub-systems rapidly identifiedidentified

Visual Field Data : Discovered groups Visual Field Data : Discovered groups offer ideal starting point for modelling as offer ideal starting point for modelling as a VAR processa VAR process

Page 16: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

16

Future WorkFuture Work

Experimenting with new datasets Experimenting with new datasets • Gene Expression DataGene Expression Data• EEG DataEEG Data

Determining the ideal Parameters Determining the ideal Parameters • e.g. e.g. QQlen len is very influential on final is very influential on final

groupingsgroupings Combining the two stages - Combining the two stages -

correlation search and grouping into correlation search and grouping into one incremental processone incremental process

Page 17: Allan Tucker- Birkbeck College Stephen Swift- Brunel University Nigel Martin- Birkbeck College

17

AcknowledgementsAcknowledgements

Engineering and Physical Sciences Engineering and Physical Sciences Research Council, UKResearch Council, UK

Moorfields Eye Hospital, UKMoorfields Eye Hospital, UK Honeywell Technology Centre, USAHoneywell Technology Centre, USA Honeywell Hi-Spec Solutions, UKHoneywell Hi-Spec Solutions, UK BP-Amoco, UKBP-Amoco, UK