9/15/2008 ctbto data mining/data fusion workshop 1 spatiotemporal stream mining applied to seismic+...
Post on 16-Dec-2015
219 Views
Preview:
TRANSCRIPT
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 11
Spatiotemporal Stream Spatiotemporal Stream Mining Applied to Mining Applied to
Seismic+ DataSeismic+ Data
Margaret H. DunhamMargaret H. DunhamCSE DepartmentCSE Department
Southern Methodist UniversitySouthern Methodist UniversityDallas, Texas 75275 USADallas, Texas 75275 USA
mhd@engr.smu.edu
OutlineOutline
CTBTO Data CTBTO Data CTBTO Modeling RequirementsCTBTO Modeling Requirements EMMEMM
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 22
CTBTO DataCTBTO Data
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 33
As a Data Miner I must first understand As a Data Miner I must first understand your DATAyour DATA
•Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide•Spatial (source and sensor)•Temporal•STREAM Data
From Sensors to StreamsFrom Sensors to Streams
Stream Data - Data captured and sent by a set Stream Data - Data captured and sent by a set of sensorsof sensors
Real-time sequence of encoded signals which Real-time sequence of encoded signals which contain desired information. contain desired information.
Continuous, ordered (implicitly by arrival time Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic or explicitly by timestamp or by geographic coordinates) sequence of items coordinates) sequence of items
Stream data is infinite - the data keeps coming. Stream data is infinite - the data keeps coming.
11/26/07 – IRADSN’0711/26/07 – IRADSN’0744
CTBTO & Data MiningCTBTO & Data Mining
Data Mining techniques must be Data Mining techniques must be defined based on your data and defined based on your data and applicationsapplications
Can’t use predefined fixed models Can’t use predefined fixed models and prediction/classification and prediction/classification techniques.techniques.
Must not redo massive amounts of Must not redo massive amounts of algorithms already created.algorithms already created.
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 55
CTBTO + DM RequirementsCTBTO + DM Requirements• Model:Model:
Handle different data types (seismic, hydroacoustic, Handle different data types (seismic, hydroacoustic, etc.)etc.)
Spatial + Temporal (Spatiotemporal)Spatial + Temporal (Spatiotemporal) HierarchicalHierarchical ScalableScalable OnlineOnline DynamicDynamic
• Anomaly Detection:Anomaly Detection: Not just specific wave type or data valuesNot just specific wave type or data values Relationships between arrival of waves/dataRelationships between arrival of waves/data Combined values of data from all sensorsCombined values of data from all sensors
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 66
EMM (Extensible Markov Model)EMM (Extensible Markov Model)
Time Varying Discrete First Order Markov ModelTime Varying Discrete First Order Markov Model Nodes are clusters of real world states.Nodes are clusters of real world states. Overlap of learning and validation phasesOverlap of learning and validation phases Learning:Learning:
• Transition probabilities between nodesTransition probabilities between nodes• Node labels (centroid or medoid of cluster)Node labels (centroid or medoid of cluster)• Nodes are added and removed as data arrivesNodes are added and removed as data arrives
Applications: prediction, anomaly detectionApplications: prediction, anomaly detection
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop77
Research ObjectivesResearch Objectives Apply proven spatiotemporal modeling Apply proven spatiotemporal modeling
technique to seismic datatechnique to seismic data Construct EMM to model sensor dataConstruct EMM to model sensor data
• Local EMM at location or areaLocal EMM at location or area• Hierarchical EMM to summarize lower level modelsHierarchical EMM to summarize lower level models• Represent all data in one vector of valuesRepresent all data in one vector of values• EMM learns normal behaviorEMM learns normal behavior
Develop new similarity metrics to include all sensor Develop new similarity metrics to include all sensor data types (Fusion)data types (Fusion)
Apply anomaly detection algorithmsApply anomaly detection algorithms
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 88
EMM Creation/LearningEMM Creation/Learning
9/15/20089/15/2008 99
<18,10,3,3,1,0,0><18,10,3,3,1,0,0>
<17,10,2,3,1,0,0><17,10,2,3,1,0,0>
<16,9,2,3,1,0,0><16,9,2,3,1,0,0>
<14,8,2,3,1,0,0><14,8,2,3,1,0,0>
<14,8,2,3,0,0,0><14,8,2,3,0,0,0>
<18,10,3,3,1,1,0.><18,10,3,3,1,1,0.>
Input Data RepresentationInput Data Representation
Vector of sensor values (numeric) at Vector of sensor values (numeric) at precise time points or aggregated precise time points or aggregated over time intervals.over time intervals.
Need not come from same sensor Need not come from same sensor types.types.
Similarity/distance between vectors Similarity/distance between vectors used to determine creation of new used to determine creation of new nodes in EMM.nodes in EMM.
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1010
11/3/0411/3/04 1111
Anomaly Detection with EMMAnomaly Detection with EMMObjectiveObjective: Detect rare (unusual, : Detect rare (unusual, surprising) eventssurprising) eventsAdvantages:Advantages:
•Dynamically learns what is Dynamically learns what is normalnormal•Based on this learning, can Based on this learning, can predict what is not normalpredict what is not normal•Do not have to a priori indicate Do not have to a priori indicate normal behaviornormal behavior
Applications:Applications:•Network IntrusionNetwork Intrusion•Data: IP traffic data, Automobile Data: IP traffic data, Automobile traffic datatraffic data
Seismic:Seismic:•Unusual Seismic EventsUnusual Seismic Events•Automatically Filter out normal Automatically Filter out normal eventsevents
Weekdays Weekend
Minnesota DOT Traffic Data
Detected unusual weekend traffic pattern
EMM with Seismic DataEMM with Seismic Data Input – Wave arrivals (all or one per sensor)Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic dataIdentify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors Wave form would first have to be converted into a series of vectors
representing the activity at various points in time.representing the activity at various points in time. Initial Testing with RDG dataInitial Testing with RDG data Use amplitude, period, and wave typeUse amplitude, period, and wave type
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1212
New Distance MeasureNew Distance Measure Data = <amplitude, period, wave type>Data = <amplitude, period, wave type> Different wave type = 100% differenceDifferent wave type = 100% difference For events of same wave type:For events of same wave type:
• 50% weight given to the difference in amplitude.50% weight given to the difference in amplitude.• 50% weight given to the difference in period.50% weight given to the difference in period.
If the distance is greater than the threshold, a If the distance is greater than the threshold, a state change is required.state change is required.
amplitude =amplitude =
| amplitude| amplitudenewnew – amplitude – amplitudeaverageaverage | / amplitude | / amplitudeaverageaverage
period = period =
| period| periodnewnew – period – periodaverageaverage | / period | / periodaverage average
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1313
EMM with Seismic DataEMM with Seismic Data
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1414
States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.
Preliminary TestingPreliminary Testing
RDG data February 1, 1981 – 6 RDG data February 1, 1981 – 6 earthquakesearthquakes
Find transition times close to known Find transition times close to known earthquakesearthquakes
9 total nodes9 total nodes 652 total transitions652 total transitions Found all quakesFound all quakes
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1515
EMM NodesEMM Nodes
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1616
Node # Average amplitude Average period Phase code
1 1.649m 0.119 sec P (primary wave)2 8.353m 0.803 sec P (primary wave)3 23.237m 0.898 sec P (primary wave)4 87.324m 0.997 sec P (primary wave)5 253.333m 1.282 sec P (primary wave)
6 270.524m 0.96 sec P (primary wave)
7 7.719m 20.4 sec P (primary wave)
8 723.088m 1.962 sec P (primary wave)
9 1938.772m 1.2 sec P (primary wave)
.
Hierarchical EMMHierarchical EMM
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1717
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1818
DATA NEEDE
D
Now What?Now What?
NOISE MAY NOT BE BAD
KDD CUP
Interest DM COMMUNITY
ReferencesReferences Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-
Temporal Data”, Temporal Data”, Proceedings of the First International Workshop on Knowledge Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex DataDiscovery in Multimedia and Complex Data, May 2002, pp 1-9., May 2002, pp 1-9.
Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531.Conference, May 2003, pp 519-531.
Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings Proceedings IEEE ICDM ConferenceIEEE ICDM Conference, November 2004, pp 371-374., November 2004, pp 371-374.
Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Spatiotemporal,” Proceedings of the IEEE PAKDD ConferenceProceedings of the IEEE PAKDD Conference, April 2006, Singapore. , April 2006, Singapore. (Also in (Also in Lecture Notes in Computer ScienceLecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, , Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)pp 750-754.)
Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Spatiotemporal Data Streams,” Journal of ComputersJournal of Computers, Vol 1, No 3, June 2006, pp 43-50., Vol 1, No 3, June 2006, pp 43-50.
Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” Anomalies,” International Journal of Computer Science and Network SecurityInternational Journal of Computer Science and Network Security, Vol 6, No , Vol 6, No 6, June 2006, pp 258-265.6, June 2006, pp 258-265.
Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) SymposiumSymposium, November 26, 2007, Shreveport Louisiana., November 26, 2007, Shreveport Louisiana.
9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1919
top related