yanlei diao, university of massachusetts amherst future directions in sensor data management yanlei...
TRANSCRIPT
Yanlei Diao, University of Massachusetts Amherst
Future Directions in Sensor Data Management
Yanlei Diao
University of Massachusetts, Amherst
Yanlei Diao, University of Massachusetts Amherst
Overview of Sensor Data Management
Infrastructural work Deploying a network of wireless sensing nodes Optimizing energy efficiency Communicative vs. storage intensive paradigmsSensor data management research Diverse types of sensor data, e.g., temperature, light,
GPS, RFID, radar data, astronomical data, … Query processing on stored data and data streamsMuch work lies ahead Supporting scientific applications: high-volume data,
complex data types, data uncertainty, user-defined functions…
Building a smart planet: platform, data integration, scale…
Yanlei Diao, University of Massachusetts Amherst
Data Streams from Sensing Applications
TV
Data: incomplete, imprecise, misleading
Results: unknown quality
Yanlei Diao, University of Massachusetts Amherst
CASA: Severe Weather Monitoring
High-Volume Raw Signal Data:• 1.66 million data items, 200Mb per sec
High-Volume Raw Signal Data:• 1.66 million data items, 200Mb per sec
Highly Noisy Data:• Environmental noise• Device noise• Transmit frequency• System clock• Positioner• Antenna…
Highly Noisy Data:• Environmental noise• Device noise• Transmit frequency• System clock• Positioner• Antenna…
SensingSensingSensingSensing
Yanlei Diao, University of Massachusetts Amherst
Uncertain Data Processing
SensingSensingSensingSensing
Transformation& Averaging
Transformation& Averaging
Transformation& Averaging
Transformation& Averaging
Uncertainty: • What is the data quality of those tuples?• What is the effect of averaging over uncertain data?
Uncertainty: • What is the data quality of those tuples?• What is the effect of averaging over uncertain data?
• Transform raw data to tuples (time, area, velocity, reflectivity, …)• Average tuples for reduced volume and smoothing
• Transform raw data to tuples (time, area, velocity, reflectivity, …)• Average tuples for reduced volume and smoothing
Yanlei Diao, University of Massachusetts Amherst
Final Tornado Detection
SensingSensing
Detection/PredicationDetection/Predication
wireless transmission
SensingSensing
Transformation& Averaging
Transformation& Averaging
Transformation& Averaging
Transformation& Averaging
Quality of the final detection result?
Tornado Detection
Yanlei Diao, University of Massachusetts Amherst
SELECT group_id, max(O.luminosity)FROM Observations O [RANGE 1 hour] GROUP BY area_id(O.(x,y), AREA_DEF) as group_idHAVING max(O.luminosity) > 20
Computational Astrophysics
(o_id, time, (x,y)p, luminosityp, colorp)
108 stars, galaxies0.5TB – 20 TB nightly data ratesNoisy observations from images
Quality of alert?
Query answer
Yanlei Diao, University of Massachusetts Amherst
Object Tracking and Monitoring using RFID
Incomplete, noisy RFID data streams
Electronic devicesElectronic devices
Metal objectsMetal
objectsOrientation
s of reading
Orientations of
reading
Raw data<time, tag_id,
reader_id>
Raw data<time, tag_id,
reader_id>
Data needed for querying
<time, tag_id, (x, y)p, …>
Data needed for querying
<time, tag_id, (x, y)p, …>
vs.
Not directly queriable
Fire monitoring
Alert when a flammable object is exposed to a high temperature. Alert when a flammable object is exposed to a high temperature.
Yanlei Diao, University of Massachusetts Amherst
Scope of our Project
Uncertain data modeled using continuous random variables
High-volume data streams
An end-to-end solution1. From raw streams to queriable probabilistic tuple
streams2. Relational processing of probabilistic tuple streams
Query answers with bounded errors (existence prob., attribute dist.)
Stream-speed processing
Objectives:
Yanlei Diao, University of Massachusetts Amherst
System Overview
T1
T2
T3
A1 A2
A3
A4
J1
tuples w. lineage
Archived tuples
Aggregates (SUM, COUNT, AVG) Joins ():
• Equijoin using a probabilistic view• Non-equijoin using a cross-
productProjections ()Selections ()Linear arithmetic operations
Selection - AggregationGroup By - Aggregation (G,Aggr )Arbitrary arithmetic operations
Yanlei Diao, University of Massachusetts Amherst
Relational Processing under the GMM Model
Closed-form Distributions in GMMs!
Approximation w. bounded errors!
Truncated GMMs
No commutativity
Major result: Relational algebra under GMMs
Yanlei Diao, University of Massachusetts AmherstYanlei Diao, University of Massachusetts Amherst © KSWO TV
© Patrick Marsh May 8, 2007
Series of low-levelcirculations.
NWS TornadoWarnings: 7:16pm,7:39pm, 8:29pm
7:21pm
8:15 pm
9:54pm
11:00pm
Yanlei Diao, University of Massachusetts Amherst
Velocity Maps
Boduo Li, Liping Peng, University of Massachusetts Amherst
Yanlei Diao, University of Massachusetts Amherst
Processing Speed and Accuracy
Data trace of 84 scans in 947 seconds
Data Analysis Time (sec)
Detection Time
False Positives
CASA 198 4486 2137
PODS 579 392 9
More analysis time, still stream speed
Faster detection due to improved data quality
Much fewer false alarms due to better data quality
• Signal strength filter or smoothing improves detection time. • GMM fitting & noise removal finally cuts down # false positives.
Yanlei Diao, University of Massachusetts Amherst
Future Work
User-defined functions e.g., fuzzy joins, tornado detection algorithms…
Array model for scientific applicationsCorrelation between derived attributes; between tuples
Query optimization cheapest plan that meets a query accuracy
requirement Large-scale data analysis for scientific
applications leveraging cluster computing…