yanlei diao, university of massachusetts amherst future directions in sensor data management yanlei...

15
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Upload: geoffrey-hall

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Future Directions in Sensor Data Management

Yanlei Diao

University of Massachusetts, Amherst

Page 2: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Overview of Sensor Data Management

Infrastructural work Deploying a network of wireless sensing nodes Optimizing energy efficiency Communicative vs. storage intensive paradigmsSensor data management research Diverse types of sensor data, e.g., temperature, light,

GPS, RFID, radar data, astronomical data, … Query processing on stored data and data streamsMuch work lies ahead Supporting scientific applications: high-volume data,

complex data types, data uncertainty, user-defined functions…

Building a smart planet: platform, data integration, scale…

Page 3: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Data Streams from Sensing Applications

TV

Data: incomplete, imprecise, misleading

Results: unknown quality

Page 4: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

CASA: Severe Weather Monitoring

High-Volume Raw Signal Data:• 1.66 million data items, 200Mb per sec

High-Volume Raw Signal Data:• 1.66 million data items, 200Mb per sec

Highly Noisy Data:• Environmental noise• Device noise• Transmit frequency• System clock• Positioner• Antenna…

Highly Noisy Data:• Environmental noise• Device noise• Transmit frequency• System clock• Positioner• Antenna…

SensingSensingSensingSensing

Page 5: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Uncertain Data Processing

SensingSensingSensingSensing

Transformation& Averaging

Transformation& Averaging

Transformation& Averaging

Transformation& Averaging

Uncertainty: • What is the data quality of those tuples?• What is the effect of averaging over uncertain data?

Uncertainty: • What is the data quality of those tuples?• What is the effect of averaging over uncertain data?

• Transform raw data to tuples (time, area, velocity, reflectivity, …)• Average tuples for reduced volume and smoothing

• Transform raw data to tuples (time, area, velocity, reflectivity, …)• Average tuples for reduced volume and smoothing

Page 6: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Final Tornado Detection

SensingSensing

Detection/PredicationDetection/Predication

wireless transmission

SensingSensing

Transformation& Averaging

Transformation& Averaging

Transformation& Averaging

Transformation& Averaging

Quality of the final detection result?

Tornado Detection

Page 7: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

SELECT group_id, max(O.luminosity)FROM Observations O [RANGE 1 hour] GROUP BY area_id(O.(x,y), AREA_DEF) as group_idHAVING max(O.luminosity) > 20

Computational Astrophysics

(o_id, time, (x,y)p, luminosityp, colorp)

108 stars, galaxies0.5TB – 20 TB nightly data ratesNoisy observations from images

Quality of alert?

Query answer

Page 8: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Object Tracking and Monitoring using RFID

Incomplete, noisy RFID data streams

Electronic devicesElectronic devices

Metal objectsMetal

objectsOrientation

s of reading

Orientations of

reading

Raw data<time, tag_id,

reader_id>

Raw data<time, tag_id,

reader_id>

Data needed for querying

<time, tag_id, (x, y)p, …>

Data needed for querying

<time, tag_id, (x, y)p, …>

vs.

Not directly queriable

Fire monitoring

Alert when a flammable object is exposed to a high temperature. Alert when a flammable object is exposed to a high temperature.

Page 9: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Scope of our Project

Uncertain data modeled using continuous random variables

High-volume data streams

An end-to-end solution1. From raw streams to queriable probabilistic tuple

streams2. Relational processing of probabilistic tuple streams

Query answers with bounded errors (existence prob., attribute dist.)

Stream-speed processing

Objectives:

Page 10: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

System Overview

T1

T2

T3

A1 A2

A3

A4

J1

tuples w. lineage

Archived tuples

Page 11: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Aggregates (SUM, COUNT, AVG) Joins ():

• Equijoin using a probabilistic view• Non-equijoin using a cross-

productProjections ()Selections ()Linear arithmetic operations

Selection - AggregationGroup By - Aggregation (G,Aggr )Arbitrary arithmetic operations

Yanlei Diao, University of Massachusetts Amherst

Relational Processing under the GMM Model

Closed-form Distributions in GMMs!

Approximation w. bounded errors!

Truncated GMMs

No commutativity

Major result: Relational algebra under GMMs

Page 12: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts AmherstYanlei Diao, University of Massachusetts Amherst © KSWO TV

© Patrick Marsh May 8, 2007

Series of low-levelcirculations.

NWS TornadoWarnings: 7:16pm,7:39pm, 8:29pm

7:21pm

8:15 pm

9:54pm

11:00pm

Page 13: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Velocity Maps

Boduo Li, Liping Peng, University of Massachusetts Amherst

Page 14: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Processing Speed and Accuracy

Data trace of 84 scans in 947 seconds

Data Analysis Time (sec)

Detection Time

False Positives

CASA 198 4486 2137

PODS 579 392 9

More analysis time, still stream speed

Faster detection due to improved data quality

Much fewer false alarms due to better data quality

• Signal strength filter or smoothing improves detection time. • GMM fitting & noise removal finally cuts down # false positives.

Page 15: Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst

Yanlei Diao, University of Massachusetts Amherst

Future Work

User-defined functions e.g., fuzzy joins, tornado detection algorithms…

Array model for scientific applicationsCorrelation between derived attributes; between tuples

Query optimization cheapest plan that meets a query accuracy

requirement Large-scale data analysis for scientific

applications leveraging cluster computing…