hifi: network-centric query processing in the physical world

40
HiFi: Network-centric Query Processing in the Physical World SAP Research Forum February 2005 Mike Franklin UC Berkeley

Upload: claire

Post on 27-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

HiFi: Network-centric Query Processing in the Physical World. Mike Franklin UC Berkeley. SAP Research Forum February 2005. Introduction. Receptors everywhere! Wireless sensor networks, RFID technologies, digital homes, network monitors,. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HiFi: Network-centric Query Processing in the Physical World

HiFi: Network-centric Query Processing in the Physical

World

SAP Research ForumFebruary 2005

Mike FranklinUC Berkeley

Page 2: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Introduction

• Receptors everywhere!• Wireless sensor networks, RFID technologies,

digital homes, network monitors, ...

Large-scale deployments will be as High Fan-In Systems

Page 3: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

High Fan-in Systems

Large numbers of receptors = large data volumesHierarchical, successive aggregation

The “Bowtie”

Page 4: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

High Fan-in Example (SCM)

RFIDRFIDReceptors

Warehouses, Stores

Dock doors, Shelves

Regional Centers

Headquarters

Page 5: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Properties

• High Fan-In, globally-distributed architecture.

• Large data volumes generated at edges.• Filtering and cleaning must be done there.

• Successive aggregation as you move inwards.• Summaries/anomalies continually, details later.

• Strong temporal focus.• Strong spatial/geographic focus.• Streaming data and stored data.• Integration within and across enterprises.

Page 6: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Design Space: Time

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

On-the-flyprocessing

Disk-basedprocessing

Stream/DiskProcessing

TimeScale

seconds years

Page 7: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Design Space: Geography

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

GeographicScope

local global

SeveralReaders

RegionalCenters

CentralOffice

Page 8: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Design Space: Resources

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

IndividualResources

tiny huge

DevicesStargates/Desktops

Clusters/Grids

Page 9: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Design Space: Data

Filtering,Cleaning,Alerts

Monitoring,Time-series

Data mining(recent history)

Archiving(provenanceand schemaevolution)

Degree of Detail Aggregate

Data VolumeDup Elimhistory: hrs

Interesting Eventshistory: days

Trends/Archivehistory: years

Page 10: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

State of the Art

• Current approaches: hand-coded, script-based• expensive, one-off, brittle, hard to deploy and keep

running• Piecemeal/stovepipe systems

• Each type of receptor (RFID, sensors, etc) handled separately

• Standards-efforts not addressing this:• Protocol design bent• Different “data models” at each level• Reinventing “query languages” at each level

No end-to-end, integrated middleware for managing distributed receptor data

Page 11: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

HiFi

• A data management infrastructure for high fan-in environments

• Uniform Declarative Framework • Every node is a data stream processor

that speaks SQL-ese stream-oriented queries at all levels• Hierarchical, stream-based views as an

organizing principle

Page 12: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Why Declarative? (database dogma)

• Independence: data, location, platform• Allows the system to adapt over time

• Many optimization opportunities• In a complex system, automatic

optimization is key.• Also, optimization across multiple

applications.

• Simplifies Programming• ???

Page 13: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Building HiFi

Page 14: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Integrating RFID & Sensors (the “loudmouth” query)

Page 15: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

A Tale of Two Systems

• TinyDB• Declarative query processing for

wireless sensor networks• In-network aggregation• Released as part of TinyOS Open Source Distribution

• TelegraphCQ• Data stream processor• Continuous, adaptive query

processing with aggressive sharing• Built by modifying PostgreSQL• Open source “beta” release out now; new release soon

Page 16: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

• The Network is the Database:• Basic idea: treat the sensor

net as a “virtual table”.

• System hides details/complexities of devices, changing topologies, failures, …

• System is responsible for efficient execution.

• Developed on TinyOS/Moteshttp://telegraph.cs.berkeley.edu/tinydb

SELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms

App

Sensor Network

TinyDB

Query, Trigger

Data

TinyDB

Page 17: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

TelegraphCQ: Data Stream Monitoring• Streaming Data

• Network monitors• Sensor Networks, RFID• News feeds, Stock tickers, …

• B2B and Enterprise apps• Trade Reconciliation, Order Processing etc.

• (Quasi) real-time flow of events and data• Manage these flows to drive business processes.• Can mine flows to create and adjust business

rules.• Can also “tap into” flows for on-line analysis.http://telegraph.cs.berkeley.edu

Page 18: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Data Stream Processing

QueriesQueriesQueriesQueries

Data

Traditional Database

Data Stream Processor

Result Tuples Result Tuples

•Data streams are unending

•Continuous, long running queries

•Real-time processing

Data

Page 19: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Windowed Queries

SELECT S.city, AVG(temp)FROM SOME_STREAM S[range by ‘5 seconds’ slide by ‘5 seconds’]WHERE S.state = ‘California’GROUP BY S.city

“I want to look at 5 seconds worth of data”

“I want a result tuple every 5 seconds”

A typical streaming query

Result Tuple(s)

Data Stream

Result Tuple(s)…

Window

Window Clause

Page 20: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

TelegraphCQ Architecture

Proxy

TelegraphCQ Front End

Planner Parser Listener

Mini-Executor

Catalog

TelegraphCQ Wrapper

ClearingHouse

Wrappers

Query Plan Queue

Eddy Control Queue

Query Result Queues

}

Shared Memory

Shared Memory Buffer Pool

Disk

Split

TelegraphCQBack End

Modules

Scans

CQEddySplit

Split

TelegraphCQ Back End

Modules

Scans

CQEddy

Page 21: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

The HiFi System

TelegraphCQ

TinyDB

Stargates

Sensor Networks &

RFID Readers

RFID Wrappers

PC

Page 22: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Basic HiFi Architecture

HiFi GlueDSQP

HiFi GlueDSQP

MDR

• Hierarchical federation of nodes

• Each node:• Data Stream Query

Processor (DSQP)• HiFi Glue

• Views drive system functionality

• Metadata Repository (MDR)

HiFi GlueDSQP

DSQP

HiFi Glue•DSQP Management•Query Planning•Archiving•Internode coordination and communication

Page 23: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

HiFi Processing Pipelines

The CSAVA Framework

Multiple Receptors

Single Tuple

Window

CSAVA Generalization

Arbitrate

Clean

Smooth

Validate

Analyze

Join w/Stored Data

On-line Data Mining

Page 24: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

CSAVA Processing

Clean

CREATE VIEW cleaned_rfid_stream AS(SELECT receptor_id, tag_idFROM rfid_stream rsWHERE read_strength >= strength_T)

Page 25: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

CSAVA: Processing

Clean

SmoothCREATE VIEW smoothed_rfid_stream AS(SELECT receptor_id, tag_id FROM cleaned_rfid_stream [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= count_T)

Page 26: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

CSAVA: Processing

Clean

Smooth

ArbitrateCREATE VIEW arbitrated_rfid_stream AS(SELECT receptor_id, tag_idFROM smoothed_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’]GROUP BY receptor_id, tag_idHAVING count(*) >= ALL (SELECT count(*) FROM smoothed_rfid_stream [range by ’5 sec’, slide by ’5 sec’] WHERE tag_id = rs.tag_id GROUP BY receptor_id))

Page 27: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

CSAVA: Processing

Arbitrate

Validate

CREATE VIEW validated_tags AS(SELECT tag_name, FROM arbitrated_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’], known_tag_list tlWHERE tl.tag_id = rs.tag_id

Clean

Smooth

Page 28: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

CSAVA: Processing

Validate

CREATE VIEW tag_count AS(SELECT tag_name, count(*) FROM validated_tags vt [range by ‘5 min’, slide by ‘1 min’]GROUP BY tag_name

Analyze

Arbitrate

Clean

Smooth

Page 29: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Ongoing Work

• Bridging the physical-digital divide• VICE – A “Virtual Device” Interface

• Hierarchical query processing• Automatic Query planning &

dissemination

• Complex event processing• Unifying event and data processing

Page 30: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Virtual Device (VICE) Layer

RFIDRFID

“Metaphysical*Data

Independence”

*The branch of philosophy that deals with the ultimate nature of reality and existence. (name due to Shawn Jeffery)

Page 31: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

The Virtues of VICE

• A simple RFID Experiment• 2 Adjacent Shelves, 8 ft each• 10 EPC-tagged items each, plus 5

moved between them.• RFID antenna on each shelf.

Page 32: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Ground Truth

Page 33: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Raw RFID Readings

Page 34: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

After VICE ProcessingUnder the covers (in this case):

Cleaning, Smoothing, and Arbitration

Page 35: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Other VICE Uses

• Once you have the right abstractions:• “Soft Sensors”• Quality and lineage streams• Pushdown of external validation information• Power management and other

optimizations• Data Archiving• Model-based sensing• “Non-declarative” code• …

Page 36: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Hierarchical Query Processing

“I provide raw readings for Soda Hall”

“I provide avg daily values for Berkeley”

“I provide avg weekly values for California”

“I provide national monthly values for the US”

• Continuous and Streaming• Automatic

placement and optimization

• Hierarchical• Temporal

granularity vs. geographic scope

• Sharing of lower-level streams

Page 37: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Complex Event Processing

• Needed for monitoring and actuation• Key to prioritization (e.g., of detail data)• Exploit duality of data and events• Shared Processing• “Semantic Windows”• Challenge: a single system that

simultaneously handles events spanning seconds to years.

Page 38: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Next Steps

• Archiving and Detail Data• Dealing with transient overloads• Rate matching between stored and streaming

data• Scheduling large archive transfers

• System design & deployment• Tools for provisioning and evaluating receptor

networks

• System monitoring & management• Leverage monitoring infrastructure for

introspection

Page 39: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Conclusions

• Receptors everywhere High Fan-In Systems• Current middleware solutions are complex & brittle• Uniform declarative framework is the key• The HiFi project is exploring this approach• Our initial prototype

• Leveraged TelegraphCQ and TinyDB• Demonstrated RFID/multiple sensor integration• Validated the HiFi approach

• We have an ambitious on-going research agenda

• See http://hifi.cs.berkeley.edu for more info.

Page 40: HiFi: Network-centric Query Processing in the Physical World

Mike Franklin UC Berkeley EECS

Acknowledgements

• Team HiFi: Shawn Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Nathan Burkhart, Owen Cooper, Anil Edakkunni

• Experts in VICE: Gustavo Alonso, Wei Hong, Jennifer Widom

• Funding and/or Reduced-Price Gizmos from NSF, Intel, UC MICRO program, and Alien Technologies