hifi: network-centric query processing in the physical world
DESCRIPTION
HiFi: Network-centric Query Processing in the Physical World. Mike Franklin UC Berkeley. SAP Research Forum February 2005. Introduction. Receptors everywhere! Wireless sensor networks, RFID technologies, digital homes, network monitors,. - PowerPoint PPT PresentationTRANSCRIPT
HiFi: Network-centric Query Processing in the Physical
World
SAP Research ForumFebruary 2005
Mike FranklinUC Berkeley
Mike Franklin UC Berkeley EECS
Introduction
• Receptors everywhere!• Wireless sensor networks, RFID technologies,
digital homes, network monitors, ...
Large-scale deployments will be as High Fan-In Systems
Mike Franklin UC Berkeley EECS
High Fan-in Systems
Large numbers of receptors = large data volumesHierarchical, successive aggregation
The “Bowtie”
Mike Franklin UC Berkeley EECS
High Fan-in Example (SCM)
RFIDRFIDReceptors
Warehouses, Stores
Dock doors, Shelves
Regional Centers
Headquarters
Mike Franklin UC Berkeley EECS
Properties
• High Fan-In, globally-distributed architecture.
• Large data volumes generated at edges.• Filtering and cleaning must be done there.
• Successive aggregation as you move inwards.• Summaries/anomalies continually, details later.
• Strong temporal focus.• Strong spatial/geographic focus.• Streaming data and stored data.• Integration within and across enterprises.
Mike Franklin UC Berkeley EECS
Design Space: Time
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
On-the-flyprocessing
Disk-basedprocessing
Stream/DiskProcessing
TimeScale
seconds years
Mike Franklin UC Berkeley EECS
Design Space: Geography
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
GeographicScope
local global
SeveralReaders
RegionalCenters
CentralOffice
Mike Franklin UC Berkeley EECS
Design Space: Resources
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
IndividualResources
tiny huge
DevicesStargates/Desktops
Clusters/Grids
Mike Franklin UC Berkeley EECS
Design Space: Data
Filtering,Cleaning,Alerts
Monitoring,Time-series
Data mining(recent history)
Archiving(provenanceand schemaevolution)
Degree of Detail Aggregate
Data VolumeDup Elimhistory: hrs
Interesting Eventshistory: days
Trends/Archivehistory: years
Mike Franklin UC Berkeley EECS
State of the Art
• Current approaches: hand-coded, script-based• expensive, one-off, brittle, hard to deploy and keep
running• Piecemeal/stovepipe systems
• Each type of receptor (RFID, sensors, etc) handled separately
• Standards-efforts not addressing this:• Protocol design bent• Different “data models” at each level• Reinventing “query languages” at each level
No end-to-end, integrated middleware for managing distributed receptor data
Mike Franklin UC Berkeley EECS
HiFi
• A data management infrastructure for high fan-in environments
• Uniform Declarative Framework • Every node is a data stream processor
that speaks SQL-ese stream-oriented queries at all levels• Hierarchical, stream-based views as an
organizing principle
Mike Franklin UC Berkeley EECS
Why Declarative? (database dogma)
• Independence: data, location, platform• Allows the system to adapt over time
• Many optimization opportunities• In a complex system, automatic
optimization is key.• Also, optimization across multiple
applications.
• Simplifies Programming• ???
Mike Franklin UC Berkeley EECS
Building HiFi
Mike Franklin UC Berkeley EECS
Integrating RFID & Sensors (the “loudmouth” query)
Mike Franklin UC Berkeley EECS
A Tale of Two Systems
• TinyDB• Declarative query processing for
wireless sensor networks• In-network aggregation• Released as part of TinyOS Open Source Distribution
• TelegraphCQ• Data stream processor• Continuous, adaptive query
processing with aggressive sharing• Built by modifying PostgreSQL• Open source “beta” release out now; new release soon
Mike Franklin UC Berkeley EECS
• The Network is the Database:• Basic idea: treat the sensor
net as a “virtual table”.
• System hides details/complexities of devices, changing topologies, failures, …
• System is responsible for efficient execution.
• Developed on TinyOS/Moteshttp://telegraph.cs.berkeley.edu/tinydb
SELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms
App
Sensor Network
TinyDB
Query, Trigger
Data
TinyDB
Mike Franklin UC Berkeley EECS
TelegraphCQ: Data Stream Monitoring• Streaming Data
• Network monitors• Sensor Networks, RFID• News feeds, Stock tickers, …
• B2B and Enterprise apps• Trade Reconciliation, Order Processing etc.
• (Quasi) real-time flow of events and data• Manage these flows to drive business processes.• Can mine flows to create and adjust business
rules.• Can also “tap into” flows for on-line analysis.http://telegraph.cs.berkeley.edu
Mike Franklin UC Berkeley EECS
Data Stream Processing
QueriesQueriesQueriesQueries
Data
Traditional Database
Data Stream Processor
Result Tuples Result Tuples
•Data streams are unending
•Continuous, long running queries
•Real-time processing
Data
Mike Franklin UC Berkeley EECS
Windowed Queries
SELECT S.city, AVG(temp)FROM SOME_STREAM S[range by ‘5 seconds’ slide by ‘5 seconds’]WHERE S.state = ‘California’GROUP BY S.city
“I want to look at 5 seconds worth of data”
“I want a result tuple every 5 seconds”
A typical streaming query
Result Tuple(s)
Data Stream
Result Tuple(s)…
Window
Window Clause
Mike Franklin UC Berkeley EECS
TelegraphCQ Architecture
Proxy
TelegraphCQ Front End
Planner Parser Listener
Mini-Executor
Catalog
TelegraphCQ Wrapper
ClearingHouse
Wrappers
Query Plan Queue
Eddy Control Queue
Query Result Queues
}
Shared Memory
Shared Memory Buffer Pool
Disk
Split
TelegraphCQBack End
Modules
Scans
CQEddySplit
Split
TelegraphCQ Back End
Modules
Scans
CQEddy
Mike Franklin UC Berkeley EECS
The HiFi System
TelegraphCQ
TinyDB
Stargates
Sensor Networks &
RFID Readers
RFID Wrappers
PC
Mike Franklin UC Berkeley EECS
Basic HiFi Architecture
HiFi GlueDSQP
HiFi GlueDSQP
MDR
• Hierarchical federation of nodes
• Each node:• Data Stream Query
Processor (DSQP)• HiFi Glue
• Views drive system functionality
• Metadata Repository (MDR)
HiFi GlueDSQP
DSQP
HiFi Glue•DSQP Management•Query Planning•Archiving•Internode coordination and communication
Mike Franklin UC Berkeley EECS
HiFi Processing Pipelines
The CSAVA Framework
Multiple Receptors
Single Tuple
Window
CSAVA Generalization
Arbitrate
Clean
Smooth
Validate
Analyze
Join w/Stored Data
On-line Data Mining
Mike Franklin UC Berkeley EECS
CSAVA Processing
Clean
CREATE VIEW cleaned_rfid_stream AS(SELECT receptor_id, tag_idFROM rfid_stream rsWHERE read_strength >= strength_T)
Mike Franklin UC Berkeley EECS
CSAVA: Processing
Clean
SmoothCREATE VIEW smoothed_rfid_stream AS(SELECT receptor_id, tag_id FROM cleaned_rfid_stream [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= count_T)
Mike Franklin UC Berkeley EECS
CSAVA: Processing
Clean
Smooth
ArbitrateCREATE VIEW arbitrated_rfid_stream AS(SELECT receptor_id, tag_idFROM smoothed_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’]GROUP BY receptor_id, tag_idHAVING count(*) >= ALL (SELECT count(*) FROM smoothed_rfid_stream [range by ’5 sec’, slide by ’5 sec’] WHERE tag_id = rs.tag_id GROUP BY receptor_id))
Mike Franklin UC Berkeley EECS
CSAVA: Processing
Arbitrate
Validate
CREATE VIEW validated_tags AS(SELECT tag_name, FROM arbitrated_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’], known_tag_list tlWHERE tl.tag_id = rs.tag_id
Clean
Smooth
Mike Franklin UC Berkeley EECS
CSAVA: Processing
Validate
CREATE VIEW tag_count AS(SELECT tag_name, count(*) FROM validated_tags vt [range by ‘5 min’, slide by ‘1 min’]GROUP BY tag_name
Analyze
Arbitrate
Clean
Smooth
Mike Franklin UC Berkeley EECS
Ongoing Work
• Bridging the physical-digital divide• VICE – A “Virtual Device” Interface
• Hierarchical query processing• Automatic Query planning &
dissemination
• Complex event processing• Unifying event and data processing
Mike Franklin UC Berkeley EECS
Virtual Device (VICE) Layer
RFIDRFID
“Metaphysical*Data
Independence”
*The branch of philosophy that deals with the ultimate nature of reality and existence. (name due to Shawn Jeffery)
Mike Franklin UC Berkeley EECS
The Virtues of VICE
• A simple RFID Experiment• 2 Adjacent Shelves, 8 ft each• 10 EPC-tagged items each, plus 5
moved between them.• RFID antenna on each shelf.
Mike Franklin UC Berkeley EECS
Ground Truth
Mike Franklin UC Berkeley EECS
Raw RFID Readings
Mike Franklin UC Berkeley EECS
After VICE ProcessingUnder the covers (in this case):
Cleaning, Smoothing, and Arbitration
Mike Franklin UC Berkeley EECS
Other VICE Uses
• Once you have the right abstractions:• “Soft Sensors”• Quality and lineage streams• Pushdown of external validation information• Power management and other
optimizations• Data Archiving• Model-based sensing• “Non-declarative” code• …
Mike Franklin UC Berkeley EECS
Hierarchical Query Processing
“I provide raw readings for Soda Hall”
“I provide avg daily values for Berkeley”
“I provide avg weekly values for California”
“I provide national monthly values for the US”
• Continuous and Streaming• Automatic
placement and optimization
• Hierarchical• Temporal
granularity vs. geographic scope
• Sharing of lower-level streams
Mike Franklin UC Berkeley EECS
Complex Event Processing
• Needed for monitoring and actuation• Key to prioritization (e.g., of detail data)• Exploit duality of data and events• Shared Processing• “Semantic Windows”• Challenge: a single system that
simultaneously handles events spanning seconds to years.
Mike Franklin UC Berkeley EECS
Next Steps
• Archiving and Detail Data• Dealing with transient overloads• Rate matching between stored and streaming
data• Scheduling large archive transfers
• System design & deployment• Tools for provisioning and evaluating receptor
networks
• System monitoring & management• Leverage monitoring infrastructure for
introspection
Mike Franklin UC Berkeley EECS
Conclusions
• Receptors everywhere High Fan-In Systems• Current middleware solutions are complex & brittle• Uniform declarative framework is the key• The HiFi project is exploring this approach• Our initial prototype
• Leveraged TelegraphCQ and TinyDB• Demonstrated RFID/multiple sensor integration• Validated the HiFi approach
• We have an ambitious on-going research agenda
• See http://hifi.cs.berkeley.edu for more info.
Mike Franklin UC Berkeley EECS
Acknowledgements
• Team HiFi: Shawn Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Nathan Burkhart, Owen Cooper, Anil Edakkunni
• Experts in VICE: Gustavo Alonso, Wei Hong, Jennifer Widom
• Funding and/or Reduced-Price Gizmos from NSF, Intel, UC MICRO program, and Alien Technologies