1 data dissemination peyman teymoori. 2 introduction data dissemination: the process by which...

1

Data Dissemination

Peyman Teymoori

2

Introduction

Data dissemination: the process by which queries or data routed in the network

Source: the node generating data Event: the information to be reported Sink: the node interested in an event Two general steps:

Interest propagation: temperature, intrusion Data propagation: routing, aggregation

3

Routing Models

Address Centric Each source independently send data to sink

Data Centric Routing nodes en-route look at data sent

Source 2

Source 1

Sink

BA

Source 2

Source 1

Sink

BA

AC Routing

DC Routing

4

Differences with Current Networks Difficult to pay special attention to any individual

node: Collecting information within the specified region Collaboration between neighbors

Sensors may be inaccessible: embedded in physical structures. thrown into inhospitable terrain.

Sensor networks deployed in very large ad hoc manner No static infrastructure

5

Differences with Current Networks They will suffer substantial changes as nodes fail:

battery exhaustion accidents new nodes are added.

User and environmental demands also contribute to dynamics: Nodes move Objects move

Data-centric and application-centric Location aware Time aware

6

One possible solution? Internet technology coupled with ad-hoc routing

mechanism Each node has one IP address Each node can run applications and services Nodes establish an ad-hoc network amongst

themselves when deployed Application instances running on each node can

communicate with each other

Overall Design of Sensor Networks

7

Why Different and Difficult?

A sensor node is not an identity (address) Content based and data centric

Where are nodes whose temperatures will exceed more than 10 degrees for next 10 minutes?

Tell me the location of the object ( with interest specification) every 100ms for 2 minutes.

Multiple sensors collaborate to achieve one goal. Intermediate nodes can perform data aggregation

and caching in addition to routing.where, when, how?

8

Energy-limited nodes Computation

Aggregate data Suppress redundant routing information

Communication Bandwidth-limited Energy-intensive

Scalability: ad-hoc deployment in large scale Robustness: unexpected sensor node failures Dynamic changes: no a-priori knowledge, mobility

Goal: Minimize energy dissipation

Challenges

9

Aggregation Many studies addressing not only the routing problem

but also representing and combining data more efficiently

Process of data while being forwarded toward the sink

Reducing the number of transmissions Definition:

Gathering & routing information in a multihop network Processing data at intermediate nodes to

Reduce resource consumption Increase network lifetime

10

Aggregation Two approaches:

In-network with size reduction More ability to reduce traffic Less accuracy Difficulty in reconstructing the original data

In-network without size reduction Merging some smaller packets into one

Requires: Networking protocol: routing Effective aggregation functions Data representation

11

Aggregation A different routing paradigm is required Data-centric routing: nodes route packets based on

packet content Taking into account:

The most suitable aggregation points Data type Priority of information

Timing strategies: Periodic simple aggregation Periodic per-hop aggregation Periodic per-hop adjusted aggregation

depends on the node’s position in the gathering tree

12

Aggregation

Aggregation functions: Lossy & lossless Duplicate sensitive vs. duplicate insensitive

AVG vs. MAX Data representation:

limited storage capabilities vary according to the application requirements distributed source coding: A method to deal

with data representation and compression

13

Network Protocols for In-Network Aggregation How to forward packets in order to facilitate

in-network aggregation Several approaches:

Tree-based (Hierarchical): SPTs rooted at sink, or nodes grouped into clusters

Multi-path routing: DAG, more robust Hybrid approaches

14

Flooding

Just broadcast what you receive and is not yours (consider the max hop count)

Disadvantages: Implosion

A

B C

D

(a)

(a)

(a)

(a)

A B

C (r,s)(q,r)

q sr

Data overlap

Resource blindness

15

Gossiping

A modified version of flooding Random selection of neighbors for broadcast Avoids implosion Disadvantages:

Taking a long time to propagate No delivering guarantee

16

Rumor Routing

An agent-based path creation algorithm Agents or “ants”:

long-lived entities created at random by nodes packets circulated to establish shortest paths

to events they encounter perform path optimization

Motivation Sometimes a non-optimal route is satisfactory

17

Rumor Routing Creating Paths:

Nodes having observed an event send out agents which leave routing info to the event as state in nodes

Agents attempt to travel in a straight line

If an agent crosses a path to another event, it begins to build the path to both

Agent also optimizes paths if they find shorter ones.

18

Rumor Routing Basis for algorithm:

Observation: Two lines in a bounded rectangle have a 69% chance of intersecting

Create a set of straight line gradients from event, then send query along a random straight line from source.

Event

Source

19

Rumor Routing

(a) (b)

20

Rumor Routing

Advantages: Tunable best effort delivery Tunable for a range of query/event ratios

Disadvantages: Optimal parameters depend heavily on

topology (but can be adaptively tuned) Does not guarantee delivery

21

Sensor Protocols for Information via Negotiation (SPIN) A data-centric routing approach Uses negotiation & resource adaptation to

address deficiencies of flooding Two basic ideas:

Exchanging sensor data may be expensive, but exchanging data about sensor data may not be.

Nodes need to monitor and adapt to changes in their own energy resources

22

Sensor Protocols for Information via Negotiation (SPIN) Data negotiation

Meta-data (data naming) Application-level control Model “ideal” data paths

SPIN messages ADV- advertise data REQ- request specific data DATA- requested data

Resource management

A B

A B

A B

ADV

REQ

DATA

23

Sensor Protocols for Information via Negotiation (SPIN)

B

AADVREQDATA

ADV

AD

VADV

ADV

AD

V ADV

REQ

REQ

REQ

REQ

REQ

DATADA

TA

DATA

DA

TA

DATA

24

Cost-Field Approach

Sets up minimum cost paths to a sink A two-phase process:

Set up the cost field (metrics such as delay) Data dissemination using the cost

At each node, cost = min cost to the sink No explicit path information

25

Cost-Field Approach Setting up the cost field:

Sink broadcasts: ADV + its cost as 0 A node N hears an ADV from M:

Its path Ln : total cost from node N to sink Lm : the cost of node M to sink Cnm : the cost from node N to M

If Ln was updated, the new cost is broadcasted (new ADV)

Flooding-based implementation of Dijkstra’s algorithm A back-off-based approach, Time to defer = γ * Cmn

γ is a parameter of the algorithm

min( , )N M NMcost L L C

26

Cost-Field Approach

An example of setting up the cost field γ = 10

27

Cost-Field Approach

Data dissemination: A source sends a message

cost = Cs cost-so-far = 0

In an intermediate node M (with cost = Cm): If cost-so-far + Cm = Cs then

Forward the packet

28

Directed Diffusion

A reactive data-centric protocol Suitable for monitoring Expressed in terms of named data Organized in three phases:

Interest dissemination Gradient setup Path reinforcement & forwarding

29

Directed Diffusion

Naming A list of attribute – value pairs Animal tracking:

Interest ( Task ) DescriptionType = four-legged animalInterval = 20 msDuration = 1 minuteLocation = [-100, -100; 200, 400]

RequestRequestNode dataType =four-legged animalInstance = elephantLocation = [125, 220]Confidence = 0.85Time = 02:10:35

ReplyReply

30

Directed Diffusion Interest

The sink periodically broadcasts interest messages

Every node maintains an interest cache Each item corresponds to a distinct interest No information about the sink Interest aggregation : identical type, completely

overlap rectangle attributes Each entry in the cache has several fields

Timestamp: last received matching interest Several gradients: data rate, duration, direction

31

Directed Diffusion Setting Up Gradient

Source

Sink

Interest = InterrogationGradient = Who is interested(data rate , duration, direction)

Neighbor’s choices :1. Flooding 2. Geographic routing3. Cache data to direct interests

32

Directed Diffusion Data propagation

Sensor node computes the highest requested event rate among all its outgoing gradients

When a node receives a data: Find a matching interest entry in its cache

Examine the gradient list, send out data by rate Cache keeps track of recent seen data items

(loop prevention) Data message is sent individually to the

relevant neighbors (unicast)

33

Directed Diffusion

Reinforcing the best path

Source

Sink

Low rate event Reinforcement = Increased interest

The neighbor reinforces a path:1. At least one neighbor2. Choose the one from whom it first received the latest event (low delay)3. Choose all neighbors from which new events were recently received

34

Directed Diffusion The reinforced path must be periodically

refreshed A trade off based on network dynamics:

Frequency of gradient setup Achieved performance

MAC layer issues: Keeping local (control) traffic at a low level

Avoid collision, delay Enhanced Directed Diffusion

Joint of Directed Diffusion & cluster-based arch.

35

Tiny AGgregation (TAG)

A tree-based data-centric approach Timing: periodic per hop adjusted Two main phases:

Distribution: disseminating queries Collection: aggregating & routing readings

Declarative interface for data collection and aggregation – SQL style

36

Tiny AGgregation (TAG) A sample query:

SELECT: an expression over one or more aggregation values expr: the name of a single attribute agg: aggregation function attrs: the attributes by which the sensor readings are partitioned WHERE, HAVING: filters out irrelevant readings GROUP BY: specifies an attribute based partitioning of readings EPOCH DURATION: time interval of aggr record computation

SELECT {agg(expr), attrs} from SENSOR WHERE {selPreds} GROUP BY {attrs} HAVING {havingPreds} EPOCH DURATION i

37


Distribution: The sink broadcasts an organizing message

Message contains level & distance from the root Node receiving the message:

If it doesn’t belongs to any level: Its level = message.level + 1 Its parent = the sender node Rebroadcasts the message adding its own ID & level

Broadcasting the query along the structure

38


Collection: Each parent waits for data Then sends its aggregation up the tree Epochs are divided slots equals to the max

depth of the tree – Sleep & Wake up Every epoch, new aggregate produced Most of the times, motes are idle and in low

power state

39


1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

51

Sensor #

<- T

ime

SELECT COUNT(*) FROM sensors

40


1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Sensor #


<- T

ime

41


1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

31

Sensor #


<- T

ime

42


1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

5Sensor #


<- T

ime

43


1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

51

Sensor #


<- T

ime

44


A grouping example

45

Synopsis Diffusion Synopsis Diffusion : In-network aggregation Synopsis Functions

Synopsis Generation: s=SG(r) Synopsis Fusion: s=SF(s1,s2) Synopsis Evaluation: r*=SE(s)

Efficient Topology for Synopsis Diffusion Rings (R0 R1 … Ri-1 Ri )

Duplicate Sensitive Aggregates Mapping DS Aggregates Order- and duplicate-

insensitive synopsis

46

Synopsis Diffusion

Aggregate : A metric of aggregationClass of Aggregates

Not T

AGCo

unt M

IN Hist

ogra

mAv

erag

e

Coun

t Dist

inct

Med

ian

Sink

47

Synopsis Diffusion

[ ]

48

Synopsis Diffusion Phases of SD:

Distribution Phase Aggregate query is flooded through the network Network node form a set of rings

Aggregation Phase Each node uses SG to convert local data to local

synopsis and then uses SF to merge two synopsis to create a new one. The query initiator uses the SE to generate the final result.

Adapting the Topology Ring Topology Adaptive Ring Topology Nodes moves up or down in the rings dependent upon

the messages it overhears.

49

Delay Bounded Medium Access Control (DB-MAC) A tree-based aggregation scheme A joint design of routing & MAC protocols Minimizes the latency for delay bounded

apps. Takes advantage of data aggregation Adopts CSMA/CA scheme based on

RTS/CTS/DATA/ACK handshake Suitable for cases where:

Different sources sense an event There is delay constraint

50

Delay Bounded Medium Access Control (DB-MAC) A message exchange example in DB-MAC

51

Delay Bounded Medium Access Control (DB-MAC) Dynamically aggregates data while forwarding toward

the sink Aggregation tree is built on the fly No knowledge of the network topology

RTS/CTS are exploited to do aggregation Back-off intervals respect priorities By overhearing CTSs, the relay node is chosen Advantages:

Flexible & distributed construction of the tree Suitable for dynamic topologies Energy-efficient due to cross-layer design

52

Tributaries and Deltas

A hybrid approach: tree-based & multipath Idea

In low packet loss rate regions, use tree (T) In high loss rate regions, use multipath (M)

How to link the regions: M nodes only send data to M nodes M nodes subgraph includes the sink

Thresholds for M node percentage

53

Tributaries and Deltas

1 data dissemination peyman teymoori. 2 introduction data dissemination: the process by which...

Documents