dragan rade petrović - university of california,...

Communication and Compression in Dense Networks of Unreliable Nodes

by

Dragan Rade Petrović

B.S. (University of Illinois at Urbana-Champaign) 1999 M. S. (University of California, Berkeley) 2001

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy in

Engineering - Electrical Engineering and Computer Sciences

in the

GRADUATE DIVISION

Of the

UNIVERSITY OF CALIFORNIA, BERKELEY

Committee in charge:

Professor Kannan Ramchandran, Chair Professor Jan Rabaey Professor Paul Wright

Spring 2005

The dissertation of Dragan Rade Petrović is approved:

Chair Date

Date

Date

University of California, Berkeley

Spring 2005


Copyright 2005

by


1

Abstract


by


Doctor of Philosophy in Engineering –

Electrical Engineering and Computer Sciences

University of California, Berkeley

Professor Kannan Ramchandran, Chair

The drive toward the implementation and massive deployment of wireless sensor

networks calls for ultra-low-cost, low-power and ever smaller nodes. While the

digital subsystems of the nodes are still experiencing exponential reduction of all of

these metrics as described by Moore's Law, there is no such trend regarding the

performance of analog components needed for the radios that enable the nodes to

communicate wirelessly with one another. This dissertation presents a two part

approach to reducing the energy consumption of the radios. First, a new radio

architecture is presented that greatly reduces the power required to operate a

transceiver, as well as reducing the cost and size of the nodes. Secondly, a novel

distributed compression scheme is introduced that allows the sensor nodes to

compress their data in order to reduce the amount of communication that the

radios must perform.

The dissertation presents a fully integrated architecture of both digital and analog

components (including local oscillator) that offers significant reduction in cost, size

and power consumption of the overall node. Even though such a radical

architecture cannot offer the reliable tuning of standard designs, it is shown that by

using random network coding, a dense network of such nodes can achieve

throughput linear in the number of channels available for communication.

Moreover, the ratio of the achievable throughput of the untuned network to the

throughput of a tuned network with perfect coordination is shown to be close to

. By contrast, it is also shown that if coding is not used (i.e. if nodes are only

allowed to forward packets without processing them), the performance does not

improve with increased density and available spectrum.

e/1

To reduce the amount of communication among nodes required, a novel

approach to reducing energy consumption in sensor networks using a distributed

adaptive signal processing framework and efficient algorithm is proposed.

Specifically, the dissertation presents a distributed way of continuously exploiting

existing correlations in sensor data based on adaptive signal processing and

distributed source coding principles. This approach enables sensor nodes to

blindly compress their readings with respect to one another without the need for

explicit and energy-expensive inter-sensor communication to effect this

compression. Furthermore, the distributed algorithm used by each sensor node

is extremely low in complexity and easy to implement (i.e., one modulo

operation), while an adaptive filtering framework is used at the data-gathering

2

3

unit to continuously learn the relevant correlation structures in the sensor data.

Applying the algorithm to testbed data resulted in energy savings of 10%-65% for

a multitude of sensor modalities.

Both the network coding for communication with untuned radios and the

distributed source coding schemes require minimal complexity from the low-

power sensor nodes. Instead, the complexity of the system is pushed toward the

edge of the network where a gateway between the wireless network and the

wired world resides.

i

TABLE OF CONTENTS

List of Figures ........................................................................................ iv

List of Tables.......................................................................................... vi

Acknowledgments................................................................................. vii

Chapter 1: Introduction ..........................................................................1

1.1 Trading-off Radio Power Dissipation for Tunability....................5

1.2 Distributed Compression of Sensor Node Data.........................8

1.3 Contributions ............................................................................12

Chapter 2: Reliable Communication Using Untuned Radios ..............14

2.1 Drawbacks of Untuned Radios ................................................17

2.2 Multi-Hop Communication Method...........................................22

2.3 Practical Justification of the Model...........................................26

2.4 Analysis ....................................................................................29

2.4.1 Transition Probabilities ....................................................30

2.4.2 Robustness Grows Exponentially with Density ...............33

2.4.3 Stability and Optimal Ratio of Density to

Channelization ................................................................37

2.5 System Level View...................................................................41

2.5.1 Comparison to Schemes Using Untuned Radios with

Wide Receive Filters.......................................................43

2.5.2 Comparison to Schemes Using Tuned Radios ...............43

Chapter 3: Throughput of Networks of Untuned Radios .....................46

ii

3.1 Maximum throughput over One Hop........................................46

3.2 Maximum throughput over Many Hops....................................48

3.3 Achievable Throughput over Many Hops.................................49

3.3.1 Random Graph Representation of the Network

Connectivity ...............................................................................50

3.3.2 Max-Flow of the Random Graph .....................................51

3.4 Simulation Results ...................................................................59

Chapter 4: Distributed Compression ...................................................61

4.1 Background on Compression with Side Information................63

4.2 Code Construction for Distributed Compression .....................68

4.3 Correlation Tracking.................................................................72

2.4.3 Parameter Estimation ......................................................78

4.3.2 Decoding Error.................................................................83

4.4 Querying and Data Reporting Algorithm..................................85

4.4.1 Data-Gathering Node Algorithm......................................85

4.4.2 Sensor Node Algorithm ...................................................87

4.5 Simulation Results ...................................................................88

4.5.1 Correlation Tracking ........................................................88

4.5.2 Energy Savings ...............................................................93

4.5.3 Robustness to Errors.......................................................95

4.6 Conclusion................................................................................97

Chapter 5: Future Work .....................................................................100

iii

Appendix A: Maximizing the Number of Channels with Exactly one

Transmitter ....................................................................101

Appendix B: Throughput of Routing without Coding in a Network of

Untuned Radios.............................................................104

Bibliography ........................................................................................109

iv

LIST OF FIGURES

Number Page

1. An example sensor network topology in which considerable

gains can be achieved from distributed compression..............11

2. Signal bandwidth relative to process variation.........................19

3. Proposed multi-hop communication method ..........................23

4. Probability distribution of number of transmitters while the

packet is still alive.....................................................................34

5. Robustness v. Channelization for various values of α.............36

6. Probability of failing to transmit a packet over 10 hops v.

Channelization for various values of α.....................................37

7. Random graph representing connectivity in the network of

nodes with untuned radios .......................................................51

8. Simulation results: Throughput vs. number of input radios......60

9. Distributed compression set-up................................................62

10. Achievable rate regions in distributed compression ................66

11. A tree based construction for compression with side

information................................................................................69

12. An example of the tree based codebook .................................71

13. Adaptive filtering block used to form the side information

and decode the sensor reading................................................82

v

14. Tolerable noise and prediction noise for 18,000 samples of

humidity data ............................................................................90


temperature data ......................................................................91


light data ...................................................................................92

17. Random graph representing connectivity when only routing

is allowed................................................................................105

vi

LIST OF TABLES

Number Page

1. Energy savings of LMS-based correlation tracking and

distributed compression scheme .............................................pp

2. Expected throughput of routing................................................pp

vii

ACKNOWLEDGMENTS

The author…

1

C h a p t e r 1

INTRODUCTION

Advances in wireless communication as well as embedded microprocessor design

and manufacturing have led to great interest in recent years in the possibility of

doing distributed sensing and control. As a result, the emerging field of wireless

sensor networks has become a very active area of both academic research [1]-[4]

and industrial development [5]-[9]. The goal is to design and produce tiny silicon-

based devices, usually referred to as “nodes,” that have some sensing capability

(e.g. light-sensor, thermometer, humidity-sensor, barometer, accelerometer,

magnetometer, etc.) along with some amount of memory and processing

capability, as well as the ability to communicate wirelessly with each other. These

nodes could then be deployed in some environment and used to observe, through

their sensing capability, some aspects of that environment. Deploying many such

nodes would allow fine-grained information about the state of the environment that

could then be used to take some action to control certain parameters. In order to

distribute the nodes densely, they must be made small. This size constraint also

imposes a limit on the processing power as well as the energy available to the

individual nodes. Therefore, performing complex tasks would require the

coordination and cooperation of many nodes.

2

The potential scenarios for use of sensor networks are far ranging. One of the

many applications for which sensor networks are already being used is monitoring

the structural integrity of buildings and bridges [10], in which nodes with

accelerometers are placed at key junctions of the structure to measure its

movement as a response to stresses due to seismic, tidal, and traffic activity.

Having this data allows for timely maintenance of the building or bridge that is also

more cost-effective than the traditional approach of performing maintenance and

reinforcement at certain time intervals, whether they are needed or not [10].

Another important application of sensor networks that is receiving a lot of attention

is environmental control within living and working spaces [11]. While fully a third of

all energy consumed in the developed world is spent on environmental control of

living and working spaces (the other two thirds are spent on agriculture,

manufacturing, and transportation), it is estimated that 80% of this energy is

wasted due to inadequate ability to measure the state of the environment and a

lack of fine-grained actuation to influence it [11]. Sensor networks are also used

for habitat monitoring in animal sanctuaries [12]. Using sensor networks allows

biologists to keep track of crucial environmental factors that affect the health of

animal and plant populations in different ecosystems. In more urban areas, sensor

networks are being deployed for highway traffic control [13]. Instead of tearing up

asphalt to install inductor coils to measure the flow of traffic, sensor nodes with

magnetometers are being used to monitor the traffic and this data can be used in

real time to control the traffic by updating the patterns of traffic and metering lights

as well as suggested routes to trip-planning and global positioning system (GPS)-

3

based guidance systems [13]. Warehouse inventory tracking with sensor networks

is also gaining a lot of attention in industry [14].

In order to gain experience and early exposure, most initial efforts in the field of

sensor networks have relied on available “off the shelf” components (sensors,

processors, radios, and memory) to assemble the first sensor network nodes. The

main lesson of those early experiences was that the networks were very fragile.

The nodes frequently failed because they ran out of energy or because they were

damaged due to exposure to the environment (e.g. bumped or stepped on). The

communication channels between the nodes were also found to be unstable.

Depending on the topology of the environment, it is not uncommon to find nodes a

few meters apart that cannot reliably communicate with one another, while other

pairs of nodes that are tens of meters apart can. Perhaps most importantly, the

networks did not scale well because increased density resulted in greater

contention for the wireless communication medium causing collisions among the

transmissions. As a result of this, most commercial efforts have focused on

making the nodes more reliable and equipping them with better, and more

complex and power-consuming radios, in order to make the links among the nodes

more reliable. These efforts have been successful in giving exposure in the

technical community to this emerging field through the deployment of wireless

sensor networks in the applications already mentioned. However, making the

nodes more reliable requires them to be costly, power-hungry, and large. This

dissertation argues that, in order to make wireless sensor networks truly

4

ubiquitous, the nodes must be made cheap, their power dissipation must be

comparable to the amount power that can be scavenged from the environment,

and they must be made small so that they can be deployed with high density.

Also, a new class of protocols must be developed specifically for wireless sensor

networks that can benefit from, rather than suffer from density. The benefit of such

an approach will only increase in the future with ever more demanding

applications.

The potential applications of sensor networks that are being considered in the

long-term include such science-fiction-like concepts as smart surfaces that can

respond to contact or serve as a communication backplane. Another possible

application in the long term is “skin” for airplanes that can provide real-time

monitoring of and alerts regarding the state of every square centimeter of the

aircraft’s surface. It may even be possible to eventually produce sensor nodes

small enough to be inserted into the blood-stream to provide real-time diagnostics

of factors such as blood pressure, blood flow, glucose and insulin levels, etc.

To make such deployments economically and technologically feasible, it is

necessary to drastically reduce the cost, size and energy consumption of the

nodes available today. Moore’s Law still provides for exponential reduction of

these metrics over time when it comes to the digital components that comprise the

memory, computation and coding in the nodes. However, there is no equivalent

trend to Moore’s Law that applies to the analog components needed for the radios

5

that enable the nodes to communicate with one other. This work introduces 1) a

new architecture for the analog radios and 2) a distributed source coding scheme

that allows the nodes to compress their readings in order to reduce the number of

bits that have to be transmitted by the radio, thereby reducing the energy

consumed.

1.1 TRADING-OFF RADIO POWER DISSIPATION FOR TUNABILITY

The proposed radio architecture can greatly reduce the cost, size (5x reduction)

and energy consumption (10x reduction) of the nodes. In fact it is expected that

the proposed architecture will allow the energy consumption of the nodes to be so

low that they could be fully powered by energy scavenged from the environment

[15]. The penalty for using such a radical architecture is that the radios become

untuned and it is no longer possible to guarantee that any arbitrary pair of nodes

will be able to communicate with each other. Instead, it becomes necessary to rely

on the density of nodes to make the overall network capable of providing reliable

communication.

Narrowband radios have shown to be the architecture of choice for low-power

applications [6], [7], [16] , as they are low in complexity and consume less power

than spread spectrum or other wide-band techniques. One fundamental

requirement of narrowband radios is that the transmitter’s carrier frequency and

the receiver’s detection frequency must be well-matched. This is traditionally

6

accomplished by employing a crystal at both the transmitter and receiver to

provide the same low frequency reference. This reference frequency is multiplied

via a phase-locked loop (PLL) to generate the carrier wave. However, the off-chip

crystal contributes significantly to the cost, size, and power consumption of such

transceivers. The cost is due to the external quartz crystals being more expensive

than the silicon used for the baseband signal processing as well as the need to

bond separate components. This problem is especially acute in the design of

highly integrated transceivers for wireless sensor networks. The size of traditional

low power transceivers is largely due to the external crystal reference and the

interface between the crystal and the silicon integrated circuit (IC). Additionally,

the power consumption of low power radios is dominated by the crystal referenced

PLL. Therefore, great savings in all three of these areas could be obtained by

eliminating the off-chip crystal and PLL.

Even when care is taken to ensure that all radios are tuned and are attempting to

communicate on the same frequency, reliable communication is not guaranteed.

Practical implementations of sensor networks are notorious for having unstable

links because narrowband communication is susceptible to deep fades between

nodes [17], [18]. Since it is not feasible to overcome these fades by transmitting

with more power (due to power-constraints), it has been proposed that randomized

algorithms be used to ensure reliable communication [19], [20]. Such algorithms

propose to provide reliable multi-hop communication by exploiting the broadcast

nature of wireless transmissions. The key idea is for a transmitting node to send a

7

beacon to many potential forwarding nodes and then select one node to be the

next hop for the packet among those that respond to the beacon. However,

collisions among the responses to the beacon as well as the time-varying quality of

the communication channels (a channel may be good during the beaconing, but

become bad during the response and/or data transmission) contribute significant

overhead to such schemes.

This dissertation proposes a fundamentally different way of designing and

operating a transceiver. The quartz crystal is eliminated and replaced by an on-

chip resonator such as an inductor-capacitor (LC)-circuit or a nano-

electromechanical resonant structure. This makes it possible to economically

produce millions of nodes and densely deploy them by, for example, weaving them

into fabrics or mixing them with paint. The proposed architecture allows a sensor

node to be developed entirely out of thin-film technologies (radio, digital

components, battery, energy scavenging, and sensing). However, the drawback

of such architectures is that the variations in the manufacturing process are large,

resulting in un-tuned radios. Therefore, two narrowband radios produced by such

a process are not likely to be able to communicate with each other. To address

this problem, a low-complexity communication protocol is proposed that makes

use of the high density of nodes to ensure reliable communication using such un-

tuned radios even without the need for handshaking protocols or re-transmission.

By eliminating the need for this kind of coordination, the protocol is also made

8

more robust to link failures, while the density that is made possible by such low

cost designs makes the network robust to the failure of individual nodes.

1.2 DISTRIBUTED COMPRESSION OF SENSOR NODE DATA

Motivated by the energy constraint in sensor networks, there has been

considerable recent interest in the area of energy-aware routing for ad hoc and

sensor networks [28]-[30] and efficient information processing [31], [32] to reduce

the energy usage of sensor nodes. For example, one method of conserving

energy in a sensor node is to aggregate packets along the sensor paths to reduce

header overhead. This dissertation proposes a fundamentally new method of

conserving energy in sensor networks that is mutually exclusive and

complementary to those approaches, and can be used in combination with them to

increase energy-consumption reduction.

The approach is based on judiciously exploiting existing sensor data correlations in

a distributed manner. Correlations in sensor data are brought about by the spatio-

temporal characteristics of the physical medium being sensed. Dense sensor

networks are particularly rich in correlations, where spatially dense nodes are

typically needed to acquire fine spatial resolution in the data being sensed, and for

fault-tolerance from individual node failures. Examples of correlated sensors

include temperature and humidity sensors in a similar geographic region, or

magneto-metric sensors tracking a moving vehicle. Another interesting example of

9

correlated sensor data involves audio field sensors (microphones) that sense a

common event such as a concert or whale cries. Audio data is particularly

interesting in that it is rich in spatial correlation structure due to the presence of

echoes, causing multiple sensors to pick up attenuated and delayed versions of a

common sound origin.

This dissertation proposes to remove the redundancy caused by these inherent

correlations in the sensor data through a distributed compression algorithm which

obviates the need for the sensors to exchange their data among each other in

order to strip their common redundancy. Rather surprisingly, it will be shown that

compression can be effected in a fully blind manner without the sensor nodes ever

knowing what the other correlated sensor nodes have measured. This enables a

simple and inexpensive architecture for each sensor node and is in fact preferable

to an architecture based on each sensor knowing the other sensors'

measurements. The proposed paradigm is particularly effective for sensor

network architectures having two types of nodes: sensing nodes and data-

gathering nodes. The sensing nodes gather data of a specific type and transmit

this data upon being queried. The data-gathering node queries specific sensors in

order to gather information in which it is interested (see Figure 1). We will assume

the above architecture of Figure 1 and show that for such an architecture, we can

devise compression algorithms that have very lightweight encoders, yet can

achieve significant savings. Note, that this work targets very lightweight encoders

because we assume that the sensors have limited processing power, but the

10

constructions introduced here can be easily strengthened given greater

computational power at the sensors. The savings are achieved by having the

data-gathering node track the correlation structure among nodes and then use this

information to effect distributed sensor data compression. The correlation

structure is determined by using an adaptive prediction algorithm. The sensors,

however, do not need to know the correlation structure; they need to know only the

number of bits that they should use for encoding their measurements. As a result,

each sensor node is required to perform very few operations in order to encode its

data. The decoder, however, is considerably more complex, but it resides on the

data-gathering node, which is not assumed to be energy-constrained. Preliminary

results based on our distributed compression and adaptive prediction algorithms

perform well in realistic scenarios, achieving 10-65% energy savings for each

sensor in typical cases. In addition, our distributed compression architecture can

be combined with other energy saving methods such as packet/data aggregation

to achieve further gains [33].

11

sensor node

sensor node

sensor node

sensor node

sensor node

sensor node

sensor node

sensor node

Data Gathering Node

query data

query

data

Figure 1. An example sensor network topology in which considerable gains can be achieved from distributed compression. A computer acts as the data-gathering node and queries various sensors to collect data.

12

1.3 CONTRIBUTIONS

This dissertation focuses on reducing the energy spent by the radios of wireless

sensor network nodes. The first part proposes a new architecture for the radios

that eliminates the tuning elements (off-chip crystal and the reference phase lock

loop (PLL)). Such a radio requires considerably less power to operate than

traditional designs and is also much cheaper and smaller than traditional radios,

but it sacrifices the ability to tune the transceivers. The second part of the

dissertation presents a code construction and correlation tracking algorithm that

allow the sensor nodes to compress their readings in order to reduce the amount

of communication required among the nodes. The contributions of the dissertation

are:

• It is shown that the reliability of a network of untuned radios grows

exponentially with node density and available bandwidth.

• It is shown that by using random linear network coding (which is

computationally very simple), achievable throughput of a network of

untuned radios is linear in the density of nodes and the amount of spectrum

available for communication, same as in a fully-coordinated network of

tuned radios.

• The ratio of the achievable throughput of the untuned network to the

achievable throughput of a tuned network is shown to be close to . e/1

13

• It is also shown that if a network of untuned radios does not utilize any

coding in the network (i.e. the intermediate nodes only forward packets

without processing them), the throughput of the network does not grow with

node density and the amount of spectrum available for communication.

• A computationally inexpensive encoder is proposed that can support

multiple compression rates, allowing sensor nodes to compress their data

without having to do heavy processing.

• An adaptive correlation-tracking algorithm based on Least-Mean-Square

(LMS) filtering is presented. The algorithm can continuously track and

exploit both spatial and temporal correlation in the sensors' data.

14

C h a p t e r 2

RELIABLE COMMUNICATION USING UNTUNED RADIOS

With the emergence of ubiquitous wireless communication networks (such as

sensor networks) and ambient intelligence, the design of low data-rate short

distance wireless transceivers has gained prominence. Narrowband radios have

shown to be the architecture of choice for such applications [6],[7],[16], as they are

low in complexity and consume less power than spread spectrum or other wide-

band techniques. One fundamental requirement of narrowband radios is that the

transmitter’s carrier frequency and the receiver’s detection frequency must be well-

matched. This is traditionally accomplished by employing a crystal at both the

transmitter and receiver that provide the same reference frequency for the carrier

wave. However, great savings in both manufacturing cost and power consumption

could be obtained by eliminating the off-chip crystal. In [16] a mechanical

resonator is used to provide the carrier frequency at lower cost and power

consumption relative to crystal-based designs. The State-of-the art-quartz crystal-

based radios, such as the Chipcon CC2420, typically require 50mW of power in

both transmit (Tx) and receive (Rx) mode [6] for communication over 10 meters.

The micro-electromechanical (MEMS) based radio shown in [16] provides

substantial power and integration savings over traditional designs, operating at

3.6mW in Rx mode and 5.9mW in Tx mode for communication over 30 meters.

15

The cost and power consumption of the transceiver could be further reduced by

two orders of magnitude if a fully integrated radio could be built that uses an on-

chip resonator to provide the reference frequency. The drawback of using an

integrated on-chip resonator (such as an LC circuit or nano-electromechanical

weight) is that the variations in the manufacturing process are large. Therefore,

two narrowband radios produced by such a process are not guaranteed to be able

to communicate with each other.

To mitigate this problem, this dissertation proposes to exploit the high density of

nodes in the network that would be made economically feasible by using such a

low-cost design. The idea is to exploit the broadcast nature of wireless

communications. At high node density, this ensures that when a node attempts to

communicate, with high probability at least one of its neighbors can receive the

message even though not all the nodes are capable of communicating with each

other. There is a potential problem with using this approach for multi-hop

communication, however. If nodes just broadcast their packets and have any of a

number of candidate neighbors that hear the transmission forward the packet, it is

likely that multiple neighbors will receive and forward the packet. This would result

in an explosion in the number of copies of a packet as it propagates through the

network. To prevent this explosion in the number of packets, collisions - usually

considered a bane of wireless communications - could be exploited to ensure that

the number of received copies of a packet at each step does not grow

unboundedly. If too many nodes are transmitting a copy of the packet, they will

16

collide, resulting in a reduced number of receivers that successfully receive the

packet. If few transmitters are sending the packet, the number of collisions will be

low, increasing the number of successful receptions of the packet.

It will be shown analytically that these opposing forces cause the system to reach

a balanced equilibrium allowing for reliable communication through the network.

Increasing the density along with providing for channelization results in reliability of

the communication method that grows exponentially with density. An additional

benefit of distributing the responsibility for communication among many nodes is

that it makes the network much more robust to the inevitable failure or death of

individual nodes.

The radio architectures enabled by such an approach make it possible to envision

fully-integrated, single-chip nodes that would be small enough (10 mm3), cheap

enough (<$0.01), and low power enough (10 µW) that they could be woven into

fabrics or mixed in with wall-paint. This would enable the realization of “truly-

disappearing electronics”, that is ubiquitous computing and communication

devices that can be effectively integrated into the environment, and disappear from

view. The availability of these integrated meso-scale compute nodes would open

the door for new concepts such as smart fabrics, intelligent surfaces and ambient

intelligence.

The rest of this chapter is organized as follows: Section 2.1 describes the issues

involved in using such low-cost, low-power, but unreliable radio architectures.

17

Section 2.2 proposes a communication method that exploits the high density of

nodes made feasible by using such low-cost and small-form-factor nodes to

overcome the inherent unreliability of the radios. Section 2.3 discusses the

practical considerations that lead to the modeling assumptions made in Section

2.2. Section 2.4 analyzes the reliability of the proposed scheme. Finally, Section

2.5 compares the performance of the proposed scheme to two benchmark

schemes.

2.1 DRAWBACKS OF UNTUNED RADIOS

The main drawback of using an on-chip resonator is that the variations in the

manufacturing process are large. To achieve frequency tolerance approaching a

quartz crystal, prohibitively expensive trimming would have to take place. In

addition, drift over time and temperature would quickly render the trimming

inaccurate. The other option is to leave the transmitters and receivers un-tuned,

which means that two narrowband radios produced by such a process are not

guaranteed to be able to communicate with each other. Figure 2 presents a

qualitative illustration of the challenges presented by this approach. In a traditional

narrow-band architecture using a quartz crystal reference, the signal bandwidth is

typically orders of magnitude larger than the center frequency tolerance. When

using un-tuned receivers, the situation is reversed and the carrier frequency

variation is orders of magnitude greater than the signal bandwidth. This means

that if two radios obtained from such a process attempt to communicate, and the

18

receiver is sensitive only to a narrow portion of the spectrum (as would be done

when using traditional, tuned radios), the probability that the transmitter is sending

a signal in this narrow portion of the spectrum is very low. The most

straightforward way of ensuring that radios produced by such an imprecise

process can communicate with each other is to allow the front-end filter of each

receiver to admit all the frequencies in the range of carrier frequencies that could

result from the manufacturing process. Such wide-band receivers would be able

to “hear” any of the radios produced by the process, but they would also admit all

the noise and interferers in the band-pass range of the front-end filter. This, in

turn, would either force the transmitter to output more power to provide the same

signal to noise ratio (SNR) at the receiver, or it would greatly reduce the

communication range of the nodes.

19

3 σ v a ria tio n

S ig n a l B W

fca rr ier

P (fca rr ier)

Figure 2. Signal bandwidth relative to process variation when using on-chip LC resonators to provide the carrier frequency for narrowband radios. When using an LC resonator without a crystal reference, it is not possible for a radio to know exactly at what frequency it is operating.

However, there is an alternative to using the brute-force method of having a wide-

band front-end filter at the receiver that admits all the possible frequencies that

might contain the transmitted carrier frequency. The receiver could employ filtering

with the same bandwidth it would use in classical narrowband communication

when the transmitter’s carrier frequency is known with high accuracy. Of course,

in this case there is no guarantee that a particular transmitter would be able to

communicate with a particular receiver. If the input frequency range of a particular

receiver did admit the transmitter’s carrier frequency, the pair would be able to

communicate. Otherwise, the result would be the same as if the transmitter were

communicating on a channel orthogonal to the one that the receiver is monitoring.

20

The result is that unreliable manufacturing processes can be used to provide

channelization to the communication system.

Even though a particular transmit-receive pair may not be able to communicate

because they would be effectively tuned to different channels, a sufficiently high

density of nodes ensures a high probability that there are pairs of transmitters and

receivers that can communicate with one another. The number of channels

available in the system is determined by the ratio of the variations of the

manufacturing process to the receiver bandwidth. Narrowing the receiver input

bandwidth results in more communication channels. This would decrease the

probability that any given pair of nodes could communicate with each other, but it

would also decrease the amount of noise admitted by the receiver, increasing the

receiver sensitivity and reducing the necessary transmitted power level. It should

be noted that the receive bandwidth can be altered by using digital control to adjust

the receive radio parameters. As long as the bandwidth admitted by the receive

filters is greater than the signal bandwidth, there will be a non-zero probability that

two randomly selected nodes will be able to communicate with each other.

There is a potential problem with using this approach for multi-hop communication,

however. If nodes simply broadcast their packets and have any of a number of

candidate neighbors that hear the transmission forward the packet, it is likely that

multiple neighbors will receive and forward the packet. This would result in an

explosion in the number of copies of a packet as it propagates through the

21

network. This phenomenon is known as “broadcast storm.” To prevent this

explosion in the number of packets, collisions - usually considered a bane of

wireless communications - could be exploited to ensure that the number of

received copies of a packet at each step does not grow unboundedly. If too many

nodes are transmitting a copy of the packet, they will collide resulting in a reduced

number of receivers that successfully receive the packet. If few transmitters are

sending the packet, the number of collisions will be low increasing the number of

successful receptions of the packet. It will be shown in Section 2.4.3 that these

opposing forces cause the system to reach a balanced equilibrium allowing for

reliable communication through the network.

In the rest of this chapter and the next, this abstraction of having multiple channels

available for communication is made. Note that it is not necessary for the

channels to be orthogonal. What is important is the number of transmitter carrier

frequencies that fall within the bandwidth being monitored by a particular receiver.

The probability that any given transmitter falls within the receiver’s range is

dependent only on the ratio of the range of possible carrier frequencies to the

receiver bandwidth. This ratio is equivalent to the number of channels in the

analysis presented here because it is equal to the maximum number of

independent transmissions that can be made simultaneously without interfering

with one another.

22

2.2 MULTI-HOP COMMUNICATION METHOD

In order to convey the intuition of the proposed communication method, consider

the scenario shown in Figure 3. The source node has a packet to send to the

destination node. If the source and destination nodes are far enough apart, the

source node is not able to transmit the packet directly to the destination node.

Instead, it has to rely on nodes that lie between it and the destination to forward

the packet in a multi-hop fashion. The region between the source and destination

nodes can be divided into disjoint blocks as shown in the Figure 3. This can be

accomplished by including the coordinates of the corner points of block containing

the next-hop nodes. If the size of the blocks is small enough relative to the

transmission range of the nodes’ radios, it is possible for any node to communicate

with any other node in a neighboring block. This makes it possible to have the

packet hop from one block to the next until it reaches the destination.

If the transmitter selects a particular node in the next block to be the next hop on

the packet’s way to the destination, there is a danger that the transmitter and the

selected receiver are not communicating on the same channel. If the nodes are

allowed to duty cycle their radios in order to conserve their energy, it is also

possible for the selected receiver to be in the off state at the time when the

communication is attempted. Also, selecting only one node to route the packet

toward the destination does not make use of the broadcasting nature of wireless

communication. When a packet is transmitted, all of the nodes in the neighboring

23

blocks that are listening to the transmitter’s channel will be able to receive the

packet as long as they are on. This raises the following question: Is it possible for

the transmitter node to broadcast its packet and safely assume that at least one of

the nodes in the neighboring block have received it without any need for

acknowledging the reception?

1 …

Source Destination

H

Figure 3. Proposed multi-hop communication method. Nodes in each block listen to L randomly selected frequency bands. If a node detects transmissions from the previous block in any of those bands, it combines the inputs using random linear network coding and broadcasts the result to the next block.

The potential problem is that, in order to ensure that at least one neighbor will

receive the packet with high probability even in the presence of unreliable

channels or un-tuned radios, the number of potential next hop neighbors must be

high. In this case, it is likely that many neighbors will receive the packet and

forward it onward. As the packet propagates toward the destination, there is a

danger that the number of copies of the packet will grow unboundedly because at

each step (block) there will be more and more nodes transmitting the packet. To

prevent this from happening, it may be possible to make use of collisions on the

24

channel to prevent the number of copies of the packet received in any block from

growing unboundedly.

Consider the scenario in which there are multiple channels available for

communication among the nodes (as is the case when using un-tuned radio

architectures), and each of the nodes can dynamically select a channel to transmit

on or listen to. If more than one transmitter is operating on a particular channel

concurrently, a collision will occur and any node listening to that channel will be

unable to decode any transmissions on that channel during that time. Note that

the assumption that nodes dynamically select a channel to communicate on can

be satisfied by either varying the reactance of an LC-resonator (or selecting one of

several available resonators) on a single node, or by duty-cycling the nodes

thereby making a random subset of them active in communication at any point in

time.

Consider the case when the network is divided into a virtual grid as shown in

Figure 3 and the number of nodes in each block is equal. Let us define the

following variables:

≡K Number of channels ≡N Number of nodes/block ≡bT Number of transmitters in block b

The proposed communication method operates as follows:

25

1. All nodes that are awake and do not have a packet to send are in receive

mode. They randomly and independently select, with uniform probability,

one of the K channels to monitor.

2. The source node randomly selects, with uniform probability, one of the K

channels to transmit on.

3. Any of the N nodes in Block 1 that are monitoring the channel on which the

source node is transmitting, will receive a copy of the packet and forward it

at the next time step. Let us call the number of nodes in Block 1 that

received the packet . 1T

4. At the next time step, all of the nodes in Block 1 that received a copy of

the packet from the source node forward it toward the destination. They

randomly and independently select, with uniform probability, one of the

1T

K

channels to transmit on and broadcast the packet on that channel.

5. If more than one transmitter broadcasts on a particular channel, the result is

a collision, and any receivers monitoring that channel are unable to receive

the packet. Thus, only those receivers in Block 2 that are listening to a

channel on which exactly one transmitter is sending will receive a copy of

the packet. We denote the number of nodes in Block 2 that receive a copy

of the packet as . These nodes then proceed to forward the packet in the

same manner as described in Step 4.

2T

26

}

The random process specified by the number of nodes, , in Block that receive

a copy of the packet forms a discrete time Markov chain with the state space

, where the zero-state is absorbing and all others are transient. This

implies that the chain will eventually be absorbed in the zero-state with probability

one. What is of interest to us is how long it will take until the chain is absorbed in

the zero-state. This provides a measure of the robustness of the scheme because

it indicates how many hops the packet can traverse before it is lost. As long as

this number is greater than the distance between the source and the destination, it

is possible to use this method of communication. Section 2.4 considers the

robustness of the scheme as a function of

bT b

{ N,...,1,0

K and N as well as the energy

required to provide this level of robustness.

2.3 PRACTICAL JUSTIFICATION OF THE MODEL

The proposed communication method makes several assumptions that are

necessary to simplify the analysis. This section gives practical justification for

those assumptions showing that they can be met in practice.

The analysis makes the simplifying assumption that the communication channels

are orthogonal to each other, even though this is clearly not the case in practice.

Channelization is obtained by having each of the receivers listen to a portion of

spectrum that is much smaller than that covered by the variations in the

manufacturing process of the radios. Since the transmitter carrier frequencies and

27

the center frequencies of the receive filters are continuous random variables,

portions of spectrum monitored by different receivers can partially overlap.

However, the probability of channel collisions is the same whether the channels

are orthogonal or not. A channel collision at a particular receiver occurs when

more than one transmitter broadcasts a signal within the frequency range

monitored by the receiver. The probability of this occurring is independent of the

frequency ranges monitored by other receivers. It only depends on the ratio of the

variations of the manufacturing process to the frequency range monitored by each

receiver. That ratio is taken to be the number of channels in this analysis.

The analysis also assumes that the transmitting and receiving nodes randomly

select a channel to transmit on and monitor. In practice, this can be achieved by

having a high density of nodes in the network and allowing most of them to sleep

much of the time and have only a random subset of them on at any time. Having a

random subset of nodes on has the same effect as allowing the nodes to randomly

choose their receive center frequencies. As long as the nodes have two different

on-chip resonators provide their receive center frequency and their transmit carrier

frequency, the effect will be the same as if the receiving nodes randomly choose a

channel on which to forward the packet they have received. Alternatively, the

random channel selection could be accomplished by having the nodes randomly

adjust the capacitances of their reference frequency generators. This would allow

the nodes to randomly select, with near-uniform probability, a frequency in the

28

range of interest, although the node would not be aware of the exact frequency it is

selecting.

The system is modeled as being discrete time. I.e. it is assumed that the nodes in

each block will make their transmissions concurrently. This can be achieved in

practice by having each node that successfully receives the packet forward it as

soon as the node from which they heard it stops its transmission. Since the

proposed protocol does not require any handshaking and acknowledgements

there is no variable part in the latency, making it possible to maintain the discrete

time model. To initiate the process, it is possible for the node that originally

produced the packet to transmit the packet several times in succession so that any

nodes that wake up during this time and are tuned to its frequency will have an

opportunity to hear the packet and forward it. This can be used to reduce the

probability that the packet is lost at the first hop because there is only one

transmitter and no nodes may be listening to its channel.

The virtual grid used to divide the network into blocks is not really necessary in

practice. It is used to simplify the analysis by ensuring that all transmitters could

be heard by all receivers in the next hop; however, in practice this is not

necessary. (If required, the grid could be achieved by sending the coordinates of

the endpoints of the receiving block sent in the packet header. This would indicate

that only nodes that fall within those coordinates should forward the packet

towards the destination if they hear it.) Like all geographical routing methods, this

29

}

requires that the nodes have knowledge of their own locations relative to each

other. Fortunately, the performance of most position estimation algorithms for

sensor networks improves with increasing density [21].

2.4 ANALYSIS

As stated in Section 2.2, the number of copies of a packet being transmitted in

each block, , using this communication method forms a Markov chain on the

state space { where the zero-state is absorbing and all the other states

are transient. Therefore, the chain is guaranteed to eventually be absorbed in the

zero-state. This implies that if the distance (number of blocks) that the packet

attempts to traverse is unbounded, the packet will eventually be lost. There are

two ways in which this can happen. Consider the transmission of the packet from

Block to Block . There are transmitters in Block b and

bT

N,...,1,0

b 1+b bT N receivers in

Block . One possibility is that all of the transmitters are interfering with

each other. In other words, there is no channel that is being used by exactly one

transmitter. The other possibility is that there are channels on which exactly one

among the transmitters is sending, but none of those channels are being

monitored by any of the

1+b bT

bT

N receivers.

What is of interest to us is how long it takes for the chain described by to be

absorbed in the zero-state. This indicates how far (how many blocks) the packet

bT

30

can traverse before being lost. Also important is the average number of

transmitters per block that are transmitting the packet. This is equal to the average

value of before the chain is absorbed in the zero-state. This is important

because it provides a metric of how much energy is spent on transmitting the

packet for one block.

bT

It is possible to determine both the average number of steps until the chain is

absorbed in the zero-state and the average value taken on by the chain before

absorption from the transition probability matrix of the Markov chain. The

challenge then becomes finding this transition probability matrix, , as a

function of

NKP ,

K and N .

2 . 4 . 1 T r a n s i t i o n P r o b a b i l i t i e s

The probability, , that the Markov chain transitions from state to state is

the probability that transmitters broadcast the packet and receivers get a copy

of the packet. Let us denote with the probability that there are exactly

good channels (i.e.

NKllP ,', l 'l

l 'l

Kljp ,, j

j channels with exactly one transmitter) given that there are

transmitters and

l

K channels for { }),min(,...,1 Klj∈ . We know that the probability

of having successful receptions given 'l N receivers and j good channels out of

a total of K channels is

31

''

'

lNl

KjK

Kj

lN −

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛⎟⎟⎠

⎞⎜⎜⎝

⎛ (1)

for and { }),min(,...,1 Klj∈ { }Nl ,...,1'∈ . Thus, the probability of having

transmitters broadcast the packet and receivers get a copy of the packet, for

, is

l

'l

{ }Nl ,...,1'∈

''),min(

1,,

,

'',

lNlKl

jKlj

NK

KjK

Kj

lN

pPll

−

=⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⋅= ∑ (2)

The value can be computed by counting the ways in which Kljp ,, j out of

transmitters can each select a unique channel out of

l

K total channels, while the

other jK − channels each have no transmitters or more than one transmitter and

dividing this total number of combinations by the total number of ways in which

transmitters can select one of

l

K channels. There are a total of lK ways to

distribute transmitters among l K channels and ways to place each of !jjl

jK

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛

j out of l transmitters in a unique channel out of K total channels. The number

of ways of assigning transmitters among jl − jK − channels such that no

channel has exactly one transmitter can be shown to be equal to the coefficient of

the term in the expression [26]. This term is given by )!/(l jx jl −− jKx xe −− )(

32

( ) ( ))!(

)!(10 mjl

jlmjKm

jK mjlmjK

m −−−

−−−⎟⎟⎠

⎞⎜⎜⎝

⎛ − −−−

=∑ (3)

Combining this with (2) gives us the transition probability matrix . Once

is computed, the expected time until the Markov chain is absorbed in the

zero state (i.e. the robustness of the scheme), given that it starts in state , is

given by the entry of the column vector

NKP ,

NKP ,

i

thi

( ) NN QI 11 v⋅− − (4)

where is the NI NN × identity matrix, N1v

is a column vector of N ones, and

is the

Q

NN × matrix containing the transition probabilities among the recurring

states (non-zero states) of the Markov chain [27]. ( ) 1−−QI N is an NN × matrix

whose entry is the expected number of visits to state given that the chain

started in state . From this, it is possible to compute the probability distribution of

before the chain is absorbed in the zero-state given that it starts in state . The

average value of before the chain is absorbed in the zero-state given that it

starts in state l is equal to the row of the column vector [27]:

', ll 'l

l

bT l

bT

thl

( ) [ ]TN NQI ,...,2,11 ×− − (5)

This gives us the tools necessary to evaluate the robustness and energy

requirements of the communication scheme for any given values of K and N .

33

2 . 4 . 2 R o b u s t n e s s G r o w s E x p o n e n t i a l l y w i t h D e n s i t y

The key parameter in this communication scheme is the ratio of the node density

to the number of channels. Let us define the following notation KN /≡α to

specify the ratio of node density (number of nodes/block) to the total number of

channels. This value is important because if α is too small, there will be few

transmitters and receivers compared to the number of channels, making it unlikely

that any pair of transmit and receive nodes will be attempting to communicate on

the same channel. On the other hand, if α is too large, there is a high probability

that the number of transmitters is so large that all of the channels will experience

collisions.

However, for a “reasonable” choice of α , the proposed communication scheme

possesses a nice self-regulating property in that the number of transmitters per

block, specified by the process , resists growing either too large or too small.

This is because, if the number of transmitters in one block is small, there will be

few collisions resulting in more good receptions and will be greater than .

Conversely, if the number of transmitters in a block is too large, the result will be

many collisions and will be less than . Indeed, this intuition is confirmed by

Figure 4, which shows the probability distribution of the number of transmitters in a

given block provided that the packet reaches that block. This particular probability

distribution corresponds to the case when

bT

bT

1+bT bT

1+bT bT

30=K , 96=N , and 2.3/ == KNα .

34

It can be seen that having fewer than 10 or more than 60 transmitters in a block is

very unlikely, although there is a finite probability that the number of transmitters

will be an outlier implying that eventually the number of transmitters will be zero

resulting in the death of the packet (i.e. the Markov chain will be absorbed in the

zero-state).

Figure 4. Probability distribution of number of transmitters while the packet is still alive for K=30 channels and density N=96 nodes/block

35

Intuitively, it would seem that by increasing both K and N , while keeping the ratio

α constant, the law of large numbers would ensure that the deviation of the

number of transmitters about the mean decreases, resulting in a larger number of

hops that the packet can safely traverse. Figure 5 shows that this intuition is

correct. Here, the expected number of hops that a packet can travel (if the

destination is infinitely far away) before it is lost is shown as a function of the

number of channels used, given that for each curve α , the ratio of the number of

nodes per block to the number of channels, is kept constant. Four different curves,

corresponding to four different values of α , are plotted. In each case, the

robustness of the system grows exponentially with the number of channels.

36

Figure 5. Robustness v. Channelization for various values of α

The scenario in which the source and destination are infinitely far apart is

unrealistic, so a more practical demonstration of the performance of this approach

is to consider the probability that a packet will fail to reach its destination that is a

fixed distance away. Figure 6 shows the probability that a packet will not be able

to reach a destination 10 hops away from the source node. Again, it can be seen

that the reliability of this communication scheme can be made arbitrarily high by

increasing the density of nodes and the channelization of the available spectrum,

37

because the probability that a packet does not reach its destination decays

exponentially fast as the density of nodes and number of channels is increased.

Figure 6. Probability of failing to transmit a packet over 10 hops v. Channelization for various values of α.

2 . 4 . 3 S t a b i l i t y a n d O p t i m a l R a t i o o f D e n s i t y t o C h a n n e l i z a t i o n

Figure 5 and Figure 6 demonstrate the exponential reliability of the proposed

communication scheme for various values of α . It can be seen that for certain

38

ratios of node density to number of channels the system performs better than for

others. In fact, some values of α are exponentially better than others. This is to

be expected due to reasons described in Section 2.4.2, where the intuition

regarding the performance of the system was given assuming a “reasonable”

choice of α . Let us now examine what choice of α is optimal in that it guarantees

the highest robustness at any given node density.

In order to find this optimal choice of α , note that the self-regulating property

described in the previous sub-section and shown in Figure 4 implies that there

should be an “equilibrium” point in the number of transmitters per block. Let’s call

this equilibrium value T . This value should be such that, if the number of

transmitters in block is equal to b T , the expected number of receptions in block

will also be 1+b T , meaning that the expected number of nodes that will be

forwarding the packet from block 1+b to block 2+b will also be T . Hence we

refer to this value as the “equilibrium” point. In order to find the optimal value of α ,

we will first find the equilibrium point, T , for each α (this will be a function of α ),

and then find the α that has the equilibrium point which minimizes the probability

of unsuccessful communication over one hop when T transmitters are attempting

to communicate over K channels with KN ⋅=α receivers.

To simplify the notation, let us define the following variables that will be useful in

finding the equilibrium values:

39

KTR bb /=

KTR /=

Given that there are transmitters in block b and bT K channels, the probability

that a randomly selected channel has exactly one transmitter on it is

( ) 11

/11111

−⋅−

−=⎟⎠⎞

⎜⎝⎛ −⎟⎠⎞

⎜⎝⎛⎟⎟⎠

⎞⎜⎜⎝

⎛ KRb

Tb b

b

KRK

KK

L (6)

which asymptotically becomes, as K grows to infinity,

bRbb eRKTTxexactlyhaschnlselected −=),|.1.Pr( (7)

Knowing this probability that a randomly selected channel will have exactly one

transmission on it (i.e. no collisions), it is possible to find the expected number of

receptions given the number of transmitters. In particular we would like to find the

number of transmitters at equilibrium. Mathematically, we seek:

],,|[ 1 NKTTTET bb == +

],|./[#],|.[#

KNchnlreceiversofEKTTcollisionsnowithchnlsofE b

⋅==

KNKTTcollisionsnohaschnlselectedK b

/),|.Pr(

⋅=⋅=

α⋅⋅⋅= −ReRK (8)

Expressing T as RK ⋅ and comparing the leftmost expression with the rightmost

expression in (8) gives us the desired relationship between the number of

transmitters at equilibrium and α :

40

bReR =⇔= αα )ln( (9)

It is straightforward to show that this equilibrium is a stable one by verifying, using

the derivation of (8), that bbb TNKTTTE <>+ ],,|[ 1 and

bbb TNKTTTE ><+ ],,|[ 1 . This is simply a result of the aforementioned self-

regulation property.

Once the relationship between α and the equilibrium point R is known, the

optimal value of α , which provides the greatest reliability in the communication

system, can be computed by finding what value of α minimizes the probability that

a selected channel is not utilized given that the system is in equilibrium. A channel

is unutilized if the number of transmitters on it is not equal to one or if no receiver

listens on that channel. The probability that the number of transmitters on a given

channel is not equal to one is simply the complement of (7), while the probability

that no receiver, out of N , listens to the channel given that there are K channels

is

( ) αα −⋅ =−=⎟⎠⎞

⎜⎝⎛ − eK

KK K

N

/111 (10)

where the second equality holds asymptotically as K grows to infinity. Thus, the

optimal value of α can be computed as

41

])1()1[(minarg αα

α

−−−− ⋅⋅−−+⋅− eeReeR RR

])/)ln(1()/)ln(1[(minarg αα

ααααα −− ⋅−−+−= ee

(11)

The probability is minimized for 187.3=α . Indeed, Figure 5 and Figure 6 confirm

that this value of α maximizes the reliability of the proposed communication

scheme.

It should be noted that this optimal value of α can be reached at run-time as both

the density of nodes and the number of available communication channels can be

adjusted dynamically. The effective density of nodes can be altered by changing

the duty cycling of nodes making more or fewer of them active at any given point in

time. The number of available communication channels can also be adjusted

dynamically via digital control of the radio front-end and/or baseband circuitry to

vary the receive noise bandwidth. This flexibility allows the system to adapt and

operate in different network conditions providing an arbitrarily reliable

communication fabric even though the individual components are un-tuned and

unreliable.

2.5 SYSTEM LEVEL VIEW

It is important to compare the proposed communication scheme to certain

benchmarks in order to evaluate its system level performance. One of the two

most relevant benchmarks is comparing the proposed method to schemes that

42

employ tuned radios. While individual tuned radios are larger and much more

costly, as well as power-hungry, their relative reliability (due to the guarantee that

any two radios can communicate with each other if the channel is not in a deep

fade) may offer system level advantages that must be considered. The other

important benchmark is comparing the proposed method to one which uses the

same untuned radios, but with wide front-end receive filters that would again

ensure that any two such radios can potentially communicate, albeit at the cost of

significantly increased noise levels at the receiver.

The metrics to consider are overall cost, reliability even in the presence of node

and link failures and total power consumption as well as the ability to scavenge

energy from the environment. This last point is important because the energy that

can be scavenged by the network is proportional to the number of nodes in the

network and the size of the nodes. So, even if a system employing tuned radios

may be able to achieve the same level of reliability at a lower network-wide power

level than the one proposed here, the cost of such nodes may prohibit deploying

enough nodes to generate as much power as is consumed. The other way of

gathering more energy is by making the nodes larger (e.g. larger solar panels can

gather more energy), but having few large nodes would mean having less sensing

resolution and less robustness to the failure of individual nodes. The strengths as

well as weaknesses of the proposed scheme relative to the two alternatives are

discussed next.

43

2 . 5 . 1 C o m p a r i s o n t o S c h e m e s U s i n g U n t u n e d R a d i o s w i t h W i d e R e c e i v e F i l t e r s

Even when using un-tuned radios, it is possible to ensure that any two such radios

can potentially communicate with each other by having the receivers admit all the

frequencies that could potentially contain the signal and then using self-mixing

(envelope detection) to recover the signal even though the frequency of the carrier

wave is not known. Such schemes still suffer from poor spectral efficiency since

only one packet at a time may be sent on any of the frequencies that fall within the

variance of the radio manufacturing process. This again requires the nodes to

coordinate who is transmitting when, incurring the latency and power overhead of

such coordination. Also, opening up the receive filter would admit more noise

(proportional to the BW of the filter) and interference at the receiver. Therefore,

the transmitter would have to output more power in order to achieve the same

SNR at the receiver. This would cancel out the advantage from having only one

node transmit a packet instead of many as in the scheme proposed in this paper.

2 . 5 . 2 C o m p a r i s o n t o S c h e m e s U s i n g T u n e d R a d i o s

When using tuned narrowband radios for communication, it is possible to

guarantee that any two radios are capable of communicating with each other, as

long as the attenuation of the signal in the channel is not too high and the level of

interferers is not prohibitively large. However, as empirical evidence has shown, in

44

the short-distance communication required for sensor networks the link qualities

vary over time and deep fades can make the connectivity unstable [17],[18]. To

combat the instability of the links, it is necessary to deploy nodes with high density

so that alternate links are available. High node density is also required to address

the issue of power consumption of individual nodes. Because tuned radios

consume more power than the nodes can scavenge from the environment, it is

necessary to allow the nodes to sleep for much of the time. In order to preserve

network functionality even while most nodes are asleep, it is necessary to deploy

the nodes densely. However, the cost of producing nodes with tuned radios is

high, making the deployment of such high-density networks economically

imposing. Another difficulty to deploying dense networks of tuned nodes is that the

large size of such nodes can make it physically difficult to do so. Note that it is

also possible to increase the reliability of the links and the duty cycling of nodes by

making more energy available to them either by having larger batteries or larger

energy-scavenging engines, but this would further increase the size of the nodes,

making it impossible to embed them in surfaces and losing out on the sensing-

resolution aspect of dense networks.

The second drawback to traditional schemes that ensure reliable multi-hop

communication by having the transmitter select one forwarding node to send to is

latency. If the transmitter selects the forwarding node without testing the channel

first, it is possible that the channel will be bad. If the transmitter tests the channel

first it must wait for the responses from the potential forwarding nodes before

45

sending the packet. Those responses may collide causing further delay. Also, the

need for acknowledgements and retransmissions increases the power

consumption of such schemes. The method proposed here eliminates the need

for such handshaking, thereby reducing the latency of the communication.

The one important advantage of using tuned radios is spectral efficiency. The

scheme described in this chapter requires the same packet to be transmitting over

multiple channels. In contrast, using tuned radios would make it possible to

transmit independent packets over those channels increasing the throughput of the

network. This disadvantage of the proposed scheme may be partially reduced by

employing distributed forward error correction codes. Using distributed channel

coding makes it possible for networks of untuned radios to achieve performance

close to that of a network of tuned radios. The next chapter shows how this can be

done and proves that the performance of a network of untuned radios can achieve

throughput of up to of the maximum throughput of a network of tuned radios. e/1

46

C h a p t e r 3

THROUGHPUT OF NETWORKS OF UNTUNED RADIOS

3.1 MAXIMUM THROUGHPUT OVER ONE HOP

Consider using the nodes to form a communication backplane carrying data

between a source and a destination as described in Section 2.2. The data is

transported in a multi-hop fashion by a network of nodes that employ untuned

narrowband radios. We are interested in determining the throughput of this

communication backplane. In other words, we wish to find how many independent

packets can be transmitted simultaneously. To maximize the throughput of the

network, it is necessary to maximize the probability that during communication a

channel is occupied by exactly one transmitter. This will maximize the number of

channels that contain a decodable transmission. It can be shown that when there

are N channels available for communication and each transmitter is

independently and randomly assigned to a channel with the same probability

distribution, the probability that a channel is occupied by exactly one transmitter is

maximized when there are exactly N transmitters and each transmitter is equally

likely to be assigned to any of the channels. In this case, the probability that a

channel contains exactly one transmission is asymptotically, for large N , equal to

(the result is proven in Appendix A). This implies that, in order to maximize e/1

47

the throughput, the network should be operated with N active transmitters within

communication range of each other, in which case each transmission will

experience a collision with probability e/11− .

Having N active transmitters within communication range maximizes the

probability that a receiver will have exactly one transmitter in the range of

frequencies it monitors; however, it may still be possible that a transmitter occupies

a unique channel, but no receiver is tuned to that channel. In order to increase the

probability that a non-colliding transmission is heard as well as the probability that

a receiving node hears at least one packet, each one is equipped with several

receive radios, with each radio tuned to a different channel (by using a different

LC-circuit as the local oscillator for each radio - this also allows the nodes to

transmit on different random channels at different times by selecting any of its LC-

circuits to provide the carrier frequency for the transmitter).

Denote the number of receive-radios on each node with L . Section 3.3, derives

the relationship between the value of L and the throughput of the network. It

shows that it is possible to achieve throughput that is linear in the number of

channels even with a constant value of L . It is important to show that this is

achievable with a constant L because requiring L to grow with the number of

channels would correspond to requiring more hardware on each node, and this is

exactly what we are trying to avoid. It should be noted that the theoretical results

use bounding techniques, so the constants of this linear throughput that are

48

guaranteed by any particular value of L are pessimistic. To complement the

theoretical result and give guidelines for practical deployments, simulations are

used to estimate the throughput that can be achieved with different values of L .

3.2 MAXIMUM THROUGHPUT OVER MANY HOPS

Considering the multi-hop communication through a virtual grid as described in

Section 2.2, if every node in a block transmits on a random frequency, it is likely

that there will be transmissions at frequencies close to each other, thus any

receiver tuned to those frequencies will detect a collision and will not be able to

decode the individual transmissions. These collisions effectively erase some of

the packets, making it seem as if nodes in neighboring blocks communicate with

each other through an erasure channel. The question we are interested in is,

given N unit-capacity communication channels, how much data can

simultaneously be sent to the destination and have this data successfully received

and decoded by the destination that is H hops away (for now, let us assume that

H is a constant, though later it will be shown that H may be allowed to grow with

N , as long as , without affecting the asymptotic

throughput). We want to compare this to a fully-coordinated network employing

tuned radios, in which case exactly

0/)](log[ ⎯⎯ →⎯ ∞↑NNNH

N packets could be sent in each wave,

provided that there are N nodes in each block and every node selects a unique

frequency on which to communicate.

49

3.3 ACHIEVABLE THROUGHPUT OVER MANY HOPS

We will find the relative throughput of the untuned network by showing that the

connectivity of the network can be modeled as a random graph and then applying

known results from network coding literature. Namely, we make use of the result

that for communication in a graph of unit capacity links for which the connectivity is

not known a priori, a throughput equal to the max-flow between the source and the

destination is achievable with arbitrarily high probability by using random linear

network coding1 over a high enough field size [22]. However, in order to make use

of this result, we must find the max-flow of the graph. This is done by Result 1.

Since the connectivity of this random graph is not static (i.e. each wave of data will

encounter a different set of links), the packets have to carry the encoding vectors

in their headers to provide the destination with just the right information needed to

decode the source packets as in the scheme introduced in [23].

We will show that the throughput with network coding is linear in N over

hops as long as . By contrast, simple random routing (in

which forwarding nodes are only allowed to randomly select one of the packets

from each wave to forward, rather than combining the packets they receive in each

wave to form the output packet) has constant throughput over hops

(see Appendix B).

)(NH

0/)](log[ ⎯⎯ →⎯ ∞↑NNNH

NNH =)(

1 In random linear network coding, forwarding nodes send on each outgoing link a random linear combination of the packets

it receives on the input links. Each input packet is multiplied by a randomly chosen element from some Galois Field and these products are added together to form the outgoing packet [22].

50

3 . 3 . 1 R a n d o m G r a p h R e p r e s e n t a t i o n o f t h e N e t w o r k C o n n e c t i v i t y

We now create the random graph, shown in Figure 7, that models the connectivity

of the network. The N vertices in each column correspond to the N nodes in

each block during communication. The H columns correspond to attempting to

communicate over H blocks (for ease of notation, we first consider the case when

H is constant, rather than a function of N ). Each of the vertices in columns

has },...,2{ H L incoming links corresponding to the L receivers on each node.

Each link connects a vertex to a randomly, independently chosen vertex in the

previous column. Since transmissions experience collisions with probability

, each of the vertices in the graph is deleted with probability e/11− e/11− , in

which case all of its incoming and outgoing links are also deleted2. This means

that each of the links is deleted with probability e/11− because each receive-radio

has probability of being tuned to a frequency range that does not contain a

decodable transmission (i.e. either no transmission or more than one). The links

that are not deleted are equally likely to connect to any of the vertices in the

previous column that are not deleted because, given that a receive-radio has

exactly one transmission in its receive frequency range, the source of that

transmission is equally likely to be any of the transmitters that do not experience a

collision.

e/11−

2 In the random graph, the vertices are deleted independently of one another. This is not the case in the network since the

collisions are not independent; however, this approximation becomes accurate as N tends to infinity. The independence assumption allows for analytical tractability in what follows.

51

We label the resulting random graph as and show that the max-flow of

is close to if

eLG /11, −

eLG /11, − e/1 L is large enough.

L

1

2

N

1

1 2 3 H

L

1

L

1 L

1

L

1 L

1

L

1

L

1

L

1

Figure 7. Random graph representing connectivity in the network of nodes with untuned radios. Each vertex (node) in columns {2,…H} has L inputs, each one coming from a randomly and uniformly selected node in the previous column. Each node, along with its incoming and outgoing links, is deleted with probability 1-1/e.

3 . 3 . 2 M a x - F l o w o f t h e R a n d o m G r a p h

Result 1: For any constant β such that e/1<β there exists a constant number

of inputs/node L such that the max-flow of is greater than eLG /11, − N⋅β with high

probability as N goes to infinity.

52

We will prove this result by applying a modified version of a technique used in

Percolation Theory. The first step is to relate the likelihood of having many disjoint

end-to-end (E2E) paths to the likelihood of having even a single path E2E. Let us

define the following notation:

Let A be the event that there exists a path E2E and let be the event having the

following property: starting with any graph in , the deletion of any

rA

rA r vertices will

still result in A . This is equivalent to saying that any graph in has at least rA 1+r

vertex-disjoint paths E2E.

Lemma 1: Let r be a positive integer. Then

( ) ( ){ }APqq

qAP p

r

rp 1211

12

2 −⎟⎟⎠

⎞⎜⎜⎝

⎛−

≤− (12)

whenever . Here, 10 12 ≤≤≤ pp 11 1 pq −= and 22 1 pq −= , and the notation

( )⋅pP represents the probability that the event in parentheses occurs when vertices

in the graph are deleted with probability . p

Proof of Lemma 1: This proof is based on the proof in [1] of a similar result from

Percolation Theory. Let { }NiX ji ,...,1, ∈∀ and { }Hj ,...,1∈ be i.i.d. random

variables uniformly distributed in the interval [ ]1,0 , and to each vertex in row i and

column j of the grid assign the value . To create graphs and jiX , 1, pLG

53

2, pLG that have vertices deleted with probability and respectively, do the

following: first assign the values to each vertex in the grid. Then assign

1p 2p

jiX , L

links from each node in columns 2 through H to a randomly selected node in the

previous column. Finally, to create graph , for each vertex 1, pLG ji, , delete it iff

. To create graph , for each vertex 1, pX ji ≤2, pLG ji, , delete it iff . 2, pX ji ≤

We are interested in relating the likelihood that occurs in to the

likelihood that

rA2, pLG

A occurs in . Note that if does not occur in , then

there must be a set of vertices,

1, pLG rA2, pLG

B , such that:

a) All of the vertices in the set B are not deleted in 2, pLG

b) rB ≤

c) The graph 2, pLG obtained by deleting from the vertices in

2, pLG B

satisfies AG pL ∉2, .

There may exist many such sets B , in which case it is sufficient to pick any such

set. Suppose that rpL AG ∉2, , and that every vertex ji, in the set B satisfies

. It then follows from c) that 1,2 pXp ji ≤< AG pL ∉1, . Conditional on B , there is

a ( ) ( )[ ] ( )[ ] BB qqqppp 212221 /1/ −=−− probability that for all

vertices in

1,2 pXp ji ≤<

B ; therefore,

54

( )r

rpLpL qqq

AGAGP ⎟⎟⎠

⎞⎜⎜⎝

⎛ −≥∉∉

2

12,, 21

| (13)

Applying Bayes’s theorem and the fact that ( )rpLpL AGAGP ∉∩∉21 ,,

( )rpLp AGP ∉≤12 , gives the result of Lemma 1. �

This result of Lemma 1 is particularly useful if we can show that the probability that

that decays exponentially (with AG pL ∉1, N ) to zero for some . In other

words, if we can show that

1p

( ){ } ( )LpNp eAP ,1

11 α−≤− , then we have

( ) ( )LpNr

rp eqq

qAP ,

12

2 12

1 α−⎟⎟⎠

⎞⎜⎜⎝

⎛−

≤− (14)

and applying Nr ⋅= β tells us that the probability of not having N⋅β (actually

1+⋅Nβ ) paths decays to zero exponentially as long as

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛−

<12

21 log/,

qqq

Lpαβ (15)

The problem now becomes finding an appropriate bound on ( )AGP pL ∉1, .

55

Lemma 2: If , then ( ) 11 1 >− pL ( ) ( )∗−<⋅≤∉ ZNYPHAGP pL 1/1, where Y is a

random variable drawn from the Binomial distribution

and .

( ) ⎟⎠⎞⎜

⎝⎛ −+− ∗L

ZppN 11 11,

( )[ ] ( )LpLZ −∗ −= 1/111

Lemma 2 allows us to relate the probability that no E2E path exists in to the

probability that the mean of

1, pLG

N Bernoulli random variables deviates from its

expected value by some amount. Since this probability decays to zero

exponentially in N , this result, along with Lemma 1 will allow us to prove that the

number of vertex-disjoint paths in will be linear in 2, pLG N with high probability.

The constant, β , of this linear relationship will depend on the value of L . Note

that β also depends on the value ; however, the value is not fundamental to

the graph , and we are allowed to assign any value to , as long as it is

larger than , so as to maximize the bound on

1p 1p

2, pLG 1p

2p β guaranteed by Lemma 1 and

Lemma 2.

Also note that the condition that ( ) 11 1 >− pL is imposed to ensure that .

It can be shown that if

01 >− ∗Z

( ) 11 1 >− pL , then AG pL ∈1, with high probability,

otherwise with high probability. However, in our case it is not enough to

show that with high probability for appropriate values of

AG pL ∉1,

AG pL ∈1, L and . We 1p

56

must also bound the rate of this convergence (using Lemma 2) in order to apply

Lemma 1 to our original problem.

Proof of Lemma 2: Consider the number of vertices in each column of that

were not deleted and have a path back to column 1. Let us call these vertices

“good” and all the others “bad.” Conditioned on the number of bad (good) vertices

in column

1, pLG

j , vertices in column 1+j are themselves good or bad independently

of each other and with equal probability. Let the number of bad vertices in column

j be NZ ⋅ for some Z that satisfies 10 ≤≤ Z . Then vertices in column 1+j are

themselves bad with probability ( ) LZpp 11 1−+ .

Consider the probability that the number of bad nodes in column is greater

than

1+j

NZ ⋅ , given that the number of bad nodes in column j is NZ ⋅ . If

, this probability should be exponentially small in ( ) ZZpp L <−+ 11 1 N . This

probability is minimized when the difference ( ) LZppZ 11 1−+− is greatest.

Setting the first derivative of this function, which is convex in the interval , to

zero shows that the expression is maximized when

)1,0[

( )[ ] ( )LpLZ −−= 1/111 .

Let represent the ratio of bad nodes to the total number of nodes in column

(i.e. the total number of bad nodes in column is ).

jR j

j NR j ⋅ AG pL ∉1, is

equivalent to . Note that if 1=HR 1=jR for some , then [ )Hj ,1∈

57

][ HjkRk ,11 +∈∀= because if column j has no connectivity back to column 1,

then none of the columns after j will have connectivity to column 1 either. We

prove Lemma 2 by arguing that ( )AGP pL ∉1, is upper bounded by the probability

that there exist a for which [ Hj ,1∈ ] 1=jR , and this is upper bounded by the

probability that there exist a ],1[ Hj∈ for which . Mathematically, ∗> ZR j

( ) [ ]( )

( ) (

( ) (

( ) ( ) ( )

)

)

( )∗∗∗

=

∗−

∗∗

=

∗−

∗∗

∗

−<⋅<

−<⋅−+>=

=>+><

≤>+>≤

>∈∃<∉

∑

∑

ZNYPH

ZNYPHZRP

ZRZRPZRP

ZRZRPZRP

ZRtsHjPAGP

p

H

jjjpp

H

jjjpp

jppL

1/

1/1

|

|

..,1

1

211

211

,

1

11

11

11

(16)

where the second inequality is by the union bound, the last inequality is because

, and ( ) LZppp ∗−+< 111 1 Y is a random variable drawn from the

Binomial⎜⎝⎛ distribution. � ( ) ⎟

⎠⎞−+− ∗L

ZppN 11 11,

Now, we must find the rate at which ( )∗−< ZNYP 1/ decays to zero and apply

this to Lemma 1 to show that the number of vertex-disjoint paths in grows 2, pLG

58

linearly with N . Fortunately, this rate is well known [25]. We use the following

notation: and to write [25]: ( ) ]1[1 11LZppq ∗−+−= ∗+−= Zq 1ε

( ) ( )( ) ( )εε

εεε+−−−−

∗⎟⎟⎠

⎞⎜⎜⎝

⎛−+−

⎟⎟⎠

⎞⎜⎜⎝

⎛ −≤>−=−<

qNqN

qq

qqNYqPZNYP

1

11/1/

(17)

The right-hand side of the equation can also be expressed as

( ) ( )

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−+−

⎟⎟⎠

⎞⎜⎜⎝

⎛ −+−−−− εε

εεqNqN

qq

qq

1

11

logexp (18)

giving us

( )( ) ( )

⎥⎥⎦

⎤

⎢⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−+−

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

+−− εεεε

αqq

qq

qqLp

1

1 11log,

(19)

as the ( Lp ,1 )α we need to plug into (15). Evaluating (15) with a large enough but

constant value for L and appropriately chosen value for provides the

guarantee that he max-flow of is at least

1p

eLG /11, − N⋅β for any e/1<β , proving

Result 1. ■

The throughput is not dependent on H because we held H constant while letting

N go to infinity. However, (16) implies that H may grow with N without affecting

the throughput result as long as goes to zero as NNH /))(log( N grows,

because all we need is for ( )AGP pL ∉1, to decay exponentially with N . Since

59

( )∗−< ZNYP 1/ decays exponentially with N , so does ( )AGP pL ∉1, as long as

goes to zero as NNH /))(log( N grows.

3.4 SIMULATION RESULTS

Note that the result is proven using bounding techniques, so it is not tight. For

example, for , the result proves that the throughput is guaranteed to be at

least 2%, whereas simulations show that

10=L

10=L is good enough to give

throughput of 30%. Figure 8 shows the result of simulations for various values of

L at 1000=N .

This max-flow result, together with the random network coding result of [22] tells us

that each wave of data can deliver nearly packets from the sources to the

destination, compared to

eN /

N packets that could be transported in each wave if the

network were composed of nodes with tuned radios and perfect coordination.

In practical deployments, having about 10 radios per node would be realistic. In

this case, the theoretical results tell us that a throughput of only N⋅02.0 can be

guaranteed. However, simulations show that a throughput of N⋅3.0 can be

expected. This is a good trade-off for many applications in which the demand on

bandwidth is not as strict as the demand for low-cost nodes that can operate at

power levels comparable to the power levels that can be supplied by the energy

scavenging mechanisms.

60

Figure 8. Simulation results showing the ratio of the throughput achievable with untuned radios and network coding to the throughput of a perfectly tuned and synchronized network, as a function of the number of inputs per node for 1000 channels and 1000 nodes per block.

61

C h a p t e r 4

DISTRIBUTED COMPRESSION3

The appeal of using distributed compression lies in the fact that each sensor can

compress its data without knowing what the other sensors are measuring. In fact,

an individual sensor does not even need to know the correlation structure between

its data and that of the other sensors. This is especially desirable in a setting

where the sensor nodes are power-constrained, because each sensor node does

not need to spend power on an algorithm for learning the correlation structure

between its own measurement and other sensors' measurements. Moreover,

each sensor node does not need to spend power on receiving and processing

other sensors' measurements. As a result, an end-to-end compression system

that achieves a significant savings across the network can be built, where the

endpoints consist of the sensor node and the data-gathering node.

To build a distributed compression system, we propose to use an asymmetric

coding method among the sensors. Specifically, we propose to build upon the

architecture of Figure 9 which is designed for two nodes. In Figure 9, there are

two nodes, each of which measures data using an Analog-to-Digital (A/D)

converter. One of the sensor nodes will either transmit its data Y directly to the

data-gathering node or compress its readings with respect to its own previous

3 The work presented in this chapter was done in collaboration with Jim Chou.

62

readings while the other sensor node compresses its data X with respect to its

own previous readings and readings from other sensors and then transmits the

compressed data to the data-gathering node. The decoder will then try to

decode to

m

m X , given that Y is correlated to X . In specific cases (e.g., discrete

alphabet or continuous alphabet with i.i.d. Gaussian correlation), it can be shown

that the compression performance of the above architecture can match the case

where Y is available to the sensor node that is measuring X .

EncoderA/D

A/D

Decoder

Data

Data

X

Y

Y

X̂

Sensor Node

Sensor Node

Data Gathering Node

m

Figure 9. Distributed compression set-up: The encoder compresses X making use of the fact that the decoder has access to Y, which is correlated to X. This allows the encoder to compress X to fewer bits, without loosing performance, than it would if the decoder did not have access to Y.

To extend the above architecture (Figure 9) to nodes, one node can send its

data either uncoded (i.e.,

n

Y ) or compressed with respect to its past. The data-

63

gathering node can decode this reading without receiving anything from the other

sensors. The other sensors can compress their data with respect to Y , without

even knowing their correlation structure with respect to Y . The data-gathering

node will keep track of the correlation structure and inform the sensors of the

number of bits that they shall use for encoding. In the compression literature, Y is

often referred to as side-information and the above architectures are often referred

to as compression with side information [34].

To develop code constructions for distributed compression, first some background

information on source coding with side information is provided and then a code

construction that achieves good performance at a low encoding cost is introduced.

4.1 BACKGROUND ON COMPRESSION WITH SIDE INFORMATION

In 1973, Slepian and Wolf presented a surprising result to the source coding

(compression) community [34]. The result states that if two discrete alphabet

random variables X and Y are correlated according to some arbitrary probability

distribution , then ( yxp , ) X can be compressed without access to Y without

losing any compression performance with respect to the case where the encoder

of X does have access to Y . More formally, without having access to Y , X can

be compressed using bits where ( YX )H ,

64

( ) ( ) ( ) ( )∑ ∑−=y x

XXY yxPyxPyPYXH |log|, 2 . (20)

The quantity, is often interpreted as the ``uncertainty'' remaining in the

random variable

( YXH , )

X given the observation of Y [35]. This is the same

compression performance that would be achieved if the encoder of X had access

to Y . To provide the intuition behind this result, we provide the following example.

Example 1: Consider X and Y to be equiprobable 3-bit data sets which are

correlated in the following way: ( ) 1, ≤YXdH , where ( )YXdH , denotes Hamming

distance between X and Y . When Y is known both at the encoder and decoder,

we can compress X to 2 bits, conveying the information about the uncertainty of

X given Y (i.e., the modulo-two sum of X and Y given by: ( ), 000 ( )100 ,

, and ). Now, if ( 010 ) )( 001 Y is known only at the decoder, we can surprisingly

still compress X to 2 bits. The method of construction stems from the following

argument: if the decoder knows that ( )000=X or ( )111=X , then it is wasteful

to spend any bits to differentiate between the two. In fact, we can group

and into one coset (it is exactly the so-called principal

coset of the length-3 repetition code). In a similar fashion, we can partition the

remaining space of 3-bit binary codewords into 3 different cosets with each coset

containing the original codewords offset by a unique and correctable error pattern.

Since there are 4 cosets, we need to spend only 2 bits to specify the coset in

which

( )00=X 11=X0 ( )1

X belongs. The four cosets are given as:

65

Coset 1 ( )111,000= Coset 2 ( )011,100=

Coset 3 ( )101,010= Coset 4 ( )110,001=

The decoder can recover X perfectly by decoding Y to the closest (in hamming

distance) codeword in the coset specified by the encoder. Thus the encoder does

not need to know the realization of Y for optimal encoding.

In their paper [34], Slepian and Wolf established a set of achievable rate-tuples

that are needed to represent X and Y . These rate-tuples are represented as a

graph in Figure 10. From the graph, we can see that a minimum of

bits are needed to represent both ( ) ( ) =+ YXHXH | ( YXH , ) X and Y . These

bits can be divided either evenly or unevenly in the encoding of X and Y as

shown in Figure 10. This dissertation focuses on the case where each sensor

node uses roughly the same number of bits for encoding its data, so that power

consumption will be evenly distributed among the nodes in the network. This rate

region corresponds to the straight line in the achievable region of Figure 10, and

can be achieved by using either symmetric codes (see [36]) or by time sharing

asymmetric codes. The latter solution is not only simpler but also more robust to

losses in the network as will be shown later. As an example of time sharing, let us

refer to Example 1. Assuming that node 1 is measuring data X , and node 2 is

measuring correlated data, Y , then we can have node 1 send its full data uncoded

during the even time instants and its data encoded (as in example 1) during the

66

odd time instants. Similarly, node 2, will send its data uncoded during the odd time

instants and its data encoded (as in Example 1) during the even time instants. If

( ) ( )YHXH = , then node 1 and node 2 will use approximately the same amount

nding its data. In the case that of power in se ( ) ( )YHXH ≠ , then we can have

node 1 either send its data for a larger or smalle the time so that the

number of bits used by node 1 and node 2 are roughly equal for long durations of

time.

r proportion of

R(Y)

R(X)

H(X)

H(X|Y)

H(Y)H(Y|X)

Achievable Rate Region

R(X) = R(Y)

Figure 10. Achievable rate regions in distributed compression: The horizontal axis corresponds to the rate for encoding Y and the vertical axis corresponds to the rate for encoding X. The 45°

The above results were established only for lossless compression of discrete

random variables. In 1976, Wyner and Ziv extended the results of [34] to lossy

line in the achievable region represents the region where the rate needed to encode X is equal to the rate needed to encode Y.

67

The results established by [34] and [37] are theoretical results; however, and

consequently do not provide intuition as to how one might achieve the predicted

distributed compression by proving that under certain conditions [37], there are no

performance degradations for lossy compression with side information available at

the decoder as compared to lossy compression with side information available at

both the encoder and decoder.

theoretical bounds practically. In 1999, Pradhan and Ramchandran [38]

prescribed a constructive framework and practical constructions for distributed

compression in an attempt to achieve the bounds predicted by [34] and [37]. The

resulting codes perform well, but cannot be used directly for sensor networks

because they are not designed to support different compression rates. To achieve

distributed compression in a sensor network, it is desirable to have one underlying

codebook construction that is not changed among the sensors but can also

support multiple compression rates. The reason for needing a codebook that

supports multiple compression rates is that the compression rate is directly

dependent on the amount of correlation in the data, which might be time-varying.

Motivated by the above, this dissertation provides a tree-based distributed

compression code that can provide variable-rate compression without the need for

changing the underlying codebook construction.

68

4.2 CODE CONSTRUCTION FOR DISTRIBUTED COMPRESSION

This describes a codebook construction that will allow an encoder to encode a

random variable X given that the decoder has access to a correlated random

variable Y . This construction can then be applied to a sensor network as shown

in Figure 9. The main design goal of the code construction is to support multiple

compression rates, in addition to being computationally inexpensive. In support of

the goal of minimizing the computations for each sensor node, code constructions

based on complicated error correction codes are not be considered here. These

codes can, however, be easily incorporated into the construction but will lead to

more complexity for each sensor node. The uncoded code construction is as

follows. Start with a root codebook that contains representative values on the

real axis. Then partition the root codebook into two subsets consisting of the even-

indexed representations and the odd-indexed representations. Represent these

two sub-codebooks as children nodes of the root codebook. Further, partition

each of these nodes into sub-codebooks and represent them as children nodes in

the second level of the tree structure. Repeat this process n times, resulting in an

-level tree structure that contains leaf nodes, each of which represents a sub-

codebook that contains one of the original values. An example partition is

given in Figure 11, where we use

n2

n n2

n2

4=n and show only levels of the partition.

Note that from this tree-based codebook construction that if the spacing between

representative values is denoted by

2

∆ , then each of the sub-codebooks at level- i

69

in the tree will contain representative values that are spaced apart by ∆i2 . In a

sensor network, a reading will typically be represented as one of the values in

the root codebook assuming that the sensor uses an -bit A/D converter. Instead

of transmitting -bits to represent the sensor reading, as would be traditionally

done, it is possible to transmit

n2

n

n

ni < bits if the decoder has access to side-

information Y , that is no further than ∆−12i away from X . The encoder need

only transmit the bits that specify the sub-codebook that i X belongs to at level-

, and the decoder will decode n Y to the closest value in the sub-codebook that

the encoder specified. Because Y is no further than from the

representation of

∆−12i

X , the decoder will always decode Y to X . The functionality of

the encoder and decoder is described in detail below.

0 2 4 6 8 10 12 14

0 4 8 12

.

.

.

.

.

.

2∆ 2∆

0 10 11 12 1413 151

1 3 5 7 9 11 13 15

0 1 0 1

10

∆

4∆ 4∆ 4∆ 4∆

2 6 1 4 1 5 9 13 3 7 11 15

r r r r r r r r r r r r r r r r



2 3 4 5 6 7 8 9

0 1

Figure 11. A tree based construction for compression with side information: The root of the tree contains 24 values, and two partitions of the root quantizer are shown. The compressed value of a representative value, ri, is given by the path through the tree taken to reach the group of values containing ri. Taking more steps through the tree results in less compression and also less ambiguity about the observed representative value.

70

1. Encoder: The encoder will receive a request from the data-gathering node

requesting that it encode its readings using i bits. The first thing that the

encoder does is find the closest representation of the data from the

values in the root codebook (this is typically done by the A/D converter).

Next, the encoder determines the sub-codebook that

n2

X belongs to at level-

. The path through the tree to this sub-codebook will specify the bits that

are transferred to the data-gathering node. The mapping from

iX to the

bits that specify the sub-codebook at level can be done through the

following deterministic mapping:

i

( ) =Xf index ( ) iX 2mod , where ( )Xf

represents the bits to be transmitted to the decoder and index() is a

mapping from values in the root codebook to their respective indices. For a

given X and , will be an -bit value which the data-gathering node

will use to traverse the tree.

i ( )Xf i

2. Decoder: The decoder (at the data-gathering node) will receive the i -bit

value, , from the encoder and will traverse the tree starting with the

least-significant-bit (LSB) of

( )Xf

( )Xf to determine the appropriate sub-

codebook, S to use. The decoder will then decode the side-information,

Y , to the closest value in S : iSr rYXi

−= ∈minargˆ where represents

the codeword in

ir

thi S . Assuming that Y is less than away from ∆−12i X ,

where is the spacing in the root codebook, then the decoder will be able

to decode

∆

Y to the exact value of X , and recover X perfectly. The

following example will elucidate the encoding/decoding operations.

Example 2: Consider the 4-level tree codebook of Figure 12. Assume that the

data is represented by the value 9.09 =r in the root codebook and the data-

gathering node asks the sensor to encode X using 2 bits. The index of is 9, so 9r

71

( ) =Xf 14mod9 = . Thus, the encoder will send the two bits, 01, to the data-

gathering node (see Figure 12). The data-gathering node will receive 01 and

descend the tree using the least-significant bit first (i.e., 1 and then 0) to determine

the sub-codebook to decode the side-information with. In the example, the value

of the side-information, Y , is 0.8, and Y is decoded in the sub-codebook located

at 1,0 (where 1 represents the least significant bit and 0 represents the most

significant bit) in the tree to find the closest codeword. This codeword is which

is exactly the value representing

9r

X . Thus, 2 bits were used to convey the value

of X instead of using the 4 bits that would have been needed if no encoding had

been done.

0 2 4 6 8 10 12 14

0 4 8 12

.

.

.

.

.

.

2∆ 2∆

0 2 3 4 5 6 7 8 9 10 11 12 1413 151

1 3 5 7 9 11 13 15

0 1 0 1

10

4∆ 4∆ 4∆

2 6 10 14 1 5 9 13 3 7 11 15




0.0

∆=0.10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1.0 1.1 1.2 1.3 1.4 1.50.8

X = 0.9

Y = 0.8

0.1 0.5 0.9 1.3

4∆=0.4

Figure 12. An example of the tree based codebook. The encoder is asked to encode X using 2 bits, so it transmits 01 to the decoder. The decoder will use the bits 01 in ascending order from the least significant bit (LSB) to determine the path to the sub-codebook to use to decode Y with.

72

4.3 CORRELATION TRACKING

In the above encoding/decoding operations it is assumed that the decoder for

sensor has available to it at time some side-information that is

correlated to the sensor reading, . To maximize efficiency, all of the data that

is already available at the decoder and correlated to should be used as the

side information. To do this effectively, simple linear predictive model is proposed

here where is a linear combination of values that are available at the decoder:

j k )( jkY

)( jkX

)( jkX

)( jkY

∑∑−

==− +=

1

1

)(

1

)()(j

i

iki

M

l

jlkl

jk XXY βα , (21)

where represents past readings for sensor and represents present

sensor readings from sensor .

)( jlkX − j )(i

kX

i 4 The variables lα and iβ are weighting

coefficients. can then be thought of as a linear prediction of based on

past values (i.e., ;

)( jkY )( j

kX

)( jlkX − { }Ml ,...,1∈ ) and other sensor readings that have already

been decoded at the data-gathering node (i.e., ; )(ikX { }1,...,1 −∈ ji , where

indexes the sensor and

i

1−j represents the number of readings from other

sensors that have already been decoded). A linear predictive model is used here

4 Note that for simplicity, the above prediction model is based on a finite number of past values and a single present value for each of the other sensor readings that have been decoded. This model can be generalized to the case where past values of other sensors are also included in the prediction, which might be useful for audio applications where echoes result in a strong temporal correlation.

73

because it is not only analytically tractable, but it is also optimal in the limiting case

where the innovations noise can be modeled as i.i.d. Gaussian random variables.

In order to leverage the inter-node correlations, one of the sensors always sends

its data either uncoded or compressed with respect to its own past data.

Furthermore, the sensors are numbered in the order that they are queried. For

example, at each time instant, one of the sensors will send its reading , either

uncoded or coded with respect to its own past. If a sensor chooses to code its

present value with respect to its past, then it can also use the codebook

construction given in Figure 12, which will simplify the encoding architecture and

reduce power consumption because it will not need to spend power on correlation

modeling. The reading for sensor 2 can then be decoded with respect to

)1(kX

)1(1

1

)2()2(k

M

llklk XXY βα += ∑

=− . (22)

Each that is decoded can then be used to form predictions for other sensor

readings according to (21). The prediction, , determines the number of bits

needed to represent . In the extreme case that perfectly predicts

(i.e., ), then zero bits are needed to represent because it is

perfectly predictable at the decoder. Thus, the main objective of the decoder is to

derive a good estimate of for sensor ,

)( jkX

)( jkY

)( jkX )( j

kY )( jkX

)()( jk

jk XY = )( j

kX

)( jkX j { }Lj ,...,1∈ , where L represents the

74

number of sensors. In more quantitative terms, the goal is for the decoder to be

able to find the lα , , and { Ml ,...,1∈ } iβ , { }1,...,1 −∈ ji , that minimize the mean

squared error between and . )( jkY )( j

kX

To find the lα and iβ that minimize the mean squared prediction error we can

utilize Wiener filter theory [39]. We start by representing the prediction error as a

random variable, . We can then rewrite the mean squared error

as:

)()( jk

jkj XYN −=

[ ]

( ) [ ] [[ ] [ ]

[ ] .

2

22

1

1,

)()(

1,

)()(

1

1

1

)()(1

)()(

1

)()(2)(

21

1

)(

1

)()(2

∑

∑∑∑

∑∑

∑∑

−

=

=−−

=

−

=−

==−

−

==−

+

++

−−⎥⎦⎤

⎢⎣⎡=

⎥⎥

⎦

⎤

⎢⎢

⎣

⎡

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+−=

j

hi

hk

ikhi

M

hl

jlk

jhkhl

M

l

j

i

jlk

ikil

N

i

ik

jki

M

l

jlk

jkl

jk

j

i

iki

M

l

jlkl

jkj

XXE

XXEXXE

XXEXXEXE

XXXENE

ββ

ααβα

βα

βα

]

}

(23)

Now, if we assume that and are pairwise jointly wide sense stationary

[39] for , then we can re-write the mean squared error as:

)( jkX )(i

kX

{ 1,...,1 −∈ ji

[ ] ( ) jj

zzTjj

Tjxxj RPrNE jj ΓΓ+Γ−=

rrrr202 , (24)

where

75

( )( )

( )( )( )

( )⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=Γ

−− 0

00

21

,

1

2

1

1

2

1

2

1

jj

j

j

jj

jj

jj

xx

xx

xx

xx

xx

xx

j

j

Mj

r

rr

Mr

rr

P

L

Lr

L

Lr

β

ββα

αα

(25)

and we use the notation ( ) [ ]jlk

jkxx XXElr jj += . With this notation, we can express

as: jzzR

⎥⎥⎦

⎤

⎢⎢⎣

⎡=

iiij

ijjj

xxT

xx

xxxxjzz RR

RRR

(26)

where is given as: jj xxR

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( ) ⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

−−

−−

=

021

201110

jjjjjj

jjjjjj

jjjjjj

jj

xxxxxx

xxxxxx

xxxxxx

xx

rMrMr

MrrrMrrr

R

L

MOMM

L

L

(27)

and and are given as ij xxR iixxR

76

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( ⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

−

=

−

−

−

MrMrMr

rrrrrr

R

jjjj

jjjj

jjjj

ij

xxxxxx

xxxxxx

xxxxxx

xx

121

121

121

1

222111

L

MOMM

L

L

)

(28)

and

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

−−−−

−

−

000

000000

112111

122212

112111

jjjj

j

j

ii

xxxxxx

xxxxxx

xxxxxx

xx

rrr

rrrrrr

R

L

MOMM

L

L

(29)

To find the set of coefficients (represented by jΓr

) that minimize the mean squared

error, we differentiate (24) with respect to jΓr

to obtain:

[ ]j

jzzj

j

j RPNE

Γ+−=Γ∂

∂ rrr 22

2

. (30)

Setting the above equal to zero and solving for the optima jΓl r

by Γr

]:

, which we denote

opt, , we arrive at the standard Wiener estimate [39j

jj

zzoptj PRrr 1

, )( −=Γ . (31)

If our assumption of stationarity holds, then the data-gathering node can request

for uncoded data from all of the sensors for the first K rounds of requests and

77

calculate the Wiener estimate (31) once from these K rounds of samples. The

set of coefficients determined from the Wiener estimate can then be used to form

the side information for each future round of request. In practice, however, the

statistics of the data may be time varying and as a result, the coefficient vector,

, must be continuously adjusted to minimize the mean-squared error. One

method of doing this is to move

jΓr

jΓr

in the opposite direction of the gradient of the

objective function (i.e., the mean squared error) for each new sample received

during round : 1+k

( ) ( ) ( )kj

kj

kj ∇−Γ=Γ + µ

rr 1 , (32)

where is given by (30) and ( )kj∇ µ represents the amount to descend opposite to

the gradient. The goal of this approach is to descend to the global minimum of the

objective function. We are assured that such a minimum exists because the

objective function is convex. In fact, it has been shown, that if µ is chosen

appropriately (e.g., max/2 λµ < , where maxλ represents the largest eigenvalue of

) then (32) will converge to the optimal solution [39]. In the following

subsection we will show how (32) can be calculated in practice and how to

incorporate adaptive prediction with the distributed source code discussed in the

previous section.

jzzR

78

2 . 4 . 3 P a r a m e t e r E s t i m a t i o n

From (30) and (32), we know that the coefficient vector should be updated as:

( ) ( ) ( )( )kj

jzzj

kj

kj RP Γ+−−Γ=Γ + rrrr

22211 µ . (33)

In practice, however, the data-gathering node will not have knowledge of jPr

and

and will therefore need an efficient method for estimating jzzR jP

r and . One

standard estimate is to use

jzzR

( )jk

jxj ZXP ,rr

= and Tjkjkzz ZZR ,,

rr= where

( )

( )

( )

( )

( )

( ) ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

=

−

−

−

−

1

2

1

2

1

,

jk

k

k

jMk

jk

jk

jk

X

XX

X

XX

Z

M

M

r, (34)

so that (33) becomes

( ) ( ) ( ) ( )( )( )

jkjkk

j

kj

Tjk

jkjk

kj

kj

NZZXZ

,,

,,1

rr

rrrrr

µµ

+Γ=Γ+−−Γ=Γ +

, (35)

79

where the second equality follows from the fact that ( )kj

Tjk

jk ZY Γ=

rr,

)( and

. The equation described by (35) is well known in the adaptive

filtering literature as the Least-Mean-Squares (LMS) algorithm and the steps in

calculating the LMS solution are summarized below:

)()(,

jk

jkjk YXN −=

1. ( )kj

Tjk

jk ZY Γ=

rr,

)( .

2. . )()(,

jk

jkjk YXN −=

3. ( ) ( )jkjk

kj

kj NZ ,,

1 rrrµ+Γ=Γ + .

In the above, µ should be chosen to be less than the largest eigenvalue of the

correlation matrix. To use the LMS algorithm, the data-gathering node will start by

querying all of the sensors for uncoded data for the first K rounds of requests.

The value of K should be chosen to be large enough to allow the LMS algorithm

to converge. After K rounds of requests have been completed, the data-

gathering node can then ask for coded values from the sensor nodes and decode

the coded value for sensor j with respect to its corresponding side information,

( )kj

Tjjkk ZY Γ=rr

,)( . The value of jΓ

r will continue to be updated to adjust to changes

in the statistics of the data. More specifically, for each round of request and each

value reported by a sensor, the decoder will decode to the closest codeword

in the sub-codebook,

)( jkY

S , specified by the corresponding sensor

80

( ) ( )i

jk

Sr

jk rYX

i

−=∈minargˆ . (36)

From Section 4.2, we know that ( )jkX̂ will always equal ( )j

kX as long as the sensor

node encodes using bits so that ( )jkX i jk

i N ,12 >∆− . If jk

i N ,12 <∆− , however,

then a decoding error will occur. We can use Chebyshev's inequality [40] to bound

this probability of error:

[ ]( )21

2

,1

22

∆≤<∆

−

−

i

Njk

i jNPσ

, (37)

where is drawn from a distribution with zero mean and variance . Thus,

to insure that

jkN ,2

jNσ

[ ]jki NP ,

12 <∆− is less than some probability of error, , we can

choose

eP

( )21

2

2 ∆=

−i

Ne

jPσ

. The value of i that will ensure this probability of error is

then given as

( )1

2log

21

21

2

2 +⎥⎥⎦

⎤

⎢⎢⎣

⎡

∆=

−i

N jiσ

, (38)

Thus, for a given , the data-gathering node should ask for -bits from each

sensor according to (38). Note that it is not necessary to be over-conservative

when choosing because Chebyshev's inequality is a loose bound.

eP i

eP

81

From (38), we can see that the data-gathering node must maintain an estimate of

the variance of the prediction error, , for each sensor in order to determine the

number of bits to request from each sensor. The data-gathering node can initialize

:

2jNσ

2jNσ

∑=−

=K

ijiN N

Kj1

2,

2

11σ (39)

during the first K rounds of requests. To update , the data-gathering node

can form the following filtered estimate:

2jNσ

( ) 2,

2,

2, 1 jkoldNnewN N

jjγσγσ +−= , (40)

where is the previous estimate of and 2,oldN j

σ 2jNσ γ is a ``forgetting factor'' [39].

We choose to use a filtered estimate to adapt to changes in statistics. The block

diagram of the decoding structure at the data-gathering node is shown in Figure

13. It is different from standard LMS in that the encoder cannot mimic the

processing done by the decoder, because the encoder (i.e., sensor node) does not

have access to all the information used by the decoder (i.e., data-gathering node).

This approach is also different from previous works on distributed source coding in

that the correlated side information is not generated by a single correlated source

but is obtained by forming the prediction of the value to be decoded based on

82

information from other correlated sources and past values generated by the source

whose value is to be decoded.

FilterAdaptive

CompressionDistributed

Z−1

+ NoiseTracker

y[n]Y[n]

x[n−1]

x[n]

c[n]

e[n] +−

i[n+1]

Figure 13. Adaptive filtering block used to form the side information and decode the sensor reading. Both spatial and temporal correlations are exploited. The sensor measurement is represented by x[n], the coset information from the sensor node is given by c[n] and the measurements from other sensors are represented by the vector Y[n]. The number of bits used for encoding at time n+1 is given by i[n+1].

In the above, it is possible to improve upon these proposed methods for adapting

the prediction coefficients and the number of bits used for encoding. First, it was

assumed that the predictive model order is fixed. More specifically, the prediction

of each sensor measurement was modeled as a weighted combination of a fixed

number of measurements that are available at the decoder. In practice, by

choosing an appropriate model order, one can lower the prediction error. One

efficient way of doing this, is by maintaining a bank of predictors that are of

different model orders and then using a weighted combination of these predictors

83

as the final prediction estimate. Singer and Feder [41] showed that this method of

adaptive prediction is not only efficient but has some desirable ``universal''

properties that allow claim for performance equivalence with respect to a ``fully-

knowledgeable'' fixed system. In the interests of keeping things simple and getting

the main point across, this work does not focus on details of the predictor, instead

choosing a predictor with a fixed model order. More sophisticated prediction

methods are left as a subject for future work. Another area of future work is to

improve upon our proposed bounding technique. In this work, a universal but

loose bound (i.e., Chebyshev’s inequality) was used for bounding the probability of

error. One can find better bounds for bounding the probability of error if one can

suitably model the prediction error. For example, if the prediction error can be

modeled as Gaussian, then a tight bound on the probability of error can be

determined. It was decided to use a loose bound in this work, to highlight the fact

that even if no assumptions are made on the data, one can still attain significant bit

savings.

... dii

4 . 3 . 2 D e c o d i n g E r r o r

As mentioned above, it is always possible for the data-gathering node to make a

decoding error if the magnitude of the correlation noise, jkN , , is larger than ∆−12i

where is the number of bits used to encode the sensor reading for sensor at

time . Two approaches for dealing with such errors are proposed. One method

i j

k

84

)

is to use error detection codes and the other method entails using error correction

codes.

To use error detection, each sensor node can transmit a cyclic redundancy check

(CRC) [42], which is computed based on the original measurements, for every

readings that it transmits. The data-gathering node will decode the readings

using the tree-structured codebook as above and compare its own calculation of

the CRC (based on the readings it decodes) to the CRC transmitted by the

sensor. If an error is detected (i.e, the CRC does not match), then the data-

gathering node can either drop the readings or ask for a retransmission of the

readings. Whether the data-gathering node drops the readings or asks for a

retransmission is application dependent, and we do not address this issue in this

paper. Furthermore, by using Chebyshev's inequality (37), the data-gathering

node can make the probability of decoding error as small as it desires which

translates directly into a lower probability of data drops or retransmissions.

m

m

m

m

m m

The other method of guarding against decoding error is to use error-correction

codes, such as an ( Reed-Solomon code [43] that can operate on KM , K

sensor readings and generate ( )KM − parity check symbols, which are

calculated based on the original readings. These ( )KM − parity check symbols

can be transmitted to the data-gathering node along with the K encoded sensor

readings. The data-gathering node will decode the K sensor readings using the

tree-based structure mentioned above and upon receiving the parity ( KM − )

85

check symbols, it can correct for any errors that occurred in the K sensor

readings. If more than ( ) 2/KM − errors exist in the K sensor readings, then the

Reed-Solomon decoder will declare that the errors can not be corrected and in this

case, the data must be either dropped or retransmitted.

4.4 QUERYING AND DATA REPORTING ALGORITHM

This section combines the concepts of the previous sections to formulate the

algorithms to be used by the data-gathering node and by the sensor node.

4 . 4 . 1 D a t a - G a t h e r i n g N o d e A l g o r i t h m

The data-gathering node will, in general make N rounds of queries to the sensor

nodes. In the first K rounds of queries, the data-gathering node will ask the

sensors to send their data uncoded. The reason for this is that the data-gathering

node needs to ``learn'' the correlation structure between sensor readings before

asking for compressed readings. Thus, the data-gathering node will use the first

K rounds of readings for calculating the correlation structure in accordance with

Section 4.3. After K rounds of readings, the data-gathering node will have an

estimate of the prediction coefficients to be used for each sensor (see (32)). Note

that should be chosen large enough to allow for the LMS algorithm to converge.

For each round after

k

K , one node will be asked to send its reading

86

``uncompressed'' with respect to the other sensors.5 The data-gathering node will

continuously maintain a counter for the number of bits that each sensor node has

sent and request for the node that has sent the least number of bits to send its

data ``uncompressed''. This is a method for insuring that each of the sensor nodes

will send approximately the same number of bits and hence use approximately the

same amount of energy. In theoretical terms, this is a method for achieving a

symmetric distributed code construction by using time sharing (see Section 4.1).

Upon receiving a transmission from a sensor, the data-gathering node will decode

it (if it is a compressed reading) with respect to a linear estimate of the data for that

sensor (see (21)). After each round of requests, the correlation parameters of

each sensor (see (32) and (40)) are updated. Pseudocode for the data-gathering

node is given below.

Pseudocode for data-gathering node:

Initialization: for (i = 0; i < K; i++) for (j = 0; j < num_sensors; j++) Ask sensor j for its uncoded reading for each pair of sensors i, j update correlation parameters using (32) and (40) Main Loop: for (k = K; k < N; k++) Request a sensor for uncoded reading for each remaining sensor Determine number of bits, i, to request using (38) Request i bits Decode data for each sensor Update correlation parameters for each sensor

5 Note that the sensor may still send its data compressed with respect to its own past readings.

87

The decoding is done in accordance with Section 4.2 and the correlation

parameters are estimated according to (32) and (40).

4 . 4 . 2 S e n s o r N o d e A l g o r i t h m

The algorithm incorporated into each sensor node is considerably simpler than the

algorithm incorporated into the data-gathering node. The sensor node will simply

listen for requests from the data-gathering node. The data-gathering node will

specify to the sensor the number of bits that it requests the sensor to encode the

data with. Each sensor will be equipped with an A/D converter that represents the

data using -bits. Upon receiving a request from the data-gathering node, the

sensor will encode the -bit value from the A/D converter using -bits, where i is

specified by the data-gathering node. This i -bit value is sent back to the data-

gathering node. Pseudocode for the sensor node is given below.

n

n i

Pseudocode for data-gathering node:

for each request Extract i from the request Get X[n] from A/D converter Transmit n mod 2i

In the above algorithm, we denote [ ]nX as the value returned from the A/D

converter and n as the index to this value. Note that the only extra operation with

respect to an uncoded system is for the sensor nodes to perform a modulo

operation. This makes it extremely cheap for a sensor node to encode its data.

88

]

4.5 SIMULATION RESULTS

The simulations were performed for measurements on light, temperature and

humidity. The energy savings (due to bit transmissions) as well as the robustness

of the correlation tracking algorithm to errors were measured. The data-gathering

node algorithm and the sensor node algorithm described in Section 4 were

implemented. In the first set of simulations, the sensor nodes simulated the

measurement of data by reading from a file, previously recorded readings from

actual sensors. The data measured by the sensors were for light, humidity and

temperature. The readings were made by a 12 -bit A/D converter with a dynamic

range of [ . The simulated network had a star topology where the data-

gathering node queried 5 sensor nodes directly.

128,128−

4 . 5 . 1 C o r r e l a t i o n T r a c k i n g

The first simulation tested the correlation tracking algorithm (see Section 4.3). The

data observed by sensor j was modeled as:

)(4

1

)()( mk

l

jlkl

jk XXY += ∑

=−α , (41)

where jm ≠ . In other words the prediction of the reading for sensor j is derived

from its past values and one other sensor. To test the correlation tracking

algorithm, the tolerable noise that the correlation tracking algorithm calculates at

4

89

each time instant was measured. The tolerable noise is the amount of noise that

can exist between the prediction of a sensor reading and the actual sensor reading

without inducing a decoding error. Tolerable noise is calculated by using (38), and

noting that the tolerable noise will be given as ∆−12i where is the number of bits

that are requested from the sensor and

i

∆ is the spacing of values in the A/D

converter. The bound on probability of decoding error was set to be less than 1 in

, and the data-gathering algorithm and the sensor node algorithms were

simulated over samples of light, temperature and humidity for each sensor

(a total of samples). A plot of the tolerable noise vs. actual prediction

noise is given in Figure 14 for humidity and in Figure 15 and Figure 16 for

temperature and light respectively. In each of the graphs, the top curve represents

the tolerable noise and the bottom curve represents the actual prediction noise.

100

000,18

000,90

From the plots it can be seen that the tolerable noise is much larger than the actual

prediction noise. The reason for this is that the parameters for estimating the

number of bits to request from the sensors were chosen conservatively. The

tolerable noise can be lowered to achieve higher efficiency but this also leads to a

higher probability of decoding error. For the simulations that were ran and

presented here, zero decoding errors were made for samples of humidity,

temperature and light.

000,90

90

Figure 14. Tolerable noise and prediction noise for 18,000 samples of humidity data. The tolerable noise is the amount of noise that can exist between the prediction of a sensor reading and the actual sensor reading without inducing a decoding error.

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

20

40

60

80

100

120

140

Tolerable Error

Prediction Error

91

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

2

4

6

8

10

12

14

16

Tolerable ErrorPrediction Error

Figure 15. Tolerable noise and prediction noise for 18,000 samples of temperature data. The tolerable noise is the amount of noise that can exist between the prediction of a sensor reading and the actual sensor reading without inducing a decoding error.

92 to variations in distortion and will therefore introduce more decoding errors for

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

200

400

600

800

1000

1200

Tolerable ErrorPrediction Error

Figure 16. Tolerable noise and prediction noise for 18,000 samples of light data. The tolerable noise is the amount of noise that can exist between the prediction of a sensor reading and the actual sensor reading without inducing a decoding error.

A couple of things to note from the plots is that (1) there are many spikes in the

tolerable noise plots and (2) the actual noise plot for temperature is lower than the

actual noise plot for humidity which is lower than the actual noise plot for light. The

spikes in the tolerable noise plots occur because an aggressive weighting factor

was chosen for calculating (40), which leads to less smoothing in the estimation.

These spikes can be reduced by weighting the current distortion less in the

estimation of the overall distortion (see (40)), but this will lead to slower responses

93

4 . 5 . 2 E n e r g y S a v i n g s

The next set of simulations was run to measure the amount of energy savings that

noisy data. The actual noise plot for temperature is less noisy than the actual

noise plots for humidity and light. This is matches our intuition because in general,

we would expect temperature measurements to be more correlated than humidity

measurements and light measurements. For example, if there are many light

sensors in a room, then if one of the light sensors is in a shaded area then it will be

less correlated with the light sensors that are in a lighted area. On the other hand,

if there are many temperature sensors in a room, even if one of the temperature

sensors is in a shaded area, its measurements will not vary too much from the

measurements obtained from sensors in lighted areas. Irrespective of the variation

in correlation, however, it can be seen from the plots that our correlation tracking

algorithm performs well in estimating the actual distortion for a variety of data sets.

the sensor nodes achieved. The energy savings were calculated to be the total

reduction in energy that resulted from transmission and reception. Note that for

reception, energy expenditure is actually not reduced but increased because the

sensor nodes need to receive the extra bits that specify the number of bits to

encode each sensor reading. For an n -bit A/D converter, an extra ( )nlog bits

need to be received each time the data gathering node informs a sensor of the

number of bits needed for encoding. It is reasonable to assume that the energy

-

94

used to transmit a bit is equivalent to the energy used to receive a bit. To reduce

the extra energy needed for reception, the data-gathering node only specified the

number of encoding bits periodically. In the simulations, this period was chosen to

be 100 samples for each sensor node. Each of the 5 sensor nodes were

alte ly queried to send back readings that were compressed only with respect

to its own past readings so that compressed readings from other sensors could be

decoded with respect to these readings. The overall average savings in energy is

given in Table 1. To assess the performance of the algorithm, the work of [44] was

used as a benchmark for comparison. The work of [44] is also based on a

distributed coding framework but the prediction algorithm uses a filtered estimate

for the prediction coefficients instead of using a gradient descent algorithm such as

LMS to determine the prediction coefficients. Furthermore, in [44] the prediction

algorithm only uses one measurement from a neighboring sensor to form the

prediction estimate. Thus, in order to perform a fair comparison, the model of (41)

was changed to only use one measurement from another sensor to form the

prediction estimate and surprisingly was able to achieve roughly the same

performance as given in Table 1. The results for humidity are approximately %24

better than the results cited in [44] for the same data set. Similarly, the resul

temperature and light are approximately %16 and %3 better respectively than the

results cited in [44] for the respective d ets. us, it is clear that the LMS

algorithm is better suited for tracking correlations than the methods given in [44]

and if one uses even better correlation tracking algorithms (see [41]) one should

rnate

ts for

ata s Th

In the above, one can also achieve m

s of

4 . 5 . 3 R o b u s t n e s s t

There are two types of errors that ca

expect further energy savings. T

correlation tracking algorithms is left f

TENERGY SAVINGS OF THE LMS-B

Data Set Temperatu

Average Energy Savings 66.6Table 1. Average energy savings of thedata compression scheme over an temperature, humidity and light.

conservative estimate of the bits n

however, lead to more decoding err

bound on the probability of decoding

decoding errors over 000,90 sample

subsection, the robustnes the algo

error is a packet loss. The second ty

results from the code not being ab

following subsections consider each t

ABLE I ASED CORRELATION TRACKING SCHEME

re Humidity Light

% 44.9% 11.7% LMS based correlation tracking and distributed

uncoded system for sensor nodes measuring

95

ore significant energy savings by using a less

o E r r o r s

n occur in this framework. The first type of

he topic of investigating more elaborate

or future work.

eeded for encoding (see (38)). This will,

ors. In the simulations presented here the

error was set so low that it resulted in no

s for each of the data sets. In the following

rithm to errors is evaluated.

pe of error is an actual decoding error which

le to correct for the prediction noise. The

ype of error.

96

1. Packet loss: A packet loss may occur if a measurement is lost due to a

malfunction of the sensor or if there is a transmission loss. In such a case,

it appears that this loss should affect the prediction estimates that depend

on this measurement (see (41)). This is not true, however, because the

prediction algorithm may replace this measurement with a previous

measurement from the same sensor to form the prediction estimate. In fact,

tests were run in which the packet drop rate was set to , and the same

compression rate was achieved with zero decoding errors. Thus, the

proposed algorithm has the additional feature that it is robust to packet loss.

This is evident from the plots, because the ``tolerable'' noise is much larger

than the actual noise.

%10

2. Decoding error: The other type of error is a decoding error. Recall, in

Section 4.3.2, it was mentioned that it is possible to make a decoding error

if the actual prediction noise between the prediction estimate and the

sensor reading exceeds the tolerable noise specified by the data-gathering

node. One can bound this probability of decoding error by using

Chebyshev's inequality to specify the number of bits needed for encoding

(see (38)). But Chebyshev's inequality is a loose bound, as can be seen

from Figure 14 and as a result, it is difficult to determine the minimal

number of bits that need to be sent by each sensor without inducing a

decoding error. It can therefore be seen that there is a delicate trade-off

between energy savings and decoding error.

97

To achieve both large energy savings and robustness, the data-gathering node

can use an aggressive estimate of the number of bits that is needed from each

sensor and each sensor can apply an error detection code or error correction code

to its readings so that the data-gathering node can handle decoding errors

appropriately. The other alternative is for the data-gathering node to over-estimate

the number of bits needed for encoding to decrease the decoding error. This is the

approach taken in the simulations presented (the bound was chosen such that the

decoding error was 0), but the downside to this approach is that there is a

corresponding decrease in energy savings for the sensor nodes. On the other

hand, if the application is insensitive to decoding error, then a more aggressive

strategy should be chosen to further reduce power consumption in the sensor

nodes.

4.6 CONCLUSION

This chapter of the dissertation proposes a method of reducing energy

consumption in sensor networks by using distributed compression and adaptive

prediction. Distributed compression leverages the fact that there exist inherent

correlations between sensor readings and hence sensor readings can be

compressed with respect to past sensor readings and sensor readings measured

by other nodes. A novel method was introduced for allowing nodes to compress

their readings to different levels without having the nodes know what the other

98

nodes are measuring. Adaptive prediction is used to track the correlation structure

of the sensor network, and ultimately determined the number of bits needed to be

spent by the sensor nodes. This approach appears to be promising, as test results

on real-world data gathered by a sensor network testbed show that an average

energy savings per sensor node of %6510 − can be achieved using this

algorithm.

The energy savings achieved in the tests presented here are a conservative

estimate of what can be achieved in practice. In practice, one can use more

complex (and more accurate) models at the data-gathering node to describe the

correlation structure among the nodes in the sensor network. A simple predictive

model was chosen in the tests presented here to demonstrate the power of the

approach. In addition, the proposed algorithm can be combined with other energy-

saving approaches such as data aggregation to achieve additional gains. Future

work remains in exploring more robust codes for the sensor nodes and better

predictive models (such as those presented in [41]) for the data-gathering node

along with incorporating the algorithm with energy-saving routing algorithms.

Future work also remains in determining methods for exploiting the correlation

structure in richer data sets such as audio or video signals. It is expected that

audio and video sensors can benefit greatly from distributed coding algorithms

because of the large amounts of redundancy that are present in their

measurements. Thus, there appear to be exciting possibilities for utilizing the

99

methods in this paper for enhancing the various modalities of sensor networks of

the future.

100

C h a p t e r 5

FUTURE WORK

Distributed Sources

Can multiple rounds of transmission by sources eliminate need for log(N) links?

Does fusion of variables make it possible to decode some packets when rank is

lost?

Trading off throughput for decoding complexity by having sparser matrices

Correlation tracking using Singer’s filters

Propagation of error after failed decoding

Resiliency to dropped packets

101

A p p e n d i x A

MAXIMIZING THE NUMBER OF CHANNELS WITH EXACTLY ONE TRANSMITTER

Consider placing k balls into bins such that each ball randomly selects a bin in

which to be placed independently of all other balls, and all of the random

selections are made using the same probability mass function (pmf). Given the

number of bins, , we wish to find the number of balls, , and the pmf such that

the expected number of bins containing exactly one ball is maximized.

n

n k

Result 2: The expected number of bins containing exactly one ball is maximized

when the number of balls is equal to the number of bins and each ball is equally

likely to select any of the bins. In this case, the expected number of bins

containing exactly one ball, in the limit as grows to infinity, is . n en /

Proof: We first write the expression for the expected number of bins containing

exactly one ball using to denote the event that bin contains exactly one ball,

to denote the probability that any given ball chooses bin i , and to denote

the indicator function, which takes on the value 1 if the expression in the braces is

true and the value 0 otherwise.

ib i

ip {}⋅1

102

{ }

{ }[ ]

[ ]

( )

( )∑

∑

∑

∑

∑

=

−

=

−

=

=

=

−=

−⎟⎟⎠

⎞⎜⎜⎝

⎛=

=

=

⎥⎦

⎤⎢⎣

⎡

n

i

kii

n

i

kii

n

ii

n

ii

n

ii

ppk

ppk

bP

bE

bE

1

11

11

1

1

1

11

1

1

(42)

Therefore, we seek to find the

( )∑=

−

=

−

∑=

n

i

kii

ptsp

ppn

iii

1

1

1..

1maxarg

1

(43)

Let us first find the value of that maximizes each of the summands in (43).

Solving for:

ip

( )

( ) ( ) ( ) 21

1

111

10−−

−

−⋅⋅−−−

=−=

kk

k

ppkp

ppdpd

(44)

gives as a local maximum of the summands in (43). This value also gives the

global maximum in the range because the function has a strictly positive first

derivative in the range and a strictly negative first derivative in the range

.

k/1

]1,0[

[ )k/1,0

( )1,/1 k

103

}

Since the value of in the range that individually maximizes each of the

summands of (43) is , the overall sum is maximized when

(i.e. each of the balls is equally likely to be in any of the

bins). However, due to the constraint that the ‘s form a valid pmf (i.e. ),

this is only possible when

ip ]1,0[

k/1

{ nikpi ,...,1/1 ∈∀=

ip ∑=

=n

iip

11

nk = . Therefore, the expected number of bins

containing exactly one ball is maximized when the number of balls is equal to the

number of bins and each ball is equally likely to be placed in any of the bins.

The expected number of bins containing exactly one ball can be computed as:

( )

( )

( )( )

( ) 11

11

11

1

/11

/11/1

1

1

−=

−=

−=

−

−⋅=

−=

−=

−

∑

∑

∑

n

n

i

n

n

i

nii

n

i

kii

nn

nnn

ppn

ppk

(45)

Since , the expression in (45) asymptotically goes to as

grows large. ■

( ) en n

n/1/11lim =−

∞→en /

n

104

A p p e n d i x B

THROUGHPUT OF ROUTING WITHOUT CODING IN A NETWORK OF UNTUNED RADIOS

To analyze the throughput of a routing scheme in which the output of each

intermediate node is simply a copy of one of its inputs (the node randomly selects

which of the inputs to forward), rather than a function of all of the inputs, consider

the random graph that corresponds to this routing scheme. The connectivity of the

network is still the same and can be represented by Figure 7; however, because

each intermediate node only forwards a packet from a randomly selected input

link, the graph must be modified to reflect this. Because only one of the incoming

packets is forwarded, the information coming in on the other links is not

propagated; therefore, those links can be deleted from the graph without altering

the throughput. In other words, starting with the graph in Figure 7, for each vertex

that is not deleted and has at least one incoming link (i.e. at least one of the

original L incoming links was connected to a surviving vertex in the previous

column), randomly chose one of those incoming links to keep and delete all the

others. Which link is kept at each vertex is chosen uniformly and independently

from the other vertices. The problem then becomes finding the end to end max-

flow of the resulting graph.

To make the problem more analytically tractable, we consider the bounding case

in which none of the vertices in the graph are deleted. In this case, the random

105

graph simply consists of vertices in an HN × grid with each vertex in columns

connecting to only one randomly selected (with uniform probability)

vertex in the previous column. An instance of this random graph is shown in

Figure 17. Since deleting vertices (along with the links associated with those

vertices) can only decrease the max-flow of a graph, any upper bound on the max-

flow from column 1 to column

{ H,...,2 }

H is also an upper bound on the max-flow of the

same graph in which certain vertices are deleted (remember that the performance

of a routing scheme in which only forwarding is allowed corresponds to the max-

flow of the graph in Figure 17 with each vertex deleted with probability ). e/11−

A 2

N

1

1 2 3 H

Figure 17. Random graph representing connectivity when only routing is allowed. Each vertex in columns {2,…,H} connects to one randomly chosen vertex in the previous column.

106

To find the max-flow of this graph, consider a particular vertex in Column 1. Let us

label this vertex as vertex A , as shown in Figure 17. Consider the number of

vertices in columns that have a connection back to vertex },...,2{ H A (note that,

since each vertex has only one incoming link, each one has a connection to only

one vertex in the first column). Since each vertex randomly and independently

connects to a vertex in the previous column, the number of vertices in each

column that connect back to vertex A only depends on the number of vertices in

the previous column that connect to A ; therefore, the number of vertices in each

column with a connection back to A forms a Markov chain. The transition

probability matrix, , of this Markov chain has the form: P

jNj

ji NiN

Ni

jN

P−

⎟⎠⎞

⎜⎝⎛ −

⎟⎠⎞

⎜⎝⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=,

(46)

for . },...,0{, Nji ∈

This Markov chain has two absorbing states 0 and N , while all the others are

transient. Therefore, the chain is guaranteed eventually to be absorbed in one of

these two states. Since the number of descendents of each vertex in column 1

forms a Markov chain with the same transition probability matrix and the number of

descendents of each of these vertices has to add up to N , eventually all but one

of the chains will be absorbed in the zero state, while one of the chains will be

absorbed in state N . In other words, eventually there will be a column in which all

107

of the vertices connect back to the same vertex of column 1, and all subsequent

columns will likewise only connect back to that vertex.

Indeed, row 1 (corresponding to starting in state 1, with only one vertex connecting

back to A) of ∞P has the form [ ]NN /1,0,....,0,/11− . gives the probability of

starting in state and ending in state after infinitely many steps. Row 1 of

∞jiP ,

i j ∞P

shows that any vertex in column 1 will eventually either have no descendents (this

happens with probability N/11− ) or all of the vertices in later columns will be its

descendents (this happens with probability ). This means that, as N/1 H goes to

infinity, the max-flow of the graph will be 1.

But what happens over a finite number of steps? The expected max-flow over H

steps is given by:

(PN ⋅ given vertex has descendents after H steps)

= ( )HPN 0,11−⋅ (47)

The problem is that the closed-form expression for HP is unmanageably

elaborate; therefore, we rely on a computer to evaluate it for some values of N

and H . Table 2 shows the expected max-flow between column 1 and column H

for various values of N in the graph corresponding to a scheme that only allows

routing at intermediate nodes and in the case when no collisions occur (i.e. no

vertices are deleted from the graph). As can be seen from the table, the

performance of a routing-only schem

hops, while network coding allows fo

as long as .0/)](log[ ⎯⎯ →⎯ ∞↑NNNH

EXPECTED THROUGHP N

100 400 700 1000 2000

Table 2. The throughpu even as N grows. Over 1 sent end-to end with rout

TABLE II UT OF ROUTING OVER H STEPS

NH = NH 10=

2.304 1.0001 2.351 1.0001

2.359 1.0001 2.362 1.0001 2.362 1.0001

t over N steps remains constant,0N steps, only one packet can be

ing, even if no collisions occur.

108

e allows for a constant throughput over N

r throughput that is linear in N over H hops

109

BIBLIOGRAPHY

[1] http://www.tinyos.net/ [2] http://webs.cs.berkeley.edu/nest-index.html [3] http://bwrc.eecs.berkeley.edu/Research/Pico_Radio/Default.htm [4] http://www.cens.ucla.edu/ [5] http://www.intel-research.net/berkeley/index.asp [6] http://www.zigbee.org [7] http://www.xbow.com [8] http://www.ember.com [9] http://www.dust-inc.com/flash-index.shtml [10] http://www.cs.berkeley.edu/~binetude/ggb/ [11] http://www.cbe.berkeley.edu/research/briefs-Wireless.htm [12] A. Mainwaring, J. Polastre, R. Szewczyk, and D. Culler, “Wireless Sensor

Networks for Habitat Monitoring,” Intel Research, IRB-TR-02-006, Jun. 19, 2002.

[13] S. Coleri, S. Cheung, and P. Varaiya, “Sensor Networks for Monitoring Traffic,” 42nd Annual Allerton Conference on Communication, Control, and Computing, September 2004.

[14] http://www.wherenet.com/ [15] S. Roundy, B. Otis, Y.H. Chee, J. Rabaey, P. Wright, “A 1.9GHz RF Transmit

Beacon using Environmentally Scavenged Energy,” Dig. IEEE Int. Symposium on Low Power Elec. and Devices, Seoul, Korea, 2003.

[16] B. Otis, Y.H. Chee, R. Lu, N.Pletcher, J. Rabaey, “An Ultra-Low Power MEMS-Based Two-Channel Transceiver for Wireless Sensor Networks,” IEEE Symp. on VLSI Circuits, Honolulu, HI, June 2004.

[17] A. Willig et al., “Measurements of a Wireless Link in an Industrial Environment using an IEEE 802.11-Compliant Physical Layer,” IEEE Trans. on Industrial Electronics, vol. 43, Dec. 2002.

[18] N. Patwari, Y. Wang, R. O’Dea, “The Importance of the Multipoint-to-Multipoint Indoor Radio Channel in Ad Hoc Networks,” IEEE Wireless Communication and Networking Conference (WCNC), Orlando FL, March 2002.

110

[19] M. Zorzi, R.R. Rao, “Geographic Random Forwarding (GeRaF) for ad hoc and sensor networks: energy and latency performance,” IEEE Trans. on Mobile Computing, vol. 2, Oct.-Dec. 2003.

[20] J. Van Greuen, D. Petrović, A. Bonivento, J. Rabaey, K. Ramchandran, A. Sangiovanni-Vincentelli, “Adaptive Sleep Discipline for Energy Conservation and Robustness in Dense Sensor Networks,” ICC 2004.

[21] L. Doherty, L. El Ghaoui, K. Pister, “Convex Position Estimation in Wireless Sensor Networks,” Infocom 2001, Anchorage, AK, April 2001.

[22] T. Ho, M. Medard, J. Shi, M. Effros, and D. Krager, “On randomized network coding,” Proceedings of 41st Annual Allerton Conference on Communication, Control, and Computing, October 2003.

[23] P. Chou, Y. Wu, and K. Jain, “Practical network coding,” Allerton Conference on Communication, Control, and Computing, Monticello, IL, October 2003. Invited paper.

[24] M. Aizenman, J. T. Chayes, L. Chayes, J. Fröhlich, L. Russo, “On a sharp transition from area law to perimeter law in a system of random surfaces,” Communications in Mathematical Physics vol. 92, pp. 19-69, 1983.

[25] H. Chernoff, “A measure of asymptotic efficiency for test of a hypothesis based on the sum of observations,” Annals of Mathematical Statistics, 23:493-507, 1952.

[26] F. Roberts, Applied Combinatorics, Prentice Hall, 1984. [27] G. Lawler, Introduction to Stochastic Processes, Chapman & Hall 1995. [28] C. Toh, “Maximum battery life routing to support ubiquitous mobile

computing in wireless ad hoc networks,” IEEE Communications Magazine, pp. 138--147, June 2001.

[29] R. Shah and J. Rabaey, “Energy aware routing for low energy ad hoc sensor networks,” Proc. of IEEE WCNC, Mar 2002.

[30] C. Intanagonwiwat, R. Govindan, D. Estrin, “Directed diffusion: A scalable and robust communication paradigm for sensor networks,” Proc. of IEEE MobiCom, Aug 2000.

[31] G. Pottie and W. Kaiser, “Wireless sensor networks,” Communications of the ACM, 2000.

[32] M. Chu, H. Haussecker, and F. Zhao, “Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks,” IEEE Journal of High Performance Computing Applications, To Appear 2002.

[33] D. Petrović, R. Shah, K. Ramchandran, and J. Rabaey, “Data funneling: routing with aggregation and compression for wireless sensor networks,” IEEE Sensor Network Protocols and Applications, May 2003.

111

[34] D. Slepian and J. K. Wolf, “Noiseless encoding of correlated information sources,” IEEE Trans. on Inform. Theory, vol. IT-19, pp. 471--480, July 1973.

[35] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York 1991.

[36] S. S. Pradhan and K. Ramchandran, “Group-theoretic construction and analysis of generalized coset codes for symmetric/asymmetric distributed source coding,” Proceedings of Data Compression Conference (DCC), March 2000.

[37] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. on Inform. Theory, vol. IT-22, pp. 1--10, January 1976.

[38] S. S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes: Design and construction,” Proceedings of the Data Compression Conference (DCC), March 1999.

[39] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, 1996. [40] H. Stark and J. Woods, Probability, Random Processes and Estimation

Theory for Engineers, Prentice Hall, Englewood Cliffs, 1994. [41] A. Singer and M. Feder, “Universal linear prediction by model order

weighting,” IEEE Transactions on Signal Processing, vol. 10, pp. 2685--2699, October 1999.

[42] T. Ramabadran and S. Gaitonde, “A tutorial on CRC computations,” IEEE Micro, vol. 45, pp. 62--74, Aug 1988.

[43] R. Blahut, Theory and Practice of Data Transmission Codes, 1995. [44] J. Chou, D. Petrović, and K. Ramchandran, “Tracking and exploiting

correlations in dense sensor networks,” Proceedings of the Asilomar Conference on Signals Systems and Computers, November 2002.

[45]

dragan rade petrović - university of california,...

Documents