dragan rade petrović - university of california,...
TRANSCRIPT
Communication and Compression in Dense Networks of Unreliable Nodes
by
Dragan Rade Petrović
B.S. (University of Illinois at Urbana-Champaign) 1999 M. S. (University of California, Berkeley) 2001
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy in
Engineering - Electrical Engineering and Computer Sciences
in the
GRADUATE DIVISION
Of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge:
Professor Kannan Ramchandran, Chair Professor Jan Rabaey Professor Paul Wright
Spring 2005
The dissertation of Dragan Rade Petrović is approved:
Chair Date
Date
Date
University of California, Berkeley
Spring 2005
Communication and Compression in Dense Networks of Unreliable Nodes
Copyright 2005
by
Dragan Rade Petrović
1
Abstract
Communication and Compression in Dense Networks of Unreliable Nodes
by
Dragan Rade Petrović
Doctor of Philosophy in Engineering –
Electrical Engineering and Computer Sciences
University of California, Berkeley
Professor Kannan Ramchandran, Chair
The drive toward the implementation and massive deployment of wireless sensor
networks calls for ultra-low-cost, low-power and ever smaller nodes. While the
digital subsystems of the nodes are still experiencing exponential reduction of all of
these metrics as described by Moore's Law, there is no such trend regarding the
performance of analog components needed for the radios that enable the nodes to
communicate wirelessly with one another. This dissertation presents a two part
approach to reducing the energy consumption of the radios. First, a new radio
architecture is presented that greatly reduces the power required to operate a
transceiver, as well as reducing the cost and size of the nodes. Secondly, a novel
distributed compression scheme is introduced that allows the sensor nodes to
compress their data in order to reduce the amount of communication that the
radios must perform.
The dissertation presents a fully integrated architecture of both digital and analog
components (including local oscillator) that offers significant reduction in cost, size
and power consumption of the overall node. Even though such a radical
architecture cannot offer the reliable tuning of standard designs, it is shown that by
using random network coding, a dense network of such nodes can achieve
throughput linear in the number of channels available for communication.
Moreover, the ratio of the achievable throughput of the untuned network to the
throughput of a tuned network with perfect coordination is shown to be close to
. By contrast, it is also shown that if coding is not used (i.e. if nodes are only
allowed to forward packets without processing them), the performance does not
improve with increased density and available spectrum.
e/1
To reduce the amount of communication among nodes required, a novel
approach to reducing energy consumption in sensor networks using a distributed
adaptive signal processing framework and efficient algorithm is proposed.
Specifically, the dissertation presents a distributed way of continuously exploiting
existing correlations in sensor data based on adaptive signal processing and
distributed source coding principles. This approach enables sensor nodes to
blindly compress their readings with respect to one another without the need for
explicit and energy-expensive inter-sensor communication to effect this
compression. Furthermore, the distributed algorithm used by each sensor node
is extremely low in complexity and easy to implement (i.e., one modulo
operation), while an adaptive filtering framework is used at the data-gathering
2
3
unit to continuously learn the relevant correlation structures in the sensor data.
Applying the algorithm to testbed data resulted in energy savings of 10%-65% for
a multitude of sensor modalities.
Both the network coding for communication with untuned radios and the
distributed source coding schemes require minimal complexity from the low-
power sensor nodes. Instead, the complexity of the system is pushed toward the
edge of the network where a gateway between the wireless network and the
wired world resides.
i
TABLE OF CONTENTS
List of Figures ........................................................................................ iv
List of Tables.......................................................................................... vi
Acknowledgments................................................................................. vii
Chapter 1: Introduction ..........................................................................1
1.1 Trading-off Radio Power Dissipation for Tunability....................5
1.2 Distributed Compression of Sensor Node Data.........................8
1.3 Contributions ............................................................................12
Chapter 2: Reliable Communication Using Untuned Radios ..............14
2.1 Drawbacks of Untuned Radios ................................................17
2.2 Multi-Hop Communication Method...........................................22
2.3 Practical Justification of the Model...........................................26
2.4 Analysis ....................................................................................29
2.4.1 Transition Probabilities ....................................................30
2.4.2 Robustness Grows Exponentially with Density ...............33
2.4.3 Stability and Optimal Ratio of Density to
Channelization ................................................................37
2.5 System Level View...................................................................41
2.5.1 Comparison to Schemes Using Untuned Radios with
Wide Receive Filters.......................................................43
2.5.2 Comparison to Schemes Using Tuned Radios ...............43
Chapter 3: Throughput of Networks of Untuned Radios .....................46
ii
3.1 Maximum throughput over One Hop........................................46
3.2 Maximum throughput over Many Hops....................................48
3.3 Achievable Throughput over Many Hops.................................49
3.3.1 Random Graph Representation of the Network
Connectivity ...............................................................................50
3.3.2 Max-Flow of the Random Graph .....................................51
3.4 Simulation Results ...................................................................59
Chapter 4: Distributed Compression ...................................................61
4.1 Background on Compression with Side Information................63
4.2 Code Construction for Distributed Compression .....................68
4.3 Correlation Tracking.................................................................72
2.4.3 Parameter Estimation ......................................................78
4.3.2 Decoding Error.................................................................83
4.4 Querying and Data Reporting Algorithm..................................85
4.4.1 Data-Gathering Node Algorithm......................................85
4.4.2 Sensor Node Algorithm ...................................................87
4.5 Simulation Results ...................................................................88
4.5.1 Correlation Tracking ........................................................88
4.5.2 Energy Savings ...............................................................93
4.5.3 Robustness to Errors.......................................................95
4.6 Conclusion................................................................................97
Chapter 5: Future Work .....................................................................100
iii
Appendix A: Maximizing the Number of Channels with Exactly one
Transmitter ....................................................................101
Appendix B: Throughput of Routing without Coding in a Network of
Untuned Radios.............................................................104
Bibliography ........................................................................................109
iv
LIST OF FIGURES
Number Page
1. An example sensor network topology in which considerable
gains can be achieved from distributed compression..............11
2. Signal bandwidth relative to process variation.........................19
3. Proposed multi-hop communication method ..........................23
4. Probability distribution of number of transmitters while the
packet is still alive.....................................................................34
5. Robustness v. Channelization for various values of α.............36
6. Probability of failing to transmit a packet over 10 hops v.
Channelization for various values of α.....................................37
7. Random graph representing connectivity in the network of
nodes with untuned radios .......................................................51
8. Simulation results: Throughput vs. number of input radios......60
9. Distributed compression set-up................................................62
10. Achievable rate regions in distributed compression ................66
11. A tree based construction for compression with side
information................................................................................69
12. An example of the tree based codebook .................................71
13. Adaptive filtering block used to form the side information
and decode the sensor reading................................................82
v
14. Tolerable noise and prediction noise for 18,000 samples of
humidity data ............................................................................90
15. Tolerable noise and prediction noise for 18,000 samples of
temperature data ......................................................................91
16. Tolerable noise and prediction noise for 18,000 samples of
light data ...................................................................................92
17. Random graph representing connectivity when only routing
is allowed................................................................................105
vi
LIST OF TABLES
Number Page
1. Energy savings of LMS-based correlation tracking and
distributed compression scheme .............................................pp
2. Expected throughput of routing................................................pp
vii
ACKNOWLEDGMENTS
The author…
1
C h a p t e r 1
INTRODUCTION
Advances in wireless communication as well as embedded microprocessor design
and manufacturing have led to great interest in recent years in the possibility of
doing distributed sensing and control. As a result, the emerging field of wireless
sensor networks has become a very active area of both academic research [1]-[4]
and industrial development [5]-[9]. The goal is to design and produce tiny silicon-
based devices, usually referred to as “nodes,” that have some sensing capability
(e.g. light-sensor, thermometer, humidity-sensor, barometer, accelerometer,
magnetometer, etc.) along with some amount of memory and processing
capability, as well as the ability to communicate wirelessly with each other. These
nodes could then be deployed in some environment and used to observe, through
their sensing capability, some aspects of that environment. Deploying many such
nodes would allow fine-grained information about the state of the environment that
could then be used to take some action to control certain parameters. In order to
distribute the nodes densely, they must be made small. This size constraint also
imposes a limit on the processing power as well as the energy available to the
individual nodes. Therefore, performing complex tasks would require the
coordination and cooperation of many nodes.
2
The potential scenarios for use of sensor networks are far ranging. One of the
many applications for which sensor networks are already being used is monitoring
the structural integrity of buildings and bridges [10], in which nodes with
accelerometers are placed at key junctions of the structure to measure its
movement as a response to stresses due to seismic, tidal, and traffic activity.
Having this data allows for timely maintenance of the building or bridge that is also
more cost-effective than the traditional approach of performing maintenance and
reinforcement at certain time intervals, whether they are needed or not [10].
Another important application of sensor networks that is receiving a lot of attention
is environmental control within living and working spaces [11]. While fully a third of
all energy consumed in the developed world is spent on environmental control of
living and working spaces (the other two thirds are spent on agriculture,
manufacturing, and transportation), it is estimated that 80% of this energy is
wasted due to inadequate ability to measure the state of the environment and a
lack of fine-grained actuation to influence it [11]. Sensor networks are also used
for habitat monitoring in animal sanctuaries [12]. Using sensor networks allows
biologists to keep track of crucial environmental factors that affect the health of
animal and plant populations in different ecosystems. In more urban areas, sensor
networks are being deployed for highway traffic control [13]. Instead of tearing up
asphalt to install inductor coils to measure the flow of traffic, sensor nodes with
magnetometers are being used to monitor the traffic and this data can be used in
real time to control the traffic by updating the patterns of traffic and metering lights
as well as suggested routes to trip-planning and global positioning system (GPS)-
3
based guidance systems [13]. Warehouse inventory tracking with sensor networks
is also gaining a lot of attention in industry [14].
In order to gain experience and early exposure, most initial efforts in the field of
sensor networks have relied on available “off the shelf” components (sensors,
processors, radios, and memory) to assemble the first sensor network nodes. The
main lesson of those early experiences was that the networks were very fragile.
The nodes frequently failed because they ran out of energy or because they were
damaged due to exposure to the environment (e.g. bumped or stepped on). The
communication channels between the nodes were also found to be unstable.
Depending on the topology of the environment, it is not uncommon to find nodes a
few meters apart that cannot reliably communicate with one another, while other
pairs of nodes that are tens of meters apart can. Perhaps most importantly, the
networks did not scale well because increased density resulted in greater
contention for the wireless communication medium causing collisions among the
transmissions. As a result of this, most commercial efforts have focused on
making the nodes more reliable and equipping them with better, and more
complex and power-consuming radios, in order to make the links among the nodes
more reliable. These efforts have been successful in giving exposure in the
technical community to this emerging field through the deployment of wireless
sensor networks in the applications already mentioned. However, making the
nodes more reliable requires them to be costly, power-hungry, and large. This
dissertation argues that, in order to make wireless sensor networks truly
4
ubiquitous, the nodes must be made cheap, their power dissipation must be
comparable to the amount power that can be scavenged from the environment,
and they must be made small so that they can be deployed with high density.
Also, a new class of protocols must be developed specifically for wireless sensor
networks that can benefit from, rather than suffer from density. The benefit of such
an approach will only increase in the future with ever more demanding
applications.
The potential applications of sensor networks that are being considered in the
long-term include such science-fiction-like concepts as smart surfaces that can
respond to contact or serve as a communication backplane. Another possible
application in the long term is “skin” for airplanes that can provide real-time
monitoring of and alerts regarding the state of every square centimeter of the
aircraft’s surface. It may even be possible to eventually produce sensor nodes
small enough to be inserted into the blood-stream to provide real-time diagnostics
of factors such as blood pressure, blood flow, glucose and insulin levels, etc.
To make such deployments economically and technologically feasible, it is
necessary to drastically reduce the cost, size and energy consumption of the
nodes available today. Moore’s Law still provides for exponential reduction of
these metrics over time when it comes to the digital components that comprise the
memory, computation and coding in the nodes. However, there is no equivalent
trend to Moore’s Law that applies to the analog components needed for the radios
5
that enable the nodes to communicate with one other. This work introduces 1) a
new architecture for the analog radios and 2) a distributed source coding scheme
that allows the nodes to compress their readings in order to reduce the number of
bits that have to be transmitted by the radio, thereby reducing the energy
consumed.
1.1 TRADING-OFF RADIO POWER DISSIPATION FOR TUNABILITY
The proposed radio architecture can greatly reduce the cost, size (5x reduction)
and energy consumption (10x reduction) of the nodes. In fact it is expected that
the proposed architecture will allow the energy consumption of the nodes to be so
low that they could be fully powered by energy scavenged from the environment
[15]. The penalty for using such a radical architecture is that the radios become
untuned and it is no longer possible to guarantee that any arbitrary pair of nodes
will be able to communicate with each other. Instead, it becomes necessary to rely
on the density of nodes to make the overall network capable of providing reliable
communication.
Narrowband radios have shown to be the architecture of choice for low-power
applications [6], [7], [16] , as they are low in complexity and consume less power
than spread spectrum or other wide-band techniques. One fundamental
requirement of narrowband radios is that the transmitter’s carrier frequency and
the receiver’s detection frequency must be well-matched. This is traditionally
6
accomplished by employing a crystal at both the transmitter and receiver to
provide the same low frequency reference. This reference frequency is multiplied
via a phase-locked loop (PLL) to generate the carrier wave. However, the off-chip
crystal contributes significantly to the cost, size, and power consumption of such
transceivers. The cost is due to the external quartz crystals being more expensive
than the silicon used for the baseband signal processing as well as the need to
bond separate components. This problem is especially acute in the design of
highly integrated transceivers for wireless sensor networks. The size of traditional
low power transceivers is largely due to the external crystal reference and the
interface between the crystal and the silicon integrated circuit (IC). Additionally,
the power consumption of low power radios is dominated by the crystal referenced
PLL. Therefore, great savings in all three of these areas could be obtained by
eliminating the off-chip crystal and PLL.
Even when care is taken to ensure that all radios are tuned and are attempting to
communicate on the same frequency, reliable communication is not guaranteed.
Practical implementations of sensor networks are notorious for having unstable
links because narrowband communication is susceptible to deep fades between
nodes [17], [18]. Since it is not feasible to overcome these fades by transmitting
with more power (due to power-constraints), it has been proposed that randomized
algorithms be used to ensure reliable communication [19], [20]. Such algorithms
propose to provide reliable multi-hop communication by exploiting the broadcast
nature of wireless transmissions. The key idea is for a transmitting node to send a
7
beacon to many potential forwarding nodes and then select one node to be the
next hop for the packet among those that respond to the beacon. However,
collisions among the responses to the beacon as well as the time-varying quality of
the communication channels (a channel may be good during the beaconing, but
become bad during the response and/or data transmission) contribute significant
overhead to such schemes.
This dissertation proposes a fundamentally different way of designing and
operating a transceiver. The quartz crystal is eliminated and replaced by an on-
chip resonator such as an inductor-capacitor (LC)-circuit or a nano-
electromechanical resonant structure. This makes it possible to economically
produce millions of nodes and densely deploy them by, for example, weaving them
into fabrics or mixing them with paint. The proposed architecture allows a sensor
node to be developed entirely out of thin-film technologies (radio, digital
components, battery, energy scavenging, and sensing). However, the drawback
of such architectures is that the variations in the manufacturing process are large,
resulting in un-tuned radios. Therefore, two narrowband radios produced by such
a process are not likely to be able to communicate with each other. To address
this problem, a low-complexity communication protocol is proposed that makes
use of the high density of nodes to ensure reliable communication using such un-
tuned radios even without the need for handshaking protocols or re-transmission.
By eliminating the need for this kind of coordination, the protocol is also made
8
more robust to link failures, while the density that is made possible by such low
cost designs makes the network robust to the failure of individual nodes.
1.2 DISTRIBUTED COMPRESSION OF SENSOR NODE DATA
Motivated by the energy constraint in sensor networks, there has been
considerable recent interest in the area of energy-aware routing for ad hoc and
sensor networks [28]-[30] and efficient information processing [31], [32] to reduce
the energy usage of sensor nodes. For example, one method of conserving
energy in a sensor node is to aggregate packets along the sensor paths to reduce
header overhead. This dissertation proposes a fundamentally new method of
conserving energy in sensor networks that is mutually exclusive and
complementary to those approaches, and can be used in combination with them to
increase energy-consumption reduction.
The approach is based on judiciously exploiting existing sensor data correlations in
a distributed manner. Correlations in sensor data are brought about by the spatio-
temporal characteristics of the physical medium being sensed. Dense sensor
networks are particularly rich in correlations, where spatially dense nodes are
typically needed to acquire fine spatial resolution in the data being sensed, and for
fault-tolerance from individual node failures. Examples of correlated sensors
include temperature and humidity sensors in a similar geographic region, or
magneto-metric sensors tracking a moving vehicle. Another interesting example of
9
correlated sensor data involves audio field sensors (microphones) that sense a
common event such as a concert or whale cries. Audio data is particularly
interesting in that it is rich in spatial correlation structure due to the presence of
echoes, causing multiple sensors to pick up attenuated and delayed versions of a
common sound origin.
This dissertation proposes to remove the redundancy caused by these inherent
correlations in the sensor data through a distributed compression algorithm which
obviates the need for the sensors to exchange their data among each other in
order to strip their common redundancy. Rather surprisingly, it will be shown that
compression can be effected in a fully blind manner without the sensor nodes ever
knowing what the other correlated sensor nodes have measured. This enables a
simple and inexpensive architecture for each sensor node and is in fact preferable
to an architecture based on each sensor knowing the other sensors'
measurements. The proposed paradigm is particularly effective for sensor
network architectures having two types of nodes: sensing nodes and data-
gathering nodes. The sensing nodes gather data of a specific type and transmit
this data upon being queried. The data-gathering node queries specific sensors in
order to gather information in which it is interested (see Figure 1). We will assume
the above architecture of Figure 1 and show that for such an architecture, we can
devise compression algorithms that have very lightweight encoders, yet can
achieve significant savings. Note, that this work targets very lightweight encoders
because we assume that the sensors have limited processing power, but the
10
constructions introduced here can be easily strengthened given greater
computational power at the sensors. The savings are achieved by having the
data-gathering node track the correlation structure among nodes and then use this
information to effect distributed sensor data compression. The correlation
structure is determined by using an adaptive prediction algorithm. The sensors,
however, do not need to know the correlation structure; they need to know only the
number of bits that they should use for encoding their measurements. As a result,
each sensor node is required to perform very few operations in order to encode its
data. The decoder, however, is considerably more complex, but it resides on the
data-gathering node, which is not assumed to be energy-constrained. Preliminary
results based on our distributed compression and adaptive prediction algorithms
perform well in realistic scenarios, achieving 10-65% energy savings for each
sensor in typical cases. In addition, our distributed compression architecture can
be combined with other energy saving methods such as packet/data aggregation
to achieve further gains [33].
11
sensor node
sensor node
sensor node
sensor node
sensor node
sensor node
sensor node
sensor node
Data Gathering Node
query data
query
data
Figure 1. An example sensor network topology in which considerable gains can be achieved from distributed compression. A computer acts as the data-gathering node and queries various sensors to collect data.
12
1.3 CONTRIBUTIONS
This dissertation focuses on reducing the energy spent by the radios of wireless
sensor network nodes. The first part proposes a new architecture for the radios
that eliminates the tuning elements (off-chip crystal and the reference phase lock
loop (PLL)). Such a radio requires considerably less power to operate than
traditional designs and is also much cheaper and smaller than traditional radios,
but it sacrifices the ability to tune the transceivers. The second part of the
dissertation presents a code construction and correlation tracking algorithm that
allow the sensor nodes to compress their readings in order to reduce the amount
of communication required among the nodes. The contributions of the dissertation
are:
• It is shown that the reliability of a network of untuned radios grows
exponentially with node density and available bandwidth.
• It is shown that by using random linear network coding (which is
computationally very simple), achievable throughput of a network of
untuned radios is linear in the density of nodes and the amount of spectrum
available for communication, same as in a fully-coordinated network of
tuned radios.
• The ratio of the achievable throughput of the untuned network to the
achievable throughput of a tuned network is shown to be close to . e/1
13
• It is also shown that if a network of untuned radios does not utilize any
coding in the network (i.e. the intermediate nodes only forward packets
without processing them), the throughput of the network does not grow with
node density and the amount of spectrum available for communication.
• A computationally inexpensive encoder is proposed that can support
multiple compression rates, allowing sensor nodes to compress their data
without having to do heavy processing.
• An adaptive correlation-tracking algorithm based on Least-Mean-Square
(LMS) filtering is presented. The algorithm can continuously track and
exploit both spatial and temporal correlation in the sensors' data.
14
C h a p t e r 2
RELIABLE COMMUNICATION USING UNTUNED RADIOS
With the emergence of ubiquitous wireless communication networks (such as
sensor networks) and ambient intelligence, the design of low data-rate short
distance wireless transceivers has gained prominence. Narrowband radios have
shown to be the architecture of choice for such applications [6],[7],[16], as they are
low in complexity and consume less power than spread spectrum or other wide-
band techniques. One fundamental requirement of narrowband radios is that the
transmitter’s carrier frequency and the receiver’s detection frequency must be well-
matched. This is traditionally accomplished by employing a crystal at both the
transmitter and receiver that provide the same reference frequency for the carrier
wave. However, great savings in both manufacturing cost and power consumption
could be obtained by eliminating the off-chip crystal. In [16] a mechanical
resonator is used to provide the carrier frequency at lower cost and power
consumption relative to crystal-based designs. The State-of-the art-quartz crystal-
based radios, such as the Chipcon CC2420, typically require 50mW of power in
both transmit (Tx) and receive (Rx) mode [6] for communication over 10 meters.
The micro-electromechanical (MEMS) based radio shown in [16] provides
substantial power and integration savings over traditional designs, operating at
3.6mW in Rx mode and 5.9mW in Tx mode for communication over 30 meters.
15
The cost and power consumption of the transceiver could be further reduced by
two orders of magnitude if a fully integrated radio could be built that uses an on-
chip resonator to provide the reference frequency. The drawback of using an
integrated on-chip resonator (such as an LC circuit or nano-electromechanical
weight) is that the variations in the manufacturing process are large. Therefore,
two narrowband radios produced by such a process are not guaranteed to be able
to communicate with each other.
To mitigate this problem, this dissertation proposes to exploit the high density of
nodes in the network that would be made economically feasible by using such a
low-cost design. The idea is to exploit the broadcast nature of wireless
communications. At high node density, this ensures that when a node attempts to
communicate, with high probability at least one of its neighbors can receive the
message even though not all the nodes are capable of communicating with each
other. There is a potential problem with using this approach for multi-hop
communication, however. If nodes just broadcast their packets and have any of a
number of candidate neighbors that hear the transmission forward the packet, it is
likely that multiple neighbors will receive and forward the packet. This would result
in an explosion in the number of copies of a packet as it propagates through the
network. To prevent this explosion in the number of packets, collisions - usually
considered a bane of wireless communications - could be exploited to ensure that
the number of received copies of a packet at each step does not grow
unboundedly. If too many nodes are transmitting a copy of the packet, they will
16
collide, resulting in a reduced number of receivers that successfully receive the
packet. If few transmitters are sending the packet, the number of collisions will be
low, increasing the number of successful receptions of the packet.
It will be shown analytically that these opposing forces cause the system to reach
a balanced equilibrium allowing for reliable communication through the network.
Increasing the density along with providing for channelization results in reliability of
the communication method that grows exponentially with density. An additional
benefit of distributing the responsibility for communication among many nodes is
that it makes the network much more robust to the inevitable failure or death of
individual nodes.
The radio architectures enabled by such an approach make it possible to envision
fully-integrated, single-chip nodes that would be small enough (10 mm3), cheap
enough (<$0.01), and low power enough (10 µW) that they could be woven into
fabrics or mixed in with wall-paint. This would enable the realization of “truly-
disappearing electronics”, that is ubiquitous computing and communication
devices that can be effectively integrated into the environment, and disappear from
view. The availability of these integrated meso-scale compute nodes would open
the door for new concepts such as smart fabrics, intelligent surfaces and ambient
intelligence.
The rest of this chapter is organized as follows: Section 2.1 describes the issues
involved in using such low-cost, low-power, but unreliable radio architectures.
17
Section 2.2 proposes a communication method that exploits the high density of
nodes made feasible by using such low-cost and small-form-factor nodes to
overcome the inherent unreliability of the radios. Section 2.3 discusses the
practical considerations that lead to the modeling assumptions made in Section
2.2. Section 2.4 analyzes the reliability of the proposed scheme. Finally, Section
2.5 compares the performance of the proposed scheme to two benchmark
schemes.
2.1 DRAWBACKS OF UNTUNED RADIOS
The main drawback of using an on-chip resonator is that the variations in the
manufacturing process are large. To achieve frequency tolerance approaching a
quartz crystal, prohibitively expensive trimming would have to take place. In
addition, drift over time and temperature would quickly render the trimming
inaccurate. The other option is to leave the transmitters and receivers un-tuned,
which means that two narrowband radios produced by such a process are not
guaranteed to be able to communicate with each other. Figure 2 presents a
qualitative illustration of the challenges presented by this approach. In a traditional
narrow-band architecture using a quartz crystal reference, the signal bandwidth is
typically orders of magnitude larger than the center frequency tolerance. When
using un-tuned receivers, the situation is reversed and the carrier frequency
variation is orders of magnitude greater than the signal bandwidth. This means
that if two radios obtained from such a process attempt to communicate, and the
18
receiver is sensitive only to a narrow portion of the spectrum (as would be done
when using traditional, tuned radios), the probability that the transmitter is sending
a signal in this narrow portion of the spectrum is very low. The most
straightforward way of ensuring that radios produced by such an imprecise
process can communicate with each other is to allow the front-end filter of each
receiver to admit all the frequencies in the range of carrier frequencies that could
result from the manufacturing process. Such wide-band receivers would be able
to “hear” any of the radios produced by the process, but they would also admit all
the noise and interferers in the band-pass range of the front-end filter. This, in
turn, would either force the transmitter to output more power to provide the same
signal to noise ratio (SNR) at the receiver, or it would greatly reduce the
communication range of the nodes.
19
3 σ v a ria tio n
S ig n a l B W
fca rr ier
P (fca rr ier)
Figure 2. Signal bandwidth relative to process variation when using on-chip LC resonators to provide the carrier frequency for narrowband radios. When using an LC resonator without a crystal reference, it is not possible for a radio to know exactly at what frequency it is operating.
However, there is an alternative to using the brute-force method of having a wide-
band front-end filter at the receiver that admits all the possible frequencies that
might contain the transmitted carrier frequency. The receiver could employ filtering
with the same bandwidth it would use in classical narrowband communication
when the transmitter’s carrier frequency is known with high accuracy. Of course,
in this case there is no guarantee that a particular transmitter would be able to
communicate with a particular receiver. If the input frequency range of a particular
receiver did admit the transmitter’s carrier frequency, the pair would be able to
communicate. Otherwise, the result would be the same as if the transmitter were
communicating on a channel orthogonal to the one that the receiver is monitoring.
20
The result is that unreliable manufacturing processes can be used to provide
channelization to the communication system.
Even though a particular transmit-receive pair may not be able to communicate
because they would be effectively tuned to different channels, a sufficiently high
density of nodes ensures a high probability that there are pairs of transmitters and
receivers that can communicate with one another. The number of channels
available in the system is determined by the ratio of the variations of the
manufacturing process to the receiver bandwidth. Narrowing the receiver input
bandwidth results in more communication channels. This would decrease the
probability that any given pair of nodes could communicate with each other, but it
would also decrease the amount of noise admitted by the receiver, increasing the
receiver sensitivity and reducing the necessary transmitted power level. It should
be noted that the receive bandwidth can be altered by using digital control to adjust
the receive radio parameters. As long as the bandwidth admitted by the receive
filters is greater than the signal bandwidth, there will be a non-zero probability that
two randomly selected nodes will be able to communicate with each other.
There is a potential problem with using this approach for multi-hop communication,
however. If nodes simply broadcast their packets and have any of a number of
candidate neighbors that hear the transmission forward the packet, it is likely that
multiple neighbors will receive and forward the packet. This would result in an
explosion in the number of copies of a packet as it propagates through the
21
network. This phenomenon is known as “broadcast storm.” To prevent this
explosion in the number of packets, collisions - usually considered a bane of
wireless communications - could be exploited to ensure that the number of
received copies of a packet at each step does not grow unboundedly. If too many
nodes are transmitting a copy of the packet, they will collide resulting in a reduced
number of receivers that successfully receive the packet. If few transmitters are
sending the packet, the number of collisions will be low increasing the number of
successful receptions of the packet. It will be shown in Section 2.4.3 that these
opposing forces cause the system to reach a balanced equilibrium allowing for
reliable communication through the network.
In the rest of this chapter and the next, this abstraction of having multiple channels
available for communication is made. Note that it is not necessary for the
channels to be orthogonal. What is important is the number of transmitter carrier
frequencies that fall within the bandwidth being monitored by a particular receiver.
The probability that any given transmitter falls within the receiver’s range is
dependent only on the ratio of the range of possible carrier frequencies to the
receiver bandwidth. This ratio is equivalent to the number of channels in the
analysis presented here because it is equal to the maximum number of
independent transmissions that can be made simultaneously without interfering
with one another.
22
2.2 MULTI-HOP COMMUNICATION METHOD
In order to convey the intuition of the proposed communication method, consider
the scenario shown in Figure 3. The source node has a packet to send to the
destination node. If the source and destination nodes are far enough apart, the
source node is not able to transmit the packet directly to the destination node.
Instead, it has to rely on nodes that lie between it and the destination to forward
the packet in a multi-hop fashion. The region between the source and destination
nodes can be divided into disjoint blocks as shown in the Figure 3. This can be
accomplished by including the coordinates of the corner points of block containing
the next-hop nodes. If the size of the blocks is small enough relative to the
transmission range of the nodes’ radios, it is possible for any node to communicate
with any other node in a neighboring block. This makes it possible to have the
packet hop from one block to the next until it reaches the destination.
If the transmitter selects a particular node in the next block to be the next hop on
the packet’s way to the destination, there is a danger that the transmitter and the
selected receiver are not communicating on the same channel. If the nodes are
allowed to duty cycle their radios in order to conserve their energy, it is also
possible for the selected receiver to be in the off state at the time when the
communication is attempted. Also, selecting only one node to route the packet
toward the destination does not make use of the broadcasting nature of wireless
communication. When a packet is transmitted, all of the nodes in the neighboring
23
blocks that are listening to the transmitter’s channel will be able to receive the
packet as long as they are on. This raises the following question: Is it possible for
the transmitter node to broadcast its packet and safely assume that at least one of
the nodes in the neighboring block have received it without any need for
acknowledging the reception?
1 …
Source Destination
H
Figure 3. Proposed multi-hop communication method. Nodes in each block listen to L randomly selected frequency bands. If a node detects transmissions from the previous block in any of those bands, it combines the inputs using random linear network coding and broadcasts the result to the next block.
The potential problem is that, in order to ensure that at least one neighbor will
receive the packet with high probability even in the presence of unreliable
channels or un-tuned radios, the number of potential next hop neighbors must be
high. In this case, it is likely that many neighbors will receive the packet and
forward it onward. As the packet propagates toward the destination, there is a
danger that the number of copies of the packet will grow unboundedly because at
each step (block) there will be more and more nodes transmitting the packet. To
prevent this from happening, it may be possible to make use of collisions on the
24
channel to prevent the number of copies of the packet received in any block from
growing unboundedly.
Consider the scenario in which there are multiple channels available for
communication among the nodes (as is the case when using un-tuned radio
architectures), and each of the nodes can dynamically select a channel to transmit
on or listen to. If more than one transmitter is operating on a particular channel
concurrently, a collision will occur and any node listening to that channel will be
unable to decode any transmissions on that channel during that time. Note that
the assumption that nodes dynamically select a channel to communicate on can
be satisfied by either varying the reactance of an LC-resonator (or selecting one of
several available resonators) on a single node, or by duty-cycling the nodes
thereby making a random subset of them active in communication at any point in
time.
Consider the case when the network is divided into a virtual grid as shown in
Figure 3 and the number of nodes in each block is equal. Let us define the
following variables:
≡K Number of channels ≡N Number of nodes/block ≡bT Number of transmitters in block b
The proposed communication method operates as follows:
25
1. All nodes that are awake and do not have a packet to send are in receive
mode. They randomly and independently select, with uniform probability,
one of the K channels to monitor.
2. The source node randomly selects, with uniform probability, one of the K
channels to transmit on.
3. Any of the N nodes in Block 1 that are monitoring the channel on which the
source node is transmitting, will receive a copy of the packet and forward it
at the next time step. Let us call the number of nodes in Block 1 that
received the packet . 1T
4. At the next time step, all of the nodes in Block 1 that received a copy of
the packet from the source node forward it toward the destination. They
randomly and independently select, with uniform probability, one of the
1T
K
channels to transmit on and broadcast the packet on that channel.
5. If more than one transmitter broadcasts on a particular channel, the result is
a collision, and any receivers monitoring that channel are unable to receive
the packet. Thus, only those receivers in Block 2 that are listening to a
channel on which exactly one transmitter is sending will receive a copy of
the packet. We denote the number of nodes in Block 2 that receive a copy
of the packet as . These nodes then proceed to forward the packet in the
same manner as described in Step 4.
2T
26
}
The random process specified by the number of nodes, , in Block that receive
a copy of the packet forms a discrete time Markov chain with the state space
, where the zero-state is absorbing and all others are transient. This
implies that the chain will eventually be absorbed in the zero-state with probability
one. What is of interest to us is how long it will take until the chain is absorbed in
the zero-state. This provides a measure of the robustness of the scheme because
it indicates how many hops the packet can traverse before it is lost. As long as
this number is greater than the distance between the source and the destination, it
is possible to use this method of communication. Section 2.4 considers the
robustness of the scheme as a function of
bT b
{ N,...,1,0
K and N as well as the energy
required to provide this level of robustness.
2.3 PRACTICAL JUSTIFICATION OF THE MODEL
The proposed communication method makes several assumptions that are
necessary to simplify the analysis. This section gives practical justification for
those assumptions showing that they can be met in practice.
The analysis makes the simplifying assumption that the communication channels
are orthogonal to each other, even though this is clearly not the case in practice.
Channelization is obtained by having each of the receivers listen to a portion of
spectrum that is much smaller than that covered by the variations in the
manufacturing process of the radios. Since the transmitter carrier frequencies and
27
the center frequencies of the receive filters are continuous random variables,
portions of spectrum monitored by different receivers can partially overlap.
However, the probability of channel collisions is the same whether the channels
are orthogonal or not. A channel collision at a particular receiver occurs when
more than one transmitter broadcasts a signal within the frequency range
monitored by the receiver. The probability of this occurring is independent of the
frequency ranges monitored by other receivers. It only depends on the ratio of the
variations of the manufacturing process to the frequency range monitored by each
receiver. That ratio is taken to be the number of channels in this analysis.
The analysis also assumes that the transmitting and receiving nodes randomly
select a channel to transmit on and monitor. In practice, this can be achieved by
having a high density of nodes in the network and allowing most of them to sleep
much of the time and have only a random subset of them on at any time. Having a
random subset of nodes on has the same effect as allowing the nodes to randomly
choose their receive center frequencies. As long as the nodes have two different
on-chip resonators provide their receive center frequency and their transmit carrier
frequency, the effect will be the same as if the receiving nodes randomly choose a
channel on which to forward the packet they have received. Alternatively, the
random channel selection could be accomplished by having the nodes randomly
adjust the capacitances of their reference frequency generators. This would allow
the nodes to randomly select, with near-uniform probability, a frequency in the
28
range of interest, although the node would not be aware of the exact frequency it is
selecting.
The system is modeled as being discrete time. I.e. it is assumed that the nodes in
each block will make their transmissions concurrently. This can be achieved in
practice by having each node that successfully receives the packet forward it as
soon as the node from which they heard it stops its transmission. Since the
proposed protocol does not require any handshaking and acknowledgements
there is no variable part in the latency, making it possible to maintain the discrete
time model. To initiate the process, it is possible for the node that originally
produced the packet to transmit the packet several times in succession so that any
nodes that wake up during this time and are tuned to its frequency will have an
opportunity to hear the packet and forward it. This can be used to reduce the
probability that the packet is lost at the first hop because there is only one
transmitter and no nodes may be listening to its channel.
The virtual grid used to divide the network into blocks is not really necessary in
practice. It is used to simplify the analysis by ensuring that all transmitters could
be heard by all receivers in the next hop; however, in practice this is not
necessary. (If required, the grid could be achieved by sending the coordinates of
the endpoints of the receiving block sent in the packet header. This would indicate
that only nodes that fall within those coordinates should forward the packet
towards the destination if they hear it.) Like all geographical routing methods, this
29
}
requires that the nodes have knowledge of their own locations relative to each
other. Fortunately, the performance of most position estimation algorithms for
sensor networks improves with increasing density [21].
2.4 ANALYSIS
As stated in Section 2.2, the number of copies of a packet being transmitted in
each block, , using this communication method forms a Markov chain on the
state space { where the zero-state is absorbing and all the other states
are transient. Therefore, the chain is guaranteed to eventually be absorbed in the
zero-state. This implies that if the distance (number of blocks) that the packet
attempts to traverse is unbounded, the packet will eventually be lost. There are
two ways in which this can happen. Consider the transmission of the packet from
Block to Block . There are transmitters in Block b and
bT
N,...,1,0
b 1+b bT N receivers in
Block . One possibility is that all of the transmitters are interfering with
each other. In other words, there is no channel that is being used by exactly one
transmitter. The other possibility is that there are channels on which exactly one
among the transmitters is sending, but none of those channels are being
monitored by any of the
1+b bT
bT
N receivers.
What is of interest to us is how long it takes for the chain described by to be
absorbed in the zero-state. This indicates how far (how many blocks) the packet
bT
30
can traverse before being lost. Also important is the average number of
transmitters per block that are transmitting the packet. This is equal to the average
value of before the chain is absorbed in the zero-state. This is important
because it provides a metric of how much energy is spent on transmitting the
packet for one block.
bT
It is possible to determine both the average number of steps until the chain is
absorbed in the zero-state and the average value taken on by the chain before
absorption from the transition probability matrix of the Markov chain. The
challenge then becomes finding this transition probability matrix, , as a
function of
NKP ,
K and N .
2 . 4 . 1 T r a n s i t i o n P r o b a b i l i t i e s
The probability, , that the Markov chain transitions from state to state is
the probability that transmitters broadcast the packet and receivers get a copy
of the packet. Let us denote with the probability that there are exactly
good channels (i.e.
NKllP ,', l 'l
l 'l
Kljp ,, j
j channels with exactly one transmitter) given that there are
transmitters and
l
K channels for { }),min(,...,1 Klj∈ . We know that the probability
of having successful receptions given 'l N receivers and j good channels out of
a total of K channels is
31
''
'
lNl
KjK
Kj
lN −
⎟⎠⎞
⎜⎝⎛ −
⎟⎠⎞
⎜⎝⎛⎟⎟⎠
⎞⎜⎜⎝
⎛ (1)
for and { }),min(,...,1 Klj∈ { }Nl ,...,1'∈ . Thus, the probability of having
transmitters broadcast the packet and receivers get a copy of the packet, for
, is
l
'l
{ }Nl ,...,1'∈
''),min(
1,,
,
'',
lNlKl
jKlj
NK
KjK
Kj
lN
pPll
−
=⎟⎠⎞
⎜⎝⎛ −
⎟⎠⎞
⎜⎝⎛⎟⎟⎠
⎞⎜⎜⎝
⎛⋅= ∑ (2)
The value can be computed by counting the ways in which Kljp ,, j out of
transmitters can each select a unique channel out of
l
K total channels, while the
other jK − channels each have no transmitters or more than one transmitter and
dividing this total number of combinations by the total number of ways in which
transmitters can select one of
l
K channels. There are a total of lK ways to
distribute transmitters among l K channels and ways to place each of !jjl
jK
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
j out of l transmitters in a unique channel out of K total channels. The number
of ways of assigning transmitters among jl − jK − channels such that no
channel has exactly one transmitter can be shown to be equal to the coefficient of
the term in the expression [26]. This term is given by )!/(l jx jl −− jKx xe −− )(
32
( ) ( ))!(
)!(10 mjl
jlmjKm
jK mjlmjK
m −−−
−−−⎟⎟⎠
⎞⎜⎜⎝
⎛ − −−−
=∑ (3)
Combining this with (2) gives us the transition probability matrix . Once
is computed, the expected time until the Markov chain is absorbed in the
zero state (i.e. the robustness of the scheme), given that it starts in state , is
given by the entry of the column vector
NKP ,
NKP ,
i
thi
( ) NN QI 11 v⋅− − (4)
where is the NI NN × identity matrix, N1v
is a column vector of N ones, and
is the
Q
NN × matrix containing the transition probabilities among the recurring
states (non-zero states) of the Markov chain [27]. ( ) 1−−QI N is an NN × matrix
whose entry is the expected number of visits to state given that the chain
started in state . From this, it is possible to compute the probability distribution of
before the chain is absorbed in the zero-state given that it starts in state . The
average value of before the chain is absorbed in the zero-state given that it
starts in state l is equal to the row of the column vector [27]:
', ll 'l
l
bT l
bT
thl
( ) [ ]TN NQI ,...,2,11 ×− − (5)
This gives us the tools necessary to evaluate the robustness and energy
requirements of the communication scheme for any given values of K and N .
33
2 . 4 . 2 R o b u s t n e s s G r o w s E x p o n e n t i a l l y w i t h D e n s i t y
The key parameter in this communication scheme is the ratio of the node density
to the number of channels. Let us define the following notation KN /≡α to
specify the ratio of node density (number of nodes/block) to the total number of
channels. This value is important because if α is too small, there will be few
transmitters and receivers compared to the number of channels, making it unlikely
that any pair of transmit and receive nodes will be attempting to communicate on
the same channel. On the other hand, if α is too large, there is a high probability
that the number of transmitters is so large that all of the channels will experience
collisions.
However, for a “reasonable” choice of α , the proposed communication scheme
possesses a nice self-regulating property in that the number of transmitters per
block, specified by the process , resists growing either too large or too small.
This is because, if the number of transmitters in one block is small, there will be
few collisions resulting in more good receptions and will be greater than .
Conversely, if the number of transmitters in a block is too large, the result will be
many collisions and will be less than . Indeed, this intuition is confirmed by
Figure 4, which shows the probability distribution of the number of transmitters in a
given block provided that the packet reaches that block. This particular probability
distribution corresponds to the case when
bT
bT
1+bT bT
1+bT bT
30=K , 96=N , and 2.3/ == KNα .
34
It can be seen that having fewer than 10 or more than 60 transmitters in a block is
very unlikely, although there is a finite probability that the number of transmitters
will be an outlier implying that eventually the number of transmitters will be zero
resulting in the death of the packet (i.e. the Markov chain will be absorbed in the
zero-state).
Figure 4. Probability distribution of number of transmitters while the packet is still alive for K=30 channels and density N=96 nodes/block
35
Intuitively, it would seem that by increasing both K and N , while keeping the ratio
α constant, the law of large numbers would ensure that the deviation of the
number of transmitters about the mean decreases, resulting in a larger number of
hops that the packet can safely traverse. Figure 5 shows that this intuition is
correct. Here, the expected number of hops that a packet can travel (if the
destination is infinitely far away) before it is lost is shown as a function of the
number of channels used, given that for each curve α , the ratio of the number of
nodes per block to the number of channels, is kept constant. Four different curves,
corresponding to four different values of α , are plotted. In each case, the
robustness of the system grows exponentially with the number of channels.
36
Figure 5. Robustness v. Channelization for various values of α
The scenario in which the source and destination are infinitely far apart is
unrealistic, so a more practical demonstration of the performance of this approach
is to consider the probability that a packet will fail to reach its destination that is a
fixed distance away. Figure 6 shows the probability that a packet will not be able
to reach a destination 10 hops away from the source node. Again, it can be seen
that the reliability of this communication scheme can be made arbitrarily high by
increasing the density of nodes and the channelization of the available spectrum,
37
because the probability that a packet does not reach its destination decays
exponentially fast as the density of nodes and number of channels is increased.
Figure 6. Probability of failing to transmit a packet over 10 hops v. Channelization for various values of α.
2 . 4 . 3 S t a b i l i t y a n d O p t i m a l R a t i o o f D e n s i t y t o C h a n n e l i z a t i o n
Figure 5 and Figure 6 demonstrate the exponential reliability of the proposed
communication scheme for various values of α . It can be seen that for certain
38
ratios of node density to number of channels the system performs better than for
others. In fact, some values of α are exponentially better than others. This is to
be expected due to reasons described in Section 2.4.2, where the intuition
regarding the performance of the system was given assuming a “reasonable”
choice of α . Let us now examine what choice of α is optimal in that it guarantees
the highest robustness at any given node density.
In order to find this optimal choice of α , note that the self-regulating property
described in the previous sub-section and shown in Figure 4 implies that there
should be an “equilibrium” point in the number of transmitters per block. Let’s call
this equilibrium value T . This value should be such that, if the number of
transmitters in block is equal to b T , the expected number of receptions in block
will also be 1+b T , meaning that the expected number of nodes that will be
forwarding the packet from block 1+b to block 2+b will also be T . Hence we
refer to this value as the “equilibrium” point. In order to find the optimal value of α ,
we will first find the equilibrium point, T , for each α (this will be a function of α ),
and then find the α that has the equilibrium point which minimizes the probability
of unsuccessful communication over one hop when T transmitters are attempting
to communicate over K channels with KN ⋅=α receivers.
To simplify the notation, let us define the following variables that will be useful in
finding the equilibrium values:
39
KTR bb /=
KTR /=
Given that there are transmitters in block b and bT K channels, the probability
that a randomly selected channel has exactly one transmitter on it is
( ) 11
/11111
−⋅−
−=⎟⎠⎞
⎜⎝⎛ −⎟⎠⎞
⎜⎝⎛⎟⎟⎠
⎞⎜⎜⎝
⎛ KRb
Tb b
b
KRK
KK
L (6)
which asymptotically becomes, as K grows to infinity,
bRbb eRKTTxexactlyhaschnlselected −=),|.1.Pr( (7)
Knowing this probability that a randomly selected channel will have exactly one
transmission on it (i.e. no collisions), it is possible to find the expected number of
receptions given the number of transmitters. In particular we would like to find the
number of transmitters at equilibrium. Mathematically, we seek:
],,|[ 1 NKTTTET bb == +
],|./[#],|.[#
KNchnlreceiversofEKTTcollisionsnowithchnlsofE b
⋅==
KNKTTcollisionsnohaschnlselectedK b
/),|.Pr(
⋅=⋅=
α⋅⋅⋅= −ReRK (8)
Expressing T as RK ⋅ and comparing the leftmost expression with the rightmost
expression in (8) gives us the desired relationship between the number of
transmitters at equilibrium and α :
40
bReR =⇔= αα )ln( (9)
It is straightforward to show that this equilibrium is a stable one by verifying, using
the derivation of (8), that bbb TNKTTTE <>+ ],,|[ 1 and
bbb TNKTTTE ><+ ],,|[ 1 . This is simply a result of the aforementioned self-
regulation property.
Once the relationship between α and the equilibrium point R is known, the
optimal value of α , which provides the greatest reliability in the communication
system, can be computed by finding what value of α minimizes the probability that
a selected channel is not utilized given that the system is in equilibrium. A channel
is unutilized if the number of transmitters on it is not equal to one or if no receiver
listens on that channel. The probability that the number of transmitters on a given
channel is not equal to one is simply the complement of (7), while the probability
that no receiver, out of N , listens to the channel given that there are K channels
is
( ) αα −⋅ =−=⎟⎠⎞
⎜⎝⎛ − eK
KK K
N
/111 (10)
where the second equality holds asymptotically as K grows to infinity. Thus, the
optimal value of α can be computed as
41
])1()1[(minarg αα
α
−−−− ⋅⋅−−+⋅− eeReeR RR
])/)ln(1()/)ln(1[(minarg αα
ααααα −− ⋅−−+−= ee
(11)
The probability is minimized for 187.3=α . Indeed, Figure 5 and Figure 6 confirm
that this value of α maximizes the reliability of the proposed communication
scheme.
It should be noted that this optimal value of α can be reached at run-time as both
the density of nodes and the number of available communication channels can be
adjusted dynamically. The effective density of nodes can be altered by changing
the duty cycling of nodes making more or fewer of them active at any given point in
time. The number of available communication channels can also be adjusted
dynamically via digital control of the radio front-end and/or baseband circuitry to
vary the receive noise bandwidth. This flexibility allows the system to adapt and
operate in different network conditions providing an arbitrarily reliable
communication fabric even though the individual components are un-tuned and
unreliable.
2.5 SYSTEM LEVEL VIEW
It is important to compare the proposed communication scheme to certain
benchmarks in order to evaluate its system level performance. One of the two
most relevant benchmarks is comparing the proposed method to schemes that
42
employ tuned radios. While individual tuned radios are larger and much more
costly, as well as power-hungry, their relative reliability (due to the guarantee that
any two radios can communicate with each other if the channel is not in a deep
fade) may offer system level advantages that must be considered. The other
important benchmark is comparing the proposed method to one which uses the
same untuned radios, but with wide front-end receive filters that would again
ensure that any two such radios can potentially communicate, albeit at the cost of
significantly increased noise levels at the receiver.
The metrics to consider are overall cost, reliability even in the presence of node
and link failures and total power consumption as well as the ability to scavenge
energy from the environment. This last point is important because the energy that
can be scavenged by the network is proportional to the number of nodes in the
network and the size of the nodes. So, even if a system employing tuned radios
may be able to achieve the same level of reliability at a lower network-wide power
level than the one proposed here, the cost of such nodes may prohibit deploying
enough nodes to generate as much power as is consumed. The other way of
gathering more energy is by making the nodes larger (e.g. larger solar panels can
gather more energy), but having few large nodes would mean having less sensing
resolution and less robustness to the failure of individual nodes. The strengths as
well as weaknesses of the proposed scheme relative to the two alternatives are
discussed next.
43
2 . 5 . 1 C o m p a r i s o n t o S c h e m e s U s i n g U n t u n e d R a d i o s w i t h W i d e R e c e i v e F i l t e r s
Even when using un-tuned radios, it is possible to ensure that any two such radios
can potentially communicate with each other by having the receivers admit all the
frequencies that could potentially contain the signal and then using self-mixing
(envelope detection) to recover the signal even though the frequency of the carrier
wave is not known. Such schemes still suffer from poor spectral efficiency since
only one packet at a time may be sent on any of the frequencies that fall within the
variance of the radio manufacturing process. This again requires the nodes to
coordinate who is transmitting when, incurring the latency and power overhead of
such coordination. Also, opening up the receive filter would admit more noise
(proportional to the BW of the filter) and interference at the receiver. Therefore,
the transmitter would have to output more power in order to achieve the same
SNR at the receiver. This would cancel out the advantage from having only one
node transmit a packet instead of many as in the scheme proposed in this paper.
2 . 5 . 2 C o m p a r i s o n t o S c h e m e s U s i n g T u n e d R a d i o s
When using tuned narrowband radios for communication, it is possible to
guarantee that any two radios are capable of communicating with each other, as
long as the attenuation of the signal in the channel is not too high and the level of
interferers is not prohibitively large. However, as empirical evidence has shown, in
44
the short-distance communication required for sensor networks the link qualities
vary over time and deep fades can make the connectivity unstable [17],[18]. To
combat the instability of the links, it is necessary to deploy nodes with high density
so that alternate links are available. High node density is also required to address
the issue of power consumption of individual nodes. Because tuned radios
consume more power than the nodes can scavenge from the environment, it is
necessary to allow the nodes to sleep for much of the time. In order to preserve
network functionality even while most nodes are asleep, it is necessary to deploy
the nodes densely. However, the cost of producing nodes with tuned radios is
high, making the deployment of such high-density networks economically
imposing. Another difficulty to deploying dense networks of tuned nodes is that the
large size of such nodes can make it physically difficult to do so. Note that it is
also possible to increase the reliability of the links and the duty cycling of nodes by
making more energy available to them either by having larger batteries or larger
energy-scavenging engines, but this would further increase the size of the nodes,
making it impossible to embed them in surfaces and losing out on the sensing-
resolution aspect of dense networks.
The second drawback to traditional schemes that ensure reliable multi-hop
communication by having the transmitter select one forwarding node to send to is
latency. If the transmitter selects the forwarding node without testing the channel
first, it is possible that the channel will be bad. If the transmitter tests the channel
first it must wait for the responses from the potential forwarding nodes before
45
sending the packet. Those responses may collide causing further delay. Also, the
need for acknowledgements and retransmissions increases the power
consumption of such schemes. The method proposed here eliminates the need
for such handshaking, thereby reducing the latency of the communication.
The one important advantage of using tuned radios is spectral efficiency. The
scheme described in this chapter requires the same packet to be transmitting over
multiple channels. In contrast, using tuned radios would make it possible to
transmit independent packets over those channels increasing the throughput of the
network. This disadvantage of the proposed scheme may be partially reduced by
employing distributed forward error correction codes. Using distributed channel
coding makes it possible for networks of untuned radios to achieve performance
close to that of a network of tuned radios. The next chapter shows how this can be
done and proves that the performance of a network of untuned radios can achieve
throughput of up to of the maximum throughput of a network of tuned radios. e/1
46
C h a p t e r 3
THROUGHPUT OF NETWORKS OF UNTUNED RADIOS
3.1 MAXIMUM THROUGHPUT OVER ONE HOP
Consider using the nodes to form a communication backplane carrying data
between a source and a destination as described in Section 2.2. The data is
transported in a multi-hop fashion by a network of nodes that employ untuned
narrowband radios. We are interested in determining the throughput of this
communication backplane. In other words, we wish to find how many independent
packets can be transmitted simultaneously. To maximize the throughput of the
network, it is necessary to maximize the probability that during communication a
channel is occupied by exactly one transmitter. This will maximize the number of
channels that contain a decodable transmission. It can be shown that when there
are N channels available for communication and each transmitter is
independently and randomly assigned to a channel with the same probability
distribution, the probability that a channel is occupied by exactly one transmitter is
maximized when there are exactly N transmitters and each transmitter is equally
likely to be assigned to any of the channels. In this case, the probability that a
channel contains exactly one transmission is asymptotically, for large N , equal to
(the result is proven in Appendix A). This implies that, in order to maximize e/1
47
the throughput, the network should be operated with N active transmitters within
communication range of each other, in which case each transmission will
experience a collision with probability e/11− .
Having N active transmitters within communication range maximizes the
probability that a receiver will have exactly one transmitter in the range of
frequencies it monitors; however, it may still be possible that a transmitter occupies
a unique channel, but no receiver is tuned to that channel. In order to increase the
probability that a non-colliding transmission is heard as well as the probability that
a receiving node hears at least one packet, each one is equipped with several
receive radios, with each radio tuned to a different channel (by using a different
LC-circuit as the local oscillator for each radio - this also allows the nodes to
transmit on different random channels at different times by selecting any of its LC-
circuits to provide the carrier frequency for the transmitter).
Denote the number of receive-radios on each node with L . Section 3.3, derives
the relationship between the value of L and the throughput of the network. It
shows that it is possible to achieve throughput that is linear in the number of
channels even with a constant value of L . It is important to show that this is
achievable with a constant L because requiring L to grow with the number of
channels would correspond to requiring more hardware on each node, and this is
exactly what we are trying to avoid. It should be noted that the theoretical results
use bounding techniques, so the constants of this linear throughput that are
48
guaranteed by any particular value of L are pessimistic. To complement the
theoretical result and give guidelines for practical deployments, simulations are
used to estimate the throughput that can be achieved with different values of L .
3.2 MAXIMUM THROUGHPUT OVER MANY HOPS
Considering the multi-hop communication through a virtual grid as described in
Section 2.2, if every node in a block transmits on a random frequency, it is likely
that there will be transmissions at frequencies close to each other, thus any
receiver tuned to those frequencies will detect a collision and will not be able to
decode the individual transmissions. These collisions effectively erase some of
the packets, making it seem as if nodes in neighboring blocks communicate with
each other through an erasure channel. The question we are interested in is,
given N unit-capacity communication channels, how much data can
simultaneously be sent to the destination and have this data successfully received
and decoded by the destination that is H hops away (for now, let us assume that
H is a constant, though later it will be shown that H may be allowed to grow with
N , as long as , without affecting the asymptotic
throughput). We want to compare this to a fully-coordinated network employing
tuned radios, in which case exactly
0/)](log[ ⎯⎯ →⎯ ∞↑NNNH
N packets could be sent in each wave,
provided that there are N nodes in each block and every node selects a unique
frequency on which to communicate.
49
3.3 ACHIEVABLE THROUGHPUT OVER MANY HOPS
We will find the relative throughput of the untuned network by showing that the
connectivity of the network can be modeled as a random graph and then applying
known results from network coding literature. Namely, we make use of the result
that for communication in a graph of unit capacity links for which the connectivity is
not known a priori, a throughput equal to the max-flow between the source and the
destination is achievable with arbitrarily high probability by using random linear
network coding1 over a high enough field size [22]. However, in order to make use
of this result, we must find the max-flow of the graph. This is done by Result 1.
Since the connectivity of this random graph is not static (i.e. each wave of data will
encounter a different set of links), the packets have to carry the encoding vectors
in their headers to provide the destination with just the right information needed to
decode the source packets as in the scheme introduced in [23].
We will show that the throughput with network coding is linear in N over
hops as long as . By contrast, simple random routing (in
which forwarding nodes are only allowed to randomly select one of the packets
from each wave to forward, rather than combining the packets they receive in each
wave to form the output packet) has constant throughput over hops
(see Appendix B).
)(NH
0/)](log[ ⎯⎯ →⎯ ∞↑NNNH
NNH =)(
1 In random linear network coding, forwarding nodes send on each outgoing link a random linear combination of the packets
it receives on the input links. Each input packet is multiplied by a randomly chosen element from some Galois Field and these products are added together to form the outgoing packet [22].
50
3 . 3 . 1 R a n d o m G r a p h R e p r e s e n t a t i o n o f t h e N e t w o r k C o n n e c t i v i t y
We now create the random graph, shown in Figure 7, that models the connectivity
of the network. The N vertices in each column correspond to the N nodes in
each block during communication. The H columns correspond to attempting to
communicate over H blocks (for ease of notation, we first consider the case when
H is constant, rather than a function of N ). Each of the vertices in columns
has },...,2{ H L incoming links corresponding to the L receivers on each node.
Each link connects a vertex to a randomly, independently chosen vertex in the
previous column. Since transmissions experience collisions with probability
, each of the vertices in the graph is deleted with probability e/11− e/11− , in
which case all of its incoming and outgoing links are also deleted2. This means
that each of the links is deleted with probability e/11− because each receive-radio
has probability of being tuned to a frequency range that does not contain a
decodable transmission (i.e. either no transmission or more than one). The links
that are not deleted are equally likely to connect to any of the vertices in the
previous column that are not deleted because, given that a receive-radio has
exactly one transmission in its receive frequency range, the source of that
transmission is equally likely to be any of the transmitters that do not experience a
collision.
e/11−
2 In the random graph, the vertices are deleted independently of one another. This is not the case in the network since the
collisions are not independent; however, this approximation becomes accurate as N tends to infinity. The independence assumption allows for analytical tractability in what follows.
51
We label the resulting random graph as and show that the max-flow of
is close to if
eLG /11, −
eLG /11, − e/1 L is large enough.
L
1
2
N
1
1 2 3 H
L
1
L
1 L
1
L
1 L
1
L
1
L
1
L
1
Figure 7. Random graph representing connectivity in the network of nodes with untuned radios. Each vertex (node) in columns {2,…H} has L inputs, each one coming from a randomly and uniformly selected node in the previous column. Each node, along with its incoming and outgoing links, is deleted with probability 1-1/e.
3 . 3 . 2 M a x - F l o w o f t h e R a n d o m G r a p h
Result 1: For any constant β such that e/1<β there exists a constant number
of inputs/node L such that the max-flow of is greater than eLG /11, − N⋅β with high
probability as N goes to infinity.
52
We will prove this result by applying a modified version of a technique used in
Percolation Theory. The first step is to relate the likelihood of having many disjoint
end-to-end (E2E) paths to the likelihood of having even a single path E2E. Let us
define the following notation:
Let A be the event that there exists a path E2E and let be the event having the
following property: starting with any graph in , the deletion of any
rA
rA r vertices will
still result in A . This is equivalent to saying that any graph in has at least rA 1+r
vertex-disjoint paths E2E.
Lemma 1: Let r be a positive integer. Then
( ) ( ){ }APqq
qAP p
r
rp 1211
12
2 −⎟⎟⎠
⎞⎜⎜⎝
⎛−
≤− (12)
whenever . Here, 10 12 ≤≤≤ pp 11 1 pq −= and 22 1 pq −= , and the notation
( )⋅pP represents the probability that the event in parentheses occurs when vertices
in the graph are deleted with probability . p
Proof of Lemma 1: This proof is based on the proof in [1] of a similar result from
Percolation Theory. Let { }NiX ji ,...,1, ∈∀ and { }Hj ,...,1∈ be i.i.d. random
variables uniformly distributed in the interval [ ]1,0 , and to each vertex in row i and
column j of the grid assign the value . To create graphs and jiX , 1, pLG
53
2, pLG that have vertices deleted with probability and respectively, do the
following: first assign the values to each vertex in the grid. Then assign
1p 2p
jiX , L
links from each node in columns 2 through H to a randomly selected node in the
previous column. Finally, to create graph , for each vertex 1, pLG ji, , delete it iff
. To create graph , for each vertex 1, pX ji ≤2, pLG ji, , delete it iff . 2, pX ji ≤
We are interested in relating the likelihood that occurs in to the
likelihood that
rA2, pLG
A occurs in . Note that if does not occur in , then
there must be a set of vertices,
1, pLG rA2, pLG
B , such that:
a) All of the vertices in the set B are not deleted in 2, pLG
b) rB ≤
c) The graph 2, pLG obtained by deleting from the vertices in
2, pLG B
satisfies AG pL ∉2, .
There may exist many such sets B , in which case it is sufficient to pick any such
set. Suppose that rpL AG ∉2, , and that every vertex ji, in the set B satisfies
. It then follows from c) that 1,2 pXp ji ≤< AG pL ∉1, . Conditional on B , there is
a ( ) ( )[ ] ( )[ ] BB qqqppp 212221 /1/ −=−− probability that for all
vertices in
1,2 pXp ji ≤<
B ; therefore,
54
( )r
rpLpL qqq
AGAGP ⎟⎟⎠
⎞⎜⎜⎝
⎛ −≥∉∉
2
12,, 21
| (13)
Applying Bayes’s theorem and the fact that ( )rpLpL AGAGP ∉∩∉21 ,,
( )rpLp AGP ∉≤12 , gives the result of Lemma 1. �
This result of Lemma 1 is particularly useful if we can show that the probability that
that decays exponentially (with AG pL ∉1, N ) to zero for some . In other
words, if we can show that
1p
( ){ } ( )LpNp eAP ,1
11 α−≤− , then we have
( ) ( )LpNr
rp eqq
qAP ,
12
2 12
1 α−⎟⎟⎠
⎞⎜⎜⎝
⎛−
≤− (14)
and applying Nr ⋅= β tells us that the probability of not having N⋅β (actually
1+⋅Nβ ) paths decays to zero exponentially as long as
( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛−
<12
21 log/,
qqq
Lpαβ (15)
The problem now becomes finding an appropriate bound on ( )AGP pL ∉1, .
55
Lemma 2: If , then ( ) 11 1 >− pL ( ) ( )∗−<⋅≤∉ ZNYPHAGP pL 1/1, where Y is a
random variable drawn from the Binomial distribution
and .
( ) ⎟⎠⎞⎜
⎝⎛ −+− ∗L
ZppN 11 11,
( )[ ] ( )LpLZ −∗ −= 1/111
Lemma 2 allows us to relate the probability that no E2E path exists in to the
probability that the mean of
1, pLG
N Bernoulli random variables deviates from its
expected value by some amount. Since this probability decays to zero
exponentially in N , this result, along with Lemma 1 will allow us to prove that the
number of vertex-disjoint paths in will be linear in 2, pLG N with high probability.
The constant, β , of this linear relationship will depend on the value of L . Note
that β also depends on the value ; however, the value is not fundamental to
the graph , and we are allowed to assign any value to , as long as it is
larger than , so as to maximize the bound on
1p 1p
2, pLG 1p
2p β guaranteed by Lemma 1 and
Lemma 2.
Also note that the condition that ( ) 11 1 >− pL is imposed to ensure that .
It can be shown that if
01 >− ∗Z
( ) 11 1 >− pL , then AG pL ∈1, with high probability,
otherwise with high probability. However, in our case it is not enough to
show that with high probability for appropriate values of
AG pL ∉1,
AG pL ∈1, L and . We 1p
56
must also bound the rate of this convergence (using Lemma 2) in order to apply
Lemma 1 to our original problem.
Proof of Lemma 2: Consider the number of vertices in each column of that
were not deleted and have a path back to column 1. Let us call these vertices
“good” and all the others “bad.” Conditioned on the number of bad (good) vertices
in column
1, pLG
j , vertices in column 1+j are themselves good or bad independently
of each other and with equal probability. Let the number of bad vertices in column
j be NZ ⋅ for some Z that satisfies 10 ≤≤ Z . Then vertices in column 1+j are
themselves bad with probability ( ) LZpp 11 1−+ .
Consider the probability that the number of bad nodes in column is greater
than
1+j
NZ ⋅ , given that the number of bad nodes in column j is NZ ⋅ . If
, this probability should be exponentially small in ( ) ZZpp L <−+ 11 1 N . This
probability is minimized when the difference ( ) LZppZ 11 1−+− is greatest.
Setting the first derivative of this function, which is convex in the interval , to
zero shows that the expression is maximized when
)1,0[
( )[ ] ( )LpLZ −−= 1/111 .
Let represent the ratio of bad nodes to the total number of nodes in column
(i.e. the total number of bad nodes in column is ).
jR j
j NR j ⋅ AG pL ∉1, is
equivalent to . Note that if 1=HR 1=jR for some , then [ )Hj ,1∈
57
][ HjkRk ,11 +∈∀= because if column j has no connectivity back to column 1,
then none of the columns after j will have connectivity to column 1 either. We
prove Lemma 2 by arguing that ( )AGP pL ∉1, is upper bounded by the probability
that there exist a for which [ Hj ,1∈ ] 1=jR , and this is upper bounded by the
probability that there exist a ],1[ Hj∈ for which . Mathematically, ∗> ZR j
( ) [ ]( )
( ) (
( ) (
( ) ( ) ( )
)
)
( )∗∗∗
=
∗−
∗∗
=
∗−
∗∗
∗
−<⋅<
−<⋅−+>=
=>+><
≤>+>≤
>∈∃<∉
∑
∑
ZNYPH
ZNYPHZRP
ZRZRPZRP
ZRZRPZRP
ZRtsHjPAGP
p
H
jjjpp
H
jjjpp
jppL
1/
1/1
|
|
..,1
1
211
211
,
1
11
11
11
(16)
where the second inequality is by the union bound, the last inequality is because
, and ( ) LZppp ∗−+< 111 1 Y is a random variable drawn from the
Binomial⎜⎝⎛ distribution. � ( ) ⎟
⎠⎞−+− ∗L
ZppN 11 11,
Now, we must find the rate at which ( )∗−< ZNYP 1/ decays to zero and apply
this to Lemma 1 to show that the number of vertex-disjoint paths in grows 2, pLG
58
linearly with N . Fortunately, this rate is well known [25]. We use the following
notation: and to write [25]: ( ) ]1[1 11LZppq ∗−+−= ∗+−= Zq 1ε
( ) ( )( ) ( )εε
εεε+−−−−
∗⎟⎟⎠
⎞⎜⎜⎝
⎛−+−
⎟⎟⎠
⎞⎜⎜⎝
⎛ −≤>−=−<
qNqN
qqNYqPZNYP
1
11/1/
(17)
The right-hand side of the equation can also be expressed as
( ) ( )
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−+−
⎟⎟⎠
⎞⎜⎜⎝
⎛ −+−−−− εε
εεqNqN
1
11
logexp (18)
giving us
( )( ) ( )
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−+−
⎟⎟⎠
⎞⎜⎜⎝
⎛ −=
+−− εεεε
αqq
qqLp
1
1 11log,
(19)
as the ( Lp ,1 )α we need to plug into (15). Evaluating (15) with a large enough but
constant value for L and appropriately chosen value for provides the
guarantee that he max-flow of is at least
1p
eLG /11, − N⋅β for any e/1<β , proving
Result 1. ■
The throughput is not dependent on H because we held H constant while letting
N go to infinity. However, (16) implies that H may grow with N without affecting
the throughput result as long as goes to zero as NNH /))(log( N grows,
because all we need is for ( )AGP pL ∉1, to decay exponentially with N . Since
59
( )∗−< ZNYP 1/ decays exponentially with N , so does ( )AGP pL ∉1, as long as
goes to zero as NNH /))(log( N grows.
3.4 SIMULATION RESULTS
Note that the result is proven using bounding techniques, so it is not tight. For
example, for , the result proves that the throughput is guaranteed to be at
least 2%, whereas simulations show that
10=L
10=L is good enough to give
throughput of 30%. Figure 8 shows the result of simulations for various values of
L at 1000=N .
This max-flow result, together with the random network coding result of [22] tells us
that each wave of data can deliver nearly packets from the sources to the
destination, compared to
eN /
N packets that could be transported in each wave if the
network were composed of nodes with tuned radios and perfect coordination.
In practical deployments, having about 10 radios per node would be realistic. In
this case, the theoretical results tell us that a throughput of only N⋅02.0 can be
guaranteed. However, simulations show that a throughput of N⋅3.0 can be
expected. This is a good trade-off for many applications in which the demand on
bandwidth is not as strict as the demand for low-cost nodes that can operate at
power levels comparable to the power levels that can be supplied by the energy
scavenging mechanisms.
60
Figure 8. Simulation results showing the ratio of the throughput achievable with untuned radios and network coding to the throughput of a perfectly tuned and synchronized network, as a function of the number of inputs per node for 1000 channels and 1000 nodes per block.
61
C h a p t e r 4
DISTRIBUTED COMPRESSION3
The appeal of using distributed compression lies in the fact that each sensor can
compress its data without knowing what the other sensors are measuring. In fact,
an individual sensor does not even need to know the correlation structure between
its data and that of the other sensors. This is especially desirable in a setting
where the sensor nodes are power-constrained, because each sensor node does
not need to spend power on an algorithm for learning the correlation structure
between its own measurement and other sensors' measurements. Moreover,
each sensor node does not need to spend power on receiving and processing
other sensors' measurements. As a result, an end-to-end compression system
that achieves a significant savings across the network can be built, where the
endpoints consist of the sensor node and the data-gathering node.
To build a distributed compression system, we propose to use an asymmetric
coding method among the sensors. Specifically, we propose to build upon the
architecture of Figure 9 which is designed for two nodes. In Figure 9, there are
two nodes, each of which measures data using an Analog-to-Digital (A/D)
converter. One of the sensor nodes will either transmit its data Y directly to the
data-gathering node or compress its readings with respect to its own previous
3 The work presented in this chapter was done in collaboration with Jim Chou.
62
readings while the other sensor node compresses its data X with respect to its
own previous readings and readings from other sensors and then transmits the
compressed data to the data-gathering node. The decoder will then try to
decode to
m
m X , given that Y is correlated to X . In specific cases (e.g., discrete
alphabet or continuous alphabet with i.i.d. Gaussian correlation), it can be shown
that the compression performance of the above architecture can match the case
where Y is available to the sensor node that is measuring X .
EncoderA/D
A/D
Decoder
Data
Data
X
Y
Y
X̂
Sensor Node
Sensor Node
Data Gathering Node
m
Figure 9. Distributed compression set-up: The encoder compresses X making use of the fact that the decoder has access to Y, which is correlated to X. This allows the encoder to compress X to fewer bits, without loosing performance, than it would if the decoder did not have access to Y.
To extend the above architecture (Figure 9) to nodes, one node can send its
data either uncoded (i.e.,
n
Y ) or compressed with respect to its past. The data-
63
gathering node can decode this reading without receiving anything from the other
sensors. The other sensors can compress their data with respect to Y , without
even knowing their correlation structure with respect to Y . The data-gathering
node will keep track of the correlation structure and inform the sensors of the
number of bits that they shall use for encoding. In the compression literature, Y is
often referred to as side-information and the above architectures are often referred
to as compression with side information [34].
To develop code constructions for distributed compression, first some background
information on source coding with side information is provided and then a code
construction that achieves good performance at a low encoding cost is introduced.
4.1 BACKGROUND ON COMPRESSION WITH SIDE INFORMATION
In 1973, Slepian and Wolf presented a surprising result to the source coding
(compression) community [34]. The result states that if two discrete alphabet
random variables X and Y are correlated according to some arbitrary probability
distribution , then ( yxp , ) X can be compressed without access to Y without
losing any compression performance with respect to the case where the encoder
of X does have access to Y . More formally, without having access to Y , X can
be compressed using bits where ( YX )H ,
64
( ) ( ) ( ) ( )∑ ∑−=y x
XXY yxPyxPyPYXH |log|, 2 . (20)
The quantity, is often interpreted as the ``uncertainty'' remaining in the
random variable
( YXH , )
X given the observation of Y [35]. This is the same
compression performance that would be achieved if the encoder of X had access
to Y . To provide the intuition behind this result, we provide the following example.
Example 1: Consider X and Y to be equiprobable 3-bit data sets which are
correlated in the following way: ( ) 1, ≤YXdH , where ( )YXdH , denotes Hamming
distance between X and Y . When Y is known both at the encoder and decoder,
we can compress X to 2 bits, conveying the information about the uncertainty of
X given Y (i.e., the modulo-two sum of X and Y given by: ( ), 000 ( )100 ,
, and ). Now, if ( 010 ) )( 001 Y is known only at the decoder, we can surprisingly
still compress X to 2 bits. The method of construction stems from the following
argument: if the decoder knows that ( )000=X or ( )111=X , then it is wasteful
to spend any bits to differentiate between the two. In fact, we can group
and into one coset (it is exactly the so-called principal
coset of the length-3 repetition code). In a similar fashion, we can partition the
remaining space of 3-bit binary codewords into 3 different cosets with each coset
containing the original codewords offset by a unique and correctable error pattern.
Since there are 4 cosets, we need to spend only 2 bits to specify the coset in
which
( )00=X 11=X0 ( )1
X belongs. The four cosets are given as:
65
Coset 1 ( )111,000= Coset 2 ( )011,100=
Coset 3 ( )101,010= Coset 4 ( )110,001=
The decoder can recover X perfectly by decoding Y to the closest (in hamming
distance) codeword in the coset specified by the encoder. Thus the encoder does
not need to know the realization of Y for optimal encoding.
In their paper [34], Slepian and Wolf established a set of achievable rate-tuples
that are needed to represent X and Y . These rate-tuples are represented as a
graph in Figure 10. From the graph, we can see that a minimum of
bits are needed to represent both ( ) ( ) =+ YXHXH | ( YXH , ) X and Y . These
bits can be divided either evenly or unevenly in the encoding of X and Y as
shown in Figure 10. This dissertation focuses on the case where each sensor
node uses roughly the same number of bits for encoding its data, so that power
consumption will be evenly distributed among the nodes in the network. This rate
region corresponds to the straight line in the achievable region of Figure 10, and
can be achieved by using either symmetric codes (see [36]) or by time sharing
asymmetric codes. The latter solution is not only simpler but also more robust to
losses in the network as will be shown later. As an example of time sharing, let us
refer to Example 1. Assuming that node 1 is measuring data X , and node 2 is
measuring correlated data, Y , then we can have node 1 send its full data uncoded
during the even time instants and its data encoded (as in example 1) during the
66
odd time instants. Similarly, node 2, will send its data uncoded during the odd time
instants and its data encoded (as in Example 1) during the even time instants. If
( ) ( )YHXH = , then node 1 and node 2 will use approximately the same amount
nding its data. In the case that of power in se ( ) ( )YHXH ≠ , then we can have
node 1 either send its data for a larger or smalle the time so that the
number of bits used by node 1 and node 2 are roughly equal for long durations of
time.
r proportion of
R(Y)
R(X)
H(X)
H(X|Y)
H(Y)H(Y|X)
Achievable Rate Region
R(X) = R(Y)
Figure 10. Achievable rate regions in distributed compression: The horizontal axis corresponds to the rate for encoding Y and the vertical axis corresponds to the rate for encoding X. The 45°
The above results were established only for lossless compression of discrete
random variables. In 1976, Wyner and Ziv extended the results of [34] to lossy
line in the achievable region represents the region where the rate needed to encode X is equal to the rate needed to encode Y.
67
The results established by [34] and [37] are theoretical results; however, and
consequently do not provide intuition as to how one might achieve the predicted
distributed compression by proving that under certain conditions [37], there are no
performance degradations for lossy compression with side information available at
the decoder as compared to lossy compression with side information available at
both the encoder and decoder.
theoretical bounds practically. In 1999, Pradhan and Ramchandran [38]
prescribed a constructive framework and practical constructions for distributed
compression in an attempt to achieve the bounds predicted by [34] and [37]. The
resulting codes perform well, but cannot be used directly for sensor networks
because they are not designed to support different compression rates. To achieve
distributed compression in a sensor network, it is desirable to have one underlying
codebook construction that is not changed among the sensors but can also
support multiple compression rates. The reason for needing a codebook that
supports multiple compression rates is that the compression rate is directly
dependent on the amount of correlation in the data, which might be time-varying.
Motivated by the above, this dissertation provides a tree-based distributed
compression code that can provide variable-rate compression without the need for
changing the underlying codebook construction.
68
4.2 CODE CONSTRUCTION FOR DISTRIBUTED COMPRESSION
This describes a codebook construction that will allow an encoder to encode a
random variable X given that the decoder has access to a correlated random
variable Y . This construction can then be applied to a sensor network as shown
in Figure 9. The main design goal of the code construction is to support multiple
compression rates, in addition to being computationally inexpensive. In support of
the goal of minimizing the computations for each sensor node, code constructions
based on complicated error correction codes are not be considered here. These
codes can, however, be easily incorporated into the construction but will lead to
more complexity for each sensor node. The uncoded code construction is as
follows. Start with a root codebook that contains representative values on the
real axis. Then partition the root codebook into two subsets consisting of the even-
indexed representations and the odd-indexed representations. Represent these
two sub-codebooks as children nodes of the root codebook. Further, partition
each of these nodes into sub-codebooks and represent them as children nodes in
the second level of the tree structure. Repeat this process n times, resulting in an
-level tree structure that contains leaf nodes, each of which represents a sub-
codebook that contains one of the original values. An example partition is
given in Figure 11, where we use
n2
n n2
n2
4=n and show only levels of the partition.
Note that from this tree-based codebook construction that if the spacing between
representative values is denoted by
2
∆ , then each of the sub-codebooks at level- i
69
in the tree will contain representative values that are spaced apart by ∆i2 . In a
sensor network, a reading will typically be represented as one of the values in
the root codebook assuming that the sensor uses an -bit A/D converter. Instead
of transmitting -bits to represent the sensor reading, as would be traditionally
done, it is possible to transmit
n2
n
n
ni < bits if the decoder has access to side-
information Y , that is no further than ∆−12i away from X . The encoder need
only transmit the bits that specify the sub-codebook that i X belongs to at level-
, and the decoder will decode n Y to the closest value in the sub-codebook that
the encoder specified. Because Y is no further than from the
representation of
∆−12i
X , the decoder will always decode Y to X . The functionality of
the encoder and decoder is described in detail below.
0 2 4 6 8 10 12 14
0 4 8 12
.
.
.
.
.
.
2∆ 2∆
0 10 11 12 1413 151
1 3 5 7 9 11 13 15
0 1 0 1
10
∆
4∆ 4∆ 4∆ 4∆
2 6 1 4 1 5 9 13 3 7 11 15
r r r r r r r r r r r r r r r r
r r r r r r r r r r r r r r r r
r r r r r r r r r r r r r r r r
2 3 4 5 6 7 8 9
0 1
Figure 11. A tree based construction for compression with side information: The root of the tree contains 24 values, and two partitions of the root quantizer are shown. The compressed value of a representative value, ri, is given by the path through the tree taken to reach the group of values containing ri. Taking more steps through the tree results in less compression and also less ambiguity about the observed representative value.
70
1. Encoder: The encoder will receive a request from the data-gathering node
requesting that it encode its readings using i bits. The first thing that the
encoder does is find the closest representation of the data from the
values in the root codebook (this is typically done by the A/D converter).
Next, the encoder determines the sub-codebook that
n2
X belongs to at level-
. The path through the tree to this sub-codebook will specify the bits that
are transferred to the data-gathering node. The mapping from
iX to the
bits that specify the sub-codebook at level can be done through the
following deterministic mapping:
i
( ) =Xf index ( ) iX 2mod , where ( )Xf
represents the bits to be transmitted to the decoder and index() is a
mapping from values in the root codebook to their respective indices. For a
given X and , will be an -bit value which the data-gathering node
will use to traverse the tree.
i ( )Xf i
2. Decoder: The decoder (at the data-gathering node) will receive the i -bit
value, , from the encoder and will traverse the tree starting with the
least-significant-bit (LSB) of
( )Xf
( )Xf to determine the appropriate sub-
codebook, S to use. The decoder will then decode the side-information,
Y , to the closest value in S : iSr rYXi
−= ∈minargˆ where represents
the codeword in
ir
thi S . Assuming that Y is less than away from ∆−12i X ,
where is the spacing in the root codebook, then the decoder will be able
to decode
∆
Y to the exact value of X , and recover X perfectly. The
following example will elucidate the encoding/decoding operations.
Example 2: Consider the 4-level tree codebook of Figure 12. Assume that the
data is represented by the value 9.09 =r in the root codebook and the data-
gathering node asks the sensor to encode X using 2 bits. The index of is 9, so 9r
71
( ) =Xf 14mod9 = . Thus, the encoder will send the two bits, 01, to the data-
gathering node (see Figure 12). The data-gathering node will receive 01 and
descend the tree using the least-significant bit first (i.e., 1 and then 0) to determine
the sub-codebook to decode the side-information with. In the example, the value
of the side-information, Y , is 0.8, and Y is decoded in the sub-codebook located
at 1,0 (where 1 represents the least significant bit and 0 represents the most
significant bit) in the tree to find the closest codeword. This codeword is which
is exactly the value representing
9r
X . Thus, 2 bits were used to convey the value
of X instead of using the 4 bits that would have been needed if no encoding had
been done.
0 2 4 6 8 10 12 14
0 4 8 12
.
.
.
.
.
.
2∆ 2∆
0 2 3 4 5 6 7 8 9 10 11 12 1413 151
1 3 5 7 9 11 13 15
0 1 0 1
10
4∆ 4∆ 4∆
2 6 10 14 1 5 9 13 3 7 11 15
r r r r r r r r r r r r r r r r
r r r r r r r r r r r r r r r r
r r r r r r r r r r r r r r r r
0.0
∆=0.10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1.0 1.1 1.2 1.3 1.4 1.50.8
X = 0.9
Y = 0.8
0.1 0.5 0.9 1.3
4∆=0.4
Figure 12. An example of the tree based codebook. The encoder is asked to encode X using 2 bits, so it transmits 01 to the decoder. The decoder will use the bits 01 in ascending order from the least significant bit (LSB) to determine the path to the sub-codebook to use to decode Y with.
72
4.3 CORRELATION TRACKING
In the above encoding/decoding operations it is assumed that the decoder for
sensor has available to it at time some side-information that is
correlated to the sensor reading, . To maximize efficiency, all of the data that
is already available at the decoder and correlated to should be used as the
side information. To do this effectively, simple linear predictive model is proposed
here where is a linear combination of values that are available at the decoder:
j k )( jkY
)( jkX
)( jkX
)( jkY
∑∑−
==− +=
1
1
)(
1
)()(j
i
iki
M
l
jlkl
jk XXY βα , (21)
where represents past readings for sensor and represents present
sensor readings from sensor .
)( jlkX − j )(i
kX
i 4 The variables lα and iβ are weighting
coefficients. can then be thought of as a linear prediction of based on
past values (i.e., ;
)( jkY )( j
kX
)( jlkX − { }Ml ,...,1∈ ) and other sensor readings that have already
been decoded at the data-gathering node (i.e., ; )(ikX { }1,...,1 −∈ ji , where
indexes the sensor and
i
1−j represents the number of readings from other
sensors that have already been decoded). A linear predictive model is used here
4 Note that for simplicity, the above prediction model is based on a finite number of past values and a single present value for each of the other sensor readings that have been decoded. This model can be generalized to the case where past values of other sensors are also included in the prediction, which might be useful for audio applications where echoes result in a strong temporal correlation.
73
because it is not only analytically tractable, but it is also optimal in the limiting case
where the innovations noise can be modeled as i.i.d. Gaussian random variables.
In order to leverage the inter-node correlations, one of the sensors always sends
its data either uncoded or compressed with respect to its own past data.
Furthermore, the sensors are numbered in the order that they are queried. For
example, at each time instant, one of the sensors will send its reading , either
uncoded or coded with respect to its own past. If a sensor chooses to code its
present value with respect to its past, then it can also use the codebook
construction given in Figure 12, which will simplify the encoding architecture and
reduce power consumption because it will not need to spend power on correlation
modeling. The reading for sensor 2 can then be decoded with respect to
)1(kX
)1(1
1
)2()2(k
M
llklk XXY βα += ∑
=− . (22)
Each that is decoded can then be used to form predictions for other sensor
readings according to (21). The prediction, , determines the number of bits
needed to represent . In the extreme case that perfectly predicts
(i.e., ), then zero bits are needed to represent because it is
perfectly predictable at the decoder. Thus, the main objective of the decoder is to
derive a good estimate of for sensor ,
)( jkX
)( jkY
)( jkX )( j
kY )( jkX
)()( jk
jk XY = )( j
kX
)( jkX j { }Lj ,...,1∈ , where L represents the
74
number of sensors. In more quantitative terms, the goal is for the decoder to be
able to find the lα , , and { Ml ,...,1∈ } iβ , { }1,...,1 −∈ ji , that minimize the mean
squared error between and . )( jkY )( j
kX
To find the lα and iβ that minimize the mean squared prediction error we can
utilize Wiener filter theory [39]. We start by representing the prediction error as a
random variable, . We can then rewrite the mean squared error
as:
)()( jk
jkj XYN −=
[ ]
( ) [ ] [[ ] [ ]
[ ] .
2
22
1
1,
)()(
1,
)()(
1
1
1
)()(1
)()(
1
)()(2)(
21
1
)(
1
)()(2
∑
∑∑∑
∑∑
∑∑
−
=
=−−
=
−
=−
==−
−
==−
+
++
−−⎥⎦⎤
⎢⎣⎡=
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛+−=
j
hi
hk
ikhi
M
hl
jlk
jhkhl
M
l
j
i
jlk
ikil
N
i
ik
jki
M
l
jlk
jkl
jk
j
i
iki
M
l
jlkl
jkj
XXE
XXEXXE
XXEXXEXE
XXXENE
ββ
ααβα
βα
βα
]
}
(23)
Now, if we assume that and are pairwise jointly wide sense stationary
[39] for , then we can re-write the mean squared error as:
)( jkX )(i
kX
{ 1,...,1 −∈ ji
[ ] ( ) jj
zzTjj
Tjxxj RPrNE jj ΓΓ+Γ−=
rrrr202 , (24)
where
75
( )( )
( )( )( )
( )⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=Γ
−− 0
00
21
,
1
2
1
1
2
1
2
1
jj
j
j
jj
jj
jj
xx
xx
xx
xx
xx
xx
j
j
Mj
r
rr
Mr
rr
P
L
Lr
L
Lr
β
ββα
αα
(25)
and we use the notation ( ) [ ]jlk
jkxx XXElr jj += . With this notation, we can express
as: jzzR
⎥⎥⎦
⎤
⎢⎢⎣
⎡=
iiij
ijjj
xxT
xx
xxxxjzz RR
RRR
(26)
where is given as: jj xxR
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( ) ⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
−−
−−
=
021
201110
jjjjjj
jjjjjj
jjjjjj
jj
xxxxxx
xxxxxx
xxxxxx
xx
rMrMr
MrrrMrrr
R
L
MOMM
L
L
(27)
and and are given as ij xxR iixxR
76
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( ⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
−
=
−
−
−
MrMrMr
rrrrrr
R
jjjj
jjjj
jjjj
ij
xxxxxx
xxxxxx
xxxxxx
xx
121
121
121
1
222111
L
MOMM
L
L
)
(28)
and
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
−−−−
−
−
000
000000
112111
122212
112111
jjjj
j
j
ii
xxxxxx
xxxxxx
xxxxxx
xx
rrr
rrrrrr
R
L
MOMM
L
L
(29)
To find the set of coefficients (represented by jΓr
) that minimize the mean squared
error, we differentiate (24) with respect to jΓr
to obtain:
[ ]j
jzzj
j
j RPNE
Γ+−=Γ∂
∂ rrr 22
2
. (30)
Setting the above equal to zero and solving for the optima jΓl r
by Γr
]:
, which we denote
opt, , we arrive at the standard Wiener estimate [39j
jj
zzoptj PRrr 1
, )( −=Γ . (31)
If our assumption of stationarity holds, then the data-gathering node can request
for uncoded data from all of the sensors for the first K rounds of requests and
77
calculate the Wiener estimate (31) once from these K rounds of samples. The
set of coefficients determined from the Wiener estimate can then be used to form
the side information for each future round of request. In practice, however, the
statistics of the data may be time varying and as a result, the coefficient vector,
, must be continuously adjusted to minimize the mean-squared error. One
method of doing this is to move
jΓr
jΓr
in the opposite direction of the gradient of the
objective function (i.e., the mean squared error) for each new sample received
during round : 1+k
( ) ( ) ( )kj
kj
kj ∇−Γ=Γ + µ
rr 1 , (32)
where is given by (30) and ( )kj∇ µ represents the amount to descend opposite to
the gradient. The goal of this approach is to descend to the global minimum of the
objective function. We are assured that such a minimum exists because the
objective function is convex. In fact, it has been shown, that if µ is chosen
appropriately (e.g., max/2 λµ < , where maxλ represents the largest eigenvalue of
) then (32) will converge to the optimal solution [39]. In the following
subsection we will show how (32) can be calculated in practice and how to
incorporate adaptive prediction with the distributed source code discussed in the
previous section.
jzzR
78
2 . 4 . 3 P a r a m e t e r E s t i m a t i o n
From (30) and (32), we know that the coefficient vector should be updated as:
( ) ( ) ( )( )kj
jzzj
kj
kj RP Γ+−−Γ=Γ + rrrr
22211 µ . (33)
In practice, however, the data-gathering node will not have knowledge of jPr
and
and will therefore need an efficient method for estimating jzzR jP
r and . One
standard estimate is to use
jzzR
( )jk
jxj ZXP ,rr
= and Tjkjkzz ZZR ,,
rr= where
( )
( )
( )
( )
( )
( ) ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
−
−
−
−
1
2
1
2
1
,
jk
k
k
jMk
jk
jk
jk
X
XX
X
XX
Z
M
M
r, (34)
so that (33) becomes
( ) ( ) ( ) ( )( )( )
jkjkk
j
kj
Tjk
jkjk
kj
kj
NZZXZ
,,
,,1
rr
rrrrr
µµ
+Γ=Γ+−−Γ=Γ +
, (35)
79
where the second equality follows from the fact that ( )kj
Tjk
jk ZY Γ=
rr,
)( and
. The equation described by (35) is well known in the adaptive
filtering literature as the Least-Mean-Squares (LMS) algorithm and the steps in
calculating the LMS solution are summarized below:
)()(,
jk
jkjk YXN −=
1. ( )kj
Tjk
jk ZY Γ=
rr,
)( .
2. . )()(,
jk
jkjk YXN −=
3. ( ) ( )jkjk
kj
kj NZ ,,
1 rrrµ+Γ=Γ + .
In the above, µ should be chosen to be less than the largest eigenvalue of the
correlation matrix. To use the LMS algorithm, the data-gathering node will start by
querying all of the sensors for uncoded data for the first K rounds of requests.
The value of K should be chosen to be large enough to allow the LMS algorithm
to converge. After K rounds of requests have been completed, the data-
gathering node can then ask for coded values from the sensor nodes and decode
the coded value for sensor j with respect to its corresponding side information,
( )kj
Tjjkk ZY Γ=rr
,)( . The value of jΓ
r will continue to be updated to adjust to changes
in the statistics of the data. More specifically, for each round of request and each
value reported by a sensor, the decoder will decode to the closest codeword
in the sub-codebook,
)( jkY
S , specified by the corresponding sensor
80
( ) ( )i
jk
Sr
jk rYX
i
−=∈minargˆ . (36)
From Section 4.2, we know that ( )jkX̂ will always equal ( )j
kX as long as the sensor
node encodes using bits so that ( )jkX i jk
i N ,12 >∆− . If jk
i N ,12 <∆− , however,
then a decoding error will occur. We can use Chebyshev's inequality [40] to bound
this probability of error:
[ ]( )21
2
,1
22
∆≤<∆
−
−
i
Njk
i jNPσ
, (37)
where is drawn from a distribution with zero mean and variance . Thus,
to insure that
jkN ,2
jNσ
[ ]jki NP ,
12 <∆− is less than some probability of error, , we can
choose
eP
( )21
2
2 ∆=
−i
Ne
jPσ
. The value of i that will ensure this probability of error is
then given as
( )1
2log
21
21
2
2 +⎥⎥⎦
⎤
⎢⎢⎣
⎡
∆=
−i
N jiσ
, (38)
Thus, for a given , the data-gathering node should ask for -bits from each
sensor according to (38). Note that it is not necessary to be over-conservative
when choosing because Chebyshev's inequality is a loose bound.
eP i
eP
81
From (38), we can see that the data-gathering node must maintain an estimate of
the variance of the prediction error, , for each sensor in order to determine the
number of bits to request from each sensor. The data-gathering node can initialize
:
2jNσ
2jNσ
∑=−
=K
ijiN N
Kj1
2,
2
11σ (39)
during the first K rounds of requests. To update , the data-gathering node
can form the following filtered estimate:
2jNσ
( ) 2,
2,
2, 1 jkoldNnewN N
jjγσγσ +−= , (40)
where is the previous estimate of and 2,oldN j
σ 2jNσ γ is a ``forgetting factor'' [39].
We choose to use a filtered estimate to adapt to changes in statistics. The block
diagram of the decoding structure at the data-gathering node is shown in Figure
13. It is different from standard LMS in that the encoder cannot mimic the
processing done by the decoder, because the encoder (i.e., sensor node) does not
have access to all the information used by the decoder (i.e., data-gathering node).
This approach is also different from previous works on distributed source coding in
that the correlated side information is not generated by a single correlated source
but is obtained by forming the prediction of the value to be decoded based on
82
information from other correlated sources and past values generated by the source
whose value is to be decoded.
FilterAdaptive
CompressionDistributed
Z−1
+ NoiseTracker
y[n]Y[n]
x[n−1]
x[n]
c[n]
e[n] +−
i[n+1]
Figure 13. Adaptive filtering block used to form the side information and decode the sensor reading. Both spatial and temporal correlations are exploited. The sensor measurement is represented by x[n], the coset information from the sensor node is given by c[n] and the measurements from other sensors are represented by the vector Y[n]. The number of bits used for encoding at time n+1 is given by i[n+1].
In the above, it is possible to improve upon these proposed methods for adapting
the prediction coefficients and the number of bits used for encoding. First, it was
assumed that the predictive model order is fixed. More specifically, the prediction
of each sensor measurement was modeled as a weighted combination of a fixed
number of measurements that are available at the decoder. In practice, by
choosing an appropriate model order, one can lower the prediction error. One
efficient way of doing this, is by maintaining a bank of predictors that are of
different model orders and then using a weighted combination of these predictors
83
as the final prediction estimate. Singer and Feder [41] showed that this method of
adaptive prediction is not only efficient but has some desirable ``universal''
properties that allow claim for performance equivalence with respect to a ``fully-
knowledgeable'' fixed system. In the interests of keeping things simple and getting
the main point across, this work does not focus on details of the predictor, instead
choosing a predictor with a fixed model order. More sophisticated prediction
methods are left as a subject for future work. Another area of future work is to
improve upon our proposed bounding technique. In this work, a universal but
loose bound (i.e., Chebyshev’s inequality) was used for bounding the probability of
error. One can find better bounds for bounding the probability of error if one can
suitably model the prediction error. For example, if the prediction error can be
modeled as Gaussian, then a tight bound on the probability of error can be
determined. It was decided to use a loose bound in this work, to highlight the fact
that even if no assumptions are made on the data, one can still attain significant bit
savings.
... dii
4 . 3 . 2 D e c o d i n g E r r o r
As mentioned above, it is always possible for the data-gathering node to make a
decoding error if the magnitude of the correlation noise, jkN , , is larger than ∆−12i
where is the number of bits used to encode the sensor reading for sensor at
time . Two approaches for dealing with such errors are proposed. One method
i j
k
84
)
is to use error detection codes and the other method entails using error correction
codes.
To use error detection, each sensor node can transmit a cyclic redundancy check
(CRC) [42], which is computed based on the original measurements, for every
readings that it transmits. The data-gathering node will decode the readings
using the tree-structured codebook as above and compare its own calculation of
the CRC (based on the readings it decodes) to the CRC transmitted by the
sensor. If an error is detected (i.e, the CRC does not match), then the data-
gathering node can either drop the readings or ask for a retransmission of the
readings. Whether the data-gathering node drops the readings or asks for a
retransmission is application dependent, and we do not address this issue in this
paper. Furthermore, by using Chebyshev's inequality (37), the data-gathering
node can make the probability of decoding error as small as it desires which
translates directly into a lower probability of data drops or retransmissions.
m
m
m
m
m m
The other method of guarding against decoding error is to use error-correction
codes, such as an ( Reed-Solomon code [43] that can operate on KM , K
sensor readings and generate ( )KM − parity check symbols, which are
calculated based on the original readings. These ( )KM − parity check symbols
can be transmitted to the data-gathering node along with the K encoded sensor
readings. The data-gathering node will decode the K sensor readings using the
tree-based structure mentioned above and upon receiving the parity ( KM − )
85
check symbols, it can correct for any errors that occurred in the K sensor
readings. If more than ( ) 2/KM − errors exist in the K sensor readings, then the
Reed-Solomon decoder will declare that the errors can not be corrected and in this
case, the data must be either dropped or retransmitted.
4.4 QUERYING AND DATA REPORTING ALGORITHM
This section combines the concepts of the previous sections to formulate the
algorithms to be used by the data-gathering node and by the sensor node.
4 . 4 . 1 D a t a - G a t h e r i n g N o d e A l g o r i t h m
The data-gathering node will, in general make N rounds of queries to the sensor
nodes. In the first K rounds of queries, the data-gathering node will ask the
sensors to send their data uncoded. The reason for this is that the data-gathering
node needs to ``learn'' the correlation structure between sensor readings before
asking for compressed readings. Thus, the data-gathering node will use the first
K rounds of readings for calculating the correlation structure in accordance with
Section 4.3. After K rounds of readings, the data-gathering node will have an
estimate of the prediction coefficients to be used for each sensor (see (32)). Note
that should be chosen large enough to allow for the LMS algorithm to converge.
For each round after
k
K , one node will be asked to send its reading
86
``uncompressed'' with respect to the other sensors.5 The data-gathering node will
continuously maintain a counter for the number of bits that each sensor node has
sent and request for the node that has sent the least number of bits to send its
data ``uncompressed''. This is a method for insuring that each of the sensor nodes
will send approximately the same number of bits and hence use approximately the
same amount of energy. In theoretical terms, this is a method for achieving a
symmetric distributed code construction by using time sharing (see Section 4.1).
Upon receiving a transmission from a sensor, the data-gathering node will decode
it (if it is a compressed reading) with respect to a linear estimate of the data for that
sensor (see (21)). After each round of requests, the correlation parameters of
each sensor (see (32) and (40)) are updated. Pseudocode for the data-gathering
node is given below.
Pseudocode for data-gathering node:
Initialization: for (i = 0; i < K; i++) for (j = 0; j < num_sensors; j++) Ask sensor j for its uncoded reading for each pair of sensors i, j update correlation parameters using (32) and (40) Main Loop: for (k = K; k < N; k++) Request a sensor for uncoded reading for each remaining sensor Determine number of bits, i, to request using (38) Request i bits Decode data for each sensor Update correlation parameters for each sensor
5 Note that the sensor may still send its data compressed with respect to its own past readings.
87
The decoding is done in accordance with Section 4.2 and the correlation
parameters are estimated according to (32) and (40).
4 . 4 . 2 S e n s o r N o d e A l g o r i t h m
The algorithm incorporated into each sensor node is considerably simpler than the
algorithm incorporated into the data-gathering node. The sensor node will simply
listen for requests from the data-gathering node. The data-gathering node will
specify to the sensor the number of bits that it requests the sensor to encode the
data with. Each sensor will be equipped with an A/D converter that represents the
data using -bits. Upon receiving a request from the data-gathering node, the
sensor will encode the -bit value from the A/D converter using -bits, where i is
specified by the data-gathering node. This i -bit value is sent back to the data-
gathering node. Pseudocode for the sensor node is given below.
n
n i
Pseudocode for data-gathering node:
for each request Extract i from the request Get X[n] from A/D converter Transmit n mod 2i
In the above algorithm, we denote [ ]nX as the value returned from the A/D
converter and n as the index to this value. Note that the only extra operation with
respect to an uncoded system is for the sensor nodes to perform a modulo
operation. This makes it extremely cheap for a sensor node to encode its data.
88
]
4.5 SIMULATION RESULTS
The simulations were performed for measurements on light, temperature and
humidity. The energy savings (due to bit transmissions) as well as the robustness
of the correlation tracking algorithm to errors were measured. The data-gathering
node algorithm and the sensor node algorithm described in Section 4 were
implemented. In the first set of simulations, the sensor nodes simulated the
measurement of data by reading from a file, previously recorded readings from
actual sensors. The data measured by the sensors were for light, humidity and
temperature. The readings were made by a 12 -bit A/D converter with a dynamic
range of [ . The simulated network had a star topology where the data-
gathering node queried 5 sensor nodes directly.
128,128−
4 . 5 . 1 C o r r e l a t i o n T r a c k i n g
The first simulation tested the correlation tracking algorithm (see Section 4.3). The
data observed by sensor j was modeled as:
)(4
1
)()( mk
l
jlkl
jk XXY += ∑
=−α , (41)
where jm ≠ . In other words the prediction of the reading for sensor j is derived
from its past values and one other sensor. To test the correlation tracking
algorithm, the tolerable noise that the correlation tracking algorithm calculates at
4
89
each time instant was measured. The tolerable noise is the amount of noise that
can exist between the prediction of a sensor reading and the actual sensor reading
without inducing a decoding error. Tolerable noise is calculated by using (38), and
noting that the tolerable noise will be given as ∆−12i where is the number of bits
that are requested from the sensor and
i
∆ is the spacing of values in the A/D
converter. The bound on probability of decoding error was set to be less than 1 in
, and the data-gathering algorithm and the sensor node algorithms were
simulated over samples of light, temperature and humidity for each sensor
(a total of samples). A plot of the tolerable noise vs. actual prediction
noise is given in Figure 14 for humidity and in Figure 15 and Figure 16 for
temperature and light respectively. In each of the graphs, the top curve represents
the tolerable noise and the bottom curve represents the actual prediction noise.
100
000,18
000,90
From the plots it can be seen that the tolerable noise is much larger than the actual
prediction noise. The reason for this is that the parameters for estimating the
number of bits to request from the sensors were chosen conservatively. The
tolerable noise can be lowered to achieve higher efficiency but this also leads to a
higher probability of decoding error. For the simulations that were ran and
presented here, zero decoding errors were made for samples of humidity,
temperature and light.
000,90
90
Figure 14. Tolerable noise and prediction noise for 18,000 samples of humidity data. The tolerable noise is the amount of noise that can exist between the prediction of a sensor reading and the actual sensor reading without inducing a decoding error.
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
20
40
60
80
100
120
140
Tolerable Error
Prediction Error
91
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
2
4
6
8
10
12
14
16
Tolerable ErrorPrediction Error
Figure 15. Tolerable noise and prediction noise for 18,000 samples of temperature data. The tolerable noise is the amount of noise that can exist between the prediction of a sensor reading and the actual sensor reading without inducing a decoding error.
92 to variations in distortion and will therefore introduce more decoding errors for
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
200
400
600
800
1000
1200
Tolerable ErrorPrediction Error
Figure 16. Tolerable noise and prediction noise for 18,000 samples of light data. The tolerable noise is the amount of noise that can exist between the prediction of a sensor reading and the actual sensor reading without inducing a decoding error.
A couple of things to note from the plots is that (1) there are many spikes in the
tolerable noise plots and (2) the actual noise plot for temperature is lower than the
actual noise plot for humidity which is lower than the actual noise plot for light. The
spikes in the tolerable noise plots occur because an aggressive weighting factor
was chosen for calculating (40), which leads to less smoothing in the estimation.
These spikes can be reduced by weighting the current distortion less in the
estimation of the overall distortion (see (40)), but this will lead to slower responses
93
4 . 5 . 2 E n e r g y S a v i n g s
The next set of simulations was run to measure the amount of energy savings that
noisy data. The actual noise plot for temperature is less noisy than the actual
noise plots for humidity and light. This is matches our intuition because in general,
we would expect temperature measurements to be more correlated than humidity
measurements and light measurements. For example, if there are many light
sensors in a room, then if one of the light sensors is in a shaded area then it will be
less correlated with the light sensors that are in a lighted area. On the other hand,
if there are many temperature sensors in a room, even if one of the temperature
sensors is in a shaded area, its measurements will not vary too much from the
measurements obtained from sensors in lighted areas. Irrespective of the variation
in correlation, however, it can be seen from the plots that our correlation tracking
algorithm performs well in estimating the actual distortion for a variety of data sets.
the sensor nodes achieved. The energy savings were calculated to be the total
reduction in energy that resulted from transmission and reception. Note that for
reception, energy expenditure is actually not reduced but increased because the
sensor nodes need to receive the extra bits that specify the number of bits to
encode each sensor reading. For an n -bit A/D converter, an extra ( )nlog bits
need to be received each time the data gathering node informs a sensor of the
number of bits needed for encoding. It is reasonable to assume that the energy
-
94
used to transmit a bit is equivalent to the energy used to receive a bit. To reduce
the extra energy needed for reception, the data-gathering node only specified the
number of encoding bits periodically. In the simulations, this period was chosen to
be 100 samples for each sensor node. Each of the 5 sensor nodes were
alte ly queried to send back readings that were compressed only with respect
to its own past readings so that compressed readings from other sensors could be
decoded with respect to these readings. The overall average savings in energy is
given in Table 1. To assess the performance of the algorithm, the work of [44] was
used as a benchmark for comparison. The work of [44] is also based on a
distributed coding framework but the prediction algorithm uses a filtered estimate
for the prediction coefficients instead of using a gradient descent algorithm such as
LMS to determine the prediction coefficients. Furthermore, in [44] the prediction
algorithm only uses one measurement from a neighboring sensor to form the
prediction estimate. Thus, in order to perform a fair comparison, the model of (41)
was changed to only use one measurement from another sensor to form the
prediction estimate and surprisingly was able to achieve roughly the same
performance as given in Table 1. The results for humidity are approximately %24
better than the results cited in [44] for the same data set. Similarly, the resul
temperature and light are approximately %16 and %3 better respectively than the
results cited in [44] for the respective d ets. us, it is clear that the LMS
algorithm is better suited for tracking correlations than the methods given in [44]
and if one uses even better correlation tracking algorithms (see [41]) one should
rnate
ts for
ata s Th
In the above, one can also achieve m
s of
4 . 5 . 3 R o b u s t n e s s t
There are two types of errors that ca
expect further energy savings. T
correlation tracking algorithms is left f
TENERGY SAVINGS OF THE LMS-B
Data Set Temperatu
Average Energy Savings 66.6Table 1. Average energy savings of thedata compression scheme over an temperature, humidity and light.
conservative estimate of the bits n
however, lead to more decoding err
bound on the probability of decoding
decoding errors over 000,90 sample
subsection, the robustnes the algo
error is a packet loss. The second ty
results from the code not being ab
following subsections consider each t
ABLE I ASED CORRELATION TRACKING SCHEME
re Humidity Light
% 44.9% 11.7% LMS based correlation tracking and distributed
uncoded system for sensor nodes measuring
95
ore significant energy savings by using a less
o E r r o r s
n occur in this framework. The first type of
he topic of investigating more elaborate
or future work.
eeded for encoding (see (38)). This will,
ors. In the simulations presented here the
error was set so low that it resulted in no
s for each of the data sets. In the following
rithm to errors is evaluated.
pe of error is an actual decoding error which
le to correct for the prediction noise. The
ype of error.
96
1. Packet loss: A packet loss may occur if a measurement is lost due to a
malfunction of the sensor or if there is a transmission loss. In such a case,
it appears that this loss should affect the prediction estimates that depend
on this measurement (see (41)). This is not true, however, because the
prediction algorithm may replace this measurement with a previous
measurement from the same sensor to form the prediction estimate. In fact,
tests were run in which the packet drop rate was set to , and the same
compression rate was achieved with zero decoding errors. Thus, the
proposed algorithm has the additional feature that it is robust to packet loss.
This is evident from the plots, because the ``tolerable'' noise is much larger
than the actual noise.
%10
2. Decoding error: The other type of error is a decoding error. Recall, in
Section 4.3.2, it was mentioned that it is possible to make a decoding error
if the actual prediction noise between the prediction estimate and the
sensor reading exceeds the tolerable noise specified by the data-gathering
node. One can bound this probability of decoding error by using
Chebyshev's inequality to specify the number of bits needed for encoding
(see (38)). But Chebyshev's inequality is a loose bound, as can be seen
from Figure 14 and as a result, it is difficult to determine the minimal
number of bits that need to be sent by each sensor without inducing a
decoding error. It can therefore be seen that there is a delicate trade-off
between energy savings and decoding error.
97
To achieve both large energy savings and robustness, the data-gathering node
can use an aggressive estimate of the number of bits that is needed from each
sensor and each sensor can apply an error detection code or error correction code
to its readings so that the data-gathering node can handle decoding errors
appropriately. The other alternative is for the data-gathering node to over-estimate
the number of bits needed for encoding to decrease the decoding error. This is the
approach taken in the simulations presented (the bound was chosen such that the
decoding error was 0), but the downside to this approach is that there is a
corresponding decrease in energy savings for the sensor nodes. On the other
hand, if the application is insensitive to decoding error, then a more aggressive
strategy should be chosen to further reduce power consumption in the sensor
nodes.
4.6 CONCLUSION
This chapter of the dissertation proposes a method of reducing energy
consumption in sensor networks by using distributed compression and adaptive
prediction. Distributed compression leverages the fact that there exist inherent
correlations between sensor readings and hence sensor readings can be
compressed with respect to past sensor readings and sensor readings measured
by other nodes. A novel method was introduced for allowing nodes to compress
their readings to different levels without having the nodes know what the other
98
nodes are measuring. Adaptive prediction is used to track the correlation structure
of the sensor network, and ultimately determined the number of bits needed to be
spent by the sensor nodes. This approach appears to be promising, as test results
on real-world data gathered by a sensor network testbed show that an average
energy savings per sensor node of %6510 − can be achieved using this
algorithm.
The energy savings achieved in the tests presented here are a conservative
estimate of what can be achieved in practice. In practice, one can use more
complex (and more accurate) models at the data-gathering node to describe the
correlation structure among the nodes in the sensor network. A simple predictive
model was chosen in the tests presented here to demonstrate the power of the
approach. In addition, the proposed algorithm can be combined with other energy-
saving approaches such as data aggregation to achieve additional gains. Future
work remains in exploring more robust codes for the sensor nodes and better
predictive models (such as those presented in [41]) for the data-gathering node
along with incorporating the algorithm with energy-saving routing algorithms.
Future work also remains in determining methods for exploiting the correlation
structure in richer data sets such as audio or video signals. It is expected that
audio and video sensors can benefit greatly from distributed coding algorithms
because of the large amounts of redundancy that are present in their
measurements. Thus, there appear to be exciting possibilities for utilizing the
99
methods in this paper for enhancing the various modalities of sensor networks of
the future.
100
C h a p t e r 5
FUTURE WORK
Distributed Sources
Can multiple rounds of transmission by sources eliminate need for log(N) links?
Does fusion of variables make it possible to decode some packets when rank is
lost?
Trading off throughput for decoding complexity by having sparser matrices
Correlation tracking using Singer’s filters
Propagation of error after failed decoding
Resiliency to dropped packets
101
A p p e n d i x A
MAXIMIZING THE NUMBER OF CHANNELS WITH EXACTLY ONE TRANSMITTER
Consider placing k balls into bins such that each ball randomly selects a bin in
which to be placed independently of all other balls, and all of the random
selections are made using the same probability mass function (pmf). Given the
number of bins, , we wish to find the number of balls, , and the pmf such that
the expected number of bins containing exactly one ball is maximized.
n
n k
Result 2: The expected number of bins containing exactly one ball is maximized
when the number of balls is equal to the number of bins and each ball is equally
likely to select any of the bins. In this case, the expected number of bins
containing exactly one ball, in the limit as grows to infinity, is . n en /
Proof: We first write the expression for the expected number of bins containing
exactly one ball using to denote the event that bin contains exactly one ball,
to denote the probability that any given ball chooses bin i , and to denote
the indicator function, which takes on the value 1 if the expression in the braces is
true and the value 0 otherwise.
ib i
ip {}⋅1
102
{ }
{ }[ ]
[ ]
( )
( )∑
∑
∑
∑
∑
=
−
=
−
=
=
=
−=
−⎟⎟⎠
⎞⎜⎜⎝
⎛=
=
=
⎥⎦
⎤⎢⎣
⎡
n
i
kii
n
i
kii
n
ii
n
ii
n
ii
ppk
ppk
bP
bE
bE
1
11
11
1
1
1
11
1
1
(42)
Therefore, we seek to find the
( )∑=
−
=
−
∑=
n
i
kii
ptsp
ppn
iii
1
1
1..
1maxarg
1
(43)
Let us first find the value of that maximizes each of the summands in (43).
Solving for:
ip
( )
( ) ( ) ( ) 21
1
111
10−−
−
−⋅⋅−−−
=−=
kk
k
ppkp
ppdpd
(44)
gives as a local maximum of the summands in (43). This value also gives the
global maximum in the range because the function has a strictly positive first
derivative in the range and a strictly negative first derivative in the range
.
k/1
]1,0[
[ )k/1,0
( )1,/1 k
103
}
Since the value of in the range that individually maximizes each of the
summands of (43) is , the overall sum is maximized when
(i.e. each of the balls is equally likely to be in any of the
bins). However, due to the constraint that the ‘s form a valid pmf (i.e. ),
this is only possible when
ip ]1,0[
k/1
{ nikpi ,...,1/1 ∈∀=
ip ∑=
=n
iip
11
nk = . Therefore, the expected number of bins
containing exactly one ball is maximized when the number of balls is equal to the
number of bins and each ball is equally likely to be placed in any of the bins.
The expected number of bins containing exactly one ball can be computed as:
( )
( )
( )( )
( ) 11
11
11
1
/11
/11/1
1
1
−=
−=
−=
−
−⋅=
−=
−=
−
∑
∑
∑
n
n
i
n
n
i
nii
n
i
kii
nn
nnn
ppn
ppk
(45)
Since , the expression in (45) asymptotically goes to as
grows large. ■
( ) en n
n/1/11lim =−
∞→en /
n
104
A p p e n d i x B
THROUGHPUT OF ROUTING WITHOUT CODING IN A NETWORK OF UNTUNED RADIOS
To analyze the throughput of a routing scheme in which the output of each
intermediate node is simply a copy of one of its inputs (the node randomly selects
which of the inputs to forward), rather than a function of all of the inputs, consider
the random graph that corresponds to this routing scheme. The connectivity of the
network is still the same and can be represented by Figure 7; however, because
each intermediate node only forwards a packet from a randomly selected input
link, the graph must be modified to reflect this. Because only one of the incoming
packets is forwarded, the information coming in on the other links is not
propagated; therefore, those links can be deleted from the graph without altering
the throughput. In other words, starting with the graph in Figure 7, for each vertex
that is not deleted and has at least one incoming link (i.e. at least one of the
original L incoming links was connected to a surviving vertex in the previous
column), randomly chose one of those incoming links to keep and delete all the
others. Which link is kept at each vertex is chosen uniformly and independently
from the other vertices. The problem then becomes finding the end to end max-
flow of the resulting graph.
To make the problem more analytically tractable, we consider the bounding case
in which none of the vertices in the graph are deleted. In this case, the random
105
graph simply consists of vertices in an HN × grid with each vertex in columns
connecting to only one randomly selected (with uniform probability)
vertex in the previous column. An instance of this random graph is shown in
Figure 17. Since deleting vertices (along with the links associated with those
vertices) can only decrease the max-flow of a graph, any upper bound on the max-
flow from column 1 to column
{ H,...,2 }
H is also an upper bound on the max-flow of the
same graph in which certain vertices are deleted (remember that the performance
of a routing scheme in which only forwarding is allowed corresponds to the max-
flow of the graph in Figure 17 with each vertex deleted with probability ). e/11−
A 2
N
1
1 2 3 H
Figure 17. Random graph representing connectivity when only routing is allowed. Each vertex in columns {2,…,H} connects to one randomly chosen vertex in the previous column.
106
To find the max-flow of this graph, consider a particular vertex in Column 1. Let us
label this vertex as vertex A , as shown in Figure 17. Consider the number of
vertices in columns that have a connection back to vertex },...,2{ H A (note that,
since each vertex has only one incoming link, each one has a connection to only
one vertex in the first column). Since each vertex randomly and independently
connects to a vertex in the previous column, the number of vertices in each
column that connect back to vertex A only depends on the number of vertices in
the previous column that connect to A ; therefore, the number of vertices in each
column with a connection back to A forms a Markov chain. The transition
probability matrix, , of this Markov chain has the form: P
jNj
ji NiN
Ni
jN
P−
⎟⎠⎞
⎜⎝⎛ −
⎟⎠⎞
⎜⎝⎛⎟⎟⎠
⎞⎜⎜⎝
⎛=,
(46)
for . },...,0{, Nji ∈
This Markov chain has two absorbing states 0 and N , while all the others are
transient. Therefore, the chain is guaranteed eventually to be absorbed in one of
these two states. Since the number of descendents of each vertex in column 1
forms a Markov chain with the same transition probability matrix and the number of
descendents of each of these vertices has to add up to N , eventually all but one
of the chains will be absorbed in the zero state, while one of the chains will be
absorbed in state N . In other words, eventually there will be a column in which all
107
of the vertices connect back to the same vertex of column 1, and all subsequent
columns will likewise only connect back to that vertex.
Indeed, row 1 (corresponding to starting in state 1, with only one vertex connecting
back to A) of ∞P has the form [ ]NN /1,0,....,0,/11− . gives the probability of
starting in state and ending in state after infinitely many steps. Row 1 of
∞jiP ,
i j ∞P
shows that any vertex in column 1 will eventually either have no descendents (this
happens with probability N/11− ) or all of the vertices in later columns will be its
descendents (this happens with probability ). This means that, as N/1 H goes to
infinity, the max-flow of the graph will be 1.
But what happens over a finite number of steps? The expected max-flow over H
steps is given by:
(PN ⋅ given vertex has descendents after H steps)
= ( )HPN 0,11−⋅ (47)
The problem is that the closed-form expression for HP is unmanageably
elaborate; therefore, we rely on a computer to evaluate it for some values of N
and H . Table 2 shows the expected max-flow between column 1 and column H
for various values of N in the graph corresponding to a scheme that only allows
routing at intermediate nodes and in the case when no collisions occur (i.e. no
vertices are deleted from the graph). As can be seen from the table, the
performance of a routing-only schem
hops, while network coding allows fo
as long as .0/)](log[ ⎯⎯ →⎯ ∞↑NNNH
EXPECTED THROUGHP N
100 400 700 1000 2000
Table 2. The throughpu even as N grows. Over 1 sent end-to end with rout
TABLE II UT OF ROUTING OVER H STEPS
NH = NH 10=
2.304 1.0001 2.351 1.0001
2.359 1.0001 2.362 1.0001 2.362 1.0001
t over N steps remains constant,0N steps, only one packet can be
ing, even if no collisions occur.
108
e allows for a constant throughput over N
r throughput that is linear in N over H hops
109
BIBLIOGRAPHY
[1] http://www.tinyos.net/ [2] http://webs.cs.berkeley.edu/nest-index.html [3] http://bwrc.eecs.berkeley.edu/Research/Pico_Radio/Default.htm [4] http://www.cens.ucla.edu/ [5] http://www.intel-research.net/berkeley/index.asp [6] http://www.zigbee.org [7] http://www.xbow.com [8] http://www.ember.com [9] http://www.dust-inc.com/flash-index.shtml [10] http://www.cs.berkeley.edu/~binetude/ggb/ [11] http://www.cbe.berkeley.edu/research/briefs-Wireless.htm [12] A. Mainwaring, J. Polastre, R. Szewczyk, and D. Culler, “Wireless Sensor
Networks for Habitat Monitoring,” Intel Research, IRB-TR-02-006, Jun. 19, 2002.
[13] S. Coleri, S. Cheung, and P. Varaiya, “Sensor Networks for Monitoring Traffic,” 42nd Annual Allerton Conference on Communication, Control, and Computing, September 2004.
[14] http://www.wherenet.com/ [15] S. Roundy, B. Otis, Y.H. Chee, J. Rabaey, P. Wright, “A 1.9GHz RF Transmit
Beacon using Environmentally Scavenged Energy,” Dig. IEEE Int. Symposium on Low Power Elec. and Devices, Seoul, Korea, 2003.
[16] B. Otis, Y.H. Chee, R. Lu, N.Pletcher, J. Rabaey, “An Ultra-Low Power MEMS-Based Two-Channel Transceiver for Wireless Sensor Networks,” IEEE Symp. on VLSI Circuits, Honolulu, HI, June 2004.
[17] A. Willig et al., “Measurements of a Wireless Link in an Industrial Environment using an IEEE 802.11-Compliant Physical Layer,” IEEE Trans. on Industrial Electronics, vol. 43, Dec. 2002.
[18] N. Patwari, Y. Wang, R. O’Dea, “The Importance of the Multipoint-to-Multipoint Indoor Radio Channel in Ad Hoc Networks,” IEEE Wireless Communication and Networking Conference (WCNC), Orlando FL, March 2002.
110
[19] M. Zorzi, R.R. Rao, “Geographic Random Forwarding (GeRaF) for ad hoc and sensor networks: energy and latency performance,” IEEE Trans. on Mobile Computing, vol. 2, Oct.-Dec. 2003.
[20] J. Van Greuen, D. Petrović, A. Bonivento, J. Rabaey, K. Ramchandran, A. Sangiovanni-Vincentelli, “Adaptive Sleep Discipline for Energy Conservation and Robustness in Dense Sensor Networks,” ICC 2004.
[21] L. Doherty, L. El Ghaoui, K. Pister, “Convex Position Estimation in Wireless Sensor Networks,” Infocom 2001, Anchorage, AK, April 2001.
[22] T. Ho, M. Medard, J. Shi, M. Effros, and D. Krager, “On randomized network coding,” Proceedings of 41st Annual Allerton Conference on Communication, Control, and Computing, October 2003.
[23] P. Chou, Y. Wu, and K. Jain, “Practical network coding,” Allerton Conference on Communication, Control, and Computing, Monticello, IL, October 2003. Invited paper.
[24] M. Aizenman, J. T. Chayes, L. Chayes, J. Fröhlich, L. Russo, “On a sharp transition from area law to perimeter law in a system of random surfaces,” Communications in Mathematical Physics vol. 92, pp. 19-69, 1983.
[25] H. Chernoff, “A measure of asymptotic efficiency for test of a hypothesis based on the sum of observations,” Annals of Mathematical Statistics, 23:493-507, 1952.
[26] F. Roberts, Applied Combinatorics, Prentice Hall, 1984. [27] G. Lawler, Introduction to Stochastic Processes, Chapman & Hall 1995. [28] C. Toh, “Maximum battery life routing to support ubiquitous mobile
computing in wireless ad hoc networks,” IEEE Communications Magazine, pp. 138--147, June 2001.
[29] R. Shah and J. Rabaey, “Energy aware routing for low energy ad hoc sensor networks,” Proc. of IEEE WCNC, Mar 2002.
[30] C. Intanagonwiwat, R. Govindan, D. Estrin, “Directed diffusion: A scalable and robust communication paradigm for sensor networks,” Proc. of IEEE MobiCom, Aug 2000.
[31] G. Pottie and W. Kaiser, “Wireless sensor networks,” Communications of the ACM, 2000.
[32] M. Chu, H. Haussecker, and F. Zhao, “Scalable information-driven sensor querying and routing for ad hoc heterogeneous sensor networks,” IEEE Journal of High Performance Computing Applications, To Appear 2002.
[33] D. Petrović, R. Shah, K. Ramchandran, and J. Rabaey, “Data funneling: routing with aggregation and compression for wireless sensor networks,” IEEE Sensor Network Protocols and Applications, May 2003.
111
[34] D. Slepian and J. K. Wolf, “Noiseless encoding of correlated information sources,” IEEE Trans. on Inform. Theory, vol. IT-19, pp. 471--480, July 1973.
[35] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York 1991.
[36] S. S. Pradhan and K. Ramchandran, “Group-theoretic construction and analysis of generalized coset codes for symmetric/asymmetric distributed source coding,” Proceedings of Data Compression Conference (DCC), March 2000.
[37] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. on Inform. Theory, vol. IT-22, pp. 1--10, January 1976.
[38] S. S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes: Design and construction,” Proceedings of the Data Compression Conference (DCC), March 1999.
[39] S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, 1996. [40] H. Stark and J. Woods, Probability, Random Processes and Estimation
Theory for Engineers, Prentice Hall, Englewood Cliffs, 1994. [41] A. Singer and M. Feder, “Universal linear prediction by model order
weighting,” IEEE Transactions on Signal Processing, vol. 10, pp. 2685--2699, October 1999.
[42] T. Ramabadran and S. Gaitonde, “A tutorial on CRC computations,” IEEE Micro, vol. 45, pp. 62--74, Aug 1988.
[43] R. Blahut, Theory and Practice of Data Transmission Codes, 1995. [44] J. Chou, D. Petrović, and K. Ramchandran, “Tracking and exploiting
correlations in dense sensor networks,” Proceedings of the Asilomar Conference on Signals Systems and Computers, November 2002.
[45]