super resolution reconstruction of compressed low resolution
Post on 21-Apr-2015
140 Views
Preview:
TRANSCRIPT
A
PROJECT
ON
SUPER RESOLUTION RECONSTRUCTION OF
COMPRESSED LOW RESOLUTION IMAGES
USING WAVELET LIFTING SCHEMES
ABSTRACT
Due to the factors like processing power limitations and channel capabilities images are
often down sampled and transmitted at low bit rates resulting in a low resolution compressed
image. High resolution images can be reconstructed from several blurred, noisy and down
sampled low resolution images using a computational process know as super resolution
reconstruction. The problem of recovering a high resolution image from a sequence of low
resolution compressed images is considered. In this paper, we propose lifting schemes for
intentionally introducing down sampling of the high resolution image sequence before
compression and then utilize super resolution techniques for generating a high resolution image
at the decoder.
Lifting wavelet transform has its advantages over the ordinary wavelet transform by way
of reduction in memory required for its implementation. This is possible because lifting
transform uses in-place computation. The lifting coefficients replace the image samples present
in the respective memory locations. In our proposed approach the forward lifting is applied to the
high resolution images which are compressed using Set Portioning in Hierarchical Trees
(SPHIT), the compressed images are transmitted which results in low resolution images at the
encoder, at the decoder super resolution techniques are applied; lifting scheme based fusion is
performed, then decoded using DSPIHT and Inverse lifting is applied. In order to remove noise
from the reconstructed image soft thresholding is performed and the blur is removed using blind
de-convolution, and finally interpolated using our novel interpolation technique. We have
performed both objective and subjective analysis of the reconstructed image, and the resultant
image has better super resolution factor, and a higher ISNR and PSNR.
INTRODUCTION
Super Resolution is a process of producing a high spatial resolution image from one or
more Low Resolution (LR) observation. It includes an alias free up sampling of the image
thereby increasing the maximum spatial frequency and removing the degradations that arises
during the image capture, Viz Blur and noise. It is the ability to use multiple noisy and blurred
images obtained by low resolution cameras and together generating a higher resolution image
with greater details than those you could get with just a single image. The main reason for the
reconstruction of a single high resolution image from multiple low resolution images is that the
images are degraded and mis-registered, these images are sub pixel shifted due to sub sampling
which results in an aliased low resolution image. If the images are sub-pixel shifted than each
low resolution image contains different information. Therefore, the information that is contained
in an under sampled image sequence can be combined to obtain an alias free high resolution
image. Super resolution image reconstruction from multiple snapshots provides far more detail
information than any interpolated image from a single snapshot.
WAVELETS
First of all, why do we need a transform, or what is a transform anyway?
Mathematical transformations are applied to signals to obtain a further information from
that signal that is not readily available in the raw signal. In the following tutorial I will assume a
time-domain signal as a raw signal, and a signal that has been "transformed" by any of the
available mathematical transformations as a processed signal.
There are number of transformations that can be applied, among which the Fourier
transforms are probably by far the most popular.
Most of the signals in practice, are TIME-DOMAIN signals in their raw format. That is,
whatever that signal is measuring, is a function of time. In other words, when we plot the signal
one of the axes is time (independent variable), and the other (dependent variable) is usually the
amplitude. When we plot time-domain signals, we obtain a time-amplitude representation of the
signal. This representation is not always the best representation of the signal for most signal
processing related applications. In many cases, the most distinguished information is hidden in
the frequency content of the signal. The frequency SPECTRUM of a signal is basically the
frequency components (spectral components) of that signal. The frequency spectrum of a signal
shows what frequencies exist in the signal.
Intuitively, we all know that the frequency is something to do with the change in rate of
something. If something ( a mathematical or physical variable, would be the technically correct
term) changes rapidly, we say that it is of high frequency, where as if this variable does not
change rapidly, i.e., it changes smoothly, we say that it is of low frequency. If this variable does
not change at all, then we say it has zero frequency, or no frequency. For example the publication
frequency of a daily newspaper is higher than that of a monthly magazine (it is published more
frequently).
The frequency is measured in cycles/second, or with a more common name, in "Hertz".
For example the electric power we use in our daily life in the US is 60 Hz (50 Hz elsewhere in
the world). This means that if you try to plot the electric current, it will be a sine wave passing
through the same point 50 times in 1 second. Now, look at the following figures. The first one is
a sine wave at 3 Hz, the second one at 10 Hz, and the third one at 50 Hz. Compare them.
So how do we measure frequency, or how do we find the frequency content of a signal?
The answer is FOURIER TRANSFORM (FT). If the FT of a signal in time domain is taken, the
frequency-amplitude representation of that signal is obtained. In other words, we now have a plot
with one axis being the frequency and the other being the amplitude. This plot tells us how much
of each frequency exists in our signal.
The frequency axis starts from zero, and goes up to infinity. For every frequency, we
have an amplitude value. For example, if we take the FT of the electric current that we use in our
houses, we will have one spike at 50 Hz, and nothing elsewhere, since that signal has only 50 Hz
frequency component. No other signal, however, has a FT which is this simple. For most
practical purposes, signals contain more than one frequency component. The following shows
the FT of the 50 Hz signal:
One word of caution is in order at this point. Note that two plots are given in Figure 1.4.
The bottom one plots only the first half of the top one. Due to reasons that are not crucial to
know at this time, the frequency spectrum of a real valued signal is always symmetric. The top
plot illustrates this point. However, since the symmetric part is exactly a mirror image of the first
part, it provides no additional information, and therefore, this symmetric second part is usually
not shown. In most of the following figures corresponding to FT, I will only show the first half
of this symmetric spectrum.
WHY DO WE NEED THE FREQUENCY INFORMATION?
Often times, the information that cannot be readily seen in the time-domain can be seen
in the frequency domain.
Let's give an example from biological signals. Suppose we are looking at an ECG signal
(ElectroCardioGraphy, graphical recording of heart's electrical activity). The typical shape of a
healthy ECG signal is well known to cardiologists. Any significant deviation from that shape is
usually considered to be a symptom of a pathological condition.
This pathological condition, however, may not always be quite obvious in the original
time-domain signal. Cardiologists usually use the time-domain ECG signals which are recorded
on strip-charts to analyze ECG signals. Recently, the new computerized ECG
recorders/analyzers also utilize the frequency information to decide whether a pathological
condition exists. A pathological condition can sometimes be diagnosed more easily when the
frequency content of the signal is analyzed.
This, of course, is only one simple example why frequency content might be useful.
Today Fourier transforms are used in many different areas including all branches of engineering.
Although FT is probably the most popular transform being used (especially in electrical
engineering), it is not the only one. There are many other transforms that are used quite often by
engineers and mathematicians. Hilbert transform, short-time Fourier transform (more about this
later), Wigner distributions, the Radon Transform, and of course our featured transformation ,
the wavelet transform, constitute only a small portion of a huge list of transforms that are
available at engineer's and mathematician's disposal. Every transformation technique has its own
area of application, with advantages and disadvantages, and the wavelet transform (WT) is no
exception.
For a better understanding of the need for the WT let's look at the FT more closely. FT
(as well as WT) is a reversible transform, that is, it allows to go back and forward between the
raw and processed (transformed) signals. However, only either of them is available at any given
time. That is, no frequency information is available in the time-domain signal, and no time
information is available in the Fourier transformed signal. The natural question that comes to
mind is that is it necessary to have both the time and the frequency information at the same time?
As we will see soon, the answer depends on the particular application and the nature of
the signal in hand. Recall that the FT gives the frequency information of the signal, which means
that it tells us how much of each frequency exists in the signal, but it does not tell us when in
time these frequency components exist. This information is not required when the signal is so-
called stationary.
FOURIER TRANSFORM DRAWBACKS
1. It is too wide of a subject to discuss in this tutorial.
2. It is not our main concern anyway. However, I would like to mention a couple important
points again for two reasons.
3. It is a necessary background to understand how WT works.
4. It has been by far the most important signal processing tool for many (and I mean many
many) years.
In 19th century (1822*, to be exact, but you do not need to know the exact time. Just trust
me that it is far before than you can remember), the French mathematician J. Fourier, showed
that any periodic function can be expressed as an infinite sum of periodic complex exponential
functions. Many years after he had discovered this remarkable property of (periodic) functions,
his ideas were generalized to first non-periodic functions, and then periodic or non-periodic
discrete time signals. It is after this generalization that it became a very suitable tool for
computer calculations. In 1965, a new algorithm called fast Fourier Transform (FFT) was
developed and FT became even more popular.
THE SHORT TERM FOURIER TRANSFORM
There is only a minor difference between STFT and FT. In STFT, the signal is divided
into small enough segments, where these segments (portions) of the signal can be assumed to be
stationary. For this purpose, a window function "w" is chosen. The width of this window must
be equal to the segment of the signal where its stationarity is valid.
This window function is first located to the very beginning of the signal. That is, the
window function is located at t=0. Let's suppose that the width of the window is "T" s. At this
time instant (t=0), the window function will overlap with the first T/2 seconds (I will assume that
all time units are in seconds). The window function and the signal are then multiplied. By doing
this, only the first T/2 seconds of the signal is being chosen, with the appropriate weighting of
the window (if the window is a rectangle, with amplitude "1", then the product will be equal to
the signal). Then this product is assumed to be just another signal, whose FT is to be taken. In
other words, FT of this product is taken, just as taking the FT of any signal.
The result of this transformation is the FT of the first T/2 seconds of the signal. If this
portion of the signal is stationary, as it is assumed, then there will be no problem and the
obtained result will be a true frequency representation of the first T/2 seconds of the signal.
The next step, would be shifting this window (for some t1 seconds) to a new location,
multiplying with the signal, and taking the FT of the product. This procedure is followed, until
the end of the signal is reached by shifting the window with "t1" seconds intervals.
MULTIRESOLUTION ANALYSIS
Although the time and frequency resolution problems are results of a physical
phenomenon (the Heisenberg uncertainty principle) and exist regardless of the transform used, it
is possible to analyze any signal by using an alternative approach called the multiresolution
analysis (MRA) . MRA, as implied by its name, analyzes the signal at different frequencies with
different resolutions. Every spectral component is not resolved equally as was the case in the
STFT.
MRA is designed to give good time resolution and poor frequency resolution at high
frequencies and good frequency resolution and poor time resolution at low frequencies. This
approach makes sense especially when the signal at hand has high frequency components for
short durations and low frequency components for long durations. Fortunately, the signals that
are encountered in practical applications are often of this type. For example, the following shows
a signal of this type. It has a relatively low frequency component throughout the entire signal and
relatively high frequency components for a short duration somewhere around the middle.
THE CONTINUOUS WAVELET TRANSFORM
The continuous wavelet transform was developed as an alternative approach to the short
time Fourier transform to overcome the resolution problem. The wavelet analysis is done in a
similar way to the STFT analysis, in the sense that the signal is multiplied with a function, {\it
the wavelet}, similar to the window function in the STFT, and the transform is computed
separately for different segments of the time-domain signal. However, there are two main
differences between the STFT and the CWT:
1. The Fourier transforms of the windowed signals are not taken, and therefore single peak will
be seen corresponding to a sinusoid, i.e., negative frequencies are not computed.
2. The width of the window is changed as the transform is computed for every single spectral
component, which is probably the most significant characteristic of the wavelet transform.
The continuous wavelet transform is defined as follows
As seen in the above equation , the transformed signal is a function of two variables, tau and s ,
the translation and scale parameters, respectively. psi(t) is the transforming function, and it is
called the mother wavelet . The term mother wavelet gets its name due to two important
properties of the wavelet analysis as explained below:
The term wavelet means a small wave . The smallness refers to the condition that this
(window) function is of finite length ( compactly supported). The wave refers to the condition
that this function is oscillatory . The term mother implies that the functions with different region
of support that are used in the transformation process are derived from one main function, or the
mother wavelet. In other words, the mother wavelet is a prototype for generating the other
window functions.
The term translation is used in the same sense as it was used in the STFT; it is related to
the location of the window, as the window is shifted through the signal. This term, obviously,
corresponds to time information in the transform domain. However, we do not have a frequency
parameter, as we had before for the STFT. Instead, we have scale parameter which is defined as
$1/frequency$. The term frequency is reserved for the STFT. Scale is described in more detail in
the next section.
COMPUTATION OF THE CWT
Interpretation of the above equation will be explained in this section. Let x(t) is the signal
to be analyzed. The mother wavelet is chosen to serve as a prototype for all windows in the
process. All the windows that are used are the dilated (or compressed) and shifted versions of the
mother wavelet. There are a number of functions that are used for this purpose. The Morlet
wavelet and the Mexican hat function are two candidates.
Once the mother wavelet is chosen the computation starts with s=1 and the continuous wavelet
transform is computed for all values of s , smaller and larger than ``1''. However, depending on
the signal, a complete transform is usually not necessary. For all practical purposes, the signals
are bandlimited, and therefore, computation of the transform for a limited interval of scales is
usually adequate. In this study, some finite interval of values for s was used, as will be described
later in this chapter.
For convenience, the procedure will be started from scale s=1 and will continue for the
increasing values of s , i.e., the analysis will start from high frequencies and proceed towards low
frequencies. This first value of s will correspond to the most compressed wavelet. As the value of
s is increased, the wavelet will dilate.
The wavelet is placed at the beginning of the signal at the point which corresponds to
time=0. The wavelet function at scale ``1'' is multiplied by the signal and then integrated over all
times. The result of the integration is then multiplied by the constant number 1/sqrt{s} . This
multiplication is for energy normalization purposes so that the transformed signal will have the
same energy at every scale. The final result is the value of the transformation, i.e., the value of
the continuous wavelet transform at time zero and scale s=1 . In other words, it is the value that
corresponds to the point tau =0 , s=1 in the time-scale plane.
The wavelet at scale s=1 is then shifted towards the right by tau amount to the location
t=tau , and the above equation is computed to get the transform value at t=tau , s=1 in the time-
frequency plane.
This procedure is repeated until the wavelet reaches the end of the signal. One row of
points on the time-scale plane for the scale s=1 is now completed.
Then, s is increased by a small value. Note that, this is a continuous transform, and
therefore, both tau and s must be incremented continuously . However, if this transform needs to
be computed by a computer, then both parameters are increased by a sufficiently small step
size. This corresponds to sampling the time-scale plane.
The above procedure is repeated for every value of s. Every computation for a given value of s
fills the corresponding single row of the time-scale plane. When the process is completed for all
desired values of s, the CWT of the signal has been calculated.
Why is the Discrete Wavelet Transform Needed?
Although the discretized continuous wavelet transform enables the computation of the
continuous wavelet transform by computers, it is not a true discrete transform. As a matter of
fact, the wavelet series is simply a sampled version of the CWT, and the information it provides
is highly redundant as far as the reconstruction of the signal is concerned. This redundancy, on
the other hand, requires a significant amount of computation time and resources. The discrete
wavelet transform (DWT), on the other hand, provides sufficient information both for analysis
and synthesis of the original signal, with a significant reduction in the computation time.
The DWT is considerably easier to implement when compared to the CWT. The basic concepts
of the DWT will be introduced in this section along with its properties and the algorithms used to
compute it. As in the previous chapters, examples are provided to aid in the interpretation of the
DWT.
THE DISCRETE WAVELET TRANSFORM (DWT)
The foundations of the DWT go back to 1976 when Croiser, Esteban, and Galand devised
a technique to decompose discrete time signals. Crochiere, Weber, and Flanagan did a similar
work on coding of speech signals in the same year. They named their analysis scheme as
subband coding. In 1983, Burt defined a technique very similar to subband coding and named it
pyramidal coding which is also known as multiresolution analysis. Later in 1989, Vetterli and
Le Gall made some improvements to the subband coding scheme, removing the existing
redundancy in the pyramidal coding scheme. Subband coding is explained below. A detailed
coverage of the discrete wavelet transform and theory of multiresolution analysis can be found in
a number of articles and books that are available on this topic, and it is beyond the scope of this
tutorial.
The Subband Coding and The Multiresolution Analysis
The main idea is the same as it is in the CWT. A time-scale representation of a digital
signal is obtained using digital filtering techniques. Recall that the CWT is a correlation between
a wavelet at different scales and the signal with the scale (or the frequency) being used as a
measure of similarity. The continuous wavelet transform was computed by changing the scale of
the analysis window, shifting the window in time, multiplying by the signal, and integrating over
all times. In the discrete case, filters of different cutoff frequencies are used to analyze the signal
at different scales. The signal is passed through a series of high pass filters to analyze the high
frequencies, and it is passed through a series of low pass filters to analyze the low frequencies.
The resolution of the signal, which is a measure of the amount of detail information in the
signal, is changed by the filtering operations, and the scale is changed by upsampling and
downsampling (subsampling) operations. Subsampling a signal corresponds to reducing the
sampling rate, or removing some of the samples of the signal. For example, subsampling by two
refers to dropping every other sample of the signal. Subsampling by a factor n reduces the
number of samples in the signal n times.
Upsampling a signal corresponds to increasing the sampling rate of a signal by adding
new samples to the signal. For example, upsampling by two refers to adding a new sample,
usually a zero or an interpolated value, between every two samples of the signal. Upsampling a
signal by a factor of n increases the number of samples in the signal by a factor of n.
Although it is not the only possible choice, DWT coefficients are usually sampled from
the CWT on a dyadic grid, i.e., s0 = 2 and 0 = 1, yielding s=2j and =k*2j, as described in Part
3. Since the signal is a discrete time function, the terms function and sequence will be used
interchangeably in the following discussion. This sequence will be denoted by x[n], where n is
an integer.
The procedure starts with passing this signal (sequence) through a half band digital
lowpass filter with impulse response h[n]. Filtering a signal corresponds to the mathematical
operation of convolution of the signal with the impulse response of the filter. The convolution
operation in discrete time is defined as follows:
A half band lowpass filter removes all frequencies that are above half of the highest
frequency in the signal. For example, if a signal has a maximum of 1000 Hz component, then
half band lowpass filtering removes all the frequencies above 500 Hz.
The unit of frequency is of particular importance at this time. In discrete signals,
frequency is expressed in terms of radians. Accordingly, the sampling frequency of the signal is
equal to 2 radians in terms of radial frequency. Therefore, the highest frequency component that
exists in a signal will be radians, if the signal is sampled at Nyquist�s rate (which is twice the
maximum frequency that exists in the signal); that is, the Nyquist�s rate corresponds to rad/s
in the discrete frequency domain. Therefore using Hz is not appropriate for discrete signals.
However, Hz is used whenever it is needed to clarify a discussion, since it is very common to
think of frequency in terms of Hz. It should always be remembered that the unit of frequency for
discrete time signals is radians.
After passing the signal through a half band lowpass filter, half of the samples can be
eliminated according to the Nyquist�s rule, since the signal now has a highest frequency of /2
radians instead of radians. Simply discarding every other sample will subsample the signal by
two, and the signal will then have half the number of points. The scale of the signal is now
doubled. Note that the lowpass filtering removes the high frequency information, but leaves the
scale unchanged. Only the subsampling process changes the scale. Resolution, on the other hand,
is related to the amount of information in the signal, and therefore, it is affected by the filtering
operations. Half band lowpass filtering removes half of the frequencies, which can be interpreted
as losing half of the information. Therefore, the resolution is halved after the filtering operation.
Note, however, the subsampling operation after filtering does not affect the resolution, since
removing half of the spectral components from the signal makes half the number of samples
redundant anyway. Half the samples can be discarded without any loss of information. In
summary, the lowpass filtering halves the resolution, but leaves the scale unchanged.
The signal is then subsampled by 2 since half of the number of samples are redundant.
This doubles the scale.
This procedure can mathematically be expressed as
Having said that, we now look how the DWT is actually computed: The DWT analyzes
the signal at different frequency bands with different resolutions by decomposing the signal into
a coarse approximation and detail information. DWT employs two sets of functions, called
scaling functions and wavelet functions, which are associated with low pass and highpass filters,
respectively. The decomposition of the signal into different frequency bands is simply obtained
by successive highpass and lowpass filtering of the time domain signal. The original signal x[n]
is first passed through a halfband highpass filter g[n] and a lowpass filter h[n]. After the filtering,
half of the samples can be eliminated according to the Nyquist�s rule, since the signal now has
a highest frequency of /2 radians instead of . The signal can therefore be subsampled by 2,
simply by discarding every other sample. This constitutes one level of decomposition and can
mathematically be expressed as follows:
where yhigh[k] and ylow[k] are the outputs of the highpass and lowpass filters, respectively,
after subsampling by 2.
This decomposition halves the time resolution since only half the number of samples now
characterizes the entire signal. However, this operation doubles the frequency resolution, since
the frequency band of the signal now spans only half the previous frequency band, effectively
reducing the uncertainty in the frequency by half.
The above procedure, which is also known as the subband coding, can be repeated for
further decomposition. At every level, the filtering and subsampling will result in half the
number of samples (and hence half the time resolution) and half the frequency band spanned
(and hence double the frequency resolution). Figure 4.1 illustrates this procedure, where x[n] is
the original signal to be decomposed, and h[n] and g[n] are lowpass and highpass filters,
respectively. The bandwidth of the signal at every level is marked on the figure as "f".
Figure 4.1. The Subband Coding Algorithm As an example, suppose that the original signal x[n]
has 512 sample points, spanning a frequency band of zero to rad/s. At the first decomposition
level, the signal is passed through the highpass and lowpass filters, followed by subsampling by
2.
The output of the highpass filter has 256 points (hence half the time resolution), but it
only spans the frequencies /2 to rad/s (hence double the frequency resolution). These 256
samples constitute the first level of DWT coefficients. The output of the lowpass filter also has
256 samples, but it spans the other half of the frequency band, frequencies from 0 to /2 rad/s.
This signal is then passed through the same lowpass and highpass filters for further
decomposition. The output of the second lowpass filter followed by subsampling has 128
samples spanning a frequency band of 0 to /4 rad/s, and the output of the second highpass filter
followed by subsampling has 128 samples spanning a frequency band of /4 to /2 rad/s. The
second highpass filtered signal constitutes the second level of DWT coefficients. This signal has
half the time resolution, but twice the frequency resolution of the first level signal. In other
words, time resolution has decreased by a factor of 4, and frequency resolution has increased by
a factor of 4 compared to the original signal. The lowpass filter output is then filtered once again
for further decomposition. This process continues until two samples are left. For this specific
example there would be 8 levels of decomposition, each having half the number of samples of
the previous level. The DWT of the original signal is then obtained by concatenating all
coefficients starting from the last level of decomposition (remaining two samples, in this case).
The DWT will then have the same number of coefficients as the original signal.
The frequencies that are most prominent in the original signal will appear as high
amplitudes in that region of the DWT signal that includes those particular frequencies. The
difference of this transform from the Fourier transform is that the time localization of these
frequencies will not be lost. However, the time localization will have a resolution that depends
on which level they appear. If the main information of the signal lies in the high frequencies, as
happens most often, the time localization of these frequencies will be more precise, since they
are characterized by more number of samples. If the main information lies only at very low
frequencies, the time localization will not be very precise, since few samples are used to express
signal at these frequencies. This procedure in effect offers a good time resolution at high
frequencies, and good frequency resolution at low frequencies. Most practical signals
encountered are of this type.
We will revisit this example, since it provides important insight to how DWT should be
interpreted. Before that, however, we need to conclude our mathematical analysis of the DWT.
One important property of the discrete wavelet transform is the relationship between the
impulse responses of the highpass and lowpass filters. The highpass and lowpass filters are not
independent of each other, and they are related by
where g[n] is the highpass, h[n] is the lowpass filter, and L is the filter length (in number
of points). Note that the two filters are odd index alternated reversed versions of each other.
Lowpass to highpass conversion is provided by the (-1)n term. Filters satisfying this condition are
commonly used in signal processing, and they are known as the Quadrature Mirror Filters
(QMF). The two filtering and subsampling operations can be expressed by
The reconstruction in this case is very easy since halfband filters form orthonormal bases.
The above procedure is followed in reverse order for the reconstruction. The signals at every
level are upsampled by two, passed through the synthesis filters g�[n], and h�[n] (highpass
and lowpass, respectively), and then added. The interesting point here is that the analysis and
synthesis filters are identical to each other, except for a time reversal. Therefore, the
reconstruction formula becomes (for each layer)
However, if the filters are not ideal halfband, then perfect reconstruction cannot be
achieved. Although it is not possible to realize ideal filters, under certain conditions it is possible
to find filters that provide perfect reconstruction. The most famous ones are the ones developed
by Ingrid Daubechies, and they are known as Daubechies� wavelets.
Note that due to successive subsampling by 2, the signal length must be a power of 2, or
at least a multiple of power of 2, in order this scheme to be efficient. The length of the signal
determines the number of levels that the signal can be decomposed to. For example, if the signal
length is 1024, ten levels of decomposition are possible.
Interpreting the DWT coefficients can sometimes be rather difficult because the way
DWT coefficients are presented is rather peculiar. To make a real long story real short, DWT
coefficients of each level are concatenated, starting with the last level. An example is in order to
make this concept clear:
Suppose we have a 256-sample long signal sampled at 10 MHZ and we wish to obtain its
DWT coefficients. Since the signal is sampled at 10 MHz, the highest frequency component that
exists in the signal is 5 MHz. At the first level, the signal is passed through the lowpass filter
h[n], and the highpass filter g[n], the outputs of which are subsampled by two. The highpass
filter output is the first level DWT coefficients. There are 128 of them, and they represent the
signal in the [2.5 5] MHz range. These 128 samples are the last 128 samples plotted. The
lowpass filter output, which also has 128 samples, but spanning the frequency band of [0 2.5]
MHz, are further decomposed by passing them through the same h[n] and g[n]. The output of the
second highpass filter is the level 2 DWT coefficients and these 64 samples precede the 128 level
1 coefficients in the plot. The output of the second lowpass filter is further decomposed, once
again by passing it through the filters h[n] and g[n]. The output of the third highpass filter is the
level 3 DWT coefficiets. These 32 samples precede the level 2 DWT coefficients in the plot.
The procedure continues until only 1 DWT coefficient can be computed at level 9. This
one coefficient is the first to be plotted in the DWT plot. This is followed by 2 level 8
coefficients, 4 level 7 coefficients, 8 level 6 coefficients, 16 level 5 coefficients, 32 level 4
coefficients, 64 level 3 coefficients, 128 level 2 coefficients and finally 256 level 1 coefficients.
Note that less and less number of samples is used at lower frequencies, therefore, the time
resolution decreases as frequency decreases, but since the frequency interval also decreases at
low frequencies, the frequency resolution increases. Obviously, the first few coefficients would
not carry a whole lot of information, simply due to greatly reduced time resolution. To illustrate
this richly bizarre DWT representation let us take a look at a real world signal. Our original
signal is a 256-sample long ultrasonic signal, which was sampled at 25 MHz. This signal was
originally generated by using a 2.25 MHz transducer, therefore the main spectral component of
the signal is at 2.25 MHz. The last 128 samples correspond to [6.25 12.5] MHz range. As seen
from the plot, no information is available here, hence these samples can be discarded without any
loss of information. The preceding 64 samples represent the signal in the [3.12 6.25] MHz range,
which also does not carry any significant information. The little glitches probably correspond to
the high frequency noise in the signal. The preceding 32 samples represent the signal in the [1.5
3.1] MHz range. As you can see, the majority of the signal�s energy is focused in these 32
samples, as we expected to see. The previous 16 samples correspond to [0.75 1.5] MHz and the
peaks that are seen at this level probably represent the lower frequency envelope of the signal.
The previous samples probably do not carry any other significant information. It is safe to say
that we can get by with the 3rd and 4th level coefficients, that is we can represent this 256 sample
long signal with 16+32=48 samples, a significant data reduction which would make your
computer quite happy.
One area that has benefited the most from this particular property of the wavelet
transforms is image processing. As you may well know, images, particularly high-resolution
images, claim a lot of disk space. As a matter of fact, if this tutorial is taking a long time to
download, that is mostly because of the images. DWT can be used to reduce the image size
without losing much of the resolution.
For a given image, you can compute the DWT of, say each row, and discard all values in
the DWT that are less then a certain threshold. We then save only those DWT coefficients that
are above the threshold for each row, and when we need to reconstruct the original image, we
simply pad each row with as many zeros as the number of discarded coefficients, and use the
inverse DWT to reconstruct each row of the original image. We can also analyze the image at
different frequency bands, and reconstruct the original image by using only the coefficients that
are of a particular band. I will try to put sample images hopefully soon, to illustrate this point.
Another issue that is receiving more and more attention is carrying out the decomposition
(subband coding) not only on the lowpass side but on both sides. In other words, zooming into
both low and high frequency bands of the signal separately. This can be visualized as having
both sides of the tree structure of Figure 4.1. What result is what is known as the wavelet
packages. We will not discuss wavelet packages in this here, since it is beyond the scope of this
tutorial. Anyone who is interested in wavelet packages, or more information on DWT can find
this information in any of the numerous texts available in the market.
And this concludes our mini series of wavelet tutorial. If I could be of any assistance to
anyone struggling to understand the wavelets, I would consider the time and the effort that went
into this tutorial well spent. I would like to remind that this tutorial is neither a complete nor a
through coverage of the wavelet transforms. It is merely an overview of the concept of wavelets
and it was intended to serve as a first reference for those who find the available texts on wavelets
rather complicated. There might be many structural and/or technical mistakes, and I would
appreciate if you could point those out to me. Your feedback is of utmost importance for the
success of this tutorial.
The Wavelet Lifting Scheme :
Wavelets and Their Applications
Wavelets have been applied in a wide range of areas. My interest in wavelets came from digital
signal processing of non-stationary time series .Data sets without obviously periodic components
cannot be processed well using Fourier techniques. Wavelets allow complex filters to be
constructed for this kind of data which can remove or enhance selected parts of the signal. There
is a growing body of literature on wavelet techniques for noise reduction.
Wavelets have been used for data compression. For example, the United States FBI compresses
their fingerprint data base using wavelets. Lifting scheme wavelets also form the basis of the
emerging JPEG 2000 image compression standard.
Wavelet techniques have also been used in a variety of statistical applications, including signal
variance estimation, frequency analysis and kernel regression.
Perfect Reconstruction in an Finite World
One of the features of wavelets that is critical in areas like signal processing and compression is
what is referred to in the wavelet literature as perfect reconstruction. A wavelet algorithm has
perfect reconstruction when the inverse wavelet transform of the result of the wavelet transform
yields exactly the original data set:
IWT( WT( D ) ) = D
Limitations of the Haar Wavelet Transform
The Haar wavelet transform has a number of advantages:
It is conceptually simple.
It is fast.
It is memory efficient, since it can be calculated in place without a temporary array.
It is exactly reversible without the edge effects that are a problem with other wavelet
trasforms.
The Haar transform also has limitations, which can be a problem for some applications.
In generating each set of averages for the next level and each set of coefficients, the Haar
transform performs an average and difference on a pair of values. Then the algorithm shifts over
by two values and calculates another average and difference on the next pair.
The high frequency coefficient spectrum should reflect all high frequency changes. The Haar
window is only two elements wide. If a big change takes place from an even value to an odd
value, the change will not be reflected in the high frequency coefficients.
If the wavelet literature is any guide, when mathematicians think about and write about
wavelets equations, they think about wavelets applied to an infinite sequence of data.
Many wavelet equations that have the property of perfect reconstruction for infinite data
sequences do not have this property for finite data sequences.
Since sound files, financial time series, images and other data sets to which wavelets are
applied are finite this can be a problem. There are several methods proposed in the wavelet
literature for dealing with the edges of finite data sets. See for example, my discussion of the
daubechies D4 Wavelet transformThe simplest wavelet algorithm that shows perfect
reconstruction is the Haar wavelet algorithm. However, Haar wavelets can miss detail and do not
always represent change at all resolution scales.
The wavelet lifting scheme divides the wavelet transform into a set of steps. One of the
elegant qualities of wavelet algorithms expressed via the listing scheme is the fact that the
inverse transform is a mirror of the forward transform. The simplest way to start thinking about
the lifting scheme is via a one step wavelet that I refer to as a "predict wavelet".
The wavelet Lifting Scheme is a method for decomposing wavelet transforms into a set of stages.
Lifing scheme algorithms have the advantage that they do not require temporary arrays in the
calculation steps, as is necessary for some versions of the Daubechies D4 wavelet algorithm.
The simplest version of a forward wavelet transform expressed in the Lifting Scheme is shown
below in Figure 1. The predict step is the subject of this web page, which will considered in
isolation. The predict step calculates the wavelet function in the wavelet transform. This is a high
pass filter. The update step calculates the scaling function, which results in a smoother version of
the data. The complete lifting scheme is discussed on the Basic Lifting Scheme web page. This
web page is intended to provide background for this discussion.
Figure 2 shows the ur-Lifting Scheme transform, consisting of two steps:
1. Split step: divide the input data into odd and even elements. In a finite data set the odd
elements are moved to the second half of the array, leaving the even elements in the first
half.
2. Predict step: predict the odd elements from the even elements.
One way to view the predict step is through the lens of data compression. If our objective is to
compress a set of data and the odd elements can be absolutely predicted from the even elements
using the equation.
odd = even * 2;
the odd elements can be replaced by zero. If we apply a compression algorithm like run length
encoding the odd elements will be reduced to a count and zero, compressing the original data set
by almost 50%. If the data set consists of points on a line, then it can be reduced to something
close to a single element and the length of the data set. In most cases the data set is more
complex and it cannot be entirely represented by a starting condition, a length and an equation.
However, a more compact representation might be arrived at by approximating the data in a local
region using a function. The predict stage replaces an odd element with the difference between
the odd element a function calculated from the even elements. The simplest example of such a
predict stage takes a single even element as its argument to calculate the predicted value of the
odd element:
oddj+1,i = oddj,i - P(evenj,k)
Here the function P() is the predict function. Wavelet algorithms are recursive, so the recursive
step j generates data for the next recursive step j+1. The subscript i indexes the odd part of the
array. The subscript k indexes the even part of the array.
One of the simplest predict functions is simply
oddj+1,i = evenj,k
If the split step had not divided the odd and even elements, the predict step predicts that the odd
value is equal to its even predecessor.
ai+1 = ai;
The predict step replaces the odd elements with the difference between the actual odd value and
the predicted value:
oddj+1,i = oddj,i - evenj,k
If the data shows a trend (in the language of statistics the data shows autocorrelation),
then the odd element can be predicted from the even element, to some degree.
As a result, the difference between the odd element and its predictor (the even element)
will be smaller than the odd element itself. Smaller values can be represented in fewer bits, so
some level of compression can be achieved.
The process of "predicting" the odd elements from the even elements is recursive, as long
as the number of data elements is a power of two. After the first pass, the odd (upper) half of the
array will contain the differences between the prediction and the original odd element values.
The next recursive pass divides the lower half of the array into odd and even halves. The
difference between the prediction and the odd element value is stored in the new odd half. The
recursive passes continue until the last step where a single odd element is predicted from a single
even element. This is shown in Figure 3 below.
Figure 3
Another way to view this is as a chain of split/predict steps, where the even elements
from one step become the input for the next step.
The closer the predict function fits the data the smaller the differences will be. A
prediction function that closely fits the data will allow the result of the predict wavelet to be
represented in fewer bits, so there will be a higher level of compression.
The Daubechies D4 Wavelet Transform
The Daubechies wavelet transform is named after its inventor (or would it be
discoverer?), the mathematician Ingrid Daubechies. The Daubechies D4 transform has four
wavelet and scaling function coefficients. The3 scaling function coefficients are
Each step of the wavelet transform applies the scaling function to the the data input. If the
original data set has N values, the scaling function will be applied in the wavelet transform step
to calculate N/2 smoothed values. In the ordered wavelet transform the smoothed values are
stored in the lower half of the N element input vector.
The wavelet function coefficient values are:
g0=h3,g1=-h2,g2=h1,g3 = -h0
Each step of the wavelet transform applies the wavelet function to the input data. If the
original data set has N values, the wavelet function will be applied to calculate N/2 differences
(reflecting change in the data). In the ordered wavelet transform the wavelet values are stored in
the upper half of teh N element input vector.
The scaling and wavelet functions are calculated by taking the inner product of the
coefficients and four data values. The equations are shown below:
Each iteration in the wavelet transform step calculates a scaling function value and a
wavelet function value. The index i is incremented by two with each iteration, and new scaling
and wavelet function values are calculated. This pattern is discussed on the web page A Linear
Algebra View of the Wavelet Transform.
In the case of the forward transform, with a finite data set (as opposed to the
mathematician's imaginary infinite data set), i will be incremented until it is equal to N-2. In the
last iteration the inner product will be calculated from calculated from s[N-2], s[N-1], s[N] and
s[N+1]. Since s[N] and s[N+1] don't exist (they are byond the end of the array), this presents a
problem. This is shown in the transform matrix below.
Daubechies D4 forward transform matrix for an 8 element signal
Note that this problem does not exist for the Haar wavelet, since it is calculated on only
two elements, s[i] and s[i+1].
A similar problem exists in the case of the inverse transform. Here the inverse transform
coefficients extend beyond the beginning of the data, where the first two inverse values are
calculated from s[-2], s[-1], s[0] and s[1]. This is shown in the inverse transform matrix below.
Daubechies D4 inverse transform matrix for an 8 element transform result
Three methods for handling the edge problem:
1. Treat the data set as if it is periodic. The beginning of the data sequence repeats folling
the end of the sequence (in the case of the forward transform) and the end of the data
wraps around to the beginning (in the case of the inverse transform).
2. Treat the data set as if it is mirrored at the ends. This means that the data is reflected from
each end, as if a mirror were held up to each end of the data sequence.
3. Gram-Schmidt orthogonalization. Gram-Schmidt orthoganalization calculates special
scaling and wavelet functions that are applied at the start and end of the data set.
Zeros can also be used to fill in for the missing elements, but this can introduce significant
error.
The Daubechies D4 algorithm published here treats the data as if it were periodic. The code
for one step of the forward transform is shown below. Note that in the calculation of the last two
values, the start of the data wraps around to the end and elements a[0] and a[1] are used in the
inner product.
SPIHT
The powerful wavelet-based image compression method called Set Partitioning in
Hierarchical Trees (SPIHT). This award-winning method has received worldwide acclaim and
attention since its introduction in 1995. Thousands of people, researchers and consumers alike,
have now tested and used SPIHT. It has become the benchmark state-of-the-art algorithm for
image compression.
The SPIHT method is not a simple extension of traditional methods for image
compression, and represents an important advance in the field. The method deserves special
attention because it provides the following:
Highest Image Quality
Progressive image transmission
Fully embedded coded file
Simple quantization algorithm
Fast coding/decoding
Completely adaptive
Lossless compression
Exact bit rate coding
Error protection
Each of these properties is discussed below. Note that different compression methods
were developed specifically to achieve at least one of those objectives. What makes SPIHT
really outstanding is that it yields all those qualities simultaneously. So, if in the future you find
one method that claims to be superior to SPIHT in one evaluation parameter (like PSNR),
remember to see who wins in the remaining criteria.
Image Quality
Extensive research has shown that the images obtained with wavelet-based methods yield
very good visual quality. At first it was shown that even simple coding methods produced good
results when combined with wavelets and is the basis for the most recently JPEG2000 standard.
However, SPIHT belongs to the next generation of wavelet encoders, employing more
sophisticated coding. In fact, SPIHT exploits the properties of the wavelet-transformed images to
increase its efficiency.
Many researchers now believe that encoders that use wavelets are superior to those that
use DCT or fractals. We will not discuss the matter of taste in the evaluation of low quality
images, but we do want to say that SPIHT wins in the test of finding the minimum rate required
to obtain a reproduction indistinguishable from the original. The SPIHT advantage is even more
pronounced in encoding color images, because the bits are allocated automatically for local
optimality among the color components, unlike other algorithms that encode the color
components separately based on global statistics of the individual components.
If, after what we said, you are still not certain that you should believe us (because in the
past you heard claims like that and then were deeply disappointed), we understand you point of
view.
Progressive Image Transmission
In some systems with progressive image transmission (like WWW browsers) the quality
of the displayed images follows the sequence: (a) weird abstract art; (b) you begin to believe that
it is an image of something; (c) CGA-like quality; (d) lossless recovery. With very fast links the
transition from (a) to (d) can be so fast that you will never notice. With slow links (how "slow"
depends on the image size, colors, etc.) the time from one stage to the next grows exponentially,
and it may take hours to download a large image. Considering that it may be possible to recover
an excellent-quality image using 10-20 times less bits, it is easy to see the inefficiency.
Furthermore, the mentioned systems are not efficient even for lossless transmission.
The problem is that such widely used schemes employ a very primitive progressive
image transmission method. On the other extreme, SPIHT is a state-of-the-art method that was
designed for optimal progressive transmission (and still beats most non-progressive methods!). It
does so by producing a fully embedded coded file (see below), in a manner that at any moment
the quality of the displayed image is the best available for the number of bits received up to that
moment.
So, SPIHT can be very useful for applications where the user can quickly inspect the
image and decide if it should be really downloaded, or is good enough to be saved, or need
refinement.
Optimized Embedded Coding
A strict definition of the embedded coding scheme is: if two files produced by the
encoder have size M and N bits, with M > N, then the file with size N is identical to the first N
bits of the file with size M.
Let's see how this abstract definition is used in practice. Suppose you need to compress
an image for three remote users.
Each one have different needs of image reproduction quality, and you find that those
qualities can be obtained with the image compressed to at least 8 Kb, 30 Kb, and 80 Kb,
respectively. If you use a non-embedded encoder (like JPEG), to save in transmission costs (or
time) you must prepare one file for each user. On the other hand, if you use an embedded
encoder (like SPIHT) then you can compress the image to a single 80 Kb file, and then send the
first 8 Kb of the file to the first user, the first 30 Kb to the second user, and the whole file to the
third user.
But what is the price to pay for this "amenity"? Surprisingly, with SPIHT all three users
would get (for the same file size) an image quality comparable or superior to the most
sophisticated non-embedded encoders available today. SPIHT achieves this feat by optimizing
the embedded coding process and always coding the most important information first.
An even more important application is for progressive image transmission, where the
user can decide at which point the image quality satisfies his needs, or abort the transmission
after a quick inspection, etc.
Compression Algorithm
Various paper describing SPIHT is available via anonymous ftp. Here we take the
opportunity to comment how it is different from other approaches. The following is a
comparison of image quality and artifacts at high compression ratios versus JPEG.
SPIHT represents a small "revolution" in image compression because it broke the trend to
more complex (in both the theoretical and the computational senses) compression schemes.
While researchers had been trying to improve previous schemes for image coding using
very sophisticated vector quantization, SPIHT achieved superior results using the simplest
method: uniform scalar quantization. Thus, it is much easier to design fast SPIHT codecs.
Encoding/Decoding Speed
The SPIHT process represents a very effective form of entropy-coding. This is shown by
the demo programs using two forms of coding: binary-uncoded (extremely simple) and context-
based adaptive arithmetic coded (sophisticated). Surprisingly, the difference in compression is
small, showing that it is not necessary to use slow methods (and also pay royalties for them!). A
fast version using Huffman codes was also successfully tested, but it is not publicly available.
A straightforward consequence of the compression simplicity is the greater
coding/decoding speed. The SPIHT algorithm is nearly symmetric, i.e., the time to encode is
nearly equal to the time to decode. (Complex compression algorithms tend to have encoding
times much larger than the decoding times.)
Some of our demo programs use floating-point operations extensively, and can be slower
in some CPUs (floating points are better when people want to test you programs with strange 16
bpp images). However, this problem can be easily solved: try the lossless version to see an
example. Similarly, the use for progressive transmission requires a somewhat more complex and
slower algorithm. Some shortcuts can be used if progressive transmission is not necessary.
When measuring speed please remember that these demo programs were written for
academic studies only, and were not fully optimized as are the commercial versions.
Applications
SPIHT exploits properties that are present in a wide variety of images. It had been
successfully tested in natural (portraits, landscape, weddings, etc.) and medical (X-ray, CT, etc)
images. Furthermore, its embedded coding process proved to be effective in a broad range of
reconstruction qualities. For instance, it can code fair-quality portraits and high-quality medical
images equally well (as compared with other methods in the same conditions).
SPIHT has also been tested for some less usual purposes, like the compression of
elevation maps, scientific data, and others.
Lossless Compression
SPIHT codes the individual bits of the image wavelet transform coefficients following a
bit-plane sequence. Thus, it is capable of recovering the image perfectly (every single bit of it)
by coding all bits of the transform. However, the wavelet transform yields perfect reconstruction
only if its numbers are stored as infinite-precision numbers. In practice it is frequently possible to
recover the image perfectly using rounding after recovery, but this is not the most efficient
approach.
For lossless compression we proposed an integer multiresolution transformation, similar
to the wavelet transform, which we called S+P transform. It solves the finite-precision problem
by carefully truncating the transform coefficients during the transformation (instead of after).
A codec that uses this transformation to yield efficient progressive transmission up to
lossless recovery is among the SPIHT demo programs. A surprising result obtained with this
codec is that for lossless compression it is as efficient as the most effective lossless encoders
(lossless JPEG is definitely not among them). In other words, the property that SPIHT yields
progressive transmission with practically no penalty in compression efficiency applies to lossless
compression too. Below are examples of Lossless and lossy (200:1) images decoded from the
same file.
Rate or Distortion Specification
Almost all image compression methods developed so far do not have precise rate control.
For some methods you specify a target rate, and the program tries to give something that is not
too far from what you wanted. For others you specify a "quality factor" and wait to see if the size
of the file fits your needs. (If not, just keep trying...)
The embedded coding property of SPIHT allows exact bit rate control, without any
penalty in performance (no bits wasted with padding, or whatever).
The same property also allows exact mean squared-error (MSE) distortion control. Even
though the MSE is not the best measure of image quality, it is far superior to other criteria used
for quality specification.
Error Protection
Errors in the compressed file cause havoc for practically all important image compression
methods. This is not exactly related to variable length entropy-coding, but to the necessity of
using context generation for efficient compression. For instance, Huffman codes have the ability
to quickly recover after an error. However, if it is used to code run-lengths, then that property is
useless because all runs after an error would be shifted.
SPIHT is not an exception for this rule. One difference, however, is that due to SPIHT's
embedded coding property, it is much easier to design efficient error-resilient schemes.
This happens because with embedded coding the information is sorted according to its
importance, and the requirement for powerful error correction codes decreases from the
beginning to the end of the compressed file. If an error is detected, but not corrected, the decoder
can discard the data after that point and still display the image obtained with the bits received
before the error. Also, with bit-plane coding the error effects are limited to below the previously
coded planes.
Another reason is that SPIHT generates two types of data. The first is sorting
information, which needs error protection as explained above. The second consists of
uncompressed sign and refinement bits, which do not need special protection because they affect
only one pixel.
While SPIHT can yield gains like 3 dB PSNR over methods like JPEG, its use in noisy
channels, combined with error protection as explained above, leads to much larger gains, like 6-
12 dB. (Such high coding gains are frequently viewed with skepticism, but they do make sense
for combined source-channel coding schemes.)
SPIHT Optimized Algorithm
The following are the suite of application specific SPIHT compression products:
D-SPIHT (Dynamic)
The D-SPIHT software is capable of the most efficient compression of monochrome, 1
and 2 byte per pel, and color images. It has the features of specifying bit rate or quality at
encoding time. For bit rate specification, the compressed file is completely and finely rate-
embedded. Rate-embedded means that any lower rate D-SPIHT file is a subset of the full file
and can be decoded to the given smaller rate. The decoding has the special feature of producing
smaller resolution image reconstructions (such as thumbnails) directly from the compressed file
without offline processing.
S-SPIHT (Striped)
The S-SPIHT software shares many features of D-SPIHT, but it works with a small
memory. The small memory leads to a slight degradation in performance compared to D-
SPIHT. Therefore, this software is ideal for compression of very large images, such as those
from GIS and remote sensing systems. It is both efficient and fast. A good strategy for
compression of image of any size is to use a combination of D-SPIHT and S-SPIHT, with S-
SPIHT taking over from D-SPIHT when an image dimension exceeds 1024 or 2048.
T-SPIHT (Tiled)
T-SPIHT is a specially designed form of SPIHT that compresses large images in tiles.
Remote sensing, geographical, and meteorological data are often handled in tiles rather than all
at once. Therefore, systems for processing such data would prefer to use a compression by tiles.
T-SPIHT can efficiently compress, with constant quality and without tile boundary artifacts,
large image or data sets, whether in tile format or not. In fact, the compression is significantly
more efficient than JPEG2000 in its tiled compression mode.
P-SPIHT (Photo-ID)
P-SPIHT is especially designed to efficiently compress small grayscale and color
images. It is more efficient in this application than JPEG2000 or any other competing software.
It is ideal for storage of photo ID’s on plastic cards with magnetic strips.
PROGRES (Progressive Resolution Decompression)
PROGRES is a progressive resolution, extremely fast version of SPIHT that has full
capability of random access decoding. It chooses a quality factor for lossy compression of any
size image and can decompress a given arbitrary region to any resolution very quickly. It comes
either in a command line version for WINDOWS/DOS or UNIX (or Linux) or with additional
region-of-interest (ROI) encoding and decoding in a UNIX (or Linux) GUI version. It is an
excellent choice for remote sensing and GIS applications, where rapid browsing of large images
is necessary.
PROGCODE (Lossless & Progressive)
PROGCODE is our pioneering progressive lossy to purely lossless (reversible) grey scale
compression algorithm. It sets the standard for lossless compression. For larger images, the
compression is still efficient, but the larger memory utilization slows down execution. This
algorithm can efficiently compress medical images either lossless or with any desired rate.
Moreover, the lossless file can act as an archive in a file server, from within which any desired
lossy lower rate file can be extracted and decompressed directly without offline processing. This
software is also ideal for multi-user storage or transmission systems, where users with different
capabilities and requirements can receive the image to their desired accuracy from a single file or
transmission.
SPM (Fastest Lossless!)
SPM software is especially designed for lossless and lossy compression of medical
images. The method is not based on SPIHT, but on our patented quadrature splitting algorithm
called AGP (for Amplitude and Group Partitioning).
SPM calls for a quality factor, with 0 giving perfectly lossless compression. Because the
intended application is medical images, the allowed lossy compression adheres to degrees of
high quality. The outstanding characteristics of SPM, besides efficiency, are very fast
compression and decompression, regardless of image size. It comes either in a command line
version for WINDOWS/DOS or UNIX (or Linux) or in a WINDOWS GUI version. All versions
allow reduced resolution decoding from a single compressed file.
IMAGE FUSION
Multi sensor Image fusion is the process of combining relevant information from two or
more images into a single image. The resulting image will be more informative than any of the
input images.
In remote sensing applications, the increasing availability of space borne sensors gives a
motivation for different image fusion algorithms. Several situations in image processing require
high spatial and high spectral resolution in a single image. Most of the available equipment is not
capable of providing such data convincingly. The image fusion techniques allow the integration
of different information sources. The fused image can have complementary spatial and spectral
resolution characteristics. However, the standard image fusion techniques can distort the spectral
information of the multispectral data while merging.
In satellite imaging, two types of images are available. The panchromatic image acquired
by satellites is transmitted with the maximum resolution available and the multispectral data are
transmitted with coarser resolution. This will usually be two or four times lower. At the receiver
station, the panchromatic image is merged with the multispectral data to convey more
information.
Many methods exist to perform image fusion. The very basic one is the high pass filtering
technique. Later techniques are based on DWT, uniform rational filter bank, and laplacian
pyramid.
Why Image Fusion
Multi sensor data fusion has become a discipline to which more and more general formal
solutions to a number of application cases are demanded. Several situations in image processing
simultaneously require high spatial and high spectral information in a single image. This is
important in remote sensing. However, the instruments are not capable of providing such
information either by design or because of observational constraints. One possible solution for
this is data fusion..
Standard Image Fusion Methods
Image fusion methods can be broadly classified into two - spatial domain fusion and
transform domain fusion.
The fusion methods such as averaging, Brovey method, principal component analysis
(PCA) and IHS based methods fall under spatial domain approaches. Another important spatial
domain fusion method is the high pass filtering based technique. Here the high frequency details
are injected into up sampled version of MS images. The disadvantage of spatial domain
approaches is that they produce spatial distortion in the fused image. Spectral distortion becomes
a negative factor while we go for further processing, such as classification problem. Spatial
distortion can be very well handled by transform domain approaches on image fusion. The multi
resolution analysis has become a very useful tool for analyzing remote sensing images. The
discrete wavelet transform has become a very useful tool for fusion. Some other fusion methods
are also there, such as Lapacian pyramid based, curvelet transform based etc. These methods
show a better performance in spatial and spectral quality of the fused image compared to other
spatial methods of fusion.
The images used in image fusion should already be registered. Mis registration is a major
source of error in image fusion. Some well-known image fusion methods are:
High pass filtering technique
IHS transform based image fusion
PCA based image fusion
Wavelet transform image fusion
pair-wise spatial frequency matching
Applications
1. Image Classification
2. Aerial and Satellite imaging
3. Medical imaging
4. Robot vision
5. Concealed weapon detection
6. Multi-focus image fusion
7. Digital camera application
8. Battle field monitoring
Satellite Image Fusion
Several methods are there for merging satellite images. In satellite imagery we can have two
types of images
Panchromatic images - An image collected in the broad visual wavelength range but
rendered in black and white.
Multispectral images - Images optically acquired in more than one spectral or
wavelength interval. Each individual image is usually of the same physical area and scale
but of a different spectral band.
The SPOT PAN satellite provides high resolution (10m pixel) panchromatic data. While the
LANDSAT TM satellite provides low resolution (30m pixel) multispectral images. Image fusion
attempts to merge these images and produce a single high resolution multispectral image.
The standard merging methods of image fusion are based on Red-Green-Blue (RGB) to
Intensity-Hue-Saturation (IHS) transformation. The usual steps involved in satellite image fusion
are as follows:
1. Resize the low resolution multispectral images to the same size as the panchromatic
image.
2. Transform the R,G and B bands of the multispectral image into IHS components.
3. Modify the panchromatic image with respect to the multispectral image. This is usually
performed by histogram matchingof the panchromatic image with Intensity component of
the multispectral images as reference.
4. Replace the intensity component by the panchromatic image and perform inverse
transformation to obtain a high resolution multispectral image.
An explanation of how to do Pan-sharpening in Photoshop.
Medical Image Fusion
Image fusion has become a common term used within medical diagnostics and treatment.
The term is used when multiple patient images are registered and overlaid or merged to provide
additional information. Fused images may be created from multiple images from the same
imaging modality, or by combining information from multiple modalities, such as magnetic
resonance image (MRI), computed tomography (CT), positron emission tomography (PET), and
single photon emission computed tomography (SPECT). In radiology and radiation oncology,
these images serve different purposes. For example, CT images are used more often to ascertain
differences in tissue density while MRI images are typically used to diagnose brain tumors.
For accurate diagnoses, radiologists must integrate information from multiple image
formats. Fused, anatomically-consistent images are especially beneficial in diagnosing and
treating cancer. Companies such as Nicesoft, Velocity Medical Solutions, Mirada Medical,
Keosys, MIMvista, IKOE, and BrainLAB have recently created image fusion software for both
improved diagnostic reading, and for use in conjunction with radiation treatment planning
systems. With the advent of these new technologies, radiation oncologists can take full
advantage of intensity modulated radiation therapy (IMRT). Being able to overlay diagnostic
images onto radiation planning images results in more accurate IMRT target tumor volumes.
RESTORATION
Image restoration refers to the recovery of an original signal from degraded
observations. Image restoration is different from image enhancement in that the latter is designed
to emphasize features of the image that make the image more pleasing to the observer, but not
necessarily to produce realistic data from a scientific point of view. Image enhancement
techniques (like contrast stretching or de-blurring by a nearest neighbor procedure) provided by
"Imaging packages" use no a priori model of the process that created the image.
With image enhancement noise can be effectively be removed by sacrificing some resolution, but
this is not acceptable in many applications. In a Fluorescence Microscope resolution in the z-
direction is bad as it is. More advanced image processing techniques must be applied to recover
the object.
DeConvolution is an example of image restoration method. It is capable of:
Increasing resolution especially in the axial direction
Removing noise
Increasing contrast
Since axial imaging performance is the prime reason for researchers to invest in expensive
optical equipment like confocal or two-photon excitation microscopes, the capability of
increasing axial resolution with a 'mere' software technique has considerable value.
In image restoration the information provided by the microscope is only taken as indirect
evidence about object. By itself the image needs not even to be viewable.
A microscopic image contains more information than is readily visible in the image.
Often details are hidden in the noise or masked by other features.
Artifacts may confuse the viewer.
Information may be present in implicit form, so it can only be retrieved with the addition
of a priory knowledge.
INTERPOLATION
In the mathematical field of numerical analysis, interpolation is a method of
constructing new data points within the range of a discrete set of known data points.
In engineering and science, one often has a number of data points, obtained by sampling
or experimentation, which represent the values of a function for a limited number of values of
the independent variable. It is often required to interpolate (i.e. estimate) the value of that
function for an intermediate value of the independent variable. This may be achieved by curve
fitting or regression analysis.
A different problem which is closely related to interpolation is the approximation of a
complicated function by a simple function. Suppose we know the formula for the function but it
is too complex to evaluate efficiently. Then we could pick a few known data points from the
complicated function, creating a lookup table, and try to interpolate those data points by
constructing a simpler function. Of course, when using the simple function to estimate new data
points we usually do not receive the same result as we would if we had used the original
function, but depending on the problem domain and the interpolation method used the gain in
simplicity might offset the error.
There is also another very different kind of interpolation in mathematics, namely the
"interpolation of operators".
A Lifting Scheme Version of the Daubechies (D4) Transform
Lifting scheme wavelet transforms are composed of update and predict steps. In this case a
normalization step has been added as well.
Fig.Two stage Daubechies D4 Forward lifting Wavelet Transform(LWT)
The split step divides the input data into even elements which are stored in the first half of an N
element array section ( S0 to Shalf-1) and odd elements which are stored in the second half of an
N element array section (Shalf to SN -1). In the forward transform equations below the
expression S [half+n] references an odd element and S[n] references an even element. The LL
represents the low frequency components and LH,HL, HH are the high frequency components in
the horizontal, vertical and diagonal directions.
The forward step equations areUpdate1 (U1):
For n= 0 to half -1
S[n] = S[n] + 3S[half + n]
Predict (P1):
for n=1 to half -1
Update2 (U2):
for n=0 to half -2
Normalize (N):
for n=0 to half-1
the inverse transform is a mirror of the forward transform, inwhich addition and subtraction
operations interchanged.
MATHEMATICAL FORMULATION FOR THE SUPER RESOLUTION MODEL
we give the mathematical model for super resolution image reconstruction from a set of
Low Resolution (LR) images. LR means that the pixel density with an image is less therefore
offering fewer details. The CCD discretizes the images and produces a digitized noisy image.
Imaging systems do not sample the scene according to Nyquist criterion. As a result of this the
high frequency contents of images are destroyed and appear as LR. Let us consider the low
resolution sensor plane byM1 by M2 . The low resolution intensity values are denoted as {y(i,j)}
where i = 0...M1 -1 and j = 0...M2 -1 ; if the down sampling parameters are q1 and q2 in
horizontal and vertical directions. Then the high resolution image will be of size q1M1×q2M2 .
We assume that q1=q2 =q and therefore the desired high resolution image Z will have
intensity values {Z (k, l)}where k=0...qM1-1 and l=0...qM2 -1.Given {z (k, l )} , the process of
obtaining down sampled LR aliased image {y(i, j)} is
i.e. the low resolution intensity is the average of high resolution intensities over a
neighborhood of q2 pixels. We formally state the problem by casting it in a low resolution
restoration frame work. There are P observed images { }p 1ym m= each of size M1 ×M2 which
are decimated, blurred and noisy versions of a single high resolution image Z of size N1 × N 2
where N1 = qM1 and N2 = qM2 . After incorporating the blur matrix, and noise vector, the image
formation model is written as
Where m=1…P Here D is the decimation matrix of size 2M1M2 × q M1M2 ,H is PSF of
size M1M2 ×M1M2 , m η is M1M2 ×1 noise vector and P is the number of low resolution
observations. Stacking P vector equations from different low resolution images into a single
matrix vector.
SPHIT ALGORITHM
Set Partitioning in Hierarchical Trees (SPIHT) is an imagecompression algorithm that
exploits the inherent similarities across the sub bands in a wavelet decomposition of an image.
The algorithm codes the most important wavelet transform coefficients first, and transmits the
bits so that an increasingly refined copy of the original image can be
obtain progressively. SPIHT consists of two passes, the ordering pass and the refinement pass. In
the ordering pass SPIHTattempts to order the coefficients according to theirmagnitude. In the
refinement pass the quantization ofcoefficients is refined. The ordering and refining is
maderelative to a threshold. The threshold is appropriatelyinitialized and then continuously made
smaller with eachround of the algorithm. SPIHT maintains three lists ofcoordinates of
coefficients in the decomposition. These arethe List of Insignificant Pixels (LIP), the List of
SignificantPixels (LSP) and the List of Insignificant Sets (LIS).
To decide if a coefficient is significant or not SPIHT uses thefollowing definition. A
coefficient is deemed significant at a certain threshold if its magnitude is larger then or equal
tothe threshold. Using the notion of significance the LIP, LIS and LSP can be explained. The LIP
contains coordinates of coefficients that are insignificant at the current threshold; The LSP
contains the coordinates of coefficients that are significant at the same threshold. The LIS
contains coordinates of the roots of the spatial parent-children trees.
PROPOSED SUPER RESOLUTION RECONSTRUCTION
Images are obtained in areas ranging from everyday photography to astronomy, remote
sensing, medical imaging and microscopy. In each case there is an underlying objector scene we
wish to observe, the original or the true image is the ideal representation of the observed scene.
Yet the observation process is never perfect, there is uncertainty in the measurement occurring as
blur, noise and other degradations in the recorded images. The low resolution observation model
of eq. (2) is considered. In our proposed approach we suggest a general frame work for super
resolution reconstruction using lifting wavelet transforms. The proposed algorithm is based on
breaking the problem into three consecutive steps to work on large dimension images. These
steps are Registration; lifting scheme based fusion, restoration step.
A. Algorithm for Super Resolution Reconstruction using lifting scheme and SPHIT.
Step 1: Three input low resolution blurred, noisy, under sampled, rotated, shifted and
compressed images are considered.
Step 2: The images are first preprocessed, i.e. registered using FFT based algorithm, as proposed
by [ 11].
Step 3: The registered low resolution images are decomposed using lifting schemes to a specified
number of levels. At each level we will have one approximation i.e. LL sub band and 3 detail sub
bands, i.e. LH, HL, HH coefficients.
Step 4: Each low resolution image is encoded using SPHIT.
Step 5: The decomposed images are fused using the fusion rule i.e. Maximum Frequency
Fusion:“fusion by averaging for each band of decomposition and for each channel the wavelets
coefficients of the three images are averaged”. That is approximate and detail coefficients re
fused separately
Step 6: Inverse lifting scheme is applied to obtain the fusedimage
Step 7: The fused image is decoded using DSPHIT.
Step 8: Restoration is performed in order to remove the blur and noise present in the image.
Step 9: Most of the additive noise will be eliminated during the fusion process, to achieve further
noise reduction soft thresolding is applied, where as the image is deblurred using Iterative Blind
De-convolution Algorithm (IBD) .
Step 10: Finally a super resolution image is obtained by wavelet based Interpolation
SIMULATION RESULTS
A (256x 256) image is considered to be the original high resolution image , from which
three noisy, blurred, under sampled, compressed and mis registered images LR images are
created. We have tested using both motion blur with an angel of (10, 20 and 30) and Gaussian
blur of 3x3, 5x5 and 7x7 is considered. Gaussian white noise with SNR (20, 30 and 40 dB) is
added to the blurred low resolution images. A high resolution image of size 512 x 512 is
constructed from the LR images using LWT and SPIHT. Table 1 shows the comparison of the
lifting scheme based super resolution reconstruction with DWT based super resolution
reconstruction Daubechies (D4) wavelets.
SUPER RESOLUTION PERFORMANCE MEASUREMENT
In order to measure the performance of super resolution algorithm, we have used the
objective image quality measure such as MSE , PSNR and ISNR (Improvement in Signal to
Noise Ratio).
1) Improvement in Signal-to-Noise Ratio (ISNR)
For the purpose of objectively testing the performance of there stored image, Improvement in
signal to noise ratio (ISNR) is used as the criteria which is defined by
Where j and i are the total number of pixels in the horizontal and vertical dimensions of the
image; f(i, j), y(i, j) and and g(i, j) are the original, degraded and the restored image.
2) The MSE of the reconstructed image is
Where f(i, j) is the source image F(I,J) is the reconstructed
image, which contains N x N pixels
CONCLUSION
Super resolution techniques have proved to be useful in may different applications in terms of
quality and computational complexity. A hardware and software implementation of SRR is
possible. SRR can be in corporated as a feature in video editing software, cellular networks and
video sitessuch as You Tube could utilize super resolution capabilitiesto improve the quality of
video clips taken by cell phones.The advantage of the proposed method is that lifting schemes is
used for the reconstruction of a high resolution image. Wavelet lifting schemes are faster,
easierto implement, the inverse transform has exactly the same complexity as the forward
transform, requires less memory,and can be used on arbitrary geometries. We have tested onboth
real time and synthetic images and found that our proposed super resolution reconstruction has
better ISNR and PSNR when compared to DWT based super resolution reconstruction. In our
proposed approach, we are using FFT based image registration to register the rotated shifted
images, and we are using SPHIT algorithm for encoded and DSPHIT algorithm for decoding and
finally we are performing iterative blind deconvolution to deblur theimage. Hence our proposed
super resolution reconstruction out performs when compared to the super resolution
reconstruction as proposed by .
REFERENCE
[1] S.Susan Young, Ronal G.Diggers, Eddie L.Jacobs, “ Signal Processing and performance
Analysis for imaging Systems”,ARTEC HOUSE, INC 2008.
[2] S. Chaudhuri, Ed., Super-Resolution Imaging. Norwell, A:Kluwer, 2001
[3] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: A technical
review,” IEEE Signal Processing Mag., vol. 20, pp. 21–36, May 2003
[4] D. Sale, R. R. Schultz, and R. J. Szczerba, “Super-resolution enhancement of night vision
image sequences,” in Proceedings of the IEEE International Conference on Systems, Man, and
Cybernetics,vol. 3, pp. 1633–1638,Nashville, Tenn, USA, October 2000.
[5] B. K. Gunturk, A. U. Batur, Y. Altunbasak, M. H. Hayes III, and R. M. Mersereau,
“Eigenface-based super- resolution for face recognition,” in Proceedings of the IEEE
International Conference. on Image Processing (ICIP ’02), vol. 2, pp. 845– 848,Rochester, NY,
USA, September 2002
[6] Priyam chatterjee, Sujata Mukherjee, Subasis Chaudhuri and Guna Setharaman,” Application
of Papoulis – Gerchberg Method in Image Super Resolution and Inpainting”,The Computer
Journal Vol 00 No 0, 2007
[7] G.R.Ayer and J.C.Danity, “ Iterative Blind Deconvolution”, July 1988, vol13, No 7, optics
letters.
[8] Barbara ZItova, “J.Flusser , “Image Registration: Survey”, Image and vision computing,
21(2003) , Elsevier publications.
[9] Hao Yang, Jianpo Gao and Zhenyang Wu, “Blur Identification and Image Super Resolution
Reconstruction using an approach similar to variable Projection” IEEE Signal Processing Letters,
Vol 15, 2008.
[10] A.Jensen, A.la Cour-Hardo, “Ripples in Mathematics” Springer publications.
[11]E.De.Castro and C.Morandi,” Registartion of translated and rotated images using FFT” IEEE
trans on pattern analysis and machine Intelligence vol 9, No 5, sep 1987.
[12] Wi Zhang, Jinzhong Yang, Xiahong Wang and Qinghu Yang, “ Fusion of remote sensing
images based on lifting wavelet transform”,Computer and Information science journal , vol 2,
no1 Feb 2009.
[13] N.K.Bose, Mahesh B.Chappali, “A second generation wavelet framework for super
resolution with noise filtering” 2004 Wiley periodicals.
[14] Wim Sweldens , “ the lifting scheme a construction of seconds generation wavlets”, SIAM
journal.
[15] W. Lim, M. Park, and M.G. Kang, “Spatially adaptive regularized iterative high resolution
image reconstruction algorithm,” in Proc. VCIP2001,Photonicswest, San Jose, CA, Jan. 2001,
pp. 20-26.
[16] Khalid Sayood, “Introduction to Data Compression” , Third Edition”.
[ 17] Ping-ISng Tsai, Tinku Acharaya , “ Image upsampling using DWT “.
Appendix A
MATLAB
A.1 Introduction
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where problems and
solutions are expressed in familiar mathematical notation. MATLAB stands for matrix
laboratory, and was written originally to provide easy access to matrix software developed by
LINPACK (linear system package) and EISPACK (Eigen system package) projects. MATLAB
is therefore built on a foundation of sophisticated matrix software in which the basic element is
array that does not require pre dimensioning which to solve many technical computing problems,
especially those with matrix and vector formulations, in a fraction of time.
MATLAB features a family of applications specific solutions called toolboxes. Very
important to most users of MATLAB, toolboxes allow learning and applying specialized
technology. These are comprehensive collections of MATLAB functions (M-files) that extend
the MATLAB environment to solve particular classes of problems. Areas in which toolboxes are
available include signal processing, control system, neural networks, fuzzy logic, wavelets,
simulation and many others.
Typical uses of MATLAB include: Math and computation, Algorithm development, Data
acquisition, Modeling, simulation, prototyping, Data analysis, exploration, visualization,
Scientific and engineering graphics, Application development, including graphical user interface
building.
A.2 Basic Building Blocks of MATLAB
The basic building block of MATLAB is MATRIX. The fundamental data type is the
array. Vectors, scalars, real matrices and complex matrix are handled as specific class of this
basic data type. The built in functions are optimized for vector operations. No dimension
statements are required for vectors or arrays.
A.2.1 MATLAB Window
The MATLAB works based on five windows: Command window, Workspace window,
Current directory window, Command history window, Editor Window, Graphics window and
Online-help window.
A.2.1.1 Command Window
The command window is where the user types MATLAB commands and expressions at
the prompt (>>) and where the output of those commands is displayed. It is opened when the
application program is launched. All commands including user-written programs are typed in
this window at MATLAB prompt for execution.
A.2.1.2 Work Space Window
MATLAB defines the workspace as the set of variables that the user creates in a work
session. The workspace browser shows these variables and some information about them.
Double clicking on a variable in the workspace browser launches the Array Editor, which can be
used to obtain information.
A.2.1.3 Current Directory Window
The current Directory tab shows the contents of the current directory, whose path is
shown in the current directory window. For example, in the windows operating system the path
might be as follows: C:\MATLAB\Work, indicating that directory “work” is a subdirectory of
the main directory “MATLAB”; which is installed in drive C. Clicking on the arrow in the
current directory window shows a list of recently used paths. MATLAB uses a search path to
find M-files and other MATLAB related files. Any file run in MATLAB must reside in the
current directory or in a directory that is on search path.
A.2.1.4 Command History Window
The Command History Window contains a record of the commands a user has entered in
the command window, including both current and previous MATLAB sessions. Previously
entered MATLAB commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands. This is useful to select
various options in addition to executing the commands and is useful feature when experimenting
with various commands in a work session.
A.2.1.5 Editor Window
The MATLAB editor is both a text editor specialized for creating M-files and a graphical
MATLAB debugger. The editor can appear in a window by itself, or it can be a sub window in
the desktop. In this window one can write, edit, create and save programs in files called M-files.
MATLAB editor window has numerous pull-down menus for tasks such as saving,
viewing, and debugging files. Because it performs some simple checks and also uses color to
differentiate between various elements of code, this text editor is recommended as the tool of
choice for writing and editing M-functions.
A.2.1.6 Graphics or Figure Window
The output of all graphic commands typed in the command window is seen in this
window.
A.2.1.7 Online Help Window
MATLAB provides online help for all it’s built in functions and programming language
constructs. The principal way to get help online is to use the MATLAB help browser, opened as
a separate window either by clicking on the question mark symbol (?) on the desktop toolbar, or
by typing help browser at the prompt in the command window. The help Browser is a web
browser integrated into the MATLAB desktop that displays a Hypertext Markup Language
(HTML) documents. The Help Browser consists of two panes, the help navigator pane, used to
find information, and the display pane, used to view the information. Self-explanatory tabs other
than navigator pane are used to perform a search.
A.3 MATLAB Files
MATLAB has three types of files for storing information. They are: M-files and MAT-
files.
A.3.1 M-Files
These are standard ASCII text file with ‘m’ extension to the file name and creating own
matrices using M-files, which are text files containing MATLAB code. MATLAB editor or
another text editor is used to create a file containing the same statements which are typed at the
MATLAB command line and save the file under a name that ends in .m. There are two types of
M-files:
1. Script Files
It is an M-file with a set of MATLAB commands in it and is executed by typing name of
file on the command line. These files work on global variables currently present in that
environment.
2. Function Files
A function file is also an M-file except that the variables in a function file are all local.
This type of files begins with a function definition line.
A.3.2 MAT-Files
These are binary data files with .mat extension to the file that are created by MATLAB
when the data is saved. The data written in a special format that only MATLAB can read. These
are located into MATLAB with ‘load’ command.
A.4 the MATLAB System:
The MATLAB system consists of five main parts:
A.4.1 Development Environment:
This is the set of tools and facilities that help you use MATLAB functions and files. Many
of these tools are graphical user interfaces. It includes the MATLAB desktop and Command
Window, a command history, an editor and debugger, and browsers for viewing help, the
workspace, files, and the search path.
A.4.2 the MATLAB Mathematical Function:
This is a vast collection of computational algorithms ranging from elementary functions
like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix
inverse, matrix eigen values, Bessel functions, and fast Fourier transforms.
A.4.3 the MATLAB Language:
This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both "programming
in the small" to rapidly create quick and dirty throw-away programs, and "programming in the
large" to create complete large and complex application programs.
A.4.4 Graphics:
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well as
annotating and printing these graphs. It includes high-level functions for two-dimensional and
three-dimensional data visualization, image processing, animation, and presentation graphics. It
also includes low-level functions that allow you to fully customize the appearance of graphics as
well as to build complete graphical user interfaces on your MATLAB applications.
A.4.5 the MATLAB Application Program Interface (API):
This is a library that allows you to write C and FORTRAN programs that interact with
MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling
MATLAB as a computational engine, and for reading and writing MAT-files.
A.5 SOME BASIC COMMANDS:
pwd prints working directory
Demo demonstrates what is possible in Mat lab
Who lists all of the variables in your Mat lab workspace?
Whose list the variables and describes their matrix size
clear erases variables and functions from memory
clear x erases the matrix 'x' from your workspace
close by itself, closes the current figure window
figure creates an empty figure window
hold on holds the current plot and all axis properties so that subsequent graphing
commands add to the existing graph
hold off sets the next plot property of the current axes to "replace"
find find indices of nonzero elements e.g.:
d = find(x>100) returns the indices of the vector x that are greater than 100
break terminate execution of m-file or WHILE or FOR loop
for repeat statements a specific number of times, the general form of a FOR
statement is:
FOR variable = expr, statement, ..., statement END
for n=1:cc/c;
magn(n,1)=NaNmean(a((n-1)*c+1:n*c,1));
end
diff difference and approximate derivative e.g.:
DIFF(X) for a vector X, is [X(2)-X(1) X(3)-X(2) ... X(n)-X(n-1)].
NaN the arithmetic representation for Not-a-Number, a NaN is obtained as a
result of mathematically undefined operations like 0.0/0.0
INF the arithmetic representation for positive infinity, a infinity is also produced
by operations like dividing by zero, e.g. 1.0/0.0, or from overflow, e.g. exp(1000).
save saves all the matrices defined in the current session into the file,
matlab.mat, located in the current working directory
load loads contents of matlab.mat into current workspace
save filename x y z saves the matrices x, y and z into the file titled filename.mat
save filename x y z /ascii save the matrices x, y and z into the file titled filename.dat
load filename loads the contents of filename into current workspace; the file can
be a binary (.mat) file
load filename.dat loads the contents of filename.dat into the variable filename
xlabel(‘ ’) : Allows you to label x-axis
ylabel(‘ ‘) : Allows you to label y-axis
title(‘ ‘) : Allows you to give title for
plot
subplot() : Allows you to create multiple
plots in the same window
A.6 SOME BASIC PLOT COMMANDS:
Kinds of plots:
plot(x,y) creates a Cartesian plot of the vectors x & y
plot(y) creates a plot of y vs. the numerical values of the elements in the y-vector
semilogx(x,y) plots log(x) vs y
semilogy(x,y) plots x vs log(y)
loglog(x,y) plots log(x) vs log(y)
polar(theta,r) creates a polar plot of the vectors r & theta where theta is in radians
bar(x) creates a bar graph of the vector x. (Note also the command stairs(x))
bar(x, y) creates a bar-graph of the elements of the vector y, locating the bars
according to the vector elements of 'x'
Plot description:
grid creates a grid on the graphics plot
title('text') places a title at top of graphics plot
xlabel('text') writes 'text' beneath the x-axis of a plot
ylabel('text') writes 'text' beside the y-axis of a plot
text(x,y,'text') writes 'text' at the location (x,y)
text(x,y,'text','sc') writes 'text' at point x,y assuming lower left corner is (0,0)
and upper right corner is (1,1)
axis([xmin xmax ymin ymax]) sets scaling for the x- and y-axes on the current plot
A.7 ALGEBRIC OPERATIONS IN MATLAB:
Scalar Calculations:
+ Addition
- Subtraction
* Multiplication
/ Right division (a/b means a ÷ b)
\ left division (a\b means b ÷ a)
^ Exponentiation
For example 3*4 executed in 'matlab' gives ans=12
4/5 gives ans=0.8
Array products: Recall that addition and subtraction of matrices involved
addition or subtraction of the individual elements of the matrices. Sometimes it is desired to
simply multiply or divide each element of an matrix by the corresponding element of another
matrix 'array operations”.
Array or element-by-element operations are executed when the operator is preceded by a '.'
(Period):
a .* b multiplies each element of a by the respective element of b
a ./ b divides each element of a by the respective element of b
a .\ b divides each element of b by the respective element of a
a .^ b raise each element of a by the respective b element
A.8 MATLAB WORKING ENVIRONMENT:
A.8.1 MATLAB DESKTOP
Matlab Desktop is the main Matlab application window. The desktop contains five sub
windows, the command window, the workspace browser, the current directory window, the
command history window, and one or more figure windows, which are shown only when the
user displays a graphic.
The command window is where the user types MATLAB commands and expressions at
the prompt (>>) and where the output of those commands is displayed. MATLAB defines the
workspace as the set of variables that the user creates in a work session.
The workspace browser shows these variables and some information about them. Double
clicking on a variable in the workspace browser launches the Array Editor, which can be used to
obtain information and income instances edit certain properties of the variable.
The current Directory tab above the workspace tab shows the contents of the current
directory, whose path is shown in the current directory window.
For example, in the windows operating system the path might be as follows: C:\
MATLAB\Work, indicating that directory “work” is a subdirectory of the main directory
“MATLAB”; WHICH IS INSTALLED IN DRIVE C. clicking on the arrow in the current
directory window shows a list of recently used paths. Clicking on the button to the right of the
window allows the user to change the current directory.
MATLAB uses a search path to find M-files and other MATLAB related files, which are
organize in directories in the computer file system. Any file run in MATLAB must reside in the
current directory or in a directory that is on search path. By default, the files supplied with
MATLAB and math works toolboxes are included in the search path. The easiest way to see
which directories are soon the search path, or to add or modify a search path, is to select set path
from the File menu the desktop, and then use the set path dialog box. It is good practice to add
any commonly used directories to the search path to avoid repeatedly having the change the
current directory.
The Command History Window contains a record of the commands a user has entered in
the command window, including both current and previous MATLAB sessions. Previously
entered MATLAB commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands.
This action launches a menu from which to select various options in addition to executing
the commands. This is useful to select various options in addition to executing the commands.
This is a useful feature when experimenting with various commands in a work session.
A.8.2 Using the MATLAB Editor to create M-Files:
The MATLAB editor is both a text editor specialized for creating M-files and a graphical
MATLAB debugger. The editor can appear in a window by itself, or it can be a sub window in
the desktop. M-files are denoted by the extension .m, as in pixelup.m.
The MATLAB editor window has numerous pull-down menus for tasks such as saving,
viewing, and debugging files. Because it performs some simple checks and also uses color to
differentiate between various elements of code, this text editor is recommended as the tool of
choice for writing and editing M-functions.
To open the editor , type edit at the prompt opens the M-file filename.m in an editor
window, ready for editing. As noted earlier, the file must be in the current directory, or in a
directory in the search path.
A.8.3 Getting Help:
The principal way to get help online is to use the MATLAB help browser, opened as a
separate window either by clicking on the question mark symbol (?) on the desktop toolbar, or by
typing help browser at the prompt in the command window. The help Browser is a web browser
integrated into the MATLAB desktop that displays a Hypertext Markup Language(HTML)
documents. The Help Browser consists of two panes, the help navigator pane, used to find
information, and the display pane, used to view the information. Self-explanatory tabs other than
navigator pane are used to perform a search.
Appendix B
INTRODUCTION TO DIGITAL IMAGE PROCESSING
What is DIP?
An image may be defined as a two-dimensional function f(x, y), where x & y are
spatial coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the intensity
or gray level of the image at that point. When x, y & the amplitude values of f are all finite
discrete quantities, we call the image a digital image. The field of DIP refers to processing digital
image by means of digital computer. Digital image is composed of a finite number of elements,
each of which has a particular location & value. The elements are called pixels.
Vision is the most advanced of our sensor, so it is not surprising that image play the single
most important role in human perception. However, unlike humans, who are limited to the visual
band of the EM spectrum imaging machines cover almost the entire EM spectrum, ranging from
gamma to radio waves. They can operate also on images generated by sources that humans are
not accustomed to associating with image.
There is no general agreement among authors regarding where image processing stops &
other related areas such as image analysis& computer vision start. Sometimes a distinction is
made by defining image processing as a discipline in which both the input & output at a process
are images. This is limiting & somewhat artificial boundary. The area of image analysis (image
understanding) is in between image processing & computer vision.
There are no clear-cut boundaries in the continuum from image processing at one end to
complete vision at the other. However, one useful paradigm is to consider three types of
computerized processes in this continuum: low-, mid-, & high-level processes. Low-level
process involves primitive operations such as image processing to reduce noise, contrast
enhancement & image sharpening. A low- level process is characterized by the fact that both its
inputs & outputs are images. Mid-level process on images involves tasks such as segmentation,
description of that object to reduce them to a form suitable for computer processing &
classification of individual objects. A mid-level process is characterized by the fact that its inputs
generally are images but its outputs are attributes extracted from those images. Finally higher-
level processing involves “Making sense” of an ensemble of recognized objects, as in image
analysis & at the far end of the continuum performing the cognitive functions normally
associated with human vision.
Digital image processing, as already defined is used successfully in a broad range of
areas of exceptional social & economic value.
What is an image?
An image is represented as a two dimensional function f(x, y) where x and y are spatial co-
ordinates and the amplitude of ‘f’ at any pair of coordinates (x, y) is called the intensity of the
image at that point.
Gray scale image:
A grayscale image is a function I (xylem) of the two spatial coordinates of the image
plane.
I(x, y) is the intensity of the image at the point (x, y) on the image plane.
I (xylem) takes non-negative values assume the image is bounded by a rectangle [0, a] [0, b]I:
[0, a] [0, b] [0, info)
Color image:
It can be represented by three functions, R (xylem) for red, G (xylem) for green and
B (xylem) for blue.
An image may be continuous with respect to the x and y coordinates and also in
amplitude. Converting such an image to digital form requires that the coordinates as well as the
amplitude to be digitized. Digitizing the coordinate’s values is called sampling. Digitizing the
amplitude values is called quantization.
Coordinate convention:
The result of sampling and quantization is a matrix of real numbers. We use two principal
ways to represent digital images. Assume that an image f(x, y) is sampled so that the resulting
image has M rows and N columns. We say that the image is of size M X N. The values of the
coordinates (xylem) are discrete quantities. For notational clarity and convenience, we use
integer values for these discrete coordinates. In many image processing books, the image origin
is defined to be at (xylem)=(0,0).
The next coordinate values along the first row of the image are (xylem)=(0,1).It is
important to keep in mind that the notation (0,1) is used to signify the second sample along the
first row. It does not mean that these are the actual values of physical coordinates when the
image was sampled. Following figure shows the coordinate convention. Note that x ranges from
0 to M-1 and y from 0 to N-1 in integer increments.
The coordinate convention used in the toolbox to denote arrays is different from the
preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox uses the
notation (race) to indicate rows and columns. Note, however, that the order of coordinates is the
same as the order discussed in the previous paragraph, in the sense that the first element of a
coordinate topples, (alb), refers to a row and the second to a column. The other difference is that
the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1 to M and c from 1 to
N in integer increments. IPT documentation refers to the coordinates. Less frequently the toolbox
also employs another coordinate convention called spatial coordinates which uses x to refer to
columns and y to refers to rows. This is the opposite of our use of variables x and y.
Image as Matrices
The preceding discussion leads to the following representation for a digitized image
function:
f (0, 0) f (0, 1) ……….. f (0, N-1)
f (1, 0) f (1, 1) ………… f (1, N-1)
f (xylem) = . . .
. . .
f (M-1, 0) f (M-1, 1) ………… f (M-1, N-1)
The right side of this equation is a digital image by definition. Each element of this
array is called an image element, picture element, pixel or pel. The terms image and pixel are
used throughout the rest of our discussions to denote a digital image and its elements.
A digital image can be represented naturally as a MATLAB matrix:
f (1, 1) f (1, 2) ……. f (1, N)
f (2, 1) f (2, 2) …….. f (2, N)
. . .
f = . . .
f (M, 1) f (M, 2) …….f (M, N)
Where f (1, 1) = f (0, 0) (note the use of a monoscope font to denote MATLAB
quantities). Clearly the two representations are identical, except for the shift in origin. The
notation f (p, q) denotes the element located in row p and the column q. For example f (6, 2) is
the element in the sixth row and second column of the matrix f. typically we use the letters M
and N respectively to denote the number of rows and columns in a matrix. A 1xN matrix is
called a row vector whereas an Mx1 matrix is called a column vector. A 1x1 matrix is a scalar.
Matrices in MATLAB are stored in variables with names such as A, a, RGB, real array and
so on. Variables must begin with a letter and contain only letters, numerals and underscores. As
noted in the previous paragraph, all MATLAB quantities are written using monoscope
characters. We use conventional Roman, italic notation such as f(x, y), for mathematical
expressions
6.5 Reading Images:
Images are read into the MATLAB environment using function imread whose syntax is
imread(‘filename’)
Format name Description recognized extension
TIFF Tagged Image File Format .tif, .tiff
JPEG Joint Photograph Experts Group .jpg, .jpeg
GIF Graphics Interchange Format .gif
BMP Windows Bitmap .bmp
PNG Portable Network Graphics .png
XWD X Window Dump .xwd
Here filename is a spring containing the complete of the image file(including any
applicable extension).For example the command line
>> f = imread (‘8. jpg’);
reads the JPEG (above table) image chestxray into image array f. Note the use of single quotes
(‘) to delimit the string filename. The semicolon at the end of a command line is used by
MATLAB for suppressing output. If a semicolon is not included. MATLAB displays the results
of the operation(s) specified in that line. The prompt symbol(>>) designates the beginning of a
command line, as it appears in the MATLAB command window.
When as in the preceding command line no path is included in filename, imread reads the
file from the current directory and if that fails it tries to find the file in the MATLAB search path.
The simplest way to read an image from a specified directory is to include a full or relative path
to that directory in filename.
For example,
>> f = imread ( ‘D:\myimages\chestxray.jpg’);
reads the image from a folder called my images on the D: drive, whereas
>> f = imread(‘ . \ myimages\chestxray .jpg’);
reads the image from the my images subdirectory of the current of the current working
directory. The current directory window on the MATLAB desktop toolbar displays MATLAB’s
current working directory and provides a simple, manual way to change it. Above table lists
some of the most of the popular image/graphics formats supported by imread and imwrite.
Function size gives the row and column dimensions of an image:
>> size (f)
ans = 1024 * 1024
This function is particularly useful in programming when used in the following form to
determine automatically the size of an image:
>>[M,N]=size(f);
This syntax returns the number of rows(M) and columns(N) in the image.
The whole function displays additional information about an array. For instance ,the
statement
>> whos f
gives
Name size Bytes Class
F 1024*1024 1048576 unit8 array
Grand total is 1048576 elements using 1048576 bytes
The unit8 entry shown refers to one of several MATLAB data classes. A semicolon at the
end of a whose line has no effect ,so normally one is not used.
6.6 Displaying Images:
Images are displayed on the MATLAB desktop using function imshow, which has the basic
syntax:
imshow(f,g)
Where f is an image array, and g is the number of intensity levels used to display it.
If g is omitted ,it defaults to 256 levels .using the syntax
Imshow (f, {low high})
Displays as black all values less than or equal to low and as white all values greater
than or equal to high. The values in between are displayed as intermediate intensity values using
the default number of levels .Finally the syntax
Imshow(f,[ ])
Sets variable low to the minimum value of array f and high to its maximum value.
This form of imshow is useful for displaying images that have a low dynamic range or that have
positive and negative values.
Function pixval is used frequently to display the intensity values of individual pixels
interactively. This function displays a cursor overlaid on an image. As the cursor is moved over
the image with the mouse the coordinates of the cursor position and the corresponding intensity
values are shown on a display that appears below the figure window .When working with color
images, the coordinates as well as the red, green and blue components are displayed. If the left
button on the mouse is clicked and then held pressed, pixval displays the Euclidean distance
between the initial and current cursor locations.
The syntax form of interest here is Pixval which shows the cursor on the last image
displayed. Clicking the X button on the cursor window turns it off.
The following statements read from disk an image called rose_512.tif extract basic
information about the image and display it using imshow :
>>f=imread(‘rose_512.tif’);
>>whos f
Name Size Bytes Class
F 512*512 262144 unit8 array
Grand total is 262144 elements using 262144 bytes
>>imshow(f)
A semicolon at the end of an imshow line has no effect, so normally one is not used.
If another image,g, is displayed using imshow, MATLAB replaces the image in the screen with
the new image. To keep the first image and output a second image, we use function figure as
follows:
>>figure ,imshow(g)
Using the statement
>>imshow(f),figure ,imshow(g) displays both images.
Note that more than one command can be written on a line ,as long as different commands
are properly delimited by commas or semicolons. As mentioned earlier, a semicolon is used
whenever it is desired to suppress screen outputs from a command line.
Suppose that we have just read an image h and find that using imshow produces the image.
It is clear that this image has a low dynamic range, which can be remedied for display purposes
by using the statement.
>>imshow(h,[ ])
6.7 WRITING IMAGES:
Images are written to disk using function imwrite, which has the following basic syntax:
Imwrite (f,’filename’)
With this syntax, the string contained in filename must include a recognized file format
extension .Alternatively, the desired format can be specified explicitly with a third input
argument. >>imwrite(f,’patient10_run1’,’tif’)
Or alternatively
For example the following command writes f to a TIFF file named patient10_run1:
>>imwrite(f,’patient10_run1.tif’)
If filename contains no path information, then imwrite saves the file in the current working
directory.
The imwrite function can have other parameters depending on e file format selected. Most
of the work in the following deals either with JPEG or TIFF images ,so we focus attention here
on these two formats.
More general imwrite syntax applicable only to JPEG images is
imwrite(f,’filename.jpg,,’quality’,q)
where q is an integer between 0 and 100(the lower number the higher the degradation due to
JPEG compression).
For example, for q=25 the applicable syntax is
>> imwrite(f,’bubbles25.jpg’,’quality’,25)
The image for q=15 has false contouring that is barely visible, but this effect becomes quite
pronounced for q=5 and q=0.Thus, an expectable solution with some margin for error is to
compress the images with q=25.In order to get an idea of the compression achieved and to obtain
other image file details, we can use function imfinfo which has syntax.
Imfinfo filename
Here filename is the complete file name of the image stored in disk.
For example,
>> imfinfo bubbles25.jpg
outputs the following information(note that some fields contain no information in this case):
Filename: ‘bubbles25.jpg’
FileModDate: ’04-jan-2003 12:31:26’
FileSize: 13849
Format: ‘jpg’
Format Version: ‘ ‘
Width: 714
Height: 682
Bit Depth: 8
Color Depth: ‘grayscale’
Format Signature: ‘ ‘
Comment: { }
Where file size is in bytes. The number of bytes in the original image is corrupted simply
by multiplying width by height by bit depth and dividing the result by 8. The result is
486948.Dividing this file size gives the compression ratio:(486948/13849)=35.16.This
compression ratio was achieved. While maintaining image quality consistent with the
requirements of the appearance. In addition to the obvious advantages in storage space, this
reduction allows the transmission of approximately 35 times the amount of un compressed data
per unit time.
The information fields displayed by imfinfo can be captured in to a so called structure
variable that can be for subsequent computations. Using the receding an example and assigning
the name K to the structure variable.
We use the syntax >>K=imfinfo(‘bubbles25.jpg’);
To store in to variable K all the information generated by command imfinfo, the
information generated by imfinfo is appended to the structure variable by means of fields,
separated from K by a dot. For example, the image height and width are now stored in structure
fields K. Height and K. width.
As an illustration, consider the following use of structure variable K to commute the
compression ratio for bubbles25.jpg:
>> K=imfinfo(‘bubbles25.jpg’);
>> image_ bytes =K.Width* K.Height* K.Bit Depth /8;
>> Compressed_ bytes = K.FilesSize;
>> Compression_ ratio=35.162
Note that iminfo was used in two different ways. The first was t type imfinfo bubbles25.jpg
at the prompt, which resulted in the information being displayed on the screen. The second was
to type K=imfinfo (‘bubbles25.jpg’),which resulted in the information generated by imfinfo
being stored in K. These two different ways of calling imfinfo are an example of command_
function duality, an important concept that is explained in more detail in the MATLAB online
documentation.
More general imwrite syntax applicable only to tif images has the form
Imwrite(g,’filename.tif’,’compression’,’parameter’,….’resloution’,[colres rowers] )
Where ‘parameter’ can have one of the following principal values: ‘none’ indicates no
compression, ‘pack bits’ indicates pack bits compression (the default for non ‘binary images’)
and ‘ccitt’ indicates ccitt compression. (the default for binary images).The 1*2 array [colres
rowers]
Contains two integers that give the column resolution and row resolution in dot per_ unit
(the default values). For example, if the image dimensions are in inches, colres is in the number
of dots(pixels)per inch (dpi) in the vertical direction and similarly for rowers in the horizontal
direction. Specifying the resolution by single scalar, res is equivalent to writing [res res].
>>imwrite(f,’sf.tif’,’compression’,’none’,’resolution’,……………..[300 300])
the values of the vector[colures rows] were determined by multiplying 200 dpi by the ratio
2.25/1.5, which gives 30 dpi. Rather than do the computation manually, we could write
>> res=round(200*2.25/1.5);
>>imwrite(f,’sf.tif’,’compression’,’none’,’resolution’,res)
where its argument to the nearest integer.It function round rounds is important to note that
the number of pixels was not changed by these commands. Only the scale of the image changed.
The original 450*450 image at 200 dpi is of size 2.25*2.25 inches. The new 300_dpi image is
identical, except that is 450*450 pixels are distributed over a 1.5*1.5_inch area. Processes such
as this are useful for controlling the size of an image in a printed document with out sacrificing
resolution.
Often it is necessary to export images to disk the way they appear on the MATLAB
desktop. This is especially true with plots .The contents of a figure window can be exported to
disk in two ways. The first is to use the file pull-down menu is in the figure window and then
choose export. With this option the user can select a location, filename, and format. More control
over export parameters is obtained by using print command:
Print-fno-dfileformat-rresno filename
Where no refers to the figure number in the figure window interest, file format refers
one of the file formats in table above. ‘resno’ is the resolution in dpi, and filename is the name
we wish to assign the file.
If we simply type print at the prompt, MATLAB prints (to the default printer) the contents of the
last figure window displayed. It is possible also to specify other options with print, such as specific
printing device.
top related