vlsi implementation of high throughput mimo …nopr.niscair.res.in/bitstream/123456789/15159/1/ijems...

13
Indian Journal of Engineering & Materials Sciences Vol. 19, October 2012, pp. 307-319 VLSI implementation of high throughput MIMO OFDM transceiver for 4 th generation systems J Raja* & M Kannan Department of Electronics Engineering, MIT, Anna University, Chennai 600 044, India Received 24 February 2012; accepted 13 August 2012 This paper aims to maximize throughput by minimizing power as minimum as possible. Scores of optimization techniques such as FFT, IFFT and memory optimization are available for reducing power of mobile OFDM systems. An approach for achieving reduction in power of MIMO OFDM system by optimizing FFT architecture is addressed in this paper. Memory references in MIMO OFDM transceiver are costly due to their long delay and high power consumption. To implement fast Fourier transform (FFT) algorithms on MIMO OFDM, involves many memory references to access butterfly inputs and twiddle factors. Conventional FFT implementations require unused memory references to load the same twiddle factors for butterflies from different stages in FFT diagrams. To minimize memory references due to twiddle factors for implementing FFT algorithms in MIMO OFDM systems, memory reference reduction method is incorporated here. Twiddle factor is calculated using binary scaling technique. The proposed FFT structure is the combination of memory reference reduction method with binary scaling technique and Radix-4 booth multiplier. Here multipliers in FFT are realized with shifters, adders and subtractors. The proposed structure is evaluated using performance parameters such as BER and SNR. Structural realization and analysis pertaining to timing, power and throughput are implemented in Virtex-4 and analysis is carried out in Altera respectively. Keywords: Multiple input multiple output (MIMO), Orthogonal frequency division multiplexing (OFDM), Fast Fourier transform (FFT), Inverse fast Fourier transform (IFFT), Inter symbol interference (ISI) Multiple input multiple output (MIMO) system consists of multiple antennas at the transmitter and receiver ends to improve link reliability and data rates of the wireless communication system. Orthogonal frequency division multiplexing is efficient in synchronizing the received signal under fading environment and has been used in past times in applications that require a huge data rate. Fast Fourier transform (FFT)/ inverse FFT (IFFT) processors are proposed for multiple-input multiple-output orthogonal frequency division multiplexing based IEEE 802.11n. Here the processor not only supports the operation of FFT/IFFT but also provides sufficient throughput rates but the drawback is hardware complexity is more, compared with conventional approaches 1 . FFT architecture which is optimized for a processor with a memory system containing a cache is discussed elsewhere 2 . The processor uses a cached memory with increased efficiency and performance for MIMO system, but the achieved throughput is very less. The architecture is mainly focused on minimum mean square error detector in decoding section for wireless specification, but the difficulty is hardware complexity 3 . The reported architecture 4 is implemented in matrix inversion circuit in MIMO detection wireless specification, the problem is the increase in hardware complexity and throughput is less compared with conventional approach. Variable length FFT processor design based on Radix 2, Radix 4 and Radix 8 with complex multipliers containing 3 multiplications described earlier 5 , consumes more power. ASIC based MIMO for OFDM provides data rate of 192 Mbps with a 20 MHz bandwidth for IEEE 802.11a standard. Here the paper mainly focuses on silicon complexity of MIMO OFDM system. Throughput of the system is very less compared with other systems 6 . MIMO OFDM Base band transceiver implementation is based on ASIC. Verification is based on testbed and GUI monitor. Here the authors have developed separate testbeds for MIMO OFDM system for verification purpose but haven’t concentrated on throughput and BER 7 . In order to meet IEEE 802.11n requirements 8,9 the processor not only supports the operation of FFT/IFFT in 128 points and 64 points but can also provide different _______________________ *Corresponding author (E-mail: [email protected])

Upload: others

Post on 30-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Indian Journal of Engineering & Materials Sciences

Vol. 19, October 2012, pp. 307-319

VLSI implementation of high throughput MIMO OFDM transceiver for

4th generation systems

J Raja* & M Kannan

Department of Electronics Engineering, MIT, Anna University, Chennai 600 044, India

Received 24 February 2012; accepted 13 August 2012

This paper aims to maximize throughput by minimizing power as minimum as possible. Scores of optimization

techniques such as FFT, IFFT and memory optimization are available for reducing power of mobile OFDM systems. An

approach for achieving reduction in power of MIMO OFDM system by optimizing FFT architecture is addressed in this

paper. Memory references in MIMO OFDM transceiver are costly due to their long delay and high power consumption. To

implement fast Fourier transform (FFT) algorithms on MIMO OFDM, involves many memory references to access butterfly

inputs and twiddle factors. Conventional FFT implementations require unused memory references to load the same twiddle

factors for butterflies from different stages in FFT diagrams. To minimize memory references due to twiddle factors for

implementing FFT algorithms in MIMO OFDM systems, memory reference reduction method is incorporated here. Twiddle

factor is calculated using binary scaling technique. The proposed FFT structure is the combination of memory reference

reduction method with binary scaling technique and Radix-4 booth multiplier. Here multipliers in FFT are realized with

shifters, adders and subtractors. The proposed structure is evaluated using performance parameters such as BER and SNR.

Structural realization and analysis pertaining to timing, power and throughput are implemented in Virtex-4 and analysis is

carried out in Altera respectively.

Keywords: Multiple input multiple output (MIMO), Orthogonal frequency division multiplexing (OFDM),

Fast Fourier transform (FFT), Inverse fast Fourier transform (IFFT), Inter symbol interference (ISI)

Multiple input multiple output (MIMO) system

consists of multiple antennas at the transmitter and

receiver ends to improve link reliability and data rates

of the wireless communication system. Orthogonal

frequency division multiplexing is efficient in

synchronizing the received signal under fading

environment and has been used in past times in

applications that require a huge data rate. Fast Fourier

transform (FFT)/ inverse FFT (IFFT) processors are

proposed for multiple-input multiple-output

orthogonal frequency division multiplexing based

IEEE 802.11n. Here the processor not only supports

the operation of FFT/IFFT but also provides sufficient

throughput rates but the drawback is hardware

complexity is more, compared with conventional

approaches1. FFT architecture which is optimized for

a processor with a memory system containing a cache

is discussed elsewhere2. The processor uses a cached

memory with increased efficiency and performance

for MIMO system, but the achieved throughput is

very less. The architecture is mainly focused on

minimum mean square error detector in decoding

section for wireless specification, but the difficulty is

hardware complexity3. The reported architecture

4 is

implemented in matrix inversion circuit in MIMO

detection wireless specification, the problem is the

increase in hardware complexity and throughput is

less compared with conventional approach. Variable

length FFT processor design based on Radix 2, Radix

4 and Radix 8 with complex multipliers containing 3

multiplications described earlier5, consumes more

power.

ASIC based MIMO for OFDM provides data rate

of 192 Mbps with a 20 MHz bandwidth for IEEE

802.11a standard. Here the paper mainly focuses on

silicon complexity of MIMO OFDM system.

Throughput of the system is very less compared with

other systems6. MIMO OFDM Base band transceiver

implementation is based on ASIC. Verification is

based on testbed and GUI monitor. Here the authors

have developed separate testbeds for MIMO OFDM

system for verification purpose but haven’t

concentrated on throughput and BER7. In order to

meet IEEE 802.11n requirements8,9

the processor not

only supports the operation of FFT/IFFT in 128 points

and 64 points but can also provide different _______________________

*Corresponding author (E-mail: [email protected])

INDIAN J. ENG. MATER. SCI., OCTOBER 2012

308

throughput rates for simultaneous data sequences. The

difficulty is number of point’s increases more and

more. Hence, the cost and complexity of the system is

also increased. IEEE 802.11a established for WLAN

standard, provides 54 Mbps throughput using SISO

OFDM transceiver. Here the throughput of SISO

OFDM is very less; to increase the throughput,

MIMO OFDM is preferred10

. The IEEE 802-11n

provides data rate up to 600 Mbps with transmission

speed of 80 MHz. Latency is incurred by

pre-processing the channel matrices for MIMO

detection11

. Different approaches for design and

implementation of 4×4 MIMO OFDM transceiver

which provides date rate of 1 Gbps is discussed

elsewhere12-15

. Here MMSE MIMO detector is used to

reduce the latency but the disadvantage is that it

requires larger circuit for high speed transmission. In

any n bit 2’s complement multiplier the maximum

number of addition and subtraction operation is n/216

.

However, the above mentioned results are still

insufficient in day to day requirements like

transmission speed and power. In this paper, the VLSI

implementation of MIMO OFDM transceiver suitable

for low power application has been studied. The FFT

optimization helps to reduce the delay time, power

thereby simultaneously increasing speed.

Orthogonal Frequency Division Multiplexing

Orthogonal frequency division multiplexing

(OFDM) is a technique which is a subset of frequency

division multiplexing in which a single channel uses

multiple sub-carriers on neighbouring frequencies.

The sub-carriers of an OFDM system overlap each

other. Normally the neighbouring channels which

overlap may obstruct one another. Since the sub-

carriers in an OFDM system are exactly orthogonal to

one another, they are capable of overlapping without

interfering. As a result, OFDM systems are capable of

maximizing spectral efficiency without causing

neighbouring channel interference. Since multiple

sub-carriers are used to transmit in an individual

channel, an OFDM communication system must

perform several steps, which are described in Fig.1.

In an OFDM system, each channel is divided into a

number of sub-carriers. A serial bit stream is converted

into several parallel bit streams. After the bit stream

division among the individual sub-carriers, each sub-

carrier is modulated. The receiver performs the vice-

versa process, which first divides the incoming signal

into appropriate sub-carriers and then demodulate them

individually before recovering the original bit stream.

The data is modulated into waveform at inverse

fast Fourier transform (IFFT) stage of the transmitter.

The IFFT is used to modulate each sub-channel onto

the appropriate carrier. The receiver performs the

vice-versa process of the transmitter. FFT is used to

demodulate the data.

One of the main drawback in wireless

communication systems is ISI which is caused due to

the multi-path reflections in the channel. In order to

reduce ISI a cyclic prefix is added. The added cyclic

prefix is a part of the first stage of a symbol which is

described in Fig. 2. This addition is necessary because

it enables multi-path representations of the source

signal to fade, so that it won’t interfere with the

subsequent symbol.

After the addition of cyclic prefix to the sub-carrier

channels, it must be transmitted as a single signal.

Thus, the process of parallel to serial conversion stage

is done by adding all sub-carriers and combining them

into a single signal. At the end of this process all

sub-carriers are generated perfectly in a simultaneous

manner.

MIMO OFDM

MIMO OFDM (multiple input multiple output,

orthogonal frequency division multiplexing) is a

technology that combines MIMO and OFDM together

to transmit data in wireless communications, in order

to deal with frequency selective channel effect. The

OFDM signal on each subcarrier can overcome

narrowband fading. Therefore, OFDM can transform

Fig. 1 OFDM structure

Fig. 2 Cyclic prefix insertion format

RAJA & KANNAN: VLSI IMPLEMENTATION OF HIGH THROUGHPUT MIMO OFDM TRANSCEIVER

309

frequency-selective fading channels into parallel flat

ones. Then by combining MIMO and OFDM

technology together, MIMO algorithms can be

applied in broadband transmission

A MIMO OFDM system transmits data modulated

by OFDM from multiple antennas simultaneously. At

the receiver, after OFDM demodulation, the signals

are recovered by decoding each sub-channel from all

transmitting antennas.

In the basic structure of MIMO OFDM shown in

Fig. 3, the signals are modulated by OFDM

modulator, then they are transmitted by MIMO

system, finally, the signals are recovered by the

OFDM demodulator.

Therefore, MIMO OFDM achieves spectral

efficiency, increased throughput and thus the

inter-symbol interference (ISI) can be reduced.

Fast Fourier Transform

Conventional FFT algorithm

The modulation of OFDM can be done using an

IDFT. The fast implementation of IDFT is

accomplished by deploying IFFT, which reduces

processing time and hardware usage. The

demodulation is through an DFT or better by using

FFT, which is an efficient way of implementation.

In order to reduce numerous calculations involved

in calculation of DFT, FFT is used. To achieve the

increased efficiency an additional step is involved

(to reverse order the data). These additional steps

won’t increase the computational complexity of the

FFT calculation. As a result, FFT is an extremely

efficient algorithm that provides a good

implementation in hardware.

If N is very large, the computation efficiency is

increased by dividing DFT successively in the smaller

calculation. The DFT of discrete signal x(n) can be

computed directly using Eq. (1).

1

0

( ) ( )N

nk

N

n

X K x n W−

=

=∑ , k=0,1,2,3............N-1 … (1)

Calculation of FFT is done using decimation in

time (DIT) or decimation in frequency (DIF)

algorithm. If it is DIT, first twiddle factor is

multiplied and later it is summed up. It is computed

with the help of Eq. (2). The butterfly diagram

pertaining to DIT is shown in Fig. 4.

)1,2,1,0( )()(

)()(

)12()2(

)()()()(

21

12

0 2

2

12

0 2

1

12

0

)12(

12

0

2

1

)(0

1

)(0

1

0

−=+=

+=

++=

+==

∑∑

∑∑

∑∑∑

=

=

=

+

=

=

=

=

NkkXWkX

WrxWWrx

WrxWrx

WnxWnxWnxkX

k

N

N

r

rk

N

k

N

N

r

rk

N

N

r

kr

N

N

r

rk

N

N

oddn

nk

N

N

evenn

nk

N

N

n

nk

N

… (2)

If it is DIF, first the input is summed up and then

the twiddle factor is multiplied. It is computed using

the Eq. (3). The butterfly diagram pertaining to DIF is

shown in Fig. 5.

)1,1,0( )2

()1()(

)2

()(

)2

()(

)()()()(

12

0

12

0

2

12

0

)2

(1

2

0

1

2

12

0

1

0

−=

+−+=

++=

++=

+==

∑∑

∑∑∑

=

=

=

+

=

=

=

=

NkWN

nxnx

WN

nxWnx

WN

nxWnx

WnxWnxWnxkX

N

n

nk

N

k

N

n

nk

N

kN

N

N

n

kN

n

N

N

n

nk

N

N

Nn

nk

N

N

n

nk

N

N

n

nk

N

… (3)

The conventional N point FFT, requires (N/2) log2

N multiplication operations and N log2 N addition

operations. If it is N bit multiplier, it will generate 2N

partial products. The entire FFT butterfly module of

Fig. 3 MIMO OFDM structure

Fig. 4 Radix 2 DIT FFT butterfly diagram

INDIAN J. ENG. MATER. SCI., OCTOBER 2012

310

conventional FFT is shown in Fig. 5a. Due to the

partial products the delay is increased and it consumes

more power. Hence, it will affect the throughput of

the mobile system. Thus, there is a need for reducing

the consumed power with decreased delay thereby

dragging down the throughput of the system for

which we propose modified FFT algorithms which

helps to overcome the above mentioned issues.

Modified fast Fourier transform algorithm

During FFT implementation identical twiddle

factors occupies enormous memory. In this paper

identical twiddle factors are grouped together and

computed, before computing the other twiddle factors.

The twiddle factors are fed in to the structure only

once and the amount of memory due to the identical

twiddle factors can be minimized. The twiddle factors

are computed using the binary scaling method. In

binary scaling the twiddle factors are calculated by

simply shifting the bits. The entire multiplication

operation is completely replaced by shifting

operation.

For example, in DIT butterfly diagram shown in

Fig. 3, the butterflies with twiddle factor W160 appears

in Stage 1, Stage 2, Stage 3 and Stage 4. Hence, these

twiddle factors are grouped together and computed in

any order in the 16-pt radix-2 DIT FFT diagram. This

can be achieved using the below identities (5)-(8).

WWWWNm

N

Nm

N

N

N

m

Nj

4/4/4/.

−−

−== m (N/4, N/2) … (5)

= WWWNm

N

Nm

N

N

N

2/2/2/.

−−

−= m (N/2, 3N/4) … (6)

= WWWNm

N

Nm

N

N

Nj

4/34/34/3.

−−

= m (3N/4, N) … (7)

= Wm

N

m (0, N/4) … (8)

The butterflies with twiddle factor W164 which

appears in Stage 4, Stage 2 and Stage 3 are computed.

Hence, twiddle factors are grouped together and

computed in any order in the 16-pt radix-2 DIT FFT

diagram. Similarly, the butterflies with twiddle factors

W162 and W16

6 appears only in Stage3 and Stage4 of the

16-pt radix-2 DIT FFT diagram. These twiddle factors

are grouped together and computed, without affecting

the computations of other butterflies. The butterflies

can be computed together by loading only one twiddle

factor mWm

Nand it is described in Eqs (9)-(11).

=m (n mod N/2i-1

)x2i-1

… (9)

jWWiiiii xNn

N

kxNNn

N −

−−−−+

=+

11111 2))2/mod(2))2/mod())2/((

… (10)

jWWiiii xNn

N

xNn

N −

−−−−

=

1111 2))2/mod(2))2/mod(

… (11)

where i is the number of stages. Finally, twiddle factors W16

1, W16

3, W16

5 and W16

7

appear only in Stage 4. These twiddle factors are

grouped together and computed, without affecting the

computations of other butterflies. Each twiddle factor is

loaded only once and redundant memory references for

identical twiddle factors are removed which is shown in

Fig. 4. Twiddle factor W165 can be replaced by -jW16

1

with a simple derivation as described in Eq. (12).

Fig. 5– Radix 2 DIF FFT butterfly diagram

Fig. 5a Block diagram of conventional FFT butterfly module

Fig. 6 Modified DIT FFT butterfly diagram

RAJA & KANNAN: VLSI IMPLEMENTATION OF HIGH THROUGHPUT MIMO OFDM TRANSCEIVER

311

W16 5

= W16 –j*(2π/16)*5

= (W16 –j*(2π/16)*4

) * (W16 –j*(2π/16)*1

)

= -jW16 1

… (12)

Similarly, twiddle factors W166 and W16

7 can be

replaced by -jW162 and -jW16

3, respectively.

The entire FFT butterfly module of modified FFT

is shown in Fig. 6a. Modified FFT is the combination

of binary scaling, which will convert floating point in

to fixed point and Radix 4 multiplier, which will

generates N/3 partial product.

Here, binary scaling technique is used to calculate

the twiddle factor. Binary scaling is a programming

technique used mainly in DSP to perform a pseudo

floating point using integer arithmetic. It is both faster

and more accurate than directly using floating point.

In binary scaling the floating point is multiplied with

256, which will convert floating point in to fixed

point and it is given in Table 1.

Radix-4 multiplier

The complexity of conventional FFT is dominated

by multiplier. Ordinary multipliers generate 2N partial

products. Due to the partial products, the latencies and

power consumption of the MIMO OFDM transceiver

are increased. The Radix-4 booth multiplier generates

N/3 partial product. It consumes less power and short

delay because of its high speed parallel multiplier. The

proposed FFT structure is the combination of binary

scaling technique and Radix-4 booth multiplier.

In general Radix-4 booth multiplier is preferred rather

than Radix-2 booth multiplier because of the following

disadvantages of Radix-2 booth multiplier: (i) the number

of addition, subtraction and shift operations in Radix-2

booth algorithm are variable and it is very difficult to

incorporate in designing of parallel multipliers, (ii) the

Radix-2 multiplier becomes suitable only when there are

isolated 1’s and (iii) the number of partial product is N.

The above problems are nullified by using

modified Radix-4 booth algorithm. It requires only

N/3 partial product. The procedure of Radix-4 booth

algorithm is described as: (i) if N is even then extend

the sign bit, (ii) append a 0 to the LSB of the

multiplier and (iii) based on the block value, each

partial product will be 0, +X, -X, +2X and -2X. The

negative values of X are obtained by taking the 2’s

complement. Based on the above result the booth

encoded table is developed and tabulated in the Table 2.

The recoding is done by encoding three bits of

multiplier X.

The proposed FFT processor described above has

been implemented using Xilinx Virtex-4 FPGA.

Based on the analysis, device utilization summary is

given in Table 3. From the results, we can observe

that the modified FFT algorithm requires only 10% of

slices, 11% of flip flops and 7.5% of LUTs compared

to the conventional FFT Algorithm.

Fig. 6a Block diagram of modified FFT Butterfly Module

Table1 Binary scaled value

S.No Twiddle factor Binary scaled value

1 1+j0 256+j0

2 0+j1 0+j256

3 0-j1 0-j256

4 0.707+j0.707 181+j181

5 -0.707-j0.707 -181-j181

Table 2 Radix-4 encoded table

Block Partial product

000 0*X

001 1*X

010 1*X

011 2*X

100 -2*X

101 -1*X

110 -1*X

111 0*X

Table 3 Device utilization summary

Modified FFT

algorithm

Conventional FFT

algorithm

Logic

Utilization

Used Available Used Available

No of slices 74 6144 782 6144

No of slices

Flip flop

92 12288 1040 12288

No of 4 input

LUTS

142 12288 1080 12288

No. of

bonded IOBs

187 240 321 240

No of

GCLKS

1 32 1 32

No. of DSP

48s

12 32 32 32

INDIAN J. ENG. MATER. SCI., OCTOBER 2012

312

Throughput of the mobile system entirely depends

on the power consumption and delay. The proposed

system consumes less power and short delay. Hence,

the throughput of the mobile system is increased.

Based on the results, the performance of the MIMO

OFDM is compared and given in Table 4.

Results of Simulation Code development and simulation were carried out

on a system running on Intel P4 configuration having

4 GB RAM working in windows 7 Platform. The

MIMO-OFDM structure for determination of

constellation points and BER were coded and

simulated in MATLAB. Structural realisation of

MIMO OFDM transceiver is implemented in Xilinx.

Analysis pertaining to timing, frequency is done in

Altera. The experimental results and analysis obtained

using Matlab, Xilinx and Altera are presented and

discussed.

MATLAB implementation output

QAM able to carry higher data rates than the

other schemes. QAM sends digital information by

periodically adjusting the phase and amplitude. Four-

QAM uses four combinations of phase and amplitude.

Moreover, each combination is assigned a 2-bit digital

pattern. For example, to generate the bit stream

(1,1,0,0,1,1). Because each symbol has a unique 2-bit

digital pattern, these bits are grouped in two’s so that

they can be mapped to the corresponding symbols. In

our example, the original bit stream (1,1,0,0,1,1)

grouped into three symbols (11,00,11). The

constellation plot in Fig. 7 shows the symbol map of

4-QAM with each possible phase (Θ) and amplitude

(A) of a carrier signal in polar coordinate form.

Bit error rate (BER) is defined as the ratio of

number of error bits and total number of bits. There

are many ways of reducing BER. Here, we focused on

Alamouti code and modulation techniques. In this

study we have taken AWGN is communication

channel where noise gets spread over the entire

spectrum of frequencies. BER has been calculated by

comparing the transmitted signal with the received

signal and computing the error count over the total

number of bits. For any given modulation, the BER is

normally expressed in terms of signal to noise ratio

(SNR). The bit error rate is improved from 10-6

with

the almost need of 10 dB of SNR as shown in Fig. 8.

FPGA implementation output

Table 5 shows the percentage minimization of

power, memory references, space in bytes and clock

cycles, and Table 6 shows the reduction in number of

slices, Flip flops, LUT’s and I/O blocks.

Fig. 7 QPSK signal constellation points

Table 4– Performance Comparisons

S.No Parameter MIMO

OFDM

Mixed Radix

FFT

architecture

based

MIMO

OFDM

Modified

FFT

architecture

based

MIMO

OFDM

1 Power dissipation 597 mW 540 mW 111 mW

2 Number of

memory reference

32 24 15

3 Storage space in

bytes

32 30 12

4 No. of clock cycle 698 600 540

5 Throughput 600 Mbps 700 Mbps 1.4 Gbps

6. Delay 9.43 ns 8.89 ns 4.04 ns

7. Logic elements 9940 9813 5844

Table 5 Percentage reduction of parameters compared with

conventional approaches

Sl. No Parameter A B C

1 % of power

dissipation

90 20 5

2 % of

memory

reference

75 45 11

3 % of space

in bytes

94 38 9

4 % of clock

cycle

86 77 19

A – Mixed Radix FFT + MIMO OFDM Transceiver;

B – Modified FFT + MIMO OFDM Transceiver

C – Modified FFT + SISO OFDM Transceiver

RAJA & KANNAN: VLSI IMPLEMENTATION OF HIGH THROUGHPUT MIMO OFDM TRANSCEIVER

313

Synthesis output SISO OFDM RTL view

The OFDM structure was synthesized and

simulated in Vertex-4 as shown in Fig. 9. The

proposed method can achieve average of 11%

reduction in number of memory references, 9%

saving of memory space due to the twiddle factor and

average of 19% reduction in number of clock cycles.

MIMO OFDM RTL view

MIMO OFDM structure was synthesized and simulated

in Vertex-4 FPGA as shown in Fig. 10. The proposed

structure was constructed using four transmitting and four

receiving antennas. Transmitter outputs and receiver inputs

are combined using spatial multiplexing technique. The

proposed method can achieve average of 20% reduction in

number of memory reference, 38% saving of memory

space due to twiddle factor and average of 77% reduction

in the number of clock cycle as compared with

conventional approaches which is evident in Table 5.

FFT RTL View

To implement the conventional FFT method, it

requires 1040 number of FFs, 321 numbers of I/O

blocks and 1080 number of LUTs. Number of FFs

and number of LUTs decide the complexity and cost

which is shown in Fig. 11.

Table 6 Device utilization expressed in percentage

Sl. No Utilization A B

1 % of Slices 3 13

2 % of Slices flip flop 1 8

3 No. of 4 Input LUTS 1.1 9

4 % of bonded IOBs 78 99.99

A – Modified FFT; B– Conventional FFT

Fig. 8 SNR versus BER

Fig. 9 OFDM transceiver RTL output

INDIAN J. ENG. MATER. SCI., OCTOBER 2012

314

Proposed FFT RTL View

To implement the proposed FFT method, it requires

92 numbers of FFs, 187 numbers of I/O blocks and

142 numbers of LUTs. When compared with conventional

method, the proposed method requires only 1% of FFs, 78%

of IOBs and 1.1% of LUTs. This is depicted in Fig. 12.

Analysis output

Power analysis

Power consumed by the MIMO OFDM

transceiver system is 111 mW is shown in Fig. 13.

Total power is the sum of the static power, dynamic

power and I/O power.

Fig. 10 MIMO OFDM transceiver RTL output

Fig. 11 FFT RTL output

RAJA & KANNAN: VLSI IMPLEMENTATION OF HIGH THROUGHPUT MIMO OFDM TRANSCEIVER

315

Timing analysis

Fmax of MIMO OFDM system is 235 Mhz. Based

on the Fmax Throughput of the system is calculated

as 1.4 Gbps. Fmax of MIMO OFDM transceiver is

shown in the Fig. 14.

Simulation output

FFT Simulation output

Simulation output of FFT is shown in Fig. 15. The

binary inputs in0, in1, in2, in3, in4, in5, in6 and in7

are passed through the butterfly units. Finally, the real

part outputs are taken from outr0,outr1,outr2,outr3,

Fig. 12 Modified FFT with binary scaling technique and radix-4 Booth multiplier RTL output

Fig. 13 Power analysis Output

INDIAN J. ENG. MATER. SCI., OCTOBER 2012

316

outr4,outr5,outr6 and outr7 and imaginary part

outputs are taken from outi0,outi1,outi2,outi3,

outi4,outi5,outi6 and outi7.

Modified FFT simulation output

Simulation output of modified FFT with binary

scaling radix-4 booth multiplier technique is shown in

the Fig. 16. The binary inputs in0, in1, in2, in3, in4,

in5, in6 and in7 are passed through the memory

reference reduction technique and higher radix

multiplier. Finally, the real part outputs are taken

from outr0,outr1, outr2, outr3, outr4, outr5, outr6 and

outr7 and imaginary part outputs are taken from outi0,

outi1, outi2,outi3, outi4, outi5, outi6 and outi7.

Simulation output

Simulation output 4×4 MIMO OFDM transceiver

is shown in Fig. 17. The inputs in, in2, in3 and in4 are

Fig. 14 Timing analysis Output

Fig. 15 FFT simulation output

RAJA & KANNAN: VLSI IMPLEMENTATION OF HIGH THROUGHPUT MIMO OFDM TRANSCEIVER

317

passed through the modified FFT algorithm in the

transmitter section. The transmitted outputs are

spatially multiplexed and the transmitted data are

received by receiver using MRC (maximum ratio

combining) techniques. Finally, the outputs are taken

from out1, out2, out3 and out4.

Performance analysis

Power analysis

Proposed system requires less power which is very

less compared with other systems. Power is inversely

proportional to throughput. Maximum throughput is

achieved by minimizing power as minimum as

possible. The proposed system power is 111 mW

which is minimum. Hence, throughput of the system

is increased as shown in Fig. 18.

Delay analysis

Proposed system generates short delay which is

very less compared with other systems. Delay is

inversely proportional to the throughput. Maximum

throughput is achieved by minimizing delay as

Fig. 16 Modified FFT with binary technique and radix-4 booth multiplier simulation output

Fig. 17 MIMO OFDM transceiver Simulation output

INDIAN J. ENG. MATER. SCI., OCTOBER 2012

318

minimum as possible. The gate delay of proposed

system is 4.04 ns which is very small. Hence, the

throughput of the system can be increased as shown in

Fig. 19.

Throughput analysis

The throughput of the proposed system is 1.4 Gbps,

this can be achieved with the help of modified FFT

architecture. Modified FFT architecture reduces

redundant memory reference, number of partial

products and number of non zero digits as shown in

Fig. 20.

Logic elements analysis

Only 58% of logic elements is required to

implement the proposed system. The number of logic

elements determines the cost and complexity of the

hardware. When compared with other system the

proposed system requires very minimum number of

logic elements, hence the cost and complexity of the

system is reduced as shown in Fig. 21.

Power dissipation versus throughput

Maximum throughput is obtained at minimum

power that is 111 mW power. When the power

increases more and more correspondingly throughput

of the system gradually decreases as shown in Fig. 22.

From the graph we conclude that the proposed system

consumes less power and transmits maximum number

bits per second.

Memory reference versus performance parameters of different

MIMO OFDM systems

The main objective of this paper is to reduce the

power and increase the throughput. Memory reference

Fig. 18 Power versus different MIMO OFDM systems

Fig. 19 Delay versus different MIMO OFDM systems

Fig. 20 Throughput versus different MIMO OFDM systems

Fig. 21 Logic element analysis versus different MIMO OFDM

systems

RAJA & KANNAN: VLSI IMPLEMENTATION OF HIGH THROUGHPUT MIMO OFDM TRANSCEIVER

319

in FFT is costly due to the long delay and high power

consumption. Memory reference reduction method is

used in FFT to eliminate redundant memory

reference. Based on the result the graph is plotted in

Fig. 23. From the graph we conclude that the

proposed method delivers 1.4 Gbps maximum

throughput with minimal number of memory

references of 15 with the corresponding delay of

4.04 ns and the power is 111 mW.

Conclusions

In this work, we have presented a VLSI

implementation of high throughput MIMO OFDM

transceiver for 4G system which achieves 1.4 Gbps

throughput. The circuit was implemented in a 28 nm

(transistor gate size) library with less circuit area and

evaluated in lower power dissipation. This can be

achieved by optimizing FFT architecture (memory

reference reduction and binary scaling methods are

used to minimize redundant memory references) and

Radix-4 booth multiplier algorithm (Radix-4

multiplier technique is used to reduce the power

dissipation) which reduces number of partial products.

References 1 Fu Bo & Ampadu Paul, J Signal Process Syst, 56(1) (2009)

59-68.

2 Chang Y & Park S C, IEICE Trans Fundamentals, E87-A

(11) (2004) 3020-3024.

3 Kim Hun Seok, Zhu Weijun, Bhatia Jatin, Mohammed

Karim, Shah Anish & Daneshrad Babak, EURASIP J Adv

Signal Process, 2008.

4 LaRoche Isabelle & Roy Sebastien, An efficient regular

matrix inversion circuit architecture for MIMO processing,

IEEE Int Symp on Circuits and Systems (ISCAS), May 2006,

pp.4819-4822.

5 Lin Y T, Tsai P Y & Chiueh T D, IEE Proc Comput Digit

Technol, 152(4) (2005) 499-506.

6 Perels D, Haene S, Luethi P, Burg A, Felber N, Fichtner W

& Bolcskei H, IEEE Trans VLSI Syst, 5 (2005) 215-218.

7 Greisen Pierre, Haene Simon & Burg, EURASIP J Embedded

Syst, 2008, Article ID242584.

8 Reisis D & Vlassopoulos N, IEEE Trans Circuits Syst I,

55(11) (2008) 3438-3447.

9 Radhouane R, Liu P & Modlin C, in Proc. IEEE Int Symp

Circuits Syst, 1 (May 2000) 116-119.

10 Yoshizawa Shingo & Miyanaga Yoshikazu, VLSI

implementation of SISO-OFDM and MIMO-OFDM

transceivers, IEEE Int Symp Communtions and Information

Technologies (ISCIT), No. T2D-4, Oct 2006.

11 Yoshizawa Shingo, Yamauchi Yasushi & Miyanaga

Yoshikazu, A complete pipelined MMSE detection

architecture in a 4x4 MIMO-OFDM receiver, IEEE Int

Symp on Circuits and Systems (ISCAS), May 2008, pp.

1248-1251.

12 Yoshizawa Shingo, Yamauchi Yasushi & Miyanaga

Yoshikazu, VLSI Architecture of a 4x4 MIMO-OFDM with

an 80-MHz Channel Bandwidth Transceiver, IEEE Int Symp

on Circuits and Systems (ISCAS), May 2009, pp. 4244-4247.

13 Yoshizawa Shingo, Yamauchi Yasushi & Miyanaga

Yoshikazu, VLSI Implementation of a 4x4 MIMO-OFDM

Transceiver for 1Gbps Data Transmission, IEEE Int Symp

on Circuits and Systems (ISCAS), May 2010, pp. 1743-1746.

14 Lamarca Rey F & Vazquez M G, IEEE Trans Signal

Process, 53 (3) (2009) 1741-1755.

15 Shin M & Lee H, A high-speed four-parallel radix-2 4

FFT/IFFT processor for UWB applications, Proc. IEEE Int.

Symp. Circuits and Systems, May 2008, pp.960-963.

16 Ma G K & Taylor F J, IEEE ASSP Mag, Jan 1990, pp. 6-20.

Fig. 22 Power versus throughput

Fig. 23 Memory Reference versus Performance parameters of

Different MIMO OFDM Systems