12004 mapld: 140jong-ru guo jong-ru guo, c. you, m. chu, k. zhou, jin-woo kim, b.s. goda*, r.p....
TRANSCRIPT
1 2004 MAPLD: 140Jong-Ru Guo
Jong-Ru Guo, C. You, M. Chu, K. Zhou, Jin-Woo Kim, B.S. Goda*, R.P. Kraft, J.F. McDonald
Rensselaer Polytechnic Institute, Troy, NY, 12180
* United State Military Academy, West Point, N.Y. 10096
High performance field programmable gate array for
gigahertz applications
2 2004 MAPLD: 140Jong-Ru Guo
Gigahertz era
High speed reconfigurable system is needed to handle the increasing amount of data.However, the CMOS FPGA just is operated at the hundreds MHz. GHz reconfigurable system is needed.
3 2004 MAPLD: 140Jong-Ru Guo
IntroductionField Programmable Gate Array (FPGA)
A
C
Logic Cell
I/O Cell
Routing Cell
B FPGA: A reconfigurable chip that can be programmed for a specific function.
Status: There are no FPGA’s that operate at GHz microprocessor clock rates much less at K-band or X-band.
Goal: Change this situation for the better.
K-band: 10.9~36GHzX-band: 8-12GHz
4 2004 MAPLD: 140Jong-Ru Guo
FPGA Applications
1. Prototyping 2. Digital Networks - Mobile Subscriber Equipment - High Speed Switching Nodes2. Real Time Signal/Image Processing - Radar - Pattern Recognition3. Digital System Processing - Filters - Fourier Transform4. Satellite Systems5. Wireless
5 2004 MAPLD: 140Jong-Ru Guo
High speed IBM SiGe HBT Process
Ref. 40-Gb/s Circuits Built From a 120-GHz fT SiGe TechnologyIEEE Journal of Solid-State Circuit. VOL. 37, NO.9, Sept. 2003
Approximated cut-off frequency:IBM 0.5 & 0.25 um generations (5HP) ~ 50 GHzIBM 0.18 um generation (7HP) ~ 120 GHzIBM 0.13 um generation (8HP) ~ 180 GHz
Observe the Logarithmic Ic Axis
8HP process
5HP process
7HP process
6 2004 MAPLD: 140Jong-Ru Guo
SiGe Graded Base Bipolar Transistor
p-Si
Eg,Ge(x=0)
EC
EV
e-
h+n+ Siemitter
Ge
p-SiGebase
Drift Field
n- Sicollector
Eg,Ge(grade)= Eg,Ge(x=Wb)- Eg,Ge(x=0)
(Concentration)
x
Si/SiGe band diagram
Ref. Yuan Taur and Tak H. Ning “Fundamentals of Modern VLSI Devices”, Cambridge University Press, p364, 1998.
Ref. Flash Comm
7 2004 MAPLD: 140Jong-Ru Guo
Ce,Qe,E,E4Cw,Qw,W,W4 Cs,Qs,S,S4 Cn,Qn,N,N4
Ce,Qe,E,E4Cw,Qw,W,W4 Cs,Qs,S,S4 Cn,Qn,N,N4
Ce,Qe,E,E4Cw,Qw,W,W4 Cs,Qs,S,S4 Cn,Qn,N,N4
Input Routingblock
CLK
C
Q
To East
To West
To South
To North
E,S,N
W,S,N
E,W,S
E,W,N
Output routingblock
New D-FF
Function Unit (FU)
New Structure: Input and Output Block and Function Unit (FU)
Schematic of the new function unitBased on XC6200
West Output MUX
South Output MUX
North Output MUX
East Output MUX
Input 16:1 MUX
Input 17:1 MUX
Input 17:1 MUX
Master-Slave Latch
Memory configurationstructure
2:1 MUX
Output drivers
(170um x 210um)
8 2004 MAPLD: 140Jong-Ru Guo
210 um
170 um
Area improvement-BC
130 um
135 um
49% layout area saved7HP: 0.18 um process8HP: 0.13 um processSmaller layout Better performance More Configurable cells.
9 2004 MAPLD: 140Jong-Ru Guo
Prediction of the performance improvement by the different generation processes
130 ps
71 mW
30 ps???
13.8 mW
42 ps
HBT Generations
5HP8T 9HP?
Propagation delay and power consumption comparisons between different processes
100 ps
4.2 mW52 mW
7HP
10 2004 MAPLD: 140Jong-Ru Guo
Information for the old, new, and future Basic Cells-BC
Process Vcc, Vee Current trees
Iref Power ( PON )
Tp
BC-I 5HP 0, -3.4V 30 0.7 mA 71.4mW 239ps
Basic Cell-II 7HP 0, -2.8V 21 0.8 mA 47.04mW 100ps
Basic Cell-II (high performance case)
8HP 0, -2.2V 21 0.7mA 32.34mW 42ps
Basic Cell-II(Power Saving case)
8HP 0, -2.2V 21 0.3mA 13.86mW 75ps
1. The 8HP cases (High performance case and power saving case) are based on simulations.
2. The difference between 8HP cases is the high performance case has its transistors set to max. cutoff frequency and the transistors in the Power Saving case are set to be the same with the maximum cutoff frequency of 7HP process.
11 2004 MAPLD: 140Jong-Ru Guo
Test circuit Four stage Basic Cell ring oscillator-BC
Measurement result of the 5HP Basic Cell
Measurement result of the 7HP ring oscillator
12 2004 MAPLD: 140Jong-Ru Guo
Design Tree # Usage
BC Maximum Usage 21 100%
Case I (Comb./Sequential. Logic) 10/12 47.6%/57.1%
Case II Sequential, One Redir. 15 71.4%
Sequential, Two Redir. 18 85.7%
Sequential, Three Redir 21 100%
Case III redirect function only 3 tree/dir 14.2%/dir
Power-saving scheme Usage [12]Case I: Only combinational logic or sequential logic is used. Case II: Sequential logic and redirection function are used.Case III: Only redirection function is used.
Power-saving scheme- Basic Cell
13 2004 MAPLD: 140Jong-Ru Guo
Summary: Basic Cell• Layout size has been reduced by 49%. With the latest Basic Cell, there will be 48x48 Basic Cell array
in 7mm x 7mm area.• Propagation delay has been reduced by 82.5% • Power consumption has been reduced by 80.6% (5HP case and
8HP power saving case) for the fully turned-on case.• There is 94% power saved when the power-saving scheme is
enabled.
14 2004 MAPLD: 140Jong-Ru Guo
High speed reconfigurable system
High speed front end
High speed back end
SiGe FPGA
CMOSFPGA
Interleaving block
De-interleaving block
DSP and other applicationsSuch as, Poly-phase filter,
digital filter…etc
10GHz ~ 80GHz 500MHz ~ 10GHz 100MHz~700MHz
Interleaving data path
De-interleaving data path
To processors or other circuits
High speed inputs
High speed outputs
15 2004 MAPLD: 140Jong-Ru Guo
• SiGe FPGA can be configured to DSP and other applications. •To compare the performance between the SiGe and CMOS FPGAs, the SiGe FPGA is configured to be 4:1 MUX and 1:4 DEMUX.
•The results can be used to prove its interleaving and de-interleaving functions.
MUX-DEMUX
Application: High speed data acquisition system
16 2004 MAPLD: 140Jong-Ru Guo
1:2 DEMUX
1:2 DEMUX
1:2 DEMUX
D1D1B
D2D2B
Data
½DIV
½DIV
CLK ¼ CLK out
D4 D4B
D3D3B
The building blocks of the 1:4 DEMUX.
CH1 CH2 CH3 CH4
CHx
T
4T
Data input
Output
The timing diagram of the 4:1 MUX (x represents 1, 2, 3 and 4)
CH1 CH3 CH2 CH4
CHx
T
4T
Outputs
1:4 DEMUX input
The timing diagram of the 1:4 DEMUX
2:1 MUX
2:1 MUX
2:1 MUX
1/2DIV
CLKCLKB
1/2 CLK 1/2 CLKB
CH1CH1B
OutputOutputB
The block diagram of the 4:1 MUX
CH3CH3BCH4CH4B
CH2CH2B
High speed data acquisition MUX-DEMUX
17 2004 MAPLD: 140Jong-Ru Guo
Layout of the 4:1 MUX and 1:4 DEMUX implemented by the SiGe Basic Cells
Layout of the 4:1 MUX
Layout of the 1:4 DEMUX
Simulation results show both 4:1 MUX and 1:4 DEMUX can operate up to 10GHz
Compare to CMOS FPGA (Xilinx Virtex), same circuits can run to 183MHz
18 2004 MAPLD: 140Jong-Ru Guo
Simulated eye diagram of the 4:1 MUX programmed by SiGe FPGA runs at 10Gbps
Simulation result of the 4:1 MUX.Inputs: CH_A: 1010011, CH_B: 0010100, CH_C: 0101001 and CH_D: 0001010.Output: 0010-1000-0011-1100-0001-0110.
Simulation results (MUX)
19 2004 MAPLD: 140Jong-Ru Guo
SiGe and CMOS FPGAPerformance comparisons
Tx rate Power (mW) Used CLB
4:1 MUX (SiGe) 10GBps 258.3 7
4:1 DEMUX (Virtex) 170MHz 61 7
1:4 DEMUX (SiGe) 2.5Gbps (Input: 10Gbps)
194.52 8
1:4 DEMUX (Virtex) 45.5MHz (Input 182MHz)
91 8
Virtex results are based on the following environments:Software: Foundation 2.1Xilinx power consumption work sheet V1.5
20 2004 MAPLD: 140Jong-Ru Guo
Larger scale SiGe FPGA
• 20x20 Basic Cell array is fabricated by IBM (7HP).
with the dimension of 7mm x 7mm. (400 Basic Cells).
• 48x48 Basic Cell array is developed (8HP) with the high speed ADC integrated.
5.8mm
7mm
21 2004 MAPLD: 140Jong-Ru Guo
Conclusion
• The performance of the SiGe FPGA can reach up to 20GHz (8HP generation)
• The layout has been reduced by 49% between the 8HP and 5HP generations.
• Applications have been proposed to run at GHz range.
• 4:1 MUX and 1:4 DEMUX have been configured to compare the performance of SiGe and CMOS FPGA.
22 2004 MAPLD: 140Jong-Ru Guo
Future work
• Test 20x20 Basic Cell array has been fabricated by IBM (7HP).
• Develop high speed data acquisition system.
• Implement DSP applications. Such as software radar, poly phase filtering …etc.
• 10GHz and 20GHz SiGe FPGA.
• Integrated with high speed front-end and back-end circuits.
Primitive layout of the 48x48 SiGe FPGA