ntu cmos emerging technology group for technology...
Post on 09-May-2020
3 Views
Preview:
TRANSCRIPT
NTU CMOS Emerging Technology Group for
Technology Direction and System Integration
Asst. Prof. Hao Yu (haoyu@ntu.edu.sg)
School of Electrical and Electronic Engineering
Nanyang Technological University, Singapore
http://www.ntucmosetgp.net
1
NTU CMOS Emerging Technology Group:
Research Summary I Research areas
Energy-efficient ICs for data link/analytic: wired/wireless I/Os, accelerators
• 2011 MOE Tier-2 (PI): 800K-S$ on 3D I/O
• 2010 NRF POC (PI): 250K-S$ on CMOS 60GHz Terminal
• 2011 MOE Tier-1 (PI): 200K-S$ on CMOS THz Terminal
• 2012 A*STAR PSF Tier-2 (CO-PI): 800K-S$ on 3D Server
• 2012 NRF CRP (CO-PI): 6M-S$ on NVM server
• 2012 HiSilicon (PI): 102K-S$ on CMOS 60GHz PA
• 2014-2015 Intel(PI): 30K-S$High-speed link
• 2014-2015 Huawei(PI): 400K-S$ on big-data server
• 2014-2018 MediaTek JIP: High-speed link and power management
Multi-modal Ics for data collection: optical/IR/THz/chemical biomedical sensors
• 2010 DSO-DIRP (CO-PI): 590K-S$ on 3D MEMS Sensor
• 2012 NRF POC (CO-PI): 250K-S$ on Ion ISFET Sensor
• 2013-2015 TSMC (PI): 50mm^2 CMOS 65nm CMOS imager/ISFET tapeout
• 2014-2015 JTC (PI): 180K-S$ wireless sensor with distributed machine learning
• 2011-2014 Advantest JIP: THz imaging
Manpower Training: 7 PhDs (4 to graduate in 2014), 3 MS/MEngs, 10 Research Staffs
2
NTU CMOS Emerging Technology Group:
Research Summary II Research Contributions
first CMOS meta-material 60GHz/140GHz/280GHz transceiver for near-field communication/imaging (2012)
first CMOS dual-mode sensor for DNA sequencing (2013)
first 3D/2.5D multi-core server (2014)
Collaborators: Prof. Dennis Sylvester/Prof. Wei Lu (Univ. of Michigan), Prof. Paul Franzon (NCSU), Prof. Sung Kyu Lim (Georgia Tech.), Prof. Sheldon Tan (UC-Riverside), Prof. Peter So (MIT), Prof. Krish Chakrabarty (Duke University), Prof. Xin Li (CMU) Dr. Tanay Karnik (Intel Lab), Dr. Ron Ho (Oracle Lab), Dr. Jinjun Xiong (IBM TJ-Watson Lab), Dr. Joshua Yang (HP Lab), Dr. Louis Scheffer (Howard Hughes Medical Institute) and Dr. William Yang (Huawei Shannon Lab).
Invited Talks: Intel Lab (Hillsboro), HP Lab (Palo Alto), IBM-TJ Lab (York Town), Qualcomm R&D (San Diego), TSMC R&D (Taiwan)
Publication: 90 conferences (VLSI-SYMP/CICC/RFIC/IMS/DAC), 40 SCI journals (IEEE/ACM Trans.), 1 best paper award in ACM Trans., 1 keynote talk, 4 books by Springer/CRC, and 5 patents in application
3
Energy-Efficient ICs for Data Analytics
• 1 Core = Microprocessor (=6 Giga
Flops @1.5GHz) •4 FPUs + RegFiles
•1 Chip = 742 Cores (=4.5 Tera Flops/s) • 213 MB of L1 I&D + 93 MB of L2
• 1 Node = 1 Chip + 16 DRAMs (16GB) • 1 Group = 12 Nodes + 12 Routers
(=54Tera Flops/s) • 1 Rack = 32 Groups (=1.7 Peta Flops/s)
• 384 nodes / rack •1 Data Center (=1 Exa Flops/s)
•3.6EB of Disk Storage •3.6PB = 0.0036 bytes/flops •583 Racks
Bandwidth at 100Gps, Space of 20,000 sq. ft. and Power
of 68 MW !!! Thousand cores in big memory
4
Big-data-analytic by Logic-Memory-Integration:
GHz TSV and TSI I/Os
[H. Yu-NTU: ICCAD’06, DAC’13, DATE’13-14, ASPDAC’13-14, ICCAD’14, TVLSI’08, TODAES’09, TCAD’13-14, TC’14]
Through silicon via
Through silicon interposer
5
8Gbps,0.8mW TSI I/Os for 2.5D Integration
A forward-clock (FC)
I/O with CTLE
equalization
Lowest JTB variation
of 35MHz, data rate
of 8Gb/s, energy
efficiency of
0.81mW/Gb/s with
the full-rate
structure, and area
of 0.16mm2
[H. Yu-NTU: 3D-IC’13, DATE’14, ISLPED’14, ICCAD’14] 6
Data-analytic Multi-core Microprocessor with 2.5D
TSI I/Os
Multiple MIPS cores with H.264 video accelerators
Multiple external memory blocks TSI I/Os between cores and memory GF 65nm + IME TSI process
9 [H. Yu-NTU: 3D-IC’13, DATE’14, ISLPED’14, ICCAD’14]
140G/280G CMOS wired I/Os:
High loss conventional T-line as well as switch can be replaced by low loss, low crosstalk
surface-wave plasmonic interconnect (waveguide)
Modulator, coupler size can be reduced to save area in 140G with larger bandwidth with
higher data rate >10Gbps
10 [H. Yu-NTU: APL’15]
CMOS sub-THz Wired I/Os
60G/140G/280G CMOS wireless I/Os:
In-phase sub-THz signal generation, transmission and detection with widest tuning
range, highest output power, and best sensitivity
11
CMOS sub-THz Wireless I/Os
[H. Yu-NTU: IMS’12-14, ASSCC’12-13, RFIC’13, CICC’13-14, ESSCIRC’14, TMTT’13-15]
2056 (H) × 1600 (V)
Pixel Array
Single-Slope Column ADC
Row
Dec
oder
/Dri
ver
IDA
C
SR
EG
Column Decoder/Driver
SRAM
ODD-COLUMN 10-bit Output
Single-Slope Column ADC
Column Decoder/Driver
SRAM
IDA
C
RA
MP
-GE
NE
RA
TO
R
EVEN-COLUMN 10-bit Output
PIX_OUT<1,3,5…1027>
PIX_OUT<0,2,4…1026>
12
5mm
5m
m
12
Data-analytic CMOS Image Sensor with 2.5D I/Os
1.1um 4way-shared pixel; Column-parallel with CDS readout
+ LVDS I/O; 60fps 3Mega
TSMC 65nm BSI
External
Memory
900kB
(1280*720)
Integ
reator
Sq
uared
Integ
rator
Integrated
Image
Buffer
21b, 21kB
(84x96)
Squared
Integrated
Image
Buffer
21b, 21kB
(84x96)
Face
Detection
Feature
Memory
26kB
(2135x96b)
Face
Buffer
Principal
Components
Analysis
Extreme
Learning
Machine
Final
Dicision
Eigen Face
Memory
97kB
(50 faces)
ELM
Memory
151kB
Chip Boundary
人脸探测
人脸识别
Architecture
3mm
3m
m
TSMC 40nm
13
Data-analytic Accelerator for Face Recognition
ISSCC’15: Collaboration with UMICH Dennis Sylvester
10mW, 5fps,1280x720
CMOS SoC for face
detection, recognition and
tracking
Big-data-analytic by Logic-in-Memory:
Non-volatile Computing
[H. Yu-NTU: ISLPED’12-13, DATE’14, ASPDAC’14, TNANO’12, TVLSI’14, Springer’14]
…...…...
SHF1SHF1
RD
RDWR1
WR1 SHF2SHF2
WR2
WR2
Load A Load BOutput
DWDW
A
BCin
SUM
EN EN
I I
A A
B
Cin
B
Cin
M2
M1 M3
M4
Cout
Cout
EN EN
VDD VDD
Cin
DW
DW
DW
A
B
A
DWCin
B DW Cout
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
nanowire
BL BLB
Column mux & sense amplifiers
Wo
rd-lin
e d
eco
de
r
8th bit1st bit 2nd bit
Parallel output by distributing bits into separate nanowires
Sigmoid
function by
DWM-LUT
Bit-lin
e
de
co
de
r
x 1/(1+e-x)
x
~1/(1+e-x)
14
Nonvolatile-in-memory Imaging Accelerator Machine learning for super-resolution imaging Comparisons with conventional architecture
1. All operations involved in machine learning on neural network can be mapped to a logic-in-memory
architecture by non-volatile domain-wall nanowire. 2. I/O traffic in proposed DW-NN is greatly alleviated with an energy efficiency improvement by 92x and
throughput improvement by 11.6x compared to the conventional image processing system by general
purpose processor.
15
Multi-modal ICs for Data Collection
System miniaturization for point-of-care personal data collection:
microscope, NMR, flow cytometer, PCR, network analyzer
16
Multi-modal CMOS sensor: electrical, optical and chemical Microfluidic channel: molecules, tissues, cells, and biofilms LoC system: high throughput, non-invasive, large array, on-chip
processing for potable diagnosis
CMOS based Multi-modal Lab-on-chip
17
CMOS Sub-THz Imaging System
[H. Yu-NTU: IMS’12-14, ASSCC’12-13, RFIC’13, CICC’13-14, ESSCIRC’14, TMTT’13-15] 18
CMOS Dual-mode ISFET Sensor for DNA
Seqeuncing and Food Safety
Parameters Specifications
Process Standard TSMC
0.18μm CIS
Pixel Type Dual-Mode
(Image and Chemical)
Pixel Size 10μm×10μm
Pixel Optical
Sensing Area 20.1μm2 (FF=18.1% )
Pixel Chemical
Sensing Area 22.3μm2 (FF=20.1% )
Array Size 64×64
Die Area 2.5mm×5mm
ADC ENOB 11.4 bits
ADC SNDR 70.35dB
FPN 0.3%
Frame Rate 1200fps
Total Power
Consumption 32mA @ 3.3V
[H.Yu-NTU: IEEE VLSI-SYMP’14 Highlighted, ISMM ‘14 Keynote Talk] 20
Measurement: Accurate Large-arrayed Local pH
Correlation
1600
2400
3200
14
7
0
pH
sc
ale
ba
r
Dig
ita
l O
utp
ut
(12
-bit
)
Contact Image pH Map
21
top related