table of contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · architectural platform...
TRANSCRIPT
SiPS Symposium Tutorial October, 2018
1
Algorithm/Architecture Co-design for Smart Signals and Systems in Cognitive Cloud/Edge
Chris Gwo Giun Lee, PhDDirector, Bioinfotronics Research Center,
Professor, Department of Electrical Engineering,National Cheng Kung University
Tainan, Taiwan
Table of ContentsIntroduction
The Big DATA Era Analytics Algorithm: analysis of big multimedia & medical data Analytic Architecture: analysis of algorithms for SMART SYSTEM design
Algorithm/Architecture Co-design: Analytic Architecture for SMART SoC/Edge
Architectural Platform before Cloud Algorithm/Architecture Co-Design: Abstraction at the System Level Dataflow Graph: Modeling the Computational Platform Algorithmic Intrinsic Complexity Metrics and Assessment Intelligent Parallel/Reconfigurable Computing
Towards IoT, Cloud, & beyond: Big Data Analytics in SMARTECH/AI with Analytic Architecture
Deep Learning Intelligent Surveillance (MPEG) Medical Imaging
1/30/2019 2Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
2
Introduction
1/30/2019 3Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
The Math Bridge in Cambridge University
1/30/2019 4Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
3
Apollo Navigation Computer
1/30/2019 5Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SMARTECH: The New Paradigm Shift in Technology
• Before Industrial Revolution: Innovations in ENERGY and ELECTRICITYbrought forth automation resulting in revolutionary changes to traditional Artisan craftsmanship from the social, political, and economical perspectives.
Artisan workers took a few decades to be re-educated so as to re-enter the new working environment.
• Today: After INFORMATION explosion resulting in BIG DATA, McKinsey Management Company in US has forecasted on changes brought forth by ARTIFICIAL INTELLIGENCE (AI) to be 10 times faster and 300 times larger in scale as compared to former Industrial Revolution!
In the 21st century, big corporate style on-the-job- training is reducing and the overall ecosystem with more startups and self-employment places more requirements on self-learning and metacognitive skills.
Industry 1.0: Energy Industry 2.0: Electricity
Industry 3.0: Information Industry 4.0: AI
Fast Changing Landscape: Engineers have been the center of ALL these technologies from the very beginning!
SiPS Symposium Tutorial October, 2018
4
Data Gets Ever Larger
Today:
Extending outwards even further with multiple sensors
interconnected . When going deeper inwards into the human
body with huge data from human brain and human genome
Marshall McLuhan (1960’s):
Electronic Media, primarily Television being extension of
human nervous systems
1/30/2019 7Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Reaching Out Even Further via IoT & Going in Ever Deeper into the Human Brain and
Genome
2015/12/16
Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE Media SoC Lab. 8
SiPS Symposium Tutorial October, 2018
5
Ever More Complex Analytics Algorithms Should Run on Analytics Architecture
2018/10/21 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE Media SoC Lab. 9
Speech recognition with feelings
Analytics Algorithm: Analyzes speech & images
Facial emotion detection
Analytics Architecture:Analyzes algorithms & data
Algorithm/Architecture Co-Design: Analytics Architecture for SMART SoC
1/30/2019 10Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
6
New Design Paradigm: Moving from programming to design and beyond…post
Moore’s law
Lee from NCKU (2007):
Design = Algorithm + Architecture
Wirth from ETHZ (1975):
Programming = Algorithm + Data Structure
1/30/2019 11Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Architectural Platforms Beyond Cloud
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 12
Performance/Area
Flexibility
PowerASIC
Embedded Reconfigurable Logic/ FPGA
Application Specific Instruction Set Processor (ASIP)
Instruction Set DSP
Reconfigurable Processor/FPGA
Embedded general purpose instruction set processor
Embedded multicore processor/GPU
Towards Cloud & Heterogeneous Systems
Neuromorphic Device
SiPS Symposium Tutorial October, 2018
7
Cross Level of Abstraction Design Space Exploration
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 13
Application/Specification
Algorithm
Architecture
Cycle-accurate model
Synthesizednetlist
Physicaldesign
Devicelevel
Different instances or realizations
Application
SystemArchitecture
Microarchitecture
Circuit
Device
Explore
Algorithm/Architecture Co-exploration
Software/Hardware Co-exploration
Circuit/Device Co-exploration
Neuromorphic
Algorithm/Architecture Co-Design: Abstraction at the System Level
Algorithm Design
Architecture Design
Traditional design flowAlgorithm/Architecture Co-Exploration
Algorithm Design
Architecture Design
Complexity MeasurementNo. of operationsDegree of parallelismData transfer rateData storage requirements
Back AnnotationData Flow
G. G. (Chris) Lee, Y.-K. Chen, M. Mattavelli, and E. S. Jang, “Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives,” IEEE Trans. on Circuits and Systems for Video Technology. Vol. 19, Iss. 11, pp. 1576-1587, Nov. 2009.
1/30/2019 14Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
8
Futuristic Smart System Design Methodology
Dataflow modelingat different abstraction levels
Characterization of algorithmic complexity
Architectural information
Algorithmic complexity
Mapping
Computing
algorithms
Software/Hardware/NeuromorphicPartition
Algorithms
Complexity
Architectural information: Parallel/Reconfigurable
Dataflow model
Algorithmic SpaceExploration
Algorithm/Architecture
Co-exploration (Spectra Graph Theory)
Complexity
Back annotation
Delay model, energy consumption, circuit reliability
Applications IoT/Mobile Comunication/5G Multimedia Biomedical
Software/Hardware
Co-exploration
Circuit/Device
Co-exploration
Programmable Parallel CPU/ GPU
Design
Reconfigurable Logic design
Analog-NVM Neuromorphic
Accelerator
MOS TransistorDesign using
TCAD/Quantum Simulator
Memory Device Design using
TCAD/Quantum Simulator
Analytic algorithm with Deep Learning for different applications
Device capacitances, Memory retention, endurance, read/write latency & current
Customized
LogicBack annotation
MODELING the COMPUTATIONAL PLATFORM
via
Dataflow Graphs (DFG)
1/30/2019 16Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
9
Dataflow…
1/30/2019 17Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Block Diagram
Three-tap FIR filter: y[n] = a0x[n]+a1x[n-1]+a2x[n-2]
Z-1 Z-1
a0 a1 a2
x[n]
y[n]+ +
Z-1 Z-1
a0 a1 a2
y[n]
x[n]
++
Direct form:
Transposed form (Data broadcast form):
1/30/2019 18Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
10
Pipeline View of Dataflow in FIR(Direct Form)
Clock Cycles
Input D0 D1 Output
0 x[0] y[0] = a0x[0]
1 x[1] x[0] y[1] = a0x[1] + a1x[0]
2 x[2] x[1] x[0] y[2] = a0x[2] + a1x[1]+ a2x[0]
3 x[3] x[2] x[1] y[3] = a0x[3] + a1x[2]+ a2x[1]
4 x[4] x[3] x[2] y[4] = a0x[4] + a1x[3]+ a2x[2]
…
127 x[127] x[126] x[125] y[127] = a0x[127] + a1x[126]+ a2x[125]
128 x[127] x[126] y[128] = a1x[127]+ a2x[126]
129 x[127] y[129] = a2x[127]
D0 D1
a0 a1 a2
x[n]
y[n]+ +
1/30/2019 19Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Pipeline View of Dataflow in FIR(Transposed Form)
Clock Cycles
Input D0 D1 Output
0 x[0] y[0] = a0x[0]
1 x[1] a2x[0] a1x[0] y[1] = a0x[1] + a1x[0]
2 x[2] a2x[1] a1x[1] + a2x[0] y[2] = a0x[2] + a1x[1]+ a2x[0]
3 x[3] a2x[2] a1x[2] + a2x[1] y[3] = a0x[3] + a1x[2]+ a2x[1]
4 x[4] a2x[3] a1x[3] + a2x[2] y[4] = a0x[4] + a1x[3]+ a2x[2]
…
127 x[127] a2x[126] a1x[126] + a2x[125] y[127] = a0x[127] + a1x[126]+ a2x[125]
128 a2x[127] a1x[127] + a2x[126] y[128] = a1x[127]+ a2x[126]
129 a2x[127] y[129] = a2x[127]
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 20
D1 D0
a0 a1 a2
y[n]
x[n]
++
SiPS Symposium Tutorial October, 2018
11
2019/1/30 Prof. Gwo Giun Lee/NCKUEE 21
Synthesis Results for 3Tap FIR Filters Using Direct and Transpose Forms
Direct FormTransposed
Form
Area(gate-count)
1212(1110, 101)
1674(1573, 101)
Timing(ns)
12.7 ns 10.37 ns
Power(mW)
35.03 mW 37.26 mW
architecture
criteria
(1) Gate count is in terms of equivalent 2-input NAND gate(2) Timing is based on static timing analysis(3) Power dissipation is only a very rough estimation
Dataflow Graph Models
They should contain: Algorithmic information or behavior
Architectural information for implementation
Some important dataflow models are: Directed acyclic graph (DAG)
Synchronous dataflow (SDF) graph
Control data flow graph (CDFG)
Kahn process networks (KPN)
Y-chart application programming interface (YAPI)
1/30/2019 22Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
12
Fundamental Dataflow Models
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 23
Methods Descriptions Features
Directed acyclic graph (DAG)
Vertex: atomic operations
Directed edge: data dependency Dependency
Synchronous dataflow (SDF) graph [1] Extension of DAG by annotating edges with static data rates Data rate
Control data flow graph (CDFG)[2] Encapsulating DAGs as vertexes controlled by control edges Control flow
Kahn process networks (KPN)[3]
Actor: atomic operations or processes
Communication among actors: first-in-first-out (FIFO)
Data-driven
Concurrence
Y-chart application programming interface (YAPI)[4]
Extension of KPN to support dynamic selection of FIFO channels
Controllability
[1] Edward A. Lee and David G. Messerschmitt, “Synchronous Data Flow,” Proceedings of the IEEE, vol. 75, no. 9, p 1235-1245, September, 1987.
[2] L. Thiele, S. Chakraborty, M. Gries, and S. Künzli, “A framework for evaluating design tradeoffs in packet processing architectures,” in Proc. Annu. ACM IEEE Design Autom. Conf., New Orleans, LA,2002, pp. 880–885.
[3] G. Kahn, “The semantics of simple language for parallel programming,” in IFIP Congress ’74, pp. 471–475, 1974.
[4] E. A. de Kock, W. J. M. Smits, P. van der Wolf, J.-Y. Brunel, W. M. Kruijtzer, P. Lieverse, K. A. Vissers,and G. Essink, “Yapi: application modeling for signal processing systems,” in Proc., Design automation conf., pp. 402– 405, 2000.
Directed Acyclic Graph (DAG)
DAG are acyclic graphs composed of nodes and directed edges: nodes indicate operation activities
directed edges between nodes represent the dependency of operations.
Arrows indicate the directions in which the data flows through the operation nodes
A
D E
B C
F
A, B…F: operations of the nodes
1/30/2019 24Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
13
Synchronous Dataflow (SDF) Graph
SDF are graphs extended from DAGs where specified a priori Heads of edges represents fixed input
Tails represent output data-rates of nodes,
If the input/output data rate of the nodes are not specified or known a priori, we then have asynchronous dataflow graph
A, B…F: operations of the nodes
a, b,…o, p: data-rate of the nodes.
A
D E
B C
F(a) (p)
(h)
(f)
(b)
(c)
(d)
(e) (k)
(i) (l)
(m)
(g)
(j)
(n)
(o)
1/30/2019 25Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Control Dataflow Graph (CDFG)
CDFG describe the flow of data computation and control procedures using nodes, control edges and data edges.
Pseudo codes:1 DFG_0
2 if (condition_1)
DFG_1
3 else if (condition_2)
DFG_2
4 DFG_3
Data edge
Control edge
condition_1 condition_2DFG_0
DFG_3
DFG_1 DFG_2
1/30/2019 26Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
14
Kahn Process Networks (KPN)
KPN are directed graphs consisting of nodes and edges. Nodes indicate processes and directed edges represent unbounded and infinite FIFO queues between processes: Blocking read: a process is fired when input data exist.
Non-blocking write: output data are sent out via a unbounded FIFO.
Process1
Process2
Process3
G. Kahn, “The semantics of simple language for parallel programming,” in IFIP Congress ’74, pp. 471–475, 1974.
1/30/2019 27Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Y-chart Application Programming Interface (YAPI)
Extension of KPN. KPN models deterministic (synchronous) dataflow
YAPI is capable of modeling non-deterministic (asynchronous) data flow by allowing dynamic decision on selecting which channel to communicate.
The selection of port/channel depends on the amount of data
Processport2
Mechanism:Let c(pk) be the number of tokens existing in portk
and Nk be the number of tokens required to be consumed from pk at a timeif ( c(pk)>= Nk)
then read Nk tokens from port pk
1.E. A. de Kock, W. J. M. Smits, P. van der Wolf, J.-Y. Brunel, W. M. Kruijtzer, P. Lieverse, K. A. Vissers, and G. Essink, “Yapi: application modeling for signal processing systems,” (Los Angeles, California, United States), pp. 402– 405, ACM, 2000.
1/30/2019 28Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
15
How Big is Big?
Algorithmic Intrinsic Complexity Metric & Assessment:
PLATFORM INDEPENDENCE
1/30/2019 29Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Number of Operations
Estimates the number of each type of operations Addition/Subtraction
Multiplication
Division
Shift
Logic operations
Operations with constant input and variable input should be differentiated to provide high accuracy X + Y vs. X + 5 (X and Y are variables)
X×Y vs. X×5 (X and Y are variables )
In addition, the precision of each operand should be taken into account, since it can significantly influence complexity
1/30/2019 30Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
16
SMART TRANSFORM PAIRvia
Spectral Graph Theoryfor
Intelligent Parallel/Reconfigurable Computing
1/30/2019 31Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Parallel Computing (Forward Transform): Efficient & Flexible Cognitive Cloud
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 32
In GlobalSIPS 2015, IBM T. J. Watson, Intel, etc. consistently addressed this technology as THE future of cognitive computing• Gwo Giun Lee, He-Yuan Lin, Chun-Fu Chen, Tsung-Yuan Huang, "Quantifying Intrinsic Parallelism Using Linear Algebra for
Algorithm/Architecture Co-Exploration," IEEE Transactions on Parallel and Distributed Systems, vol. 23, iss. 5, pp. 944-957, May 2012• Gwo Giun (Chris) Lee, Chun-Hsi Huang, Chun-Fu (Richard) Chen, Tai-Ping Wang, "Complexity-Aware Gabor Filter Bank Architecture Using
Principal Component Analysis," Journal of Signal Processing Systems for Signal, Image, and Video Technology, pp 1-14, Apr. 2017.• Gwo Giun Lee, Lee Barford, Paul Blinzer, Joe Cavallaro, Hong, Jiang, Nick Moore, Yinglong Xia, “GlobalSIPS 2015 PANEL DISCUSSION:
ALGORITHMS VS. ARCHITECTURES: OPPORTUNITIES AND CHALLENGES IN MULTICORE/GPU DSP”, DEC. 2018, FLORIDA.https://sigport.org/documents/panel‐discussion‐algorithms‐vs‐architectures‐opportunities‐and‐challenges‐multicoregpu‐dsp
SiPS Symposium Tutorial October, 2018
17
Reconfigurable Computing (Inverse Transform): Efficient & Flexible Mobile Edge
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 33
Technology adopted by ISO/IEC/MPEG’s Reconfigurable Ad Hoc Group & served as Project Lead
• Gwo Giun (Chris) Lee, Chun-Fu (Richard) Chen , Ching-Jui Hsiao, "Reconfigurable Parser Architecture Design with Microprogrammed Controller for Multiple Purposes," Journal of Signal Processing Systems for Signal, Image, and Video Technology, pp 1-15, Mar. 2016
• Gwo Giun (Chris) Lee, Chun-Fu Chen, Shu-Ming Xu, Ching-Jui Hsiao, "High-Throughput Reconfigurable Variable Length Coding Decoder for MPEG-2 and AVC/H.264," Journal of Signal Processing Systems for Signal, Image, and Video Technology, vol. 82, iss. 1, pp 27-40, Jan. 2016
Spectral Graph Theory (SGT): Adjacency & Laplacian Matrix
i j1 if vertex and vertix are adjacent to each other(i,j)
0 otherwise
A
0 1 0
1 0 1
0 1 0
A
Adjacency matrix A
1 2 3Graph
i j
(i,j) if i j
(i,j) -1 if i j and vertex is adjacent to vertix
0 otherwise
D
L
Laplacian matrix L=D-A, where D is a diagonal matrix where the diagonal elements represents the number of edges connected to that node.
1 1 0
1 2 1
0 1 1
L
1/30/2019 34Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
18
Degree of Parallelism: Eigen-Analysis of Dataflow Graphs using SGT
1
0
0
0
1
1
1 0 0 0 0 1
0 1 0 0 0 1
0 0 1 0 1 0
0 0 0 1 1 0
0 0 1 1 2 0
1 1 0 0 0 2
+ ++
A1 B1 C1 D1
O1
+ +
+
A2 B2 C2 D2
O2
O1=A1+B1+C1+D1
O2=A2+B2+C2+D2
1 2
6
43
5
Algorithm Dataflow diagram
Causation graphLaplacian matrix
Spectrum
0, 0, 1, 1, 3, 3Parallelism
2
0
1
1
1
0
0
0
0
1
1
0
0
0
0
0
0
1
1
0
2
1
1
0
0
2
0
0
0
1
1
Eigenvalue:
Eigenvector:
×
(Homogeneous)
1/30/2019 35Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Quantification of parallelization, Instruction Set Architecture (ISA) design
2019/1/30 Prof. Gwo Giun (Chris) Lee/NCKUEE 36
Reconfigurable Architecture: Commonality Extraction from Dataflow Graphs;
• Observe the common parts between each dataflow graph.
AVC/H.264 filter coefficients[1,-5,20,20,-5,1]
MPEG-4 filter coefficients[-8,24,-48,160,160,-48,24,-8]
[-1,3,-6,20,20,-6,3,-1]Divide by 8
MPEG-4 Chroma interpolation, MPEG2,and AVC/H.264 chroma prediction
[1 1]
[1 1 1 1]
+x[0]x[5]
x[1]x[4]
x[2]x[3]
+
+
2 +
2 +
4
- +
+
result
i : left shift by i i : right shift by i + : addition
+x[0]x[7]
x[1]x[6]x[2]x[5]
+
+
1 +
2 +
1
-3+ result
x[3]x[4]
+ 2 +
4
+
+
+x[0]x[1]x[2]x[3]
+
+ resultx[0]x[1]
+ result
SiPS Symposium Tutorial October, 2018
19
2019/1/30 37
Reconfigurable fractional interpolation;
[-8,24,-48,160,160,-48,24,-8]
+ 1 +
+
+ 2 +
4
3Module 4
[1,-5,20,20,-5,1]
+ 2 +
1
- + 2 + -
+ 2 +
4
[1 1 1 1] [1 1]
+
+
+
+
result
x[0]x[7]
+
3
+ 1 +x[1]x[6]
x[2]x[5]
+ 2 +
1
-
x[3]
x[4]+ 2 +
4
+
+
+
result
+ 2 +
1
-
x[5]
x[1]x[4]
x[2]
x[3]
x[0]
+ 2 +
4
+x[1]
x[0]
result+
+
+
x[1]
x[2]
x[3]
x[0]
result+
+
+
result
Module 0
Module 1
Module 2
Module 3
Module 4
x[1]x[6]
x[2]x[5]
x[3]x[4]
x[0]x[7]
x[5]0
x[1]x[4]
x[2]x[3]
x[0]0
x[1]0
x[2]0
x[3]0
x[0]x[1]
Coefficients
Module 3
Module 2
Module 1
Module 0
Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Data Storage: Configuration & Requirement using Graph 1/2
Each initial cell region confined by external nuclei markers is represented by avertex.
Images are divided into tiles of size 64 by 64.
The vertex size is defined as the number of tiles that each region is contained in. The tile size is obtained from the statistical average of all initial cell region areas.
Fixed tile size is introduced for efficient parallel processing of irregular cell regions.
The tile size could also be considered together with the platform architecture forcomputational efficiency.
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 38
One tile with size 64 by 64
Vertex
11
22 44
33
Size = 4
Size = 4
Size = 2 Size = 6
SiPS Symposium Tutorial October, 2018
20
Data Storage: Configuration & Requirement using Graph 2/2
The weight on each edge represents the proportionate overlapping area betweenthe two cell regions represented by the corresponding vertices.
The weight of each edge is defined as:
This edge weight indicates the tendency of grouping or compiling correspondingtiles of the two vertices into the same computation cluster so as to optimize datastorage, maximize data reuse and minimize data transfer rate.
The tile size could also be considered together with the platform architecture forcomputational efficiency.
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 39
wi, j
,
11
22 44
33
0.2
0.2
0.1250.33
0.33
Data Transfer Rate & Bandwidth via Graph Theory
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 40
Data Flow Graph
+
y[n]
x[n]x[n-1]
e1
e2
e3
v1 v2
v3
v4
Edge cut
y[n] = x[n-1] + x[n]
1
2
3
2
0
2
e
e
e
Mx
1 2 3 4
1
2
3
1 0 1 0
0 1 1 0
0 0 1 1
v v v ve
e
e
M
1
2
3
4
1
1
1
1
v
v
v
v
x
Mx(i) is the i-th element in vector Mx.N is total element number of a vector.
Edge cut• A set of edges which partitions a graph into at least two disconnected
components if this set of edges is removed.• Can be used to study the data transaction between vertices.
Indicator vector (x)• A labeled vector, which is composed of 1 and -1, separates vertices
into two groups.
Edge characteristics in the edge cut (Mx)• Elements in Mx present the edge characteristics
• In-edge-cut ( >0 ), out-edge-cut ( <0 ), non-edge-cut ( 0 )• Number of edges in a edge cut could be derived by Mx
• Amount of data transfer on this edge cut: 1
0
12
2
N
i
i
Mx
SiPS Symposium Tutorial October, 2018
21
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 41
A Very Useful SMART Sensor System
SMARTLET: SMART toiLET
SMART is a BUZZ word that SELLS
IoT: Any systems connected via signals for the exchange of information
1/30/2019 42Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
22
SoC Platforms
System Platforms: From System-on-Chip to IoT/Cloud & Beyond
1/30/2019 Bioinfotronics Research Center 43
IoT and Networking
Platforms
IntelligentSurveillance (IS)
IntelligentHealth Cloud (IHC)
Distributed System The central control system performs pending jobs analysis:
Similarity and loading of pending jobs
Overhead of communication
Workload of processors
Intelligently allocate jobs to processors/clients (located at different physical locations) based on the analyzed parameters
When reconfiguring the internet architecture or platform, we have cognitive internet
Job queue
Client/PE
Center
Client/PEClient/PE
Client/PE
Client/PE
Client/PE
1/30/2019 44Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
23
Cloud Computing/Service According to the priority and difference of requests from clients Cloud server intelligently reschedules and reallocates the resource in
response to the clients.
client
Communication with cloud server (request and response)
Cloud Server
client
client
client
client
client
Intelligent arbitrator
1/30/2019 45Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Storage in Cloud Server
Analyze the similarity of requests and evaluate the workload of processors
Intelligently reallocate and reschedule the storage resource to the available processors
Storage 0 Storage 1Storage
N-1
CPU 0 CPU 1 CPU N-1
Intelligent arbitrator
Client request queue
…
…
1/30/2019 46Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
24
Reconfigurable Computing over Cloud Similarity/Correlation Analysis: Telemedicine
Graph Similarity/Correlation analysis to reconfigure scalable computing clusters cluster
Normal relationship
Close relationship
Dataflow graph
1/30/2019 47Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Deep Learning
1/30/2019 48Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
25
Modeling the Neuron Mathematically
Neural network could define a complex, non-linear form to represent the relationship between input x and output y through a hypothesis hw,b(x) Neuron is the computational unit that represents input data in another
way
hw,b(x) = f(wTx + b), f: RR is activation function. Popular activation functions, sigmoid, hyper tangent, rectified linear function
x1
xN
+1
hw,b(x)
x = [x1, x2, …, xN]T
w = [w1, w2, …, wN]T
…
wN
w1
b
1/30/2019 49Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Its All about Synapses of Neurons in Forming a Network
Multi-Layer Perception (MLP) This example includes one
input layer, one hidden layer and one output layer.
Forward propagation to predict output based on input.
Back propagation to tune neural network based on the prediction error.
It can be configured in several hidden layers, different connectivity between layers.
1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Chris Lee/NCKUEE MSoC Laboratory 50
x1
…
xN
+1 +1
Input layer Hidden layer
Outputlayer
hw,b(x)
bias unitbias unit
Feedforward neural network
SiPS Symposium Tutorial October, 2018
26
Convolutional Neural Network (CNN)
Instead of using fully-connected neural network, using locally-connected neural network to extract local features. Includes convolution layer, pooling layer, fully-connected network
Training method: Unsupervised pre-training (stacked (denoising) autoencoder)
Back propagation through label information
Convolution layer Sub-sampling/Pooling layer Convolution layer Sub-sampling/
Pooling layerFully connected
network
Input layer 3 feature maps 3 sub-sampled feature maps
6 sub-sampled feature maps
6 sub-sub-sampled feature maps Normal layer Output layer
1/30/2019 51Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Deep Belief Nets (DBN)
Deep Belief Nets (DBN): Restricted Boltzman Machine
(RBM) + directed belief nets
Visible layer: input data
Hidden layer: features
Training method: Unsupervised pre-training
(stacked RBMs)
Back propagation for fine-tuning
DBN could also be multi-dimensional structure.
...P P P P P1 2 L 1 1 2 L-2 L-1 L-1 Lv,h ,h , ...h v | h h | h h | h h ,h
Directed belief nets
RBM
Hidden layers
Visible layer
h3
h2
h1
v
1/30/2019 52Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
27
Computer Scientist’s Perspective of the Human Brain
1/30/2019 53Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
Collaboration with IBM TJ Watson Research Center
International Standardizationin ISO/ITU/MPEG
Activities with Academia/Industry Worldwide: MIT, Aachen, EPFL, MERL, Qualcomm, Samsung, LG, Sharp, MediaTek, etc
Encodedbitstream
FU: function units
Network of FUs
FUFU
FU
FU
Syntax Parser
DecoderdescriptionsSyntax
FU networkconfiguration
Decodedvideo
Reconfigurable Video Coding
Encoded 3D Content
3D-HEVC
Decoding
1/30/2019 Bioinfotronics Research Center 54
SiPS Symposium Tutorial October, 2018
28
Biomedical Imaging: Harmonically Generated Microscopy
Analysis of Harmonic GenerationMicroscopy Image from NTU, Taiwan including CNN
Image stack in epidermis with SK
Dermis
Epidermis
Cross section of human skin
Stratum corneum (SC)Stratum granulosum (SG)Stratum spinosum (SS)
Stratum basale (SB)
In (d),(e),(f) illustrate the proliferation of basaloid cell.
SB
SB
SB
Basaloid cells
Seborrheic keratosis (SK) & STEM cell detection• Observe through the image stack in the epidermis,
because of the proliferation of basaloid cell,result in acanthotic epidermis.
• In normal skin, epidermal thickness: 60~80um.• In SK, epidermal thickness: exceed 100um.
1/30/2019 Bioinfotronics Research Center 55
BCC Detection via Deep CNN
Detection of Basal Cell Carcinoma (BCC) via Deep CNN using ~2000 HGM Images with 97.02% accuracy with 95.08% specificity and 99.04% sensitivity
Transfer Medical Doctor’s knowledge to feature layers of Deep Convolution Neural Network (CNN)
Image stack in epidermis
Dermis
Epidermis
Cross section of human skin
Stratum corneum (SC)Stratum granulosum (SG)Stratum spinosum (SS)
Stratum basale (SB)
SB
SB
SB
1/30/2019 Bioinfotronics Research Center 56
𝑁 𝑁 1 𝑁 𝑁 𝑘
𝑁2
𝑁2
2𝑘
·······
𝑁4
𝑁4
4𝑘𝑁8
𝑁8
8𝑘
1 1𝑥
SiPS Symposium Tutorial October, 2018
29
SMART Cloud Computing for Precision Medicine
Patents licensed by Inform Genomics in USA on Analytics Architectural Platform
1/30/2019 Bioinfotronics Research Center 57
Chemotherapy Induced Nausea Vomiting Data Analytics for CIPN NCKU Medical School and Hospital
Center Server
Sub-system
Sub-system
Sub-system Sub-system
Sub-system Sub-system Job Queue
User 0 User 1 User 2 User 3 User N
…
Fault Tolerance
Fault Tolerance
Functional EvaluationRVKDE Prediction
Gene SelectionDeep Learning
1/30/2019 Bioinfotronics Research Center 58
C1
Slide 58
C1 Sir would like to edit descriptions.Chen, 3/27/2016
SiPS Symposium Tutorial October, 2018
30
Intelligent Health Cloud Database Management System
Collaboration with BannerAlzheimer’s Institute, AZ, USA targeted for
Telemedicine
1/30/2019 Bioinfotronics Research Center 59
Conclusion
We need to have cross level of abstraction system design methodology to solve cross disciplinary
problems in SMART manners
CLOUD/EDGE/Neuromorphic Computing
Intelligent Surveillance for Autonomous Cars
Precision Medicine
Brain Initiatives……
Nanoscale Histopathology
1/30/2019 60Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.
SiPS Symposium Tutorial October, 2018
31