table of contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · architectural platform...

32
SiPS Symposium Tutorial October, 2018 1 Algorithm/Architecture Co-design for Smart Signals and Systems in Cognitive Cloud/Edge Chris Gwo Giun Lee, PhD Director, Bioinfotronics Research Center, Professor, Department of Electrical Engineering, National Cheng Kung University Tainan, Taiwan Table of Contents Introduction The Big DATA Era Analytics Algorithm: analysis of big multimedia & medical data Analytic Architecture: analysis of algorithms for SMART SYSTEM design Algorithm/Architecture Co-design: Analytic Architecture for SMART SoC/Edge Architectural Platform before Cloud Algorithm/Architecture Co-Design: Abstraction at the System Level Dataflow Graph: Modeling the Computational Platform Algorithmic Intrinsic Complexity Metrics and Assessment Intelligent Parallel/Reconfigurable Computing Towards IoT, Cloud, & beyond: Big Data Analytics in SMARTECH/AI with Analytic Architecture Deep Learning Intelligent Surveillance (MPEG) Medical Imaging 1/30/2019 2 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Upload: others

Post on 23-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

1

Algorithm/Architecture Co-design for Smart Signals and Systems in Cognitive Cloud/Edge

Chris Gwo Giun Lee, PhDDirector, Bioinfotronics Research Center,

Professor, Department of Electrical Engineering,National Cheng Kung University

Tainan, Taiwan

Table of ContentsIntroduction

The Big DATA Era Analytics Algorithm: analysis of big multimedia & medical data Analytic Architecture: analysis of algorithms for SMART SYSTEM design

Algorithm/Architecture Co-design: Analytic Architecture for SMART SoC/Edge

Architectural Platform before Cloud Algorithm/Architecture Co-Design: Abstraction at the System Level Dataflow Graph: Modeling the Computational Platform Algorithmic Intrinsic Complexity Metrics and Assessment Intelligent Parallel/Reconfigurable Computing

Towards IoT, Cloud, & beyond: Big Data Analytics in SMARTECH/AI with Analytic Architecture

Deep Learning Intelligent Surveillance (MPEG) Medical Imaging

1/30/2019 2Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 2: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

2

Introduction

1/30/2019 3Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

The Math Bridge in Cambridge University

1/30/2019 4Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 3: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

3

Apollo Navigation Computer

1/30/2019 5Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

SMARTECH: The New Paradigm Shift in Technology

• Before Industrial Revolution: Innovations in ENERGY and ELECTRICITYbrought forth automation resulting in revolutionary changes to traditional Artisan craftsmanship from the social, political, and economical perspectives.

Artisan workers took a few decades to be re-educated so as to re-enter the new working environment.

• Today: After INFORMATION explosion resulting in BIG DATA, McKinsey Management Company in US has forecasted on changes brought forth by ARTIFICIAL INTELLIGENCE (AI) to be 10 times faster and 300 times larger in scale as compared to former Industrial Revolution!

In the 21st century, big corporate style on-the-job- training is reducing and the overall ecosystem with more startups and self-employment places more requirements on self-learning and metacognitive skills.

Industry 1.0: Energy Industry 2.0: Electricity

Industry 3.0: Information Industry 4.0: AI

Fast Changing Landscape: Engineers have been the center of ALL these technologies from the very beginning!

Page 4: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

4

Data Gets Ever Larger

Today:

Extending outwards even further with multiple sensors

interconnected . When going deeper inwards into the human

body with huge data from human brain and human genome

Marshall McLuhan (1960’s):

Electronic Media, primarily Television being extension of

human nervous systems

1/30/2019 7Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Reaching Out Even Further via IoT & Going in Ever Deeper into the Human Brain and

Genome

2015/12/16

Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE Media SoC Lab. 8

Page 5: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

5

Ever More Complex Analytics Algorithms Should Run on Analytics Architecture

2018/10/21 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE Media SoC Lab. 9

Speech recognition with feelings

Analytics Algorithm: Analyzes speech & images

Facial emotion detection

Analytics Architecture:Analyzes algorithms & data

Algorithm/Architecture Co-Design: Analytics Architecture for SMART SoC

1/30/2019 10Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 6: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

6

New Design Paradigm: Moving from programming to design and beyond…post

Moore’s law

Lee from NCKU (2007):

Design = Algorithm + Architecture

Wirth from ETHZ (1975):

Programming = Algorithm + Data Structure

1/30/2019 11Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Architectural Platforms Beyond Cloud

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 12

Performance/Area

Flexibility

PowerASIC

Embedded Reconfigurable Logic/ FPGA

Application Specific Instruction Set Processor (ASIP)

Instruction Set DSP

Reconfigurable Processor/FPGA

Embedded general purpose instruction set processor

Embedded multicore processor/GPU

Towards Cloud & Heterogeneous Systems

Neuromorphic Device

Page 7: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

7

Cross Level of Abstraction Design Space Exploration

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 13

Application/Specification

Algorithm

Architecture

Cycle-accurate model

Synthesizednetlist

Physicaldesign

Devicelevel

Different instances or realizations

Application

SystemArchitecture

Microarchitecture

Circuit

Device

Explore

Algorithm/Architecture Co-exploration

Software/Hardware Co-exploration

Circuit/Device Co-exploration

Neuromorphic

Algorithm/Architecture Co-Design: Abstraction at the System Level

Algorithm Design

Architecture Design

Traditional design flowAlgorithm/Architecture Co-Exploration

Algorithm Design

Architecture Design

Complexity MeasurementNo. of operationsDegree of parallelismData transfer rateData storage requirements

Back AnnotationData Flow

G. G. (Chris) Lee, Y.-K. Chen, M. Mattavelli, and E. S. Jang, “Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives,” IEEE Trans. on Circuits and Systems for Video Technology. Vol. 19, Iss. 11, pp. 1576-1587, Nov. 2009.

1/30/2019 14Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 8: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

8

Futuristic Smart System Design Methodology

Dataflow modelingat different abstraction levels

Characterization of algorithmic complexity

Architectural information

Algorithmic complexity

Mapping

Computing

algorithms

Software/Hardware/NeuromorphicPartition

Algorithms

Complexity

Architectural information: Parallel/Reconfigurable

Dataflow model

Algorithmic SpaceExploration

Algorithm/Architecture

Co-exploration (Spectra Graph Theory)

Complexity

Back annotation

Delay model, energy consumption, circuit reliability

Applications IoT/Mobile Comunication/5G Multimedia Biomedical

Software/Hardware

Co-exploration

Circuit/Device

Co-exploration

Programmable Parallel CPU/ GPU

Design

Reconfigurable Logic design

Analog-NVM Neuromorphic

Accelerator

MOS TransistorDesign using

TCAD/Quantum Simulator

Memory Device Design using

TCAD/Quantum Simulator

Analytic algorithm with Deep Learning for different applications

Device capacitances, Memory retention, endurance, read/write latency & current

Customized

LogicBack annotation

MODELING the COMPUTATIONAL PLATFORM

via

Dataflow Graphs (DFG)

1/30/2019 16Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 9: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

9

Dataflow…

1/30/2019 17Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Block Diagram

Three-tap FIR filter: y[n] = a0x[n]+a1x[n-1]+a2x[n-2]

Z-1 Z-1

a0 a1 a2

x[n]

y[n]+ +

Z-1 Z-1

a0 a1 a2

y[n]

x[n]

++

Direct form:

Transposed form (Data broadcast form):

1/30/2019 18Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 10: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

10

Pipeline View of Dataflow in FIR(Direct Form)

Clock Cycles

Input D0 D1 Output

0 x[0] y[0] = a0x[0]

1 x[1] x[0] y[1] = a0x[1] + a1x[0]

2 x[2] x[1] x[0] y[2] = a0x[2] + a1x[1]+ a2x[0]

3 x[3] x[2] x[1] y[3] = a0x[3] + a1x[2]+ a2x[1]

4 x[4] x[3] x[2] y[4] = a0x[4] + a1x[3]+ a2x[2]

127 x[127] x[126] x[125] y[127] = a0x[127] + a1x[126]+ a2x[125]

128 x[127] x[126] y[128] = a1x[127]+ a2x[126]

129 x[127] y[129] = a2x[127]

D0 D1

a0 a1 a2

x[n]

y[n]+ +

1/30/2019 19Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Pipeline View of Dataflow in FIR(Transposed Form)

Clock Cycles

Input D0 D1 Output

0 x[0] y[0] = a0x[0]

1 x[1] a2x[0] a1x[0] y[1] = a0x[1] + a1x[0]

2 x[2] a2x[1] a1x[1] + a2x[0] y[2] = a0x[2] + a1x[1]+ a2x[0]

3 x[3] a2x[2] a1x[2] + a2x[1] y[3] = a0x[3] + a1x[2]+ a2x[1]

4 x[4] a2x[3] a1x[3] + a2x[2] y[4] = a0x[4] + a1x[3]+ a2x[2]

127 x[127] a2x[126] a1x[126] + a2x[125] y[127] = a0x[127] + a1x[126]+ a2x[125]

128 a2x[127] a1x[127] + a2x[126] y[128] = a1x[127]+ a2x[126]

129 a2x[127] y[129] = a2x[127]

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 20

D1 D0

a0 a1 a2

y[n]

x[n]

++

Page 11: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

11

2019/1/30 Prof. Gwo Giun Lee/NCKUEE 21

Synthesis Results for 3Tap FIR Filters Using Direct and Transpose Forms

Direct FormTransposed

Form

Area(gate-count)

1212(1110, 101)

1674(1573, 101)

Timing(ns)

12.7 ns 10.37 ns

Power(mW)

35.03 mW 37.26 mW

architecture

criteria

(1) Gate count is in terms of equivalent 2-input NAND gate(2) Timing is based on static timing analysis(3) Power dissipation is only a very rough estimation

Dataflow Graph Models

They should contain: Algorithmic information or behavior

Architectural information for implementation

Some important dataflow models are: Directed acyclic graph (DAG)

Synchronous dataflow (SDF) graph

Control data flow graph (CDFG)

Kahn process networks (KPN)

Y-chart application programming interface (YAPI)

1/30/2019 22Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 12: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

12

Fundamental Dataflow Models

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 23

Methods Descriptions Features

Directed acyclic graph (DAG)

Vertex: atomic operations

Directed edge: data dependency Dependency

Synchronous dataflow (SDF) graph [1] Extension of DAG by annotating edges with static data rates Data rate

Control data flow graph (CDFG)[2] Encapsulating DAGs as vertexes controlled by control edges Control flow

Kahn process networks (KPN)[3]

Actor: atomic operations or processes

Communication among actors: first-in-first-out (FIFO)

Data-driven

Concurrence

Y-chart application programming interface (YAPI)[4]

Extension of KPN to support dynamic selection of FIFO channels

Controllability

[1] Edward A. Lee and David G. Messerschmitt, “Synchronous Data Flow,” Proceedings of the IEEE, vol. 75, no. 9, p 1235-1245, September, 1987.

[2] L. Thiele, S. Chakraborty, M. Gries, and S. Künzli, “A framework for evaluating design tradeoffs in packet processing architectures,” in Proc. Annu. ACM IEEE Design Autom. Conf., New Orleans, LA,2002, pp. 880–885.

[3] G. Kahn, “The semantics of simple language for parallel programming,” in IFIP Congress ’74, pp. 471–475, 1974.

[4] E. A. de Kock, W. J. M. Smits, P. van der Wolf, J.-Y. Brunel, W. M. Kruijtzer, P. Lieverse, K. A. Vissers,and G. Essink, “Yapi: application modeling for signal processing systems,” in Proc., Design automation conf., pp. 402– 405, 2000.

Directed Acyclic Graph (DAG)

DAG are acyclic graphs composed of nodes and directed edges: nodes indicate operation activities

directed edges between nodes represent the dependency of operations.

Arrows indicate the directions in which the data flows through the operation nodes

A

D E

B C

F

A, B…F: operations of the nodes

1/30/2019 24Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 13: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

13

Synchronous Dataflow (SDF) Graph

SDF are graphs extended from DAGs where specified a priori Heads of edges represents fixed input

Tails represent output data-rates of nodes,

If the input/output data rate of the nodes are not specified or known a priori, we then have asynchronous dataflow graph

A, B…F: operations of the nodes

a, b,…o, p: data-rate of the nodes.

A

D E

B C

F(a) (p)

(h)

(f)

(b)

(c)

(d)

(e) (k)

(i) (l)

(m)

(g)

(j)

(n)

(o)

1/30/2019 25Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Control Dataflow Graph (CDFG)

CDFG describe the flow of data computation and control procedures using nodes, control edges and data edges.

Pseudo codes:1 DFG_0

2 if (condition_1)

DFG_1

3 else if (condition_2)

DFG_2

4 DFG_3

Data edge

Control edge

condition_1 condition_2DFG_0

DFG_3

DFG_1 DFG_2

1/30/2019 26Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 14: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

14

Kahn Process Networks (KPN)

KPN are directed graphs consisting of nodes and edges. Nodes indicate processes and directed edges represent unbounded and infinite FIFO queues between processes: Blocking read: a process is fired when input data exist.

Non-blocking write: output data are sent out via a unbounded FIFO.

Process1

Process2

Process3

G. Kahn, “The semantics of simple language for parallel programming,” in IFIP Congress ’74, pp. 471–475, 1974.

1/30/2019 27Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Y-chart Application Programming Interface (YAPI)

Extension of KPN. KPN models deterministic (synchronous) dataflow

YAPI is capable of modeling non-deterministic (asynchronous) data flow by allowing dynamic decision on selecting which channel to communicate.

The selection of port/channel depends on the amount of data

Processport2

Mechanism:Let c(pk) be the number of tokens existing in portk

and Nk be the number of tokens required to be consumed from pk at a timeif ( c(pk)>= Nk)

then read Nk tokens from port pk

1.E. A. de Kock, W. J. M. Smits, P. van der Wolf, J.-Y. Brunel, W. M. Kruijtzer, P. Lieverse, K. A. Vissers, and G. Essink, “Yapi: application modeling for signal processing systems,” (Los Angeles, California, United States), pp. 402– 405, ACM, 2000.

1/30/2019 28Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 15: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

15

How Big is Big?

Algorithmic Intrinsic Complexity Metric & Assessment:

PLATFORM INDEPENDENCE

1/30/2019 29Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Number of Operations

Estimates the number of each type of operations Addition/Subtraction

Multiplication

Division

Shift

Logic operations

Operations with constant input and variable input should be differentiated to provide high accuracy X + Y vs. X + 5 (X and Y are variables)

X×Y vs. X×5 (X and Y are variables )

In addition, the precision of each operand should be taken into account, since it can significantly influence complexity

1/30/2019 30Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 16: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

16

SMART TRANSFORM PAIRvia

Spectral Graph Theoryfor

Intelligent Parallel/Reconfigurable Computing

1/30/2019 31Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Parallel Computing (Forward Transform): Efficient & Flexible Cognitive Cloud

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 32

In GlobalSIPS 2015, IBM T. J. Watson, Intel, etc. consistently addressed this technology as THE future of cognitive computing• Gwo Giun Lee, He-Yuan Lin, Chun-Fu Chen, Tsung-Yuan Huang, "Quantifying Intrinsic Parallelism Using Linear Algebra for

Algorithm/Architecture Co-Exploration," IEEE Transactions on Parallel and Distributed Systems, vol. 23, iss. 5, pp. 944-957, May 2012• Gwo Giun (Chris) Lee, Chun-Hsi Huang, Chun-Fu (Richard) Chen, Tai-Ping Wang, "Complexity-Aware Gabor Filter Bank Architecture Using

Principal Component Analysis," Journal of Signal Processing Systems for Signal, Image, and Video Technology, pp 1-14, Apr. 2017.• Gwo Giun Lee, Lee Barford, Paul Blinzer, Joe Cavallaro, Hong, Jiang, Nick Moore, Yinglong Xia, “GlobalSIPS 2015 PANEL DISCUSSION:

ALGORITHMS VS. ARCHITECTURES: OPPORTUNITIES AND CHALLENGES IN MULTICORE/GPU DSP”, DEC. 2018, FLORIDA.https://sigport.org/documents/panel‐discussion‐algorithms‐vs‐architectures‐opportunities‐and‐challenges‐multicoregpu‐dsp

Page 17: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

17

Reconfigurable Computing (Inverse Transform): Efficient & Flexible Mobile Edge

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 33

Technology  adopted by ISO/IEC/MPEG’s Reconfigurable Ad Hoc Group & served as Project Lead

• Gwo Giun (Chris) Lee, Chun-Fu (Richard) Chen , Ching-Jui Hsiao, "Reconfigurable Parser Architecture Design with Microprogrammed Controller for Multiple Purposes," Journal of Signal Processing Systems for Signal, Image, and Video Technology, pp 1-15, Mar. 2016

• Gwo Giun (Chris) Lee, Chun-Fu Chen, Shu-Ming Xu, Ching-Jui Hsiao, "High-Throughput Reconfigurable Variable Length Coding Decoder for MPEG-2 and AVC/H.264," Journal of Signal Processing Systems for Signal, Image, and Video Technology, vol. 82, iss. 1, pp 27-40, Jan. 2016

Spectral Graph Theory (SGT): Adjacency & Laplacian Matrix

i j1 if vertex and vertix are adjacent to each other(i,j)

0 otherwise

A

0 1 0

1 0 1

0 1 0

A

Adjacency matrix A

1 2 3Graph

i j

(i,j) if i j

(i,j) -1 if i j and vertex is adjacent to vertix

0 otherwise

D

L

Laplacian matrix L=D-A, where D is a diagonal matrix where the diagonal elements represents the number of edges connected to that node.

1 1 0

1 2 1

0 1 1

L

1/30/2019 34Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 18: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

18

Degree of Parallelism: Eigen-Analysis of Dataflow Graphs using SGT

1

0

0

0

1

1

1 0 0 0 0 1

0 1 0 0 0 1

0 0 1 0 1 0

0 0 0 1 1 0

0 0 1 1 2 0

1 1 0 0 0 2

+ ++

A1 B1 C1 D1

O1

+ +

+

A2 B2 C2 D2

O2

O1=A1+B1+C1+D1

O2=A2+B2+C2+D2

1 2

6

43

5

Algorithm Dataflow diagram

Causation graphLaplacian matrix

Spectrum

0, 0, 1, 1, 3, 3Parallelism

2

0

1

1

1

0

0

0

0

1

1

0

0

0

0

0

0

1

1

0

2

1

1

0

0

2

0

0

0

1

1

Eigenvalue:

Eigenvector:

×

(Homogeneous)

1/30/2019 35Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Quantification of parallelization, Instruction Set Architecture (ISA) design

2019/1/30 Prof. Gwo Giun (Chris) Lee/NCKUEE 36

Reconfigurable Architecture: Commonality Extraction from Dataflow Graphs;

• Observe the common parts between each dataflow graph.

AVC/H.264 filter coefficients[1,-5,20,20,-5,1]

MPEG-4 filter coefficients[-8,24,-48,160,160,-48,24,-8]

[-1,3,-6,20,20,-6,3,-1]Divide by 8

MPEG-4 Chroma interpolation, MPEG2,and AVC/H.264 chroma prediction

[1 1]

[1 1 1 1]

+x[0]x[5]

x[1]x[4]

x[2]x[3]

+

+

2 +

2 +

4

- +

+

result

i : left shift by i i : right shift by i + : addition

+x[0]x[7]

x[1]x[6]x[2]x[5]

+

+

1 +

2 +

1

-3+ result

x[3]x[4]

+ 2 +

4

+

+

+x[0]x[1]x[2]x[3]

+

+ resultx[0]x[1]

+ result

Page 19: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

19

2019/1/30 37

Reconfigurable fractional interpolation;

[-8,24,-48,160,160,-48,24,-8]

+ 1 +

+

+ 2 +

4

3Module 4

[1,-5,20,20,-5,1]

+ 2 +

1

- + 2 + -

+ 2 +

4

[1 1 1 1] [1 1]

+

+

+

+

result

x[0]x[7]

+

3

+ 1 +x[1]x[6]

x[2]x[5]

+ 2 +

1

-

x[3]

x[4]+ 2 +

4

+

+

+

result

+ 2 +

1

-

x[5]

x[1]x[4]

x[2]

x[3]

x[0]

+ 2 +

4

+x[1]

x[0]

result+

+

+

x[1]

x[2]

x[3]

x[0]

result+

+

+

result

Module 0

Module 1

Module 2

Module 3

Module 4

x[1]x[6]

x[2]x[5]

x[3]x[4]

x[0]x[7]

x[5]0

x[1]x[4]

x[2]x[3]

x[0]0

x[1]0

x[2]0

x[3]0

x[0]x[1]

Coefficients

Module 3

Module 2

Module 1

Module 0

Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Data Storage: Configuration & Requirement using Graph 1/2

Each initial cell region confined by external nuclei markers is represented by avertex.

Images are divided into tiles of size 64 by 64.

The vertex size is defined as the number of tiles that each region is contained in. The tile size is obtained from the statistical average of all initial cell region areas.

Fixed tile size is introduced for efficient parallel processing of irregular cell regions.

The tile size could also be considered together with the platform architecture forcomputational efficiency.

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 38

One tile with size 64 by 64

Vertex

11

22 44

33

Size = 4

Size = 4

Size = 2 Size = 6

Page 20: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

20

Data Storage: Configuration & Requirement using Graph 2/2

The weight on each edge represents the proportionate overlapping area betweenthe two cell regions represented by the corresponding vertices.

The weight of each edge is defined as:

This edge weight indicates the tendency of grouping or compiling correspondingtiles of the two vertices into the same computation cluster so as to optimize datastorage, maximize data reuse and minimize data transfer rate.

The tile size could also be considered together with the platform architecture forcomputational efficiency.

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 39

wi, j

,

11

22 44

33

0.2

0.2

0.1250.33

0.33

Data Transfer Rate & Bandwidth via Graph Theory

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 40

Data Flow Graph

+

y[n]

x[n]x[n-1]

e1

e2

e3

v1 v2

v3

v4

Edge cut

y[n] = x[n-1] + x[n]

1

2

3

2

0

2

e

e

e

Mx

1 2 3 4

1

2

3

1 0 1 0

0 1 1 0

0 0 1 1

v v v ve

e

e

M

1

2

3

4

1

1

1

1

v

v

v

v

x

Mx(i) is the i-th element in vector Mx.N is total element number of a vector.

Edge cut• A set of edges which partitions a graph into at least two disconnected

components if this set of edges is removed.• Can be used to study the data transaction between vertices.

Indicator vector (x)• A labeled vector, which is composed of 1 and -1, separates vertices

into two groups.

Edge characteristics in the edge cut (Mx)• Elements in Mx present the edge characteristics

• In-edge-cut ( >0 ), out-edge-cut ( <0 ), non-edge-cut ( 0 )• Number of edges in a edge cut could be derived by Mx

• Amount of data transfer on this edge cut: 1

0

12

2

N

i

i

Mx

Page 21: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

21

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab. 41

A Very Useful SMART Sensor System

SMARTLET: SMART toiLET

SMART is a BUZZ word that SELLS

IoT: Any systems connected via signals for the exchange of information

1/30/2019 42Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 22: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

22

SoC Platforms

System Platforms: From System-on-Chip to IoT/Cloud & Beyond

1/30/2019 Bioinfotronics Research Center 43

IoT and Networking

Platforms

IntelligentSurveillance (IS)

IntelligentHealth Cloud (IHC)

Distributed System The central control system performs pending jobs analysis:

Similarity and loading of pending jobs

Overhead of communication

Workload of processors

Intelligently allocate jobs to processors/clients (located at different physical locations) based on the analyzed parameters

When reconfiguring the internet architecture or platform, we have cognitive internet

Job queue

Client/PE

Center

Client/PEClient/PE

Client/PE

Client/PE

Client/PE

1/30/2019 44Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 23: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

23

Cloud Computing/Service According to the priority and difference of requests from clients Cloud server intelligently reschedules and reallocates the resource in

response to the clients.

client

Communication with cloud server (request and response)

Cloud Server

client

client

client

client

client

Intelligent arbitrator

1/30/2019 45Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Storage in Cloud Server

Analyze the similarity of requests and evaluate the workload of processors

Intelligently reallocate and reschedule the storage resource to the available processors

Storage 0 Storage 1Storage

N-1

CPU 0 CPU 1 CPU N-1

Intelligent arbitrator

Client request queue

1/30/2019 46Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 24: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

24

Reconfigurable Computing over Cloud Similarity/Correlation Analysis: Telemedicine

Graph Similarity/Correlation analysis to reconfigure scalable computing clusters cluster

Normal relationship

Close relationship

Dataflow graph

1/30/2019 47Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Deep Learning

1/30/2019 48Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 25: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

25

Modeling the Neuron Mathematically

Neural network could define a complex, non-linear form to represent the relationship between input x and output y through a hypothesis hw,b(x) Neuron is the computational unit that represents input data in another

way

hw,b(x) = f(wTx + b), f: RR is activation function. Popular activation functions, sigmoid, hyper tangent, rectified linear function

x1

xN

+1

hw,b(x)

x = [x1, x2, …, xN]T

w = [w1, w2, …, wN]T

wN

w1

b

1/30/2019 49Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Its All about Synapses of Neurons in Forming a Network

Multi-Layer Perception (MLP) This example includes one

input layer, one hidden layer and one output layer.

Forward propagation to predict output based on input.

Back propagation to tune neural network based on the prediction error.

It can be configured in several hidden layers, different connectivity between layers.

1/30/2019 Bioinfotronics Research Center Prof. Gwo Giun Chris Lee/NCKUEE MSoC Laboratory 50

x1

xN

+1 +1

Input layer Hidden layer

Outputlayer

hw,b(x)

bias unitbias unit

Feedforward neural network

Page 26: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

26

Convolutional Neural Network (CNN)

Instead of using fully-connected neural network, using locally-connected neural network to extract local features. Includes convolution layer, pooling layer, fully-connected network

Training method: Unsupervised pre-training (stacked (denoising) autoencoder)

Back propagation through label information

Convolution layer Sub-sampling/Pooling layer Convolution layer Sub-sampling/

Pooling layerFully connected

network

Input layer 3 feature maps 3 sub-sampled feature maps

6 sub-sampled feature maps

6 sub-sub-sampled feature maps Normal layer Output layer

1/30/2019 51Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Deep Belief Nets (DBN)

Deep Belief Nets (DBN): Restricted Boltzman Machine

(RBM) + directed belief nets

Visible layer: input data

Hidden layer: features

Training method: Unsupervised pre-training

(stacked RBMs)

Back propagation for fine-tuning

DBN could also be multi-dimensional structure.

...P P P P P1 2 L 1 1 2 L-2 L-1 L-1 Lv,h ,h , ...h v | h h | h h | h h ,h

Directed belief nets

RBM

Hidden layers

Visible layer

h3

h2

h1

v

1/30/2019 52Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 27: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

27

Computer Scientist’s Perspective of the Human Brain

1/30/2019 53Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Collaboration with IBM TJ Watson Research Center

International Standardizationin ISO/ITU/MPEG

Activities with Academia/Industry Worldwide: MIT, Aachen, EPFL, MERL, Qualcomm, Samsung, LG, Sharp, MediaTek, etc

Encodedbitstream

FU: function units

Network of FUs

FUFU

FU

FU

Syntax Parser

DecoderdescriptionsSyntax

FU networkconfiguration

Decodedvideo

Reconfigurable Video Coding

Encoded 3D Content

3D-HEVC

Decoding

1/30/2019 Bioinfotronics Research Center 54

Page 28: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

28

Biomedical Imaging: Harmonically Generated Microscopy

Analysis of Harmonic GenerationMicroscopy Image from NTU, Taiwan including CNN

Image stack in epidermis with SK

Dermis

Epidermis

Cross section of human skin

Stratum corneum (SC)Stratum granulosum (SG)Stratum spinosum (SS)

Stratum basale (SB)

In (d),(e),(f) illustrate the proliferation of basaloid cell.

SB

SB

SB

Basaloid cells

Seborrheic keratosis (SK) & STEM cell detection• Observe through the image stack in the epidermis,

because of the proliferation of basaloid cell,result in acanthotic epidermis.

• In normal skin, epidermal thickness: 60~80um.• In SK, epidermal thickness: exceed 100um.

1/30/2019 Bioinfotronics Research Center 55

BCC Detection via Deep CNN

Detection of Basal Cell Carcinoma (BCC) via Deep CNN using ~2000 HGM Images with 97.02% accuracy with 95.08% specificity and 99.04% sensitivity

Transfer Medical Doctor’s knowledge to feature layers of Deep Convolution Neural Network (CNN)

Image stack in epidermis

Dermis

Epidermis

Cross section of human skin

Stratum corneum (SC)Stratum granulosum (SG)Stratum spinosum (SS)

Stratum basale (SB)

SB

SB

SB

1/30/2019 Bioinfotronics Research Center 56

𝑁 𝑁 1 𝑁 𝑁 𝑘

𝑁2

𝑁2

2𝑘

·······

𝑁4

𝑁4

4𝑘𝑁8

𝑁8

8𝑘

1 1𝑥

Page 29: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

29

SMART Cloud Computing for Precision Medicine

Patents licensed by Inform Genomics in USA on Analytics Architectural Platform

1/30/2019 Bioinfotronics Research Center 57

Chemotherapy Induced Nausea Vomiting Data Analytics for CIPN NCKU Medical School and Hospital

Center Server

Sub-system

Sub-system

Sub-system Sub-system

Sub-system Sub-system Job Queue

User 0 User 1 User 2 User 3 User N

Fault Tolerance

Fault Tolerance

Functional EvaluationRVKDE Prediction

Gene SelectionDeep Learning

1/30/2019 Bioinfotronics Research Center 58

C1

Page 30: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

Slide 58

C1 Sir would like to edit descriptions.Chen, 3/27/2016

Page 31: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

30

Intelligent Health Cloud Database Management System

Collaboration with BannerAlzheimer’s Institute, AZ, USA targeted for

Telemedicine

1/30/2019 Bioinfotronics Research Center 59

Conclusion

We need to have cross level of abstraction system design methodology to solve cross disciplinary

problems in SMART manners

CLOUD/EDGE/Neuromorphic Computing

Intelligent Surveillance for Autonomous Cars

Precision Medicine

Brain Initiatives……

Nanoscale Histopathology

1/30/2019 60Bioinfotronics Research Center Prof. Gwo Giun Lee/NCKUEE MSoC Lab.

Page 32: Table of Contentssite.ieee.org/sips2018/files/2019/01/tutorial4.pdf · Architectural Platform before Cloud ... The Math Bridge in Cambridge University 1/30/2019Bioinfotronics Research

SiPS Symposium Tutorial October, 2018

31