building fake body parts: digital mockups
DESCRIPTION
Building Fake Body Parts: Digital Mockups. Frank Vahid Univ. of California, Riverside. Support provided by NSF, SRC, and CareFusion. Building fake body parts. How test medical equipment software?. http://www.nhlbi.nih.gov/. Simulation: Slow/Inaccurate. Weibel lung complexity - PowerPoint PPT PresentationTRANSCRIPT
Frank Vahid, UCR 1
Building Fake Body Parts: Digital Mockups
Frank Vahid
Univ. of California, Riverside
Support provided by NSF, SRC, and CareFusion
Frank Vahid, UCR 2
Building fake body parts
• How test medical equipment software?
http://www.nhlbi.nih.gov/
Frank Vahid, UCR 3
Simulation: Slow/Inaccurate
Accurate simulation is slow
2-3 minutes to simulate one breath accurately
Decrease accuracy for real-time
Weibel lung complexity4 gen: 32 ODEs6 gen: 128 ODEs8 gen: 512 ODEs10 gen: 2048 ODEs
Frank Vahid, UCR 4
Mockups
Physical mockupProcessing Core
Transducers
Device
Device
Digital communication
Physical phenomena
Physical phenomena disconnected
Processing Core
Transducers
Device
Transducermodels
EnvironmentModel
Intercepted transducer packets
Digital Mockup
How run in real-time?
http://www.youtube.com/watch?feature=player_embedded&v=rb0ik1HopBk
Frank Vahid, UCR 5
Physical models are inherently parallel
V[1],F[1]
V[2],F[2]
V[7],F[7]
ODE dependency graph
Frank Vahid, UCR 6
GPUs
• Tried, failed – GPU research group also– (results later)
Frank Vahid, UCR 7
for (i=0; i < 128; i++) y[i] += c[i] * x[i]......
FPGAs: Sw circuits (parallel)
for (i=0; i < 128; i++) y += c[i] * x[i]......
* * * * * * * * * * * *
+ + + + + +
+ + +
+ +
+
C Code for FIR Filter
Processor Processor
• 1000’s of instructions– Several thousand cycles
Circuit for FIR Filter
Processor FPGA
~ 7 cycles (though slower clock) Speedup > 10x-100x
Frank Vahid, UCR 8
2x2 switch matrix
y
z
01
01
w
x
FPGAs “101” (A Quick Intro)
a b a1
a0
4x2 Memory
ab
d1 d0
F G
00011011
LUT
F G
1 0 SM
SM
SM
SM
SM
SM
SM
LUT
SM
SM
SM
SM
SM
LUT
1110
1010
00011011
a b
0
1111
0
1110
1010
SM
a b c
D E
FPGA
a b c
D E
1 11 01 10 0
0 00 00 01 0
0110 1100
0000 1111
Frank Vahid, UCR 9
Differential Equation Processing Element
• General PE• Diffeq can't be solved exactly
• Use iterative approximation (Euler, RK4)
• Computes equation solutions at given timestep (e.g. 0.1 ms timesteps).
Huang, Vahid, Givargis. A Custom FPGA Processor for Physical Model Ordinary Differential Equation Solving. Embedded Systems Letters, Dec, 2011.
FPGADigital mockup
DEPE
Device under test
Frank Vahid, UCR 10
Single DEPE
1
10
100
1000
10000
100000
Weibel(10 ODE) Lutchen(10ODE)
Wave(30 ODE) Atrial(31 ODE) Lutchen(50ODE)
ODE
s/se
cond
CPU(1) CPU(4) DEPE
•CPU(1),(4): Pentium IV, 3.0 GHz•DEPE: Xilinx Virtex6-240T
Microblaze: 2000-4000 LUTs.
Frank Vahid, UCR 11
Homogeneous network of general PEs
• Map ODEs to homogeneous PE network• ODE dependency graph
• Scheduling
V[1],F[1]
V[2],F[2]
V[7],F[7]
ODE dependency graph
Huang, Vahid, Givargis. 2012.Synthesis of networks of custom processing elements for real-time physical system emulation. Transactions on Design Automation of Electronic Systems (TODAES). *To Appear (Dec-2012)
FPGADigital mockup
PE3
PE1
PE2
100s of PEs
Synthesis tool
PE1
PE2 PE3
Frank Vahid, UCR 12
Homogeneous network of general PEs
FPGA
Digital mockup
Frank Vahid, UCR 13
Homogeneous network of general PEs
• ODE mapping via simulated annealing
10K iterations
150K iterations
Frank Vahid, UCR 14
Homogeneous network of general PEs
1
10
100
1000
10000
100000
Weibel Lutchen Wave Atrial Neuron
OD
Es/s
eco
nd
CPU(1) CPU(4) General PE network
Frank Vahid, UCR 15
Homogeneous network of general PEs – FPGA Usage
65
70
75
80
85
90
95
Weibel11
Nueron
Weibel11+Gas
Hemo
Hemo+W
eibel10
kLU
TS
General PEs
•150KLuts available on Virtex6-240T
http://www.youtube.com/watch?v=ThUKVhqoA3Q
Demo
Frank Vahid, UCR 16
Custom Processing Element
• Custom PE• Custom datapath to solve specific
type of equation
Huang, Vahid, Givargis. 2012.Synthesis of networks of custom processing elements for real-time physical system emulation. Transactions on Design Automation of Electronic Systems (TODAES). *To Appear (Dec-2012)
MUL
Const ROM
Address
Input_sel
Address
Inputs
Output
SUB
Controller
WeData RAM
Controller
PE
SUB MUL
FPGADigital mockup
V’ = F1 – F2
F’ = P1-P2-(F*CR)*CL
Custom PE for each ODE type
Frank Vahid, UCR 17
Custom Processing Element
1
10
100
1000
10000
100000
1000000
Weibel Lutchen Wave Atrial Neuron
OD
Es/s
eco
nd
CPU(1) CPU(4) General PE network Custom PE network
Frank Vahid, UCR 18
Custom Processing Element – FPGA Usage
0102030405060708090
100
Weibel11
Nueron
Weibel11+Gas
Hemo
Hemo+W
eibel10
kLU
TS
General PEs Custom PEs
Frank Vahid, UCR 19
Networks of Heterogeneous Processing Elements
Huang, Miller, Vahid, Givargis. Synthesis of Heterogeneous Processing Elements for Physical System Emulation. CODES+ISSS 2012, Oct, 2012.
• General PE: – Slow, flexible (can solve any types of ODEs)
• Custom PE: – Fast, Inflexible (only solves one type of ODEs)
• Multi-Type PE– Combined multiple types of ODEs into single custom PE
FPGADigital mockup
Huge solution space:How to choose types of PEs?
How many PEs to allocate?
How to bind ODEs to PEs?
Frank Vahid, UCR 20
Automatic allocation and binding
Initial random allocation
PE allocator
ODE-to-PE mapper
New PE allocation
Cycles of each PE
Better solution Best solutionN
Y
Simulated Annealing
Frank Vahid, UCR 21
Networks of Heterogeneous Processing Elements
1
10
100
1000
10000
100000
1000000
Weibel Lutchen Wave Atrial Neuron
OD
Es/
seco
nd
CPU(1) CPU(4)General PE network Custom PE networkHeterogeneous PE networks
Frank Vahid, UCR 22
Heterogeneous Networks – FPGA Usage
0102030405060708090
100
Weibel11
Nueron
Weibel11+Gas
Hemo
Hemo+W
eibel10
kLU
TS
General PEs Custom PEs Heterogeneous PEs
Frank Vahid, UCR 23
Network of PEs VS GPU and PC
0
100
200
300
400
500
600
700
800
900
1000
Per
form
ance
(ms)
PC(1)PC(4)GPUHLSGeneral PEsCustom PEsHeterogeneous PEs
1430 1490 1522 1184
Speedup vs real-time
PC(1): 0.76xPC(4): 3.07xGPU: 1.63xHLS: 3.23xGeneral PE: 4.94xCustom PE: 6.1xHetero PE: 34.5x
Frank Vahid, UCR 24
Network of general/custom/heterogeneous PEsVS HLS (regularity extraction)
Heterogeneous PE:
(10x, 1.1x) HLS
(7x, 0.85x) general PE
(6x, 1.35x) custom PE
(Speed, Size)0
50
100
150
200
250
300
350
0 50,000 100,000 150,000 200,000 250,000 300,000
HLS
General PE
Custom PE
Hetero PE
Performance (ms): time to emulate 1000 ms, using Euler with 0.01 ms step.
Size (equivalent LUTs)
Frank Vahid, UCR 25
Speedup / dollar
CPU (I7-950 + Intel X58 board): $480 GPU(GTX460 + I3-540 + H55 board): $380FPGA (Xilinx Virtex6 240T-2 board): $1800
0.00000
0.00500
0.01000
0.01500
0.02000
0.02500
0.03000
Weibel
Neuron
Weibel
+ gas
Hemodynam
ic
weibel +
hem
o
Nor
mal
ized
spe
edup
(sp
eedu
p /
dolla
r)
PC(1)PC(4)GPUHeterogeneous PEs
Heterogeneous PEs:
3X better than PC(4)
4.5x better than GPU
FPGA: Easier to build custom interfaces
Frank Vahid, UCR 26
Current: Embedding-based placement of networks
Heart cells
•Most physical models have a regular structure
•Meshes, trees, grids, etc.
•We can apply theoretical graph embedding techniques to embed models into FPGA
•Minimal network dilation
Lungs
Neuron mesh
FPGA
Frank Vahid, UCR 27
Embedding-based placement of networks
Physical model equations Physical placementStructured virtual PE graph
Map equations to virtual PEs
Map virtual PEs to physical PEs via embedding
EqP1EqV1
EqP2EqV2
EqP3 EqV3
EqP4EqV4
EqP7EqV7
EqP5EqV5
EqP6EqV6
EqP1EqV1
EqP2 EqV2
EqP3 EqV3
EqP4 EqV4
EqP6 EqV6
EqP5 EqV5
EqP7 EqV7
EqP1EqV1
EqP2EqV2
EqP3EqV3
EqP4EqV4
EqP7EqV7
EqP5EqV5
EqP6EqV6
No placement strategy
Simulated Annealing Placement
Embedding Placement
Frank Vahid, UCR 28
Embedding-based placement of networks
Work submitted to FPGA'13 (Miller/Vahid/Givargis)
0
100
200
300
400
weibel9gen_256pe
weibel11gen_500pe
nueron1d_256pe
nueron1d_500pe
nueron2d_256pe
nueron2d_500pe
Model and #PEs
Cir
cuit
fre
qu
ency
(M
Hz)
No placement strategy Simulated Annealing Embedding
Not routable
Frank Vahid, UCR 29
Other projects• Assistive monitoring
- www.cs.ucr.edu/~vahid/assistivemonitoring/ - http://www.youtube.com/watch?feature=player_embedded&v=Sf8tU-78lXs
– ..\Desktop\Fall montage.mp4
• Web-based learning– "Textbook is dead"– pcpp.zyante.com (C++)
• Embedded systems educ– New prog. model, virtual lab– Also riosscheduler.org
• Drunk driving (DUI)– ..\Desktop\dui.MOV
– duicam.org
Frank Vahid, UCR 30
https://docs.google.com/file/d/0B7I3PmI9QsJTM2MzY2QyYWQtZjk4Mi00YWE0LTk1NzQtZTUwMTM5ZDA5ZDc5/edit
Contributors
• Chen Huang (UC Riverside, now Amazon)
• Bailey Miller (UC Riverside)
• Prof. Tony Givargis (UC Irvine)
• Ting-Shuo Chou (UC Irvine)
• Others...
..\Desktop\Meti ER 2.mov
• Fastest cost-effective execution of physical models
• Real-time (or faster) cyber-physical system testing
• Scientific research• More apps
Frank Vahid, UCR 31
Key contributors
• Chen Huang (UC Riverside, now Amazon)
• Bailey Miller (UC Riverside)
• Prof. Tony Givargis (UC Irvine)
• Ting-Shuo Chou (UC Irvine)
• Others...