lisard: labview-integrated softcore · pdf fileconclusion - the softcore architecture for...
TRANSCRIPT
Interface to measured value
logging
Interface to workstation
Interface to sensor system
Interface toactuators
Integrated sequence control
system
Preprocessing of measured data
Control
Sensor data compaction
Inte
rna
l c
om
mu
nic
ati
on
&s
yn
ch
ron
isa
tio
n
M
a
n
a
g
e
m
e
n
t
d
e
r
S
t
ör
g
r
ö
ß
e
n
D
y
n
a
m
i
s
c
h
e
s
M
a
n
a
g
e
m
e
n
t
M
e
c
h
a
t
r
o
n
i
s
c
h
e
s
S
y
s
t
e
m
d
e
r
N
P
M
M
-2
0
0
B
a
h
n
p
l
a
n
u
n
g
Z
u
s
t
a
n
d
s
r
e
g
l
e
r
S
o
l
l
b
a
h
n
V
o
r
s
t
e
u
e
r
u
n
g
F
o
l
g
e
r
e
g
l
e
r
w e
w
,
w
,
w
,
….
.
.
M
a
n
a
g
e
m
e
n
t
d
e
r
F
üh
r
u
n
g
s
g
r
ö
ß
e
n
K
i
n
.
R
a
n
d
-b
e
d
i
n
g
u
n
g
e
n
Z
u
s
t
a
n
d
s
r
e
k
o
n
s
t
r
u
k
t
i
o
n
S
t
ör
b
e
o
b
a
c
h
t
e
r
uF
uCx
z
yu
M
a
n
a
g
e
m
e
n
t
d
e
r
S
t
ör
g
r
ö
ß
e
n
M
a
n
a
g
e
m
e
n
t
d
e
r
S
t
ör
g
r
ö
ß
e
n
D
y
n
a
m
i
s
c
h
e
s
M
a
n
a
g
e
m
e
n
t
M
e
c
h
a
t
r
o
n
i
s
c
h
e
s
S
y
s
t
e
m
d
e
r
N
P
M
M
-2
0
0
B
a
h
n
p
l
a
n
u
n
g
Z
u
s
t
a
n
d
s
r
e
g
l
e
r
S
o
l
l
b
a
h
n
V
o
r
s
t
e
u
e
r
u
n
g
F
o
l
g
e
r
e
g
l
e
r
w e
w
,
w
,
w
,
….
.
.
M
a
n
a
g
e
m
e
n
t
d
e
r
F
üh
r
u
n
g
s
g
r
ö
ß
e
n
K
i
n
.
R
a
n
d
-b
e
d
i
n
g
u
n
g
e
n
Z
u
s
t
a
n
d
s
r
e
k
o
n
s
t
r
u
k
t
i
o
n
S
t
ör
b
e
o
b
a
c
h
t
e
r
uFuF
uCuCxx
zz
yyuu
Ilmenau University of TechnologyFaculty of Computer Science and AutomationInstitute for Computer Engineering Computer Architecture and Embedded Systems Groupwww.tu-ilmenau.de/ra
Dr.-Ing. Alexander Pacholik, Dipl.-Inf. Johannes Klöckner, Dipl.-Inf. Marcus Müller, MSc. Dipl.-Ing. Irina Gushchina,
Prof. Dr.-Ing. habil. Wolfgang Fengler
{alexander.pacholik, johannes.kloeckner, marcus.mueller, irina.guschtschina, wolfgang.fengler}@tu-ilmenau.de
I lmenau Universi ty of Technology, Germany
www.tu-ilmenau.de
LISARD: LABVIEW-INTEGRATED SOFTCORE ARCHITECTURE
FOR RECONFIGURABLE DEVICES
LISARD: LABVIEW-INTEGRATED SOFTCORE ARCHITECTURE
FOR RECONFIGURABLE DEVICES
Abstract - The development of industrial control and measurement systems is often based on modular commercial off-the-shelf hardware. Lately, for these platforms reconfigurable I/O modules with field-programmable gate-arrays (FPGA) have gained significance, since they allow the implementation of data processing functionality very close to the data acquisition interfaces. However, algorithm complexity and floating-point support are limited by FPGA resources and design methods. This contribution presents an application-specifically configurable DSP softcore architecture built around a scalable double-precision floating-point arithmetic/logic unit. The core can be seamlessly utilized as a functional component in LabVIEW-based FPGA designs. A small study shows the performance of an example application implemented on the presented core in comparison to other embedded architectures.
Motivation: The Collaborative Research Centre 622 „Nano-positioning and Nano-measuring Machines“
In the context of this research and development project high-
performance data acquisition, processing and control algorithms
have to be optimally implemented to satisfy challenging
requirements for process precision under strict real-time
conditions. I.e. the control system of a nanometer scale positioning
and measuring machine incorporates multiple connected
recursive filters, with a target loop frequency of 100 kHz, resulting
in a processing time of 10 µs. To achieve the desired process
quality double-precision floating-point computations are required.
PXIembedded controllerwith LabVIEW RT
PXI FPGAcard
Data acquisitionand output
DACADC
PXI backplane
DIO
soft-CPU
Supervisory application Floating-
point algorithm
LiSARD Architecture: a floating-point microprocessor architecture, integrated in LabVIEW for implementation on PXI FPGA modules
Conceptual Consideration
In the following the application developer, who utilizes the softcore component within a
control application, has to be distinguished from the function developer, who tailors the
application specific softcore component. The application developer will be provided with an
HDL component, that computes a fixed complex floating-point arithmetic function. The
function developer has to create an application-specific component from a flexible base
architecture by designing the algorithmic function and customizing the core structure to fit
the algorithmic requirements.
ProgramMemoryInterface
ProgramMemoryInterface
InputRegisters
InputRegisters
OutputRegisters
OutputRegisters
CorePipeline
CorePipeline
ProgramMemory
ProgramMemory
DataMemory
DataMemory
DataMemoryInterface
DataMemoryInterface
Sync. Sync.± *
ё
=
Softcore in a functional component for on-chip design
The interface of the LiSARD component embedded in LabVIEW Processor Pipeline Architecture
LiSARD Core Pipeline
The core pipeline is the heart of the LiSARD component. The main task of the core
pipeline is to manage pipelined arithmetic operators with different pipeline delays by
scheduling input and output operands in the correct way.
Due to the advantages of reduced hardware effort and easy usability of parallelism, the
core architecture is based on the VLIW (the very long instruction word) approach.
The main component of the architecture is the arithmetic logic unit (ALU) containing the
required floating-point operators with two read operands. The overall
processing of the ALU is managed by the instruction words stored in the program memory.
These words are divided into two parts - execution (contains the addresses of the source
operands in the data memory and operation mode) and write-back (contains the address
of the destination operand and its source in the ALU).
In the first stage an instruction word is fetched from program memory and decoded in the
second stage. The third stage fetches the source operands from the data memory. The
execution stage provides the source operands and instruction code to the ALU. In the last
stage the calculated result is collected from the respective operator and written back to the
data memory.
one write and
Microcode Development
Assembler code allows a low level design entry to quickly implement small
algorithms. This ASM can easily be transformed into the VLIW binary code by instruction
scheduling.
For high level design entry data-flow graphs are considered. In contrast to control-
flow oriented programs in languages like C and Assembler, that require a lot of effort to
optimize ILP exploitation, data-flow programs, i.e. LabVIEW Vis, represent so called
superblocks of instructions, that inherently express global data and instruction
dependencies. Thus Data-flow optimization allows to determine an globally optimal
order of instructions regarding runtime and resource utilization for a certain algorithm.
Dataflow Graph
AssemblerSource
Dataflow Optimization
InstructionScheduling
BinaryCode
Design Entry Design Entry
Transformation
The synchronization implements a
two-wire handshake for input and
output and triggers the computation.
Software development tool chain
Kalman Filter execution performance on different target architectures
1. The execution performance contains the execution time per Kalman filter iteration and
overall data path latency including data transfer (Contoller and LiSARD implementations).
2. Runtime on the PXI controller has been determined for a native LabVIEW (generic and
minimized) implementation. The jitter results from task scheduling and compulsory data
cache misses in the LabVIEW RT runtime environment. Comparison with the pure algorithm
written in data-flow optimized C-code (DLL in LabVIEW) shows the significant overhead of
the RT execution environment.
LiSARD configuration Slices FF LUT DSP
Standard (generic and minimized implementation)
18672 36 %
14508 28 %
13180 26 %
13 27 %
Minimal (data-flow optimized implementation)
10818 21 %
8218 16 %
8043 26 %
13 27 %
LiSARD softcore resource utilization
Scalability
The micro-architecture allows the extension to multiple
homogeneous or heterogeneous ALUs. The table lists
performance increase for the Kalman filter example
running on multiple homogeneous ALUs of the
configuration described above.
# ALUs 1 2 3 4
Ticks 215 143 118 116
Runtime 1.8 µs 1.2 µs 1.0 µs 1.0 µs
Speedup 1.0 1.5 1.8 1.9
Efficiency 1 0.75 0.60 0.46
A used development platform in the measurement and control domain is the PXI hardware
and the graphical framework LabVIEW. In a typical application, the programmable hardware
serves as connection to the specific protocols of sensors/actuators and provides an adequate
timing of data acquisition and actuation, while the controller CPU is used for computing.
Incidentally, the communication latency from sensors to the controller CPU and back to
actuators over the PXI backplane is approximately 30 µs. A relocation of floating-point
algorithm implementation to the FPGA is proposed to reduce the transmission delays and thus
decreasing the closed loop period of control applications. Due to the limited support for floating-
point hardware synthesis, efficient implementation of floating-point control algorithms into the
FPGA using hardware description languages (HDL) is time consuming and expensive,
compared to CPU or DSP programming.
PXI System Setup with FPGA Module
Ш The arithmetic operators are
taken from the FPLibrary or from the
Xilinx ISE 10.x core generator.
Ш The LiSARD core features
separate memories for program and
data.
Ш The memories can be accessed
from outside the core during system
runtime. This enables debugging
and allows the reprogramming of
the core without recompiling the
FPGA design.
The Kalman filter mainly consists of matrix operations on relatively small matrices with double
precision floating-point operands:
Ш the generic version requires one division, 280 multiplications and 220 additions
Ш the structurally minimized requires one division, 80 multiplications and 85 additions
The implementation of the Kalman filter on the LiSARD core is compared to equivalent
implementations targeting an Intel Core Duo T9400 in the PXI Realtime Controller NI PXI-8108
and the direct hardware implementation for the Xilinx Virtex 5 LX 85 on the NI 7853R PXI FPGA
module.Standart: 10 32 Inputs, 16 Outputs, data memory depth of 2 , providing 38 bit instruction word length and
program memory depth of 2048 entries.
8 Minimal: 4 Inputs, 4 Outputs, data memory depth of 2 and program memory depth of 256 entries.
Resource utilization of LabVIEW FPGA implementations
FPGA implementation Slices FF LUT DSP
FPGA generic 42008 81 %
37472 72 %
38372 75 %
40 83 %
FPGA optimized area (120 MHz) 23856 46 %
26022 50%
30618 60 %
34 71 %
FPGA optimized speed (80MHz) 41490 80 %
34870 67%
30618 60 %
40 83 %
System overview
Case Study and Performance Evalution: The implementation of a Kalman filter
Conclusion - The softcore architecture for utilization as a function component in LabVIEW is highly configurable in terms of VLIW-organisation, memory organization, operator selection and I/O-connectivity. The architecture is designed to process complex floating-point algorithms for real-time control applications on distributed FPGA platforms. The LiSARD core is not supposed to substitute general purpose softcore CPUs, but to be used as a DSP core in a data flow design, or supplementing a CPU in an SoC.
ProgramMemory
ReadInterface
ProgramMemory
ReadInterface
InstructionFetch
InstructionDecode
OperandFetch
Execute Write Back
InstructionDecoder
InstructionDecoder
DataMemory
ReadInterface
DataMemory
ReadInterface
Source2Source1
ExecuteModeWriteBackOperationTarget
AL
UA
LU M
UX
MU
X
DataMemory
WriteInterface
DataMemory
WriteInterface
InputRegisters
InputRegisters
OutputRegisters
OutputRegisters