configurable multiprocessing: an fir filter example · example: fir filter (cont.) /* ** this...
TRANSCRIPT
Copyright (c) 2004 Cmpware
Configurable Multiprocessing: An FIR
Filter Example
Cmpware, Inc.
Copyright (c) 2004 Cmpware
Introduction● Multiple processors on a device common● Thousands of 32-bit RISC CPUs possible● Advantages in:
● Performance● Power consumption● Programmability
Q: How best to program such devices?
Copyright (c) 2004 Cmpware
Configurable Microprocessing● Multiple standard processor cores● Fast point to point interconnection● Balanced communication / computation
ratio (1:1 or better)● Use standard compilers / tools● Use Cmpware to design and program the
multiprocessor
Copyright (c) 2004 Cmpware
The Cmpware Software● Uses standard compilers / assemblers / etc.● Memory Mapped IO for communication
● Does not break compilers and other tools● Does not break processor IP● Provides fast, high-bandwidth communication● Provides a simple programming model
● Eclipse based IDE● Multiprocessor simulation● Multiprocessor state and performance display
Copyright (c) 2004 Cmpware
Example: FIR Filter● Finite Impulse Response (FIR) filter● Digital Signal Processing (DSP) filter● Often implemented in hardware and software● Various methods of parallelizing
x
+
x
+
x
+
x
+
h(0) h(1) h(2) h(n)
x(n)
y(n)
Copyright (c) 2004 Cmpware
The FIR Filter Implementation● Parameterized on taps and processors● FIR() and shift() subroutines standard
serial “C” code● Parallel code constructs at highest level,
controlling serial subroutines● Data read from east and sent out to west● Similar structurally to FIR diagrams
Copyright (c) 2004 Cmpware
Example: FIR Filtervoid _start(void) { int node; int ntaps; /* Get the parameters */ node = *west; ntaps = *west;
/* Send the parameters to the next node */ *east = (node-1); *east = ntaps;
for (;;) { *east = FIR(ntaps, *west); *east = shift(ntaps, *west); } /* end for() */ } /* end _start() */
Copyright (c) 2004 Cmpware
Example: FIR Filter (cont.)/*** This method computes the FIR.**** @param ntaps The number of taps in the** FIR filter.**** @param sum The partial sum into the FIR.**** @return This method returns the FIR result.*/int FIR(int ntaps, int sum) { int i; for (i=0; i<ntaps; i++) sum += h[i] * z[i]; return (sum); } /* end FIR() */
Copyright (c) 2004 Cmpware
Example: FIR Filter (cont.)
int shift(int ntaps, int zIn) { int i, zOut; /* Save the last value (being shifted out) */ zOut = z[ntaps-1]; /* Shift the delay line */ for (i=ntaps-2; i>=0; i--) z[i+1] = z[i]; /* Add in the new shifted in value */ z[0] = zIn; return (zOut); } /* end shift() */
Copyright (c) 2004 Cmpware
FIR Filter Implementation● Three nodes: data source, processor,
consumer● Single 8-tap FIR
CMPU CMPU CMPU
Node 0Node 2 Node 1FIR();shift();
DataSource Results
(8,1) (8,0)
sum sum
z[n]z[n]
Copyright (c) 2004 Cmpware
FIR Filter Implementation (cont.)● Four nodes: data source, processor 1,
processor 2, consumer● 4x2 8-tap FIR
CMPU CMPU CMPU
Node 0Node 3 Node 2FIR();shift();
DataSource Results
(4,2) (4,1)
sum sum
z[n]z[n] CMPU
Node 1FIR();shift();
(4,0)
sum
z[n]
Copyright (c) 2004 Cmpware
FIR Filter Implementation (cont.)● 8 nodes: data source, 8 processors,
consumer● 8x1 8-tap FIR
CMPU CMPU CMPU
Node 0Node 3 Node 2FIR();shift();
DataSource Results
(1,8) (1,7)
sum sum
z[n]z[n] CMPU
Node 1FIR();shift();
(1,0)
sum
z[n]
sum
z[n]
(1,1)
Copyright (c) 2004 Cmpware
The Cmpware Software● All standard C● Same code used for various processor
configurations● Can select number of processors dynamically● Communication synchronize computation● Computations only start when inputs are
available (pseudo-dataflow)● Power / performance tradeoffs unavailable in
other solutions
Copyright (c) 2004 Cmpware
Synchronization● FIR filter not simple to parallelize● CMP communication synchronizes
computation● Hardware solution is to:
● Use slow N-input adder● Use pipelined adder tree● Add pipeline delay stages
➢ All of these HW solutions are very inflexible
Copyright (c) 2004 Cmpware
FIR Filter Speedup
0
5
10
15
20
25
30
35
40
458 Tap FIR Filter
2 4 80
5000100001500020000250003000035000400004500050000550006000065000
8 Tap FIR Filter Speedup
Processors
Cyc
les
Copyright (c) 2004 Cmpware
Using CMP Parallelism● Simplest parallelism at 'task' (subroutine)● 1:1 communication / computation permits
speedups for very fine grained parallelism● Parallelism can be exploited at a very low
level (if required)● Parallelism at several levels easily exploited● Excellent control over performance via level
of parallelization
Copyright (c) 2004 Cmpware
The Cmpware Tools● Eclipse IDE based● Fast, simple access to multiprocessor data
● Source level trace● Memory● Registers● Performance / profiling, and more ...
● Pluggable CPU simulator / model generator● Fully extensible API
Copyright (c) 2004 Cmpware
The Cmpware Tools
Copyright (c) 2004 Cmpware
Configurable Multiprocessing from Cmpware, Inc.