machine vision group, jani boutellier, 06.10.2009 architectural support for the orchestration of...
DESCRIPTION
MACHINE VISION GROUP, JANI BOUTELLIER, Fine-Grained Acceleration Accelerators can be made fine-grainedTRANSCRIPT
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Architectural Support for the Orchestration of Fine-Grained Multiprocessing for Portable
Streaming Applications
Jani Boutellier1, Alessandro Cevrero2,Philip Brisk2, Paolo Ienne2
1University of Oulu (FI) 2EPFL, Lausanne (CH)
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
The Context of this Work
• The context of this work is multiprocessing embedded systems
• The systems’ processing elements (PEs) are application specific and heterogeneous
• We propose a circuit for low-overhead hardware-assisted scheduling and dispatching of PEs
• Solution suitable for data-dominated signal processing applications
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• Accelerators can be made fine-grained
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• Accelerators can be made fine-grained
• Improves accelerator utilization
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• Accelerators can be made fine-grained• Improves accelerator utilization• Allows HW use across applications
Discussed in Silvén et al. (2005)
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• Static accelerator invocation schedules are ok, only when the applications use accelerators in a regular pattern
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• Static accelerator invocation schedules are ok, only when the applications use accelerators in a regular pattern
• Unfortunately, modern signal processing uses adaptive coding
Parser
Intra block
Inter block
Screen
codedbitstream
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration• End-to-end < 10 μs over 100k
iterations / s
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• For each iteration, a different set of functions can be used
• End-to-end < 10 μs over 100k iterations / s
Acc. 1
Acc. 2
Acc. 3
Acc. 4
Acc. 5
Acc. 6
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Fine-Grained Acceleration
• Available time for switching accelerator invocation schedule is really short
• End-to-end < 10 μs over 100k iterations / s
Acc. 1
Acc. 2
Acc. 3
Acc. 4
Acc. 5
Acc. 6
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Quasi-static scheduling
• Quasi-static scheduling is a midway between dynamic and static scheduling
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Quasi-static scheduling
• Quasi-static scheduling is a midway between dynamic and static scheduling
• Applicable when application consists of sequential, static parts
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Quasi-static scheduling
• Quasi-static scheduling is a midway between dynamic and static scheduling
• Applicable when application consists of sequential, static parts
• Minimizes run-time computations
FlexibilityHigh
Static scheduling
Dynamic scheduling
Quasi-static scheduling
OverheadLow
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
BA
CD
time
proc. 1
proc. 2
proc. 3
Schedule part repositoryApplication:
Quasi-static scheduling
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
BA
CD
timeA3
proc. 1
proc. 2
proc. 3
Schedule part repository
12
31
3123
B:
C:
D:A2
A1
Application:
Quasi-static scheduling
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
BA
CD
timeA3
proc. 1
proc. 2
proc. 3
Schedule part repository
B1
B2
B3
1
3123
B:
C:
D:A2
A1
Application:
Quasi-static scheduling
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
BA
CD
timeA3
proc. 1
proc. 2
proc. 3
Schedule part repository
B1
B2
B3
1
3D1
D2
D3
B:
C:
D:A2
A1
Application:
Quasi-static scheduling
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Proposed solution
• In this work we propose a dedicated circuit for quasi-static scheduling
• because quasi-static scheduling of fine-grained accelerators is not feasible with a software scheduler *
* Boutellier et al. (2009) Journal of Signal Processing Systems
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Proposed solution
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Proposed solution
• In this work we propose a dedicated circuit for quasi-static scheduling
• Appends a new schedule part in 3 clock cycles
• Performs dispatching independently• Area is 3300 gates when
– supporting 4 accelerators– 13 alternative schedule parts
• schedule parts stored in the memory of the circuit
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Experiments
1. MPEG-4 SP video decoding
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Experiments
2. Fine-grain accelerator scheduling
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Experiments
• Both experiments were performed on an Altera Cyclone III FPGA
• The CPU and accelerators were Nios II processors
• Experiment 1 performed decoding of real video
• In Experiment 2 the accelerators just moved data around
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Results
Static schedule
Quasi-static sch.
Experiment 1 141 Mcycles 47 McyclesExperiment 2 1.13 Mcycles 0.78 Mcycles
Our circuit enables quasi-static multiprocessor scheduling with a negligible overhead, as it is not feasible in software
MACHINE VISION GROUP, JANI BOUTELLIER, 06.10.2009
Thank you for your attention.Questions?