eng3050 embedded reconfigurable computing systems application specific instruction processors...

66
ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Application Specific Instruction Processors “ASIPS” Processors “ASIPS” Reconfigurable Processors” Reconfigurable Processors”

Upload: phoebe-greer

Post on 05-Jan-2016

234 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 Embedded Reconfigurable

Computing Systems

Application Specific Instruction Application Specific Instruction Processors “ASIPS” Processors “ASIPS”

““Reconfigurable Processors”Reconfigurable Processors”

Page 2: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 2

TopicsTopics

ASIPs: DefinitionASIPs: Definition MotivationMotivation How to customize ASIPsHow to customize ASIPs Tools for ASIPsTools for ASIPs ApproachesApproaches ConclusionsConclusions

Page 3: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 3

References

1.1. ““Engineering the Complex SOC: Fast, Flexible Engineering the Complex SOC: Fast, Flexible Design with Configurable Processors”, by Chris Design with Configurable Processors”, by Chris Rowen, 2004,Rowen, 2004,

2. “Xtensa Architecture and Performance”, Tensilica Inc, Sep 2002.

3. “Configurable Processors: What, Why, How?”, Tensilica Inc, June 2007

Page 4: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 4

Microprocessors and ASICs

For the ultimate in flexibilityflexibility, programmers map the application onto a general-purpose microprocessor.

For the ultimate in performanceperformance, logic designers map the application into a custom circuit.

App

licat

ion

Microprocessor

ASIC

Programmers

Logic designers

FPGA

Page 5: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 5

Classic Options for Systems-on-Chip

Design Gap!

Page 6: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 6

General Purpose Processors

Page 7: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 7

A Case for Customization

General Purpose Processors: Flexible, but tends to customize the application to

the architecture! ASICS:

High performance, but Expensive, and tends to customize the architecture to the application!

We need to find a technology that can:We need to find a technology that can: customize the architecture to the applicationcustomize the architecture to the application and at the same time flexible and cheap!and at the same time flexible and cheap!

Page 8: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 8

Processor Specialization:Get the Best of Both Options

Gains!

Page 9: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 9

Motivations: reduce size

Pentium 4 die can fit about 50 ARM9 processors at 0.13um, and 80 at 0.10um

At 0.13um and 250MHz clock, ARM9 dissipates 0.1W50 ARM9s = 5W

12mm

12mm

ARM9 at 0.13um=3mm2

Pentium4 at 0.13um= 144mm2

Cost, Power, and Size are important for embedded applications! Processing vs. Dedicated hardware (ASIC)? System-On-a-Chip concept

Page 10: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 10

Programmable Processors

Past Microprocessor Microcontroller DSP Graphics

Processor

Now / Future Network Processor Sensor Processor Crypto Processor Game Processor Wearable Processor Mobile Processor

Page 11: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 11

A Case for Customization General purpose processors handles many

applications fairly well, but…Each application has different requirementsThe instruction set is fixed!Data path width may not suit your application!Cache size/configuration may not be optimalRegister file is either too small or …Functional units might be missing or … Internal busses are slow or too narrow …

Page 12: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 12

Processor Customizations

Specialized Specialized instructionsinstructions

Optimization, searching, classification, …Optimization, searching, classification, …

Specialized Specialized functional unitsfunctional units

MAC Units, Special Comparators, Sorting UnitsMAC Units, Special Comparators, Sorting Units

Parameterized Parameterized busses and datapathsbusses and datapaths

8-bit, 16 bits, synch/async busses8-bit, 16 bits, synch/async busses

Parameterized Parameterized register filesregister files

Parameterized Parameterized cachescaches

Cache size, replacement strategy, …Cache size, replacement strategy, …

P

RegFile

D/I - Caches

FU1 FU2 FU3

Page 13: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 13

Application-specific instruction processors An ASIP is a stored-memory CPU whose architecture architecture

is tailoredis tailored for a particular set of applications. The instruction-sets tailoredinstruction-sets tailored to specific applications or

application domains Customized functional units within data pathwithin data path for high

performance Programmability allows changesallows changes to implementation, Can be used in several differentused in several different products.

Application-specific architecture provides smaller silicon areaarea, higher speedspeed, lower power consumptionpower consumption.

Page 14: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 14

RecallRecall: Different levels of coupling: Different levels of coupling

FU

Workstation

Coprocessor

CPU Memory Caches

I/O Interfac

e

Standalone Processing Unit

Attached Processing Unit

Tightly CoupledTightly Coupled

Loosely CoupledLoosely Coupled

Page 15: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 15

FPGA

ASIC P

Design costDesign costTime-to-marketFlexibilityDeterminismPowerPowerPerformancePerformance

Design costDesign costTime-to-marketTime-to-marketFlexibilityFlexibilityDeterminismPowerPerformance

Design costTime-to-marketFlexibilityDeterminismPowerPerformance

Application Specific Instruction Processors

Page 16: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 16

FPGA

ASIC P

Design costDesign costTime-to-marketTime-to-marketFlexibilityFlexibilityDeterminismDeterminismPowerPowerPerformancePerformanceASCP

Application-Specific Customizable Embedded Processor– Helps preserve the benefits of generality Helps preserve the benefits of generality – Alleviates the drawbacks of general-purpose processorsAlleviates the drawbacks of general-purpose processors

Embedded Applications Requirements

Page 17: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 17

Performance vs. FlexibilityF

lexi

bil

ity

Performance

ASIC

GPP

DSP

RCS

ASIPs!!

Page 18: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 18

ASIPs: Advantages Tailor for specific applications by:

Customize the instruction set Add Customized execution units that efficiently

perform task specific algorithms. Add special registers sized to the natural data

types of the tasks to be performed. Instructions will often execute in one or two

clock cycles which will keep clock rates low and thus energy consumption low as well.

You can further customize the processor as your application evolves with time.

Page 19: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 19

ASIP Design MethodologyA

pplic

atio

n

Design-time configurable

microprocessor

Profile the application

Create custom hardware and instructions to

accelerate critical application sections

Most of the application runs as

execution of general-purpose

instructions

Page 20: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

20

ASIP based approach R

econ

fig

ura

ble

In

str

ucti

on

Set

Pro

cessors

C Parsing

Optimizations

Inst. Identification

Inst. Selection

Config. Scheduling

Code Generation

C Code

Assembly Code

HardwareGeneration

Configuration bits

HardwareEstimator

Compiler Structure

Page 21: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 21

Instruction Set Extension

Idea:Provide a way to augmentaugment the processor’s

instruction set with? Operations needed by a particular application

Page 22: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

22

Determinates of CPU PerformanceDeterminates of CPU Performance

CPU time = Instruction_count x CPI x clock_cycle

Instruction_count

CPI clock_cycle

Algorithm

Programming language

Compiler

ISA

Processor organization

TechnologyX

XX

XX

X X

X

X

X

X

X

ENG3050 ERCS

Page 23: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 23

Instruction Specialization The instruction set determines the functions

directly implemented in hardware and the operations which can be performed in parallel.

How to improve the instruction set?How to improve the instruction set? Operations which can frequently be scheduled

concurrently should be coded in the same instruction

Operations which can often be chained should be coded in the same way

Multiply-accumulation Vector operations

Page 24: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 24

Computationally demanding parts of applications run on special hardwarespecial hardware

New instructions New instructions use the special hardware

Instruction Set Customization

CUSTOM

XOR

MPY LD

XOR

SHR

XOR

MOV

MPYLD

SHR

AND

Page 25: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

25

Automatically Collapsing Clusters of Instructions into New Ones

If the ad-hoc functional unit completes the

job faster GAIN

One ad-hoc complex operation instead of a long

sequence of standard ones

ENG3050 ERCS

Page 26: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 26

Function Unit and Data Path Specialization

To reduce power consumption and increase performance Word length adaptationWord length adaptation Implementation of application specific HW functionsspecific HW functions

String manipulation String matching Pixel operation Multiplication-accumulation

Special consideration: clock frequency It may be better to use a slower clock in embedded

systems.

Page 27: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 27

Customized Function Units Goal: support important

computation subgraphs Add specialized units within

the data path of the processor Exploits subgraph parallelism Allows natural data

propagation

FU FU FU …

FU FU FU …

IN 1

IN 2

Fetch

Issue

…ALU

ALU

CCA

… WB

Page 28: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 28

Interconnect Specialization

Specialization can be done in respect to: Interconnect of functional modules

Reduced bus instead of standard system bus to save cost or power consumption

Dedicated connection between registers (accumulator) and memories to increased parallelism

Protocol usedProtocol used for the communication between components.

Synchronous Asynchronous Semi synchronous

Page 29: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 29

Optimizing Power in ASIPs

29

Configurable processors have a deep influence on low power design in two ways: Compared to hardwired logic, software based design

allows for more sophisticated algorithms and control of operating modes.

In many applications, the software can be much smarter than custom RTL about when to run and how fast

ASIPs pack the same work into far few cycles than GPPs allowing the SOC to run at a lower clock frequency (How?)

Page 30: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 30

Optimizing Power in ASIPs

30

E = alpha C V2n E Energy use due to active switching in

CMOS logic C is the total capacitance of all the switched

nodes in the circuit V is the voltage alpha is the average fraction of circuit nodes

switching between one and zero each cycle n is the number of cycles required to execute

the function.

Page 31: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 31

Optimizing Power (insight)

31

The impact of a good processor configuration is to sharply reduce ‘n’ , while increasing ‘C’ only slightly relative to a baseline processor.

ASIPs can be quite smart about activating execution units only when necessary. The processor generator can determine the

combinations of logic blocks that must be active at each stage of the pipeline and create logic for fine-granularity clock gatingclock gating thereby reducing ‘alpha’

Page 32: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 32

ToolsTools??

Page 33: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 33

Tensilica

Tensilica has two main product lines of 32-bit 32-bit

processor coresprocessor cores for SOC design (IP):1. Diamond Standard processors (non modifiable)

2. Xtensa processors (can be modified)

Tensilica also has several CAD tool flowsCAD tool flows to extend the instructions sets

TIE Language

XPRESS Compiler

Page 34: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 34

1. Tensilica Diamond Processor Are a set of off-the-shelf synthesizable cores (fixed and

not configurable) directly available from Tensilica and foundry partners that range from area-efficient, low-power controllerscontrollers an audioaudio processor, a high-performance DSPDSP, and a videovideo processor

Diamond Standard processors come with a comprehensive software tool set: Compilers Assemblers Debuggers, ….

Page 35: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 35

2. Tensilica Xtensa Processor Tensilica’s Xtensa processors are synthesizable

processors that are configurable and extensible.!

Page 36: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 36

Xtensa Processors Architecture The Xtensa Instruction Set Architecture (ISA) is a 32-bit

RISC architecture featuring a compact instruction set optimized for embedded designs.

RISC?

• A small number of memory addressing modes• Large uniform register files for computation operations• Fixed-size instruction words Optimized Pipelined Architecture Simple and fixed instruction-field encoding Memory access via loads and stores of registers

Page 37: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 37

Xtensa Processors Architecture The architecture has:

a 32-bit ALU; 16, 32 or 64 general-purpose physical registers; six special purpose registers; Cache:Cache:

up to 32 KB and up to 32 KB and 1,2,3,4 way set associative cache?1,2,3,4 way set associative cache? Replacement Policy?Replacement Policy? Write back vs. Write through?Write back vs. Write through?

Page 38: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 38

Xtensa Processors Architecture The architecture has:

a 32-bit ALU; 16, 32 or 64 general-purpose physical registers; six special purpose registers; 5 or 7 stage pipelines:5 or 7 stage pipelines:

5-stage: Power Usage: 47 uW/MHZ @ 350 MHz 5-stage: Power Usage: 47 uW/MHZ @ 350 MHz 7-stage: Power Usage: 57 uW/MHz @ 400 MHz7-stage: Power Usage: 57 uW/MHz @ 400 MHz

Page 39: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 39

Tensilica Xtensa Architecture

Page 40: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 40

Xtensa Processor Generator The designer can select from a broad selection of predefined

standard RISC microprocessor options and can add instructions and register extensions to the tailored processor.

Or the designer can use Tensilica's XPRES Compiler to automatically tailor the processor to optimize existing C/C++ code. The Xtensa Processor Generator then creates the complete processor

solution set – pre-verified processor hardware description in source RTL (Verilog or

VHDL), plus supporting hardware implementation methodology scripts.

This complete package includes software development tools including commercial RTOS support, and comprehensive system modeling and

modeling co-verification support.

Page 41: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 41

XPRES Compiler

Page 42: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 42

XPRES CompilerXPRES Compiler

Page 43: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 43

XPRES Compiler

Page 44: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 44

Tensilica Instruction Extension (TIE) TIE is a Verilog-like language used to

describe desired custom instructions.

You can express the desired functionality in the Tensilica Instruction Extension (TIE) language.

TIE helps you get orders of magnitude performance increases out of your processor design.

1. Fusion,

2. SIMD (Single Instruction Multiple Data),

3. FLIX (Flexible Length Instruction Encoding)

Page 45: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 45

TIE Extensions

Page 46: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 46

(I) Fusion

Page 47: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 47

Affect of TIE Instructions

Page 48: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 48

TIE Flow

Page 49: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 49

Fusion Example

Page 50: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 50

Exploiting Parallelism

Page 51: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 51

Creating SIMD TIE Execution Units

Page 52: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 52

FLIX Acceleration

Page 53: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 53

Creating FLIX (VLIW) Acceleration An Xtensa processor can become a multi-issue VLIW processor.

The Xtensa C/C++ compiler’s is capable to aggressively extract instruction-level parallelism from the code. The compiler can schedule multiple operations in a VLIW instructions.

By allowing two or three instructions to execute simultaneously, FLIX allows the processor to act as a 2- or 3- issue VLIW CPU, accelerating general purpose code by 40-60 %.

Page 54: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 54

FLIX

Page 55: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 55

Estimation (energy)

Page 56: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 56

Example: MPEG Acceleration One of the most difficult parts of encoding MPEG-4 video

streams is motion estimation which searches adjacent video frames for similar pixel blocks as part of the MPEG-4 decompression algorithm.

The search algorithm’s inner loop contains a SAD (sum of absolute differences) algorithm consisting of Subtraction Absolute value operation Addition of the resulting value with previously computed values

For a QCIF (quarter common image format) frame, a 15-Hz frame rate and an exhaustive search motion estimation scheme, SAD operations require slightly more than 641 641 millionmillion operations/sec.

Page 57: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 57

MPEG Acceleration Combining all three SAD component operations (subtraction, absolute

value, addition) into one operation that executes in one clock cycle and executing 16 single-pixel SAD operations in one SIMD SAD SIMD SAD instruction during the same clock cycle reduces the cycle count from 641 million reduces the cycle count from 641 million instructions/sec to 14 million instructions/sec – a 98% reductioninstructions/sec to 14 million instructions/sec – a 98% reduction

Page 58: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 58

MPEG Acceleration The full MPEG-4 decoder adds approximately 100,000 gates to the base

processor and implements a 2-way (coder and decoder) QCIF video coded that operates at 15 frames/sec.

When instructions are added to accelerate all of these MPEG-4 decoding tasks, creating an MPEG-4 SIMD engine within the tailored processor, the results can be quite surprising.

The resulting SIMD engine drops the number of cycles required to decode the MPEG-4 video clips from billions to millions and the required processor operating frequency by roughly 30x to around 10MHz (power dissipation!!)

Page 59: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 59

How Xtensa Compares

Page 60: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

Reconfigurable Instruction Reconfigurable Instruction Set ProcessorsSet Processors

ENG3050 ERCS 60

Page 61: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

61

Two roads to customizationTwo roads to customization

Augment GPPs with programmable logicCouple standard processor (ARM, MIPS) with

an FPGA fabricFixed processor instruction setFPGA implements custom instructions

Implement them in FPGAsCustomize instructions at compile time or at

run time

ENG3050 ERCS

Page 62: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

Reconfigurable Instruction Set ProcessorsReconfigurable Instruction Set Processors

Duplicated instruction decode logic (2 simmetrical data- channels)

Duplicated commonly used function Units (Alu and Shifter)

All others function units are shared (DSP operations, Memory handler)

A tightly coupled pipelined configurable Gate Array

ENG3050 ERCS 62

Page 63: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

Dynamic Instruction Set Extension(1)

for (i=0; i<16;i++) { temp = abs (v1[i]-v2[i]); out = out + temp; }

A-B B-A

MUX

Accumulator

for (i=0; i<16;i++) {

pgaop (out, v1[i], v2[i]);

}

PiCoGAR

egis

ter

File

ALUs & Multiplier

Memory Unit

A-B

B-A

MU

XA

ccu

mu

lato

r

Original code Optimized XiRisc code

ENG3050 ERCS 63

Page 64: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 64

Summary Configurable and extensible (tailorable) processor cores are a

combination of hardware and software IP that give system developers the ability to tailor processors for better performance tailor processors for better performance in specific applicationsin specific applications

The main difference between GPPs and ASIPs is specializationspecialization. It is important to note that specialization must not compromise flexibility!

Advantages:Advantages: Faster, more power efficient, less silicon areaFaster, more power efficient, less silicon area No other company will have your version of that task-No other company will have your version of that task-

specific processor.specific processor. No one will have the matching compiler and software tool No one will have the matching compiler and software tool

chain.chain.

Page 65: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 65

Conclusion ASIPs is somehow related to hardware/software co-designrelated to hardware/software co-design methodology

since a GP is involved along with hardware accelerators in the form of specialized functional units.

Tensilica provides all the necessary tools to automatically createautomatically create Application Specific Instruction Set Processors in minimum time.

The designer can rely either on the TIE language to manually extendTIE language to manually extend the instruction set of the newly

created processor. Another option would be to rely on the Tensilica XPRESS compilerTensilica XPRESS compiler to

automatically createautomatically create the processor and all the necessary software development tools such as compilers, debuggers …

The designer can extend the capabilities of the processor by changing the cache, ports, queues, register files, functional units, ….

It is worth pursuing using the Tensilica tools to perform some type of perform some type of design explorationdesign exploration for your application before you attempt to custom build hardware accelerators.

Page 66: ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

ENG3050 ERCS 66