signal processing catches the multi-core wave

47
Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005. TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005. TM Dan Bouvier Director, Advanced Processor Architecture Freescale Semiconductor Signal Processing Catches the Multi-core Wave: Which Architecture is Right for Your Application? GSPx 2005 Conference 25 October 2005

Upload: others

Post on 20-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Signal Processing Catches the Multi-core Wave

Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the

Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service

names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.

TM

Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other

product or service names are the property of their respective owners. © Freescale

Semiconductor, Inc. 2005.

TM

Dan BouvierDirector, Advanced Processor ArchitectureFreescale Semiconductor

Signal ProcessingCatches theMulti-core Wave:Which Architecture is Rightfor Your Application?

GSPx 2005 Conference

25 October 2005

Page 2: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Prologue

• Decisions taken for scheduling and mapping at a high level

of abstraction have a major impact on the global design flow

“Only positive consequences encourage good future performances”

John Harvey-Jones

• Completing tasks in the least time possible is highly desirable

• Finally –

it is well known that nine women can deliver a

child in one month…

Page 3: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Evolution of this presentation

• The underlying motivations to move to multi-core

• A mathematician’s view of signal processing and

applicability to parallelism

• Some attributes of signal processing applications Signal

Processing Applications

• Applying Applications to Signal Processors

• Futures: Longer term processor evolution

Page 4: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

An appetite for more

• By the year 2010, the average person

will encounter more than 300 embedded

processors every day.

-- Semico Research

• Applications of all forms continue

to demand more computational

performance

• Traditional means for scaling

performance have run their course

• We are on the cusp of an exciting

transition in how we meet the

performance demands

Page 5: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Is the system now the chip?

“It may prove to be more economical to build large

systems out of smaller functions, which are separately

packaged and interconnected.”

“The availability of large functions, combined with functional design and

construction, should allow the manufacturer of large systems to design and

construct a considerable variety of equipment both rapidly and economically.”

April 19, 1965, Gordon Moore , 35th Anniversary Issue of Electronics magazine

“Clearly, we will be able to build such component-crammed equipment.”

Page 6: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Performance scaling and technology challenges

• Clock rate improvements slowing: 40%/year 12%/year! Pipelining has increased by factor of 4 in last decade

– not possible in next decade

Semiconductor

Technology

Pipelining

GAP

8-10 FO4

Pipeline

Historical

Microarchitecture

Technology

Source: UT Dept. Computer Science

Page 7: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

How did we get here? – the physics

ITRS Historical Technology Trends

0.1

1

10

100

250 180 130 90 65 45 32

Transistor Gate Delay

Technology Node

Page 8: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

How did we get here? – the physics

ITRS Historical Technology Trends

0.1

1

10

100

250 180 130 90 65 45 32

Transistor Gate Delay

Local Interconnect

Global Interconnect

with RepeatersGlobal Interconnect

w/o Repeaters

Technology Node

Page 9: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

How much logic can we touch in 1 clock cycle?

• Transistors getting faster

• But wire delays begin to dominate

• Historical Solution: Divide and conquer with longer pipeline (less work per clock)

At 1GHz At 6GHz

Page 10: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Time for a new approach

• The increase in performance is roughly proportional to the square root of

the increase in complexity.

! Doubling the logic of a processor core delivers approximately 40 percent more

performance

“There ain’t no such thing as a free lunch.” R.A. Heinlein, The Moon is a Harsh Mistress

Speculation

Multi-Issue

VLIW

Branch

Prediction

Pipelining

Out of Order

Execution

Pollack’s Rule

Page 11: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Power Trends with the PowerPC® Processor Family

Hitting the application power envelope wall

603603

750

7410

7455

7457

7447A

7448F

requency (

MH

z)

Fre

quency/w

Page 12: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Are multi-core processors the answer?

• Multi-core has better performance

per watt

Po

wer

Po

we

r

Po

we

r

Single Core

2x F1

Single Core

F1

Dual Core

F1

Perf Pe

rf

Pe

rf

Perf

orm

ance

Time

2x core

2x frequency

• Options for higher performance

! Double the core speed

! Double the cores

• Both can have performance close to

that of 2x a single core at F1

Page 13: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

What forms might these processors take?

• General-purpose processors with Vector SIMD engines

• Multi-core Digital Signal Processors

• Hybrid processors – GPP + DSP

Page 14: Signal Processing Catches the Multi-core Wave

Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the

Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service

names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.

TM

Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other

product or service names are the property of their respective owners. © Freescale

Semiconductor, Inc. 2005.

TM

Mathematician Viewof Signal Processingand Applicabilityto Parallelism

Page 15: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Parallelism

• Sequential algorithms historically preceded parallel algorithms

• In simple terms parallelism viewed as…

! Algorithms that can be represented as a directed graph

> nodes presenting operations

> edges presenting data flow

! If graph contains layers of parallel nodes

> it could be executed in parallel

> mapped to a parallel platform

• Graph representation can help to determine optimal

depth of the algorithm

Page 16: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Scalar+

+

+

+

+

+

+

C

B

D

A

E

F

G

H

+ + + +

Partial

CheckSum

Once per

Function

Vector

A CB D E F G H

+

- is Add with Carry+

Parallel code example: inner loop of the checksum

Page 17: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

• Many signal transforms can be represented by a transformation matrix A

which is multiplied by an input data X vector of degree n, to produce the

desired output vector Y=AX, or formally:

• If we assume that we also have processing elements (PEs)

• Then at best we would need steps on possible PEs

• Many transforms also use complex math

! which could be translated to use real numbers, by replacing every

complex number a + bj by the matrix

Math background

!=

=n

j

jijixay

1

)(log2

2 nO )( 4nO

2n

Page 18: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Generic matrix multiply

for(i=0;i<N;i++) {

for(j=0;j<N;j++) {

for(k=0;k<N;k++){

c[i][j] = c[i][j] + a[i][k]*b[k][j];

}

}

}

Math background ! matrix multiply example

Page 19: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

!!!!

"

#

$$$$

%

&

!!!!

"

#

$$$$

%

&

=

!!!!

"

#

$$$$

%

&

33323130

23222120

13121110

03020100

33323130

23222120

13121110

03020100

33323130

23222120

13121110

03020100

BBBB

BBBB

BBBB

BBBB

AAAA

AAAA

AAAA

AAAA

CCCC

CCCC

CCCC

CCCC

333323321331033033

330323021301030003

320322021201020002

310321021101010001

300320021001000000

...

BABABABAC

BABABABAC

BABABABAC

BABABABAC

BABABABAC

+++=

+++=

+++=

+++=

+++=Note: All source elements

are available for computation

at start

Math background ! matrix multiply example

Page 20: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Math background ! matrix multiply example

• Same algorithm could be presented in several formats

• The unrolled version in this case is directly “mappable” to SIMD

(AltiVec™ Multiply-Sum instruction)

vc[0] = va[0]*vb[0] + va[1]*vb[4] + va[2]*vb[8] + va[3]*vb[12];

vc[1] = va[0]*vb[1] + va[1]*vb[5] + va[2]*vb[9] + va[3]*vb[13];

vc[2] = va[0]*vb[2] + va[1]*vb[6] + va[2]*vb[10]+ va[3]*vb[14];

…….

vc[14] = va[12]*vb[2] + va[13]*vb[6] + va[14]*vb[10] + va[15]*vb[14];

vc[15] = va[12]*vb[3] + va[13]*vb[7] + va[14]*vb[11] + va[15]*vb[15];

for(i=0;i<N;i++) {

for(j=0;j<N;j++) {

for(k=0;k<N;k++){

vc[i][j] = vc[i][j] + va[i][k]*vb[k][j];

}

}

}

333323321331033033

330323021301030003

320322021201020002

310321021101010001

300320021001000000

...

BABABABAC

BABABABAC

BABABABAC

BABABABAC

BABABABAC

+++=

+++=

+++=

+++=

+++=

Page 21: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Math background ! matrix multiply example

• As a result, four 4x4 matrix of bytes speedup can achieve 400%

vresult0

va[0]*vb[0]+va[1]*vb[4]+

va[2]*vb[8]+va[3]*vb[12]

va[0]*vb[1]+va[1]*vb[5]+

va[2]*vb[9]+va[3]*vb[13]

va[0]*vb[2]+va[1]*vb[6]+

va[2]*vb[10]+va[3]*vb[14]

va[0]*vb[3]+va[1]*vb[7]+

va[2]*vb[11]+va[3]*vb[15]

vresult0 = vec_msum(va_temp, vb_temp, vzero);

0 1 2 3 4 5 6 7 8 9 1

0

1

1

1

2

1

5

1

3

1

4

D

C

Prod

B

A

Page 22: Signal Processing Catches the Multi-core Wave

Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the

Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service

names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.

TM

Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other

product or service names are the property of their respective owners. © Freescale

Semiconductor, Inc. 2005.

TM

Some Attributes ofSignal ProcessingApplications

Page 23: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Math background

• If serial form of algorithm was computationally “stable,”

it will remain as such in parallel form

! “internal” parallelism

• Majority of traditional DSP algorithms fall under this category

! Fast Fourier transform (FFT)

! Discrete Fourier transform (DFT)

! Discrete cosine transforms (DCTs)

! Walsh-Hadamard transform (WHT)

! Various filters

! ECC codes (Viterby, Convolution, CRC)

! and many others

Page 24: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Signaling processing code types

• Two major classes

!Signal processing> filtering

> transformation (such as FFT, DCT, etc)

> convolution of signals

> correlation

!Baseband processing> channel coding

– convolutional, turbo, Reed-Solomon, LDPC (Low Density Parity-check Code)

> decoders of above codes such as Viterbi decoder for convolutional codes

> Cyclic Redundancy Code ( CRC )

> source coding such as voice compression and image (still or video) compression

• In addition, there are many other signal processing types

! Example: modulation

Page 25: Signal Processing Catches the Multi-core Wave

Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the

Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service

names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.

TM

Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other

product or service names are the property of their respective owners. © Freescale

Semiconductor, Inc. 2005.

TM

Applying Applicationsto Signal Processors

Page 26: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Multicore DSPs in the converged network

Packet Trunk

Bypass

Enterprise

IP-PABX

IP-Phone

TDM-to-IP/ATM

Gateway

Trunking

Gateways

MSC/RNC

Fax

Server

Workstation

ISP-RAC

Internet

Printer

ISP WEB Servers

DSLAM

DSL Router

IP-TV

ATM Switch

CATV

CMTS

Media GW

802.11 Notebook

3G Network

TRAUGateway

VideoTranscoding

Gateway

Enterprise

Network

SOHO/Home

NetworkInfrastructure

DSPs

Access

Network

ISP

Network

802.11 Notebook

802.11 AP

Node-B

Broadband

Network

Video-

Phone

Content Server

Cable Modem

Enterprise

Network

PSTN

Switches

Packet

Video Streaming

Server

802.11 AP

Page 27: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

The ways…

! Performance target

! Workload mix

! Power consumption

DSP GPP

! Code development and portability

! Memory access profile

! Application stability

• Many tradeoffs in choosing processors for the application

Page 28: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Multi-Core DSPs are here now - MSC8122

• Four 500MHz SC140 DSP cores

• 1.4MByte internal SRAM

• 10/100BT Ethernet

interface support

• 4 TDM interfaces

• DSI port (32/64)

• 16-channel DMA engine

• In production

“Large System built from

smaller functions”

Page 29: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Multi-core programming considerations

• How should memory be partitioned?

! Multi-level memories

! Do I want to use the instruction cache?

• What do I need to do to allow for multi-threading?

! Re-entrancy

• How can device resources be safely shared by the cores?

• How can the cores and other hosts communicate efficiently with

each other?

• How do you partition an application?

Page 30: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Basic elements for multi-core DSP usage

Core-to-coreInterrupt(VIRQ)

DMA

M2SharedMemory

BinarySemaphore

SharedInterruptSource

Multi-coreLink managedby one project

InstructionCache

(ICache)

WriteBuffer(WB)

SharedSubroutine

RTOS

Multi-taskprogramming

C RuntimeLibrary

MemoryUtilization

Necessary for

Multi-core DSP

Inter-Core

Synchronization

and Communication

Three major elements are

Binary Semaphore, Core-to-

core Interrupt and DMA.

Legend

RealizedBy hardware

RealizedBy software

Page 31: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

SharedInterruptSource

Core-to-coreInterrupt(VIRQ)

DMA

BinarySemaphore

Inter-Core Synchronization and Communication

M2SharedMemory

Multi-coreLink managedby one project

InstructionCache

(ICache)

WriteBuffer(WB)

SharedSubroutine

RTOS

Multi-taskprogramming

C RuntimeLibrary

MemoryUtilization

Necessary for

Multi-core DSP

Inter-Core

Synchronization

and Communication

Three major elements are

Binary Semaphore, Core-to-

core Interrupt and DMA.

Legend

RealizedBy hardware

RealizedBy software

Page 32: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Core-to-coreInterrupt(VIRQ)

DMA

BinarySemaphore

SharedInterruptSource

Hardware Support for Efficient Application

M2SharedMemory

Multi-coreLink managedby one project

InstructionCache

(ICache)

WriteBuffer(WB)

SharedSubroutine

RTOS

Multi-taskprogramming

C RuntimeLibrary

MemoryUtilization

Necessary for

Multi-core DSP

Inter-Core

Synchronization

and Communication

Three major elements are

Binary Semaphore, Core-to-

core Interrupt and DMA.

Legend

RealizedBy hardware

RealizedBy software

Page 33: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

SharedInterruptSource

InstructionCache

(ICache)

WriteBuffer(WB)

Core-to-coreInterrupt(VIRQ)

DMA

BinarySemaphore

Software Programming for Multi-core DSP

M2SharedMemory

Multi-coreLink managedby one project

SharedSubroutine

RTOS

Multi-taskprogramming

C RuntimeLibrary

MemoryUtilization

Necessary for

Multi-core DSP

Inter-Core

Synchronization

and Communication

Three major elements are

Binary Semaphore, Core-to-

core Interrupt and DMA.

Legend

RealizedBy hardware

RealizedBy software

Page 34: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Parallel computation

• Parallel computation is a very vague and general topic…

• Flynn’s classification

! SISD, SIMD, MISD, MIMD

• SIMD and MIMD are both in essence parallel platforms, but…

• SIMD - Single Instruction Multiple Data

! Implemented as one logic control unit, but multiple PEs operating on multiple

data streams

• MIMD could vary greatly by degree of integration:

! Multi-node MIMD system

! Multi-processor system

! Multi-core device

Page 35: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Parallel computation

• A MIMD system needs a model to partition workloads

into it

• Parallel Random Access Machine (PRAM) is widely

used

• The operation of a synchronous PRAM can result in

simultaneous access to the same location in shared

memory

• Synchronization usually achieved through system of

locks or semaphores

Page 36: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Strategies for multithreading synchronization

• Binary semaphore - mutual exclusion for sharedresources

! Guaranteed exclusive access to avoid corruption by a preemptive task

• Ideal mapping via Lock-free and Wait-free algorithms

! Allows multiple threads to concurrently read and write shared data

! Every step taken brings progress to the system

! No synchronization primitives, such as mutexes or semaphores, canbe involved

! "Wait-free"

>A thread can complete any operation in a finite number of steps

>Regardless of the actions of other threads

Page 37: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Processing convergence

• General-purpose processor• Exploit data-level parallelism

• Add a vector SIMD unit

• Leverage a unified programming

model

Vector UnitFPUIU

Dispatch

Cache / Memory

128 bits64 bits32 bitsInstr

ucti

on

Str

eam Execution Flow

VR30

VR31

VR0

VR1

VR2

Vector Register File

Vector ALU

Vector Permute

128 128 128

128128

Page 38: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Increased single thread performance with SIMD MPC7447A

MPC7447A Altivec™ Performance Improvement

1 10 100 1000

OSPF/Dijkstra

Route Lookup/Patricia

Packet Flow - 512 kbytes

Packet Flow - 1 Mbyte

Packet Flow - 2 Mbytes

Autocorrelation - Data1 (pulse)

Autocorrelation - Data2 (sine)

Auto-Correlation - Data3 (speech)

Convolutional Encoder - Data1 (xk5r2dt)

Convolutional Encoder - Data2 (xk4r2dt)

Convolutional Encoder - Data3 (xk3r2dt)

Fixed-point Bit Allocation - Data2 (typ)

Fixed-point Bit Allocation - Data3 (step)

Fixed Point Bit Allocation - Data6 (pent)

Fixed Point Complex FFT - Data1 (pulse)

Fixed point Complex FFT - Data2 (spn)

Fixed Point Complex FFT - Data3 (sine)

Viterbi GSM Decoder - Data1 (get)

Viterbi GSM Decoder - Data2 (toggle)

Viterbi GSM Decoder - Data3 (ones)

Viterbi GSM Decoder - Data4 (zeros)X Factor

12.09X

2.89X

Page 39: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Big performance gain with minimal power impact

MPC7447AL AltiVec Power Measurements @ 1.4GHz/1.3V

0

2

4

6

8

10

12

14

16

18

20

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Benchmarks

Po

we

r (W

att

s)

Without AltiVec

With AltiVec

Average Measured Typical Power

Page 40: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Higher numeric performance power density achieved

• Multi-core performance scaling

• Dual 1.67GHz e600 PowerPC®

processor cores

• Data-level parallelism

• Two 128b AltiVec™

SIMD Engines

• System-level parallelism

• Serial RapidIO 1x/4x

Dual-Core PowerPC® Processor – MPC8641D

Page 41: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Challenges

Inertia:

More than 3 decades of exploiting Instruction Level Parallelism (ILP)

! Sequential thinking

! Programming style

• Need development environments to ease transition

! New compilation strategies, libraries, language support

! Perhaps a portable SIMD API –for common library development

! New analysis tools

! Methods to make code portable

! Benchmark retooling

Page 42: Signal Processing Catches the Multi-core Wave

Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the

Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service

names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.

TM

Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other

product or service names are the property of their respective owners. © Freescale

Semiconductor, Inc. 2005.

TM

Futures:How Will ProcessorsEvolve Longer TermThe Challenges in Long-Term

Processor Evolution

Page 43: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Digital signal processor lineage

• 1st Generation

Specialized Hardware

for Accelerating Multiplications

• Increased Parallel Operations

• Single-Issue Complex-Instruction

• Multiple Instructions in every cycle

(VLIW, Superscaler)

• Single Instruction Multiple Data (SIMD)

• Parallel Execution MIMD (Multi-core)

2010s

2000s

1990s

1980s

Where might the next step take us?

Page 44: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

The solution

• Long-term performance scaling will come fromparallelism! Must have 100s of simultaneous instructions in flight

• Processor core architecture must remain simple! Frequency and power still important

• Simple replication of cores not the end game! Only so much unassisted thread level parallelism to exploit

• Parallel computing not a new topic

• Next-generation architecture must! easily adapt to either thread- or instruction-level parallelism

! have tighter marriage between hardware and software

! Examples:

> Heterogeneous processors

> DARPA – Polymorphous Computing ArchitectureRAW, Smart Memories, TRIPS

Page 45: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Summary

• REALITY:

Continued appetite for more processing performance

• DISCOVERY:

Physics driving us to new approaches

• MAPPING:

Signal processing applications can leverage parallelism

• ENABLEMENT:

Multi-core signal processing is here now

Page 46: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.

Processing Intelligence and

Behind the World’s Networks

We Deliver the

Connectivity Solutions

Page 47: Signal Processing Catches the Multi-core Wave

TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of

their respective owners. © Freescale Semiconductor, Inc. 2005.