cse 237acse 237a design with microprocessors

71
CSE 237A CSE 237A Design with Microprocessors Tajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego. 1

Upload: others

Post on 15-Jun-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 237ACSE 237A Design with Microprocessors

CSE 237ACSE 237A Design with Microprocessors

Tajana Simunic RosingDepartment of Computer Science and EngineeringUniversity of California, San Diego.

1

Page 2: CSE 237ACSE 237A Design with Microprocessors

Hard are platform architect reHardware platform architecture

2Tajana Simunic Rosing

Page 3: CSE 237ACSE 237A Design with Microprocessors

CPUsCPUsCPU performance

C l tiCycle time.CPU pipeline.

Latency &ThroughputMemory system.

Indeterminacy in execution Cache miss: compulsory, conflict, capacity

CPU power consumption.Compare

ARM7 TI C54x TI 60x DSPs TriMedia PentiumARM7, TI C54x, TI 60x DSPs, TriMedia, Pentium MMX, Xilinx Vertex II, single purpose controllers

3Tajana Simunic Rosing

Page 4: CSE 237ACSE 237A Design with Microprocessors

Selecting a MicroprocessorSelecting a MicroprocessorIssues

T h i l d i tTechnical: speed, power, size, costOther: development environment, prior expertise, licensing, etc.

Speed: how evaluate a processor’s speed?Clock speed – but instructions per cycle may differInstructions per second – but work per instr. may differDhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec.

MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today.

So 750 MIPS = 750*1757 = 1 317 750 Dhrystones per secondSo, 750 MIPS = 750 1757 = 1,317,750 Dhrystones per secondSPEC: set of more realistic benchmarks, but oriented to desktopsEEMBC – EDN Embedded Benchmark Consortium,

4Tajana Simunic Rosing

EEMBC EDN Embedded Benchmark Consortium, www.eembc.org

Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications

Page 5: CSE 237ACSE 237A Design with Microprocessors

General P rpose ProcessorsGeneral Purpose ProcessorsProcessor Clock speed Periph. Bus Width MIPS Power Trans. Price

General Purpose ProcessorsGeneral Purpose ProcessorsIntel Xeon 65nm tech.

3.5GHz L1 16KB data L2 8MB, L3 16MB

64 ~900 150W ~1.3B ~$2000

IBM PowerPC 750X

550 MHz 2x32 K L1, 256K L2

32/64 ~1300 5W ~7M $900

MIPS 250 MHz 2x32 K 32/64 NA NA 3 6M NAMIPS R5000

250 MHz 2x32 K2 way set assoc.

32/64 NA NA 3.6M NA

StrongARM SA-110

233 MHz None 32 268 1W 2.1M NA

Microcontroller Intel 8051

12 MHz 4K ROM, 128 RAM, 32 I/O Ti UART

8 ~1 ~0.2 W ~10K $7 8051 32 I/O, Timer, UARTMotorola 68HC811

3 MHz 4K ROM, 192 RAM, 32 I/O, Timer, WDT, SPI

8 ~.5 ~0.1 W ~10K $5

Digital Signal Processors TI C5416 160 MHz 128K, SRAM, 3 T1 16/32 ~600 0.22 W NA $34, ,

Ports, DMA, 13 ADC, 9 DAC

TMS320 80 MHz 32KB on chip 32 80 0.01 W NA $8

S I t l M t l MIPS ARM TI d IBM W b it /D t h t E b dd d S t P i

5Tajana Simunic Rosing

Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming

Page 6: CSE 237ACSE 237A Design with Microprocessors

RISC s CISCRISC vs. CISCC l i t ti t t (CISC)Complex instruction set computer (CISC):

many addressing modes;many operations.

Reduced instruction set computer (RISC):load/store;pipelinable instructions.p p

6Tajana Simunic Rosing

Page 7: CSE 237ACSE 237A Design with Microprocessors

Parallelism in programsParallelism in programsParallelism e ists inParallelism exists in several levels of granularity:granularity:

Task.Data.

P1 P2

Instruction.Instruction dependency

Ld r1, r2Add r3,r4Sub r5 r6

p yData and resourceCheck at compile &/or

ti

P3Sub r5,r6

7Tajana Simunic Rosing

run time

Page 8: CSE 237ACSE 237A Design with Microprocessors

Parallelism e tractionParallelism extractionStatic D namicStatic:

Use compiler to analyze program.

Dynamic:Use hardware to identify opportunities.analyze program.

Simpler CPU control.Can make use of high-

identify opportunities.More complex CPU.Can make use of data

level language constructs.Can’t depend on data

values.

Can t depend on data values.

8Tajana Simunic Rosing

Page 9: CSE 237ACSE 237A Design with Microprocessors

S perscalarSuperscalarRISC 1 i t/ l

n

RISC - 1 inst/cycleSuperscalar – n inst/cycle Executionn

n2 HW for n-instr parallel Executionunitunitn

Register file

9Tajana Simunic Rosing n

Page 10: CSE 237ACSE 237A Design with Microprocessors

Simple VLIW architect reSimple VLIW architectureCompile time assignment of instr ctions to FUsCompile time assignment of instructions to FUsLarge register file feeds multiple function units.E box

Add r1,r2,r3; Sub r4,r5,r6; Ld r7,foo; St r8,baz; NOP

Register file

ALU ALU Load/store Load/store FU

10Tajana Simunic Rosing

Page 11: CSE 237ACSE 237A Design with Microprocessors

Cl t d VLIW hit tClustered VLIW architecture

Register file f nction nits di ided into cl stersRegister file, function units divided into clusters.

Execution Execution

Cluster bus

Execution

Register file

Execution

Register file

11Tajana Simunic Rosing

Page 12: CSE 237ACSE 237A Design with Microprocessors

T pes of CPUs sed in ESTypes of CPUs used in ESRISC CPUsRISC CPUs

ARM 7CISC CPUsCISC CPUs

TI C54xVLIW

TI C6xTriMedia

FPGA P bl CPUFPGA – Programmable CPUsVirtex II

Single purpose processors

12Tajana Simunic Rosing

Single purpose processors

Page 13: CSE 237ACSE 237A Design with Microprocessors

ARM7 designARM7 design

ARM assembly language - RISCyARM programming modelp g g

Audio players, pagers etc.; 130 MIPSARM memory organization.ARM data operations (32 bit)ARM flow of control.

13Tajana Simunic Rosing

Page 14: CSE 237ACSE 237A Design with Microprocessors

ARM programming modelARM programming model

r0r1

r8r9 31 0

r2r3r4

r10r11r12

CPSR

31 0

r4r5r6

r12r13r14

( )N Z C V

r7 r15 (PC)

14Tajana Simunic Rosing

Page 15: CSE 237ACSE 237A Design with Microprocessors

ARM stat s bitsARM status bitsE ith ti l i l hiftiEvery arithmetic, logical, or shifting operation sets CPSR bits:

N (negative), Z (zero), C (carry), V (overflow).Examples:

-1 + 1 = 0: NZCV = 0110.231-1+1 = -231: NZCV = 1001.

15Tajana Simunic Rosing

Page 16: CSE 237ACSE 237A Design with Microprocessors

ARM pipeline e ec tionARM pipeline execution

add r0,r1,#5fetch decode execute

sub r2,r3,r6 fetch decode execute

cmp r2,#3

ti

fetch decode execute

time1 2 3

16Tajana Simunic Rosing

Page 17: CSE 237ACSE 237A Design with Microprocessors

ARM data instr ctionsARM data instructionsADD ADC add ( AND ORR EORADD, ADC : add (w. carry)SUB SBC : subtract

AND, ORR, EORBIC : bit clearLSL LSR l i lSUB, SBC : subtract

(w. carry)MUL MLA : multiply

LSL, LSR : logical shift left/rightASL ASR ith tiMUL, MLA : multiply

(and accumulate)ASL, ASR : arithmetic shift left/rightROR : rotate rightROR : rotate rightRRX : rotate right extended with C

17Tajana Simunic Rosing

extended with C

Page 18: CSE 237ACSE 237A Design with Microprocessors

ARM flo of controlARM flow of controlAll ti b f dAll operations can be performed conditionally, testing CPSR:EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE

B h iBranch operation:B #100

Can be performed conditionally.

18Tajana Simunic Rosing

Page 19: CSE 237ACSE 237A Design with Microprocessors

ARM comparison instr ctionsARM comparison instructionsCMPCMP : compareCMN : negated compareTST : bit-wise ANDTEQ : bit-wise XORTEQ : bit wise XORThese instructions set only the NZCV bits of CPSRof CPSR.

19Tajana Simunic Rosing

Page 20: CSE 237ACSE 237A Design with Microprocessors

ARM l d/ t / i t tiARM load/store/move instructions

LDR LDRH LDRB : load (half word byte)LDR, LDRH, LDRB : load (half-word, byte)STR, STRH, STRB : store (half-word, byte)byte)Addressing modes:

i t i di t 0 [ 1]register indirect : LDR r0,[r1]with second register : LDR r0,[r1,-r2]with constant : LDR 0 [ 1 #4]with constant : LDR r0,[r1,#4]

MOV, MVN : move (negated)0 1 0 1

20Tajana Simunic Rosing

MOV r0, r1 ; sets r0 to r1

Page 21: CSE 237ACSE 237A Design with Microprocessors

Addressing modesAddressing modesBase plus offset addressing:Base-plus-offset addressing:LDR r0,[r1,#16]

Loads from location r1+16Loads from location r1+16Auto-indexing increments base register:LDR 0 [ 1 #16]!LDR r0,[r1,#16]!

Post-indexing fetches, then does offset:LDR 0 [ 1] #16LDR r0,[r1],#16

Loads r0 from r1, then adds 16 to r1.

21Tajana Simunic Rosing

Page 22: CSE 237ACSE 237A Design with Microprocessors

ARM s bro tine linkageARM subroutine linkageB h d li k i t tiBranch and link instruction:BL foo

Copies current PC to r14.To return from subroutine:MOV r15,r14

22Tajana Simunic Rosing

Page 23: CSE 237ACSE 237A Design with Microprocessors

ARM S mmarARM SummaryL d/ t hit tLoad/store architectureMost instructions are RISCy, operate in single cycle.

Some multi-register operations take longer.All instructions can be executed conditionally.y

23Tajana Simunic Rosing

Page 24: CSE 237ACSE 237A Design with Microprocessors

M ltimedia CPUsMultimedia CPUsMany registers, adders etc are wide (32/64 bit),y g , ( ),Multimedia data types are narrow

•e.g. 8 bit per color, 16 bit per audio sample per channel 2 8 values can be stored per register and added2-8 values can be stored per register and added

++4 additions per instruction; carry disabled at word

24Tajana Simunic Rosing

carry disabled at word boundaries.

Page 25: CSE 237ACSE 237A Design with Microprocessors

P ti MMXPentium MMX64-bit vectors representing 8 byte encoded, 4 word encoded or 2 double word encoded numbers.wrap around/saturating options.Multimedia registers mm0 - mm7,consistent with floating-point registers (OS unchanged).

Instruction Options Comments

Padd[b/w/d]PSub[b/w/d]

wrap around,saturating

addition/subtraction of bytes, words, double words

Pcmpeq[b/w/d] Result= "11 11" if true "00 00" otherwisePcmpeq[b/w/d]Pcmpgt[b/w/d]

Result= 11..11 if true, 00..00 otherwiseResult= "11..11" if true, "00..00" otherwise

Pmullw multiplication, 4*16 bits, least significant word

25Tajana Simunic Rosing

Pmulhw multiplication, 4*16 bits, most significant word

Page 26: CSE 237ACSE 237A Design with Microprocessors

P ti MMXPentium MMX

P [ /d] N f P ll l hif f d d bl dPsra[w/d]Psll[w/d/q]Psrl[w/d/q]

No. of positions in register or instruction

Parallel shift of words, double words or 64 bit quad words

instructionPunpckl[bw/wd/dq]Punpckh[bw/wd/dq]

Parallel unpackParallel unpack

Packss[wb/dw] saturating Parallel packPand, PandnPor Pxor

Logical operations on 64 bit wordsPor, PxorMov[d/q] Move instruction

26Tajana Simunic Rosing

Page 27: CSE 237ACSE 237A Design with Microprocessors

DSPsDSPsTI C5 DSPTI C5x DSP:

Basic features.C54x architecture and programming.C55x architecture and programming.C55x co-processor.C60x

27Tajana Simunic Rosing

Page 28: CSE 237ACSE 237A Design with Microprocessors

C5 familC5x familyFi d i t DSPFixed-point DSP.Modified Harvard architecture:

1 program memory bus.3 data memory busses.

40-bit ALU.Multiple implementations:Multiple implementations:

1, 2 instructions/cycle.

28Tajana Simunic Rosing

Page 29: CSE 237ACSE 237A Design with Microprocessors

TI C54 hit t l f tTI C54x architectural features

40 bit ALU + b l hift40-bit ALU + barrel shifter.Multiple internal busses: 1 instruction, 3 data, 4 address.17 x 17 multiplier.pSingle-cycle exponent encoder.Two address generators with dedicatedTwo address generators with dedicated registers.

29Tajana Simunic Rosing

Page 30: CSE 237ACSE 237A Design with Microprocessors

TI C54 i t ti t f tTI C54x instruction set features

S i li d i t ti f Vit biSpecialized instructions for Viterbi.Repeat and block repeat instructions.Instructions that read 2, 3 operands simultaneously.yConditional store.Fast return from interruptFast return from interrupt.

30Tajana Simunic Rosing

Page 31: CSE 237ACSE 237A Design with Microprocessors

C54 CPUC54x CPU40 bit ALU40-bit ALU.Two 40-bit accumulators.Barrel shifter.17 x 17 multiplier/adder.17 x 17 multiplier/adder.Compare/select/store (CSSU) unit.

31Tajana Simunic Rosing

Page 32: CSE 237ACSE 237A Design with Microprocessors

C54 hit t l l tC54x architectural elementsALU:ALU:

40-bit arithmetic, Boolean operations.Two 16-bit operations when status register 1 C16 bit is set.

Accumulators:Low-order (0-15), high-order (16-31), guard (32-39).

Barrel shifter:Input from accumulator or data memory.Output to ALUOutput to ALU.

Multiplier:17 x 17 multiply with 40-bit accumulate.

CSSU unit:CSSU unit:Compares high and low accumulator words.Accelerates Viterbi operations.

32Tajana Simunic Rosing

Page 33: CSE 237ACSE 237A Design with Microprocessors

C54x registersC54x registersStatus reigsters ST0, ST1:

Arithmetic, bit manipulation flags.Data page pointer a iliar register pointerData page pointer, auxiliary register pointer.Processor modes.

Auxiliary registers:Used to generate 16-bit data space addressesUsed to generate 16 bit data space addresses.

Temporary register:Used to hold one multiplicand or dynamic shift count.

Transition register:gUsed for Viterbi operations.

Stack pointer:Top of system stack.

Circular buffer size register.Block-repeat registers.I t t i t

33Tajana Simunic Rosing

Interrupt registers.Processor mode status register.

Page 34: CSE 237ACSE 237A Design with Microprocessors

C54 pipelineC54x pipelineProgram prefetch Send PC address on programProgram prefetch. Send PC address on program address bus.Fetch Load instruction from program bus to IRFetch. Load instruction from program bus to IR.Decode.A P t d dd bAccess. Put operand addresses on busses.Read. Get operands from busses.E tExecute.

34Tajana Simunic Rosing

Page 35: CSE 237ACSE 237A Design with Microprocessors

C54 po er do n modesC54x power down modesTh IDLE i t tiThree IDLE instructions:

IDLE1 shuts down CPU.IDLE2 shuts down CPU and on-chip peripherals.IDLE3 h d hi l l (i l diIDLE3 shuts down chip completely (including PLL).

35Tajana Simunic Rosing

Page 36: CSE 237ACSE 237A Design with Microprocessors

C54 b ssesC54x bussesPB: program read busPB: program read bus.CB, DB: data read busses.EB: data write busEB: data write bus.PAB, CAB, DAB, EAB: address busses.Can generate two data memory addressesCan generate two data memory addresses per cycle.

Stored in auxiliary register address units y gARAU0, ARAU1.

36Tajana Simunic Rosing

Page 37: CSE 237ACSE 237A Design with Microprocessors

Addressing ModesAddressing ModesOperand field

Addressingmode

Register-filecontents

Memorycontents

Immediate Data

DataRegister-direct

Registerindirect

Register address

Register address Memory address Data

Direct

Indirect

Memory address

Memory address

Data

Memory addressIndirect Memory address Memory address

Data

37Tajana Simunic Rosing

Page 38: CSE 237ACSE 237A Design with Microprocessors

C dd i dCommon addressing modesARn (*): indirect through auxiliaryARn ( ): indirect through auxiliary registers.DP (@): direct addressing offset from DPDP (@): direct addressing offset from DP register.K23 (#): absolute addressing using label.( ) g gBit addressing (BIT instruction): modify a single bit of a memory location or MMR

i tregister.

38Tajana Simunic Rosing

Page 39: CSE 237ACSE 237A Design with Microprocessors

C54 instr ctionsC54x instructionsABDST: absolute DELAY: memory delayABDST: absolute distanceADD

DELAY: memory delayDSUB: double subtractEXP: accumulator

ADDC: add w. carryADDM: add immediate to

EXP: accumulator exponentLMS: least mean square

memADDS: add w/o sign extension

MAC: multiply accumulateMACA: multiply byextension

DADD: double addMACA: multiply by MACA, add to MACB

39Tajana Simunic Rosing

Page 40: CSE 237ACSE 237A Design with Microprocessors

C54 instr ctions cont’dC54x instructions, cont’d.MACP: multiply by POLY: evaluateMACP: multiply by program memory, then accumulateMAS lti l b T

POLY: evaluate polynomialRND: round

l tMAS: multiply by T, then subtractMAX, MIN

accumulatorSAT: saturate accumulatorMAX, MIN

MPY: multiplyNEG: negate

accumulatorSQUR: squareSUB: subtract

NORM: normalize

40Tajana Simunic Rosing

Page 41: CSE 237ACSE 237A Design with Microprocessors

C54 instr ctions cont’dC54x instructions, cont’d.AND SFTA: shift accumulatorANDBIT: test bitBITF: test bit shown by immediate

SFTA: shift accumulator arithmeticallyXORMVDD: move within dataimmediate

CMPL: complement accumulatorOR

MVDD: move within data memoryMVDP: move data to program memoryOR

ROL: rotate accumulator left

program memoryREADA: read data addressed by ACCAWRITA: write dataWRITA: write data addressed by ACCA

A d

41Tajana Simunic Rosing

And many more….

Page 42: CSE 237ACSE 237A Design with Microprocessors

C55 pipelineC55x pipelineTTwo segments:

Fetch.Execute. fetch execute

4 7-8

42Tajana Simunic Rosing

Page 43: CSE 237ACSE 237A Design with Microprocessors

C55 fetch segmentC55x fetch segmentPrefetch 1Prefetch 1:

Send address to memory.Prefetch 2:Prefetch 2:

Wait for response.Fetch:Fetch:

Get instruction from memory and put in IBQ.Predecode:Predecode:

Identify where instructions begin and end; identif parallel instructions.

43Tajana Simunic Rosing

Page 44: CSE 237ACSE 237A Design with Microprocessors

C55 e ec te segmentC55x execute segmentDecode:Decode:

Decode an instruction pair or single instruction.Address:

Perform address calculations.Access 1/2:

Send address to memory; wait.Read:

R d d t f E l t diti i tRead data from memory. Evaluate condition registers.Execute:

Read/modify registers. Set conditions.W/W+:W/W+:

Write data to MMR-addressed registers, memory; finish.

44Tajana Simunic Rosing

Page 45: CSE 237ACSE 237A Design with Microprocessors

C55 organi ationC55x organization3 data read busses 16D busC, D bussesB bus

3 data read address bussesprogram address bus

16

24

24

Instruction Programflow Address Data

programread bus 32

unit flowunit unit unit

InstructionfetchData readfrom memorySingle operandreadDual operandreadDual-multiplycoefficientWrites

2 data write busses2 data write address busses

16

24

from memoryread

45Tajana Simunic Rosing

Page 46: CSE 237ACSE 237A Design with Microprocessors

I / id h d t iImage/video hardware extensions

A il bl i 5509 d 5510Available in 5509 and 5510.Equivalent C-callable functions for other d idevices.

Available extensions:DCT/IDCT.Pixel interpolationMotion estimation.

46Tajana Simunic Rosing

Page 47: CSE 237ACSE 237A Design with Microprocessors

DCT/IDCTDCT/IDCT2 D DCT/IDCT i2-D DCT/IDCT is computed from t 1 D

blocktwo 1-D DCT/IDCT.

Column DCTPut data in different banks to maximize

Column DCT

Rmaximize throughput. interim DCTRow

DCT

47Tajana Simunic Rosing

Page 48: CSE 237ACSE 237A Design with Microprocessors

C55 motion estimationC55 motion estimationSearch strategy:Search strategy:

Full vs. non-full.Accuracy:

Full-pixel vs. half-pixel.Number of returned motion vectors:

1 (one 16x16) vs 4 (four 8x8)1 (one 16x16) vs. 4 (four 8x8).Algorithms:

3-step algorithm (distance 4,2,1).4-step algorithm (distance 8,4,2,1).4-step with half-pixel refinement.

48Tajana Simunic Rosing

Page 49: CSE 237ACSE 237A Design with Microprocessors

T pes of CPUs sed in ESTypes of CPUs used in ESRISC CPUsRISC CPUs

ARM 7CISC CPUsCISC CPUs

TI C54xVLIWVLIW

TI C6xT iM diTriMedia

FPGA – Programmable CPUsVi t II

49Tajana Simunic Rosing

Virtex II

Page 50: CSE 237ACSE 237A Design with Microprocessors

VLIW TI C62/C67VLIW: TI C62/C67Up to 8 instructions/cycleUp to 8 instructions/cycle.32 32-bit registers.F ti itFunction units:

Two multipliers.Si ALUSix ALUs.

Data operations:8/16/32-bit arithmetic.40-bit operations.Bit i l ti ti

50Tajana Simunic Rosing

Bit manipulation operations.

Page 51: CSE 237ACSE 237A Design with Microprocessors

Partitioned register filesPartitioned register files• Many memory ports are required to supply enough

operands per cycleoperands per cycle.• Memories with many ports are expensive.

Registers are partitioned into sets, e.g. for TI C60x:

Data path A Data path B

register file A register file B

L1 S1 M1 D1 D2 M2 S2 L2 Address bus

51Tajana Simunic RosingData bus

Page 52: CSE 237ACSE 237A Design with Microprocessors

C6 data pathsC6x data pathsG l i t fil (A d BGeneral-purpose register files (A and B, 16 words each).Eight function units:

.L1, .L2, .S1, .S2, .M1, .M2, .D1, .D2Two load units (LD1, LD2).Two store units (ST1 ST2)Two store units (ST1, ST2).Two register file cross paths (1X and 2X).T d t dd th (DA1 d DA2)

52Tajana Simunic Rosing

Two data address paths (DA1 and DA2).

Page 53: CSE 237ACSE 237A Design with Microprocessors

C6 f nction nitsC6x function unitsL.L

32/40-bit arithmetic.Leftmost 1 counting.Logical ops.

S.S32-bit arithmetic.32/40-bit shift and 32-bit field.BranchesBranches.Constants.

.M16 x 16 multiply.

.D32-bit add, subtract, circular address.Load, store with 5/15-bit constant offset.

53Tajana Simunic Rosing

Page 54: CSE 237ACSE 237A Design with Microprocessors

C6 s stemC6x systemO hi RAMOn-chip RAM.32-bit external memory: SDRAM, SRAM, etc.Host port.pMultiple serial ports.Multichannel DMAMultichannel DMA.32-bit timer.

54Tajana Simunic Rosing

Page 55: CSE 237ACSE 237A Design with Microprocessors

VLIW Trimedia TM 1VLIW: Trimedia TM-1

register fileregister file

d/ it bread/write crossbar

FU1 FU27...FU1 FU27

slot 1 slot 2 slot 3 slot 4 slot 5

...

Tajana Simunic Rosing

slot 1 slot 2 slot 3 slot 4 slot 5

Page 56: CSE 237ACSE 237A Design with Microprocessors

TM 1 characteristicsTM-1 characteristics

CharacteristicsFloating point supportSub-word parallelism supportIf ConversionIf ConversionAdditional custom operations

56Tajana Simunic Rosing

Page 57: CSE 237ACSE 237A Design with Microprocessors

Philips TriMediaTriMedia-ProcessorFor multimedia-For multimedia-applications, up to 5 instructions/cycle.

57Tajana Simunic Rosing

Page 58: CSE 237ACSE 237A Design with Microprocessors

DSPsDSPsG t f lti diGreat for multimediaCISC

MMXTI C54x, C55x

VLIWTI C6xTI C6xTriMedia

58Tajana Simunic Rosing

Page 59: CSE 237ACSE 237A Design with Microprocessors

VIRTEX II FPGAsVIRTEX II FPGAs

59Tajana Simunic Rosing

Page 60: CSE 237ACSE 237A Design with Microprocessors

Config rable Logic Block (CLB)Configurable Logic Block (CLB)

60Tajana Simunic Rosing

Page 61: CSE 237ACSE 237A Design with Microprocessors

2 carry paths perCLB (Vertex II Pro)

[© d Xili I Vi t II P ™ Pl tf FPGA F ti l

61Tajana Simunic Rosing

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Page 62: CSE 237ACSE 237A Design with Microprocessors

Virte II SliceVirtex II SliceLook-up tables LUT F and G can be used to compute any Boolean function of ≤ 4 variables

a b c d G0 0 0 0 0

Example:

compute any Boolean function of ≤ 4 variables. 0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 00 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 1

62Tajana Simunic Rosing

1 1 1 0 11 1 1 1 0

Page 63: CSE 237ACSE 237A Design with Microprocessors

Virtex II Pro Devices includeup to 4 PowerPC processor coresprocessor cores

© f

63Tajana Simunic Rosing

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Page 64: CSE 237ACSE 237A Design with Microprocessors

Number of resources in Virtex IINumber of resources in Virtex II

64Tajana Simunic Rosing

[© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

Page 65: CSE 237ACSE 237A Design with Microprocessors

Single p rpose processorsSingle-purpose processorsPerforms specific computation taskPerforms specific computation taskCustom single-purpose processors

Designed for a unique taskDesigned for a unique taskStandard single-purpose processors

“Off-the-shelf” -- pre-designed for a commonOff-the-shelf -- pre-designed for a common taska.k.a., peripherals, p pserial transmissionanalog/digital conversions

65Tajana Simunic Rosing

g g

Page 66: CSE 237ACSE 237A Design with Microprocessors

Timers co ntersTimers, countersTimer: measures time intervals by counting clock pulsesTimer: measures time intervals by counting clock pulses

To generate timed output events e.g., hold light for 10 sTo measure input events e.g., measure a car’s speed

Watchdog timerWatchdog timerReset timer every X time units, else it generates a signalUses: detect failure, self-reset, timeout on an ATM machine

Counter: counts pulses on a general input signalCounter: counts pulses on a general input signalE.g. count cars passing over by a sensor

Timer/counter

16-bit up counterClk Cnt

Basic timer

Top

1616-bit up counter

Clk16

Cnt_in

2x1 mux

Top

Cnt

66Tajana Simunic Rosing

Top

ResetMode

Reset

Page 67: CSE 237ACSE 237A Design with Microprocessors

P lse idth mod latorPulse width modulatorGenerates pulses with specific high/low times

clk

pwm_o

25% duty cycle – average pwm_o is 1.25V

specific high/low timesDuty cycle: % time high

Square wave: 50% duty cycle

clk

pwm_o

cycleCommon use: control average voltage to an electric device

50% duty cycle – average pwm_o is 2.5V.

pwm o

electric deviceSimpler than DC-DC converter or digital-analog converter

clk

pwm_o

75% duty cycle – average pwm_o is 3.75V.

DC motor speed, dimmer lights

Another use: encode

67Tajana Simunic Rosing

commands, receiver uses timer to decode

Page 68: CSE 237ACSE 237A Design with Microprocessors

LCD controllerLCD controllerE communications void WriteChar(char c){

R/WRS

DB7–DB0

LCD

bus

microcontroller8

RS = 1; /* indicate data being sent */DATA_BUS = c; /* send data to LCD */EnableLCD(45); /* toggle the LCD with appropriate delay */

}C

controller

CODES

I/D = 1 cursor moves left DL = 1 8-bitRS R/W DB7 DB6 DB5 DB4 DB3 DB2 DB1 DB0 Description

I/D 1 cursor moves left DL 1 8 bit

I/D = 0 cursor moves right DL = 0 4-bit

S = 1 with display shift N = 1 2 rows

S/C =1 display shift N = 0 1 row

S/C = 0 cursor movement F = 1 5x10 dots

R/L = 1 shift to right F = 0 5x7 dots

0 0 0 0 0 0 0 0 0 1 Clears all display, return cursor home

0 0 0 0 0 0 0 0 1 * Returns cursor home

0 0 0 0 0 0 0 1 I/D S Sets cursor move direction and/orspecifies not to shift display

0 0 0 0 0 0 1 D C B ON/OFF of all display(D), cursorR/L 1 shift to right F 0 5x7 dots

R/L = 0 shift to left0 0 0 0 0 0 1 D C B p y( )

ON/OFF (C), and blink position (B)

0 0 0 0 0 1 S/C R/L * * Move cursor and shifts display

0 0 0 0 1 DL N F * * Sets interface data length, number ofdisplay lines, and character font

1 0 WRITE DATA Writes Data

68Tajana Simunic Rosing

Page 69: CSE 237ACSE 237A Design with Microprocessors

Ke pad controllerKeypad controller

N1N2N3N4

k_pressed

M1M2M3M4 4

key_code

keypad controller

key_code

keypad controller

N=4, M=4

69Tajana Simunic Rosing

Page 70: CSE 237ACSE 237A Design with Microprocessors

S mmarSummaryRISC CPUsRISC CPUs

ARM 7CISC CPUsCISC CPUs

TI C54xVLIW

TI C6xTriMedia

FPGA P bl CPUFPGA – Programmable CPUsVirtex II

Single purpose processors

70Tajana Simunic Rosing

Single purpose processors

Page 71: CSE 237ACSE 237A Design with Microprocessors

Sources and References

Peter Marwedel, “Embedded Systems D i ” 2004Design,” 2004.Frank Vahid, Tony Givargis, “Embedded System Design,” Wiley, 2002. Wayne Wolf, “Computers as y , pComponents,” Morgan Kaufmann, 2001.

71Tajana Simunic Rosing