Download - DSP SHARK Processors PART2
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 1
Analog Devices SHARC
CS 433Processor Presentation Series
Prof. Luddy Harrison
A property of MVG_OMALLOOR
PD
F processed w
ith CuteP
DF
evaluation editionw
ww
.CuteP
DF
.com
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 2
Note on this presentation series
These slide presentations were prepared by students of CS433 at the University of Illinois at Urbana-ChampaignAll the drawings and figures in these slides were drawn by the students. Some drawings are based on figures in the manufacturer’s documentation for the processor, but none are electronic copies of such drawingsYou are free to use these slides provided that you leave the credits and copyright notices intact
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 3
Overview
Processor HistoryPhysical packagingData paths, register files, computational unitsPipelining, timing informationMemoryInstruction Set Architecture (ISA)Applications targetedSystems employing the SHARC
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 4
SHARC Features
Super Harvard ARChitectureUnique CISC architecture allows simultaneous fetch of two operands and an instruction in one cycle
Combines high performance DSP core with integrated, on-chip system features
Dual-ported (processor and I/O) SRAM DMA Controller
Selective Instruction CacheCache only those instructions whose fetches conflict with program memory data accesses
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 5
SHARC Processor History
ADSP-2106x (2000)Single computational units based on predecessor ADSP-2100 Family40 MHz core
ADSP-2116x (2001)SIMD (Single-Issue Multiple-Data) dual computational unit architecture added150-200 MHz core, 1-2 MB RAM
ADSP-2126x, ADSP-2136x (2003 – Future)Integrated audio-centric peripherals (128-140db Sample Rate Conversion) added333-400 MHz core, 2-3 MB RAM
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 6
ADSP-2106x OverviewCORE PROCESSOR DUAL-PORTED SRAM
TIMER INSTRUCTION CACHE
PROGRAMSEQUENCER
DAG2DAG1
BUSCONNECT
(PX)
DATAREGISTER
FILE
16x40-BITMULTIPLIER BARRELSHIFTER
ALU
PM DATA BUS
DM DATA BUS
PM ADDRESS BUS
DM ADDRESS BUS
TWO INDEPENDENTDUAL-PORTED BLOCKS
PROCESSOR PORT I/O PORT BLCO
K 0
BLCO
K 1
EXTERNALPORT
ADDR BUSMUX
MULTIPROCINTERFACE
DATA BUSMUX
HOST PORT
IOPREGISTERS
CONTROL,STATUS &
DATA BUFFERS
DMACONTROLLER
SERIAL PORTS (2)
LINK PORTS (6)
I/OPROCESSOR
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 7
ADSP-2106x Core
Computational UnitsALU, Multiplier, and Shifter can all perform independent operations in a single cycle
Register FileTwo sets (primary and alternate) of 16 registers, each 40-bits wide
Program Sequencer and Data Address Generators
Allows computational units to operate independent of instruction fetch and program counter increment
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 8
ADSP-2106x PackagingADSP-2106x
CLKINEBOOTLBOOT
IRQFLAGTIMEXP
LxCLKLxACKLxDAT
TCLK0RCLK0TFS0RFS0DT0DR0
TCLK1RCLK1TFS1RFS1DT1DR1
LINK DEVICES
SERIALDEVICE
SERIALDEVICE
1x CLOCKBMS
ADDR31-0
DATA47-0
RDWRACK
MS3-0PAGESBTS
SWADRCLK
DMAR1-2DMAG1-2
CSHBRHBG
REDY
BR1-6CPA
RPBAID2-0
CON
TRO
L
ADD
RES
S
DAT
A
CSADDR BOOT EPROMDATA
ADDRDATAOE MEMORY &WE PERIPHERALSACKCS
DMA DEVICE
DATA
HOST PROCESSORINTERFACE
ADDR
DATA
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 9
ADSP-2106x Key Pins
PIN FUNCTION NOTE
ADDR31-0 External Bus Address
DATA47-0 External Bus Data
Memory Select Lines
PAGE DRAM Page Boundary Asserted if a page boundary is crossed
DMAR(1-2) DMA Request 1 and 2
IRQ2-0 Interrupt Request Lines Edge-triggered or level-sensitive
MS3-0Asserted (low) as chip selects memory bank
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 10
ADSP-2106x Registers
Data RegistersR15 – R0 (fixed-point), F15 – F0 (floating-point)
Program SequencerPC (program counter), PCSTKP (PC stack pointer), FADDR (fetch address), etc.
Data Address GeneratorI7 – I0 (DAG1 index), M7 – M0 (DAG1 modify)L7 – L0 (DAG1 length), B7 – B0 (DAG1 base)
Bus Exchange, Timer, and System Registers
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 11
ADSP-2106x Buses
AddressProgram Memory Address – 24 bits wideData Memory Address – 32 bits wide
DataProgram Memory Data – 48 bits wide
Stores instructions and data for dual-fetchesData Memory Data – 40 bits wide
Stores data operandsOne PM Data bus and/or one DM Data bus register file access per cycle
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 12
ADSP-2106x I/O
Serial PortsOperate at clock rate of processor
DMAPort data can be automatically transferred to and from on-chip memory
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 13
ADSP-2106x DMA
I/O port block transfers (link/serial)External memory block transfers DMA Channel setup by writing memory buffer parameters to DMA parameter registers
Starting Address for BufferAddress ModifierWord Count
Interrupt generated when transfer completes (i.e. Word Count = 0)
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 14
ADSP-2106x DMA Registers31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
FSEXT. PORT FIFO
FLSHFLUSH EXT. PORT FIFO
EXTERNEXT. DEVICES TO EXT. MEM.
INTIOSINGLE-WORD INTERRUPTS
HSHAKEDMA HANDSHAKE
MASTERDMA MASTER MODE
MSWFMOST SIGNIFICANT WORD FIRST
DENDMA ENABLE
CHENDMA CHAINING ENABLE
TRANDMA CHANNEL DIRECTION
PSPACKING STATUS
DTYPEDATA TYPE
PMODEPACKING MODE
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 15
ADSP-2106x Pipelining
Three phasesFetch
Read from cache or program memoryDecode
Generate conditions for instructionExecute
Operations specified by instruction completed
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 16
ADSP-2106x Branching and Pipelining
BranchesDelayed
Two instructions after branch are executedNon-delayed
Program sequencer suppresses instruction execution for next two instructions
CLOCK CYCLES
Fetch n + 2 j j + 1 j + 2
Decode n + 1 n + 2 j j + 1
Execute n no-op n + 1 no-op n + 2 j
Non-delayed Delayed
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 17
ADSP-2106x Memory
On-Chip SRAM ADSP-21060 ADSP-21062 ADSP-21061
Total Size 500KB 250KB 125KB
On-chip support for:48-bit instructions (and 40-bit extended precision floating-point data)32-bit floating-point data16-bit short word data
Off-chip memory up to 4 GB
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 18
ADSP-2106x Memory (2)
IOP REGISTERS
RESERVED ADDRESSSPACE
BLOCK 0
BLOCK 1
These represent the samephysical memory
BLOCK 0
BLOCK 1
0x0004 0000
0x0006 0000
0x0007 FFFF
0x0000 0000
0x0000 0100
0x0001 FFFF
0x0002 0000
0x0003 0000
0x0003 FFFF
NORMALWORD
ADDRESSING128K x 32-bit words80K x 40-bit words
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 19
ADSP-2106x Memory (3)
Memory divided into blocksDual-ported (PM and DM bus share one port, I/O bus uses the other)
Allows independent access by processor core and I/O processorEach block can be accessed by both in every cycle
Typical DSP applications (digital filters, FFTs, etc.) access two operands at once, such as a filter coefficient and a data sample, so allowing single-cycle execution is a must
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 20
ADSP-2106x Shadow Write
Due to need for high-speed operations, memory writes to a two-deep FIFOOn write, data in FIFO from previous write is loaded to memory and new data enters FIFOReads of last two written locations are intercepted and re-routed to the FIFO
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 21
ADSP-2106x Instruction Cache
Sequencer checks instruction cache on every program memory data accessAllows PM bus to be used for data fetches instead of being tied up with an instruction fetchWhen fetch conflict first occurs, instruction is cached to prevent the same delay from happening again
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 22
ADSP-2106x Instruction Cache (2)
SET 0 ENTRY 0
ENTRY 1
LRU BIT VALID INSTRUCTIONS ADDRESSES (BITS 23-4) ADDRESSES (BITS 3-0)
0000
0001
1110
1111
SET 1 ENTRY 0
ENTRY 1
SET 14 ENTRY 0
ENTRY 1
SET 15 ENTRY 0
ENTRY 1
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 23
ADSP-2106x ISA Overview
24 operations, although some have more than one syntactical formInstruction Types
Compute and MoveCompute operation in parallel with data moves or index register modify
Program Flow ControlBranch, Call, Return, Loop
Immediate Data MoveOperand or addressing immediate fields
MiscellaneousBit Modify and Test, No-op, etc.
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 24
ADSP-2106x ISACompute and Move
Instructions follow the formatIF condition op1, op2;
IF and condition are optionalop1 is an optional compute instructionop2 is an optional data move instruction
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 25
ADSP-2106x ISACompute Examples
Single functionF6 = (F2 + F3);
Multi-functionDistinct parallel operations supportedParallel computations and data transfersR1 = R2 * R6, M4 = R0;
Simultaneous multiplier and ALU operationsR1 = R2 * R6, F6 = F2 + F3;
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 26
ADSP-2106x ISASingle function Compute
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 CU OPCODE RN RX RY
CU specifies00 – ALU01 – Multiplier02 – Shifter
OPCODE indicates operation type (add, subtract, etc.)RN specifies result registerRX and RY specify operand registers
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 27
ADSP-2106x ISAMulti-function Compute
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 OPCODE RM RA RXM RYM RXA RYA
Parallel ALU and Multiplier operations
Registers restricted to particular setsMultiplier X: R3 – R0, Y: R7 – R4ALU X: R11 – R8, Y: R15 – R12
OPCODE specifies ALU op, for example:000100: Rm = R3-0 * R7-4, Ra = R11-8 + R15-12;
011111: Rm = R3-0 * R7-4, Ra = MIN(R11-8, R15-12);
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 28
ADSP-2106x ISAProgram Flow Control
Instructions follow the formatIF condition JUMP/CALL, ELSE op2;
IF, condition, ELSE are optionalJUMP/CALL is a JUMP or CALL instruction op2 is an optional compute instruction
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 29
ADSP-2106x ISAProgram Flow Control (2)
Instructions follow the formatDO <addr24> UNTIL termination;
No optional fields<addr24> is the loop start addresstermination is the loop ending condition to check after each iteration
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 30
ADSP-2106x ISAProgram Flow Examples
Conditional ExecutionIF GT R1 = R2 * R6;IF NE JUMP label2;
Also used for Call/Returnmain: CALL routine;
routine: ...RTS; /*return to main*/
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 31
ADSP-2106x ISAImmediate Data Move
Instructions follow the formatureg = <data32>;DM(<data32>, Ia) = ureg;PM(<data24>, Ia) = ureg;
Ia is an optional indirect addressorDM is a 32-bit data memory addressPM is a 24-bit program memory address
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 32
ADSP-2106x ISAAddressing Examples
DirectJUMP <data24>;
Relative to Program CounterJUMP (PC, <data24>);
Register Indirect (using DAG registers)Pre-Modify (modification pre-address calculation)JUMP (M0, I0);
Post-Modify (modification post-address calculation)JUMP (I0, M0);
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 33
ADSP-2116x OverviewExtension of 2106x, adding 150Mhz core and SIMD (Single-Issue Multiple-Data) support via dual hardware
DIFFERENT DATA GOES TO EACH ELEMENT
PM DATA BUS
DM DATA BUSBUS
CONNECT
MULT
DATAREGISTER
FILE BARRELSHIFTER
ALU
MULT
DATAREGISTER
FILEBARRELSHIFTER
PROGRAMSEQUENCER
SAME INSTRUCTION GOES TO BOTH ELEMENTS
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 34
ADSP-2116x SIMD Engine
Dual hardware allows same instruction to be executed across different data
2 ALUs, multipliers, shifters, register filesTwo data values transferred with each memory or register file accessVery effective for stereo channel processing
Can effectively double performance over similar algorithms running on ADSP-2106x processors
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 35
ADSP-2116x SIMD Engine (2)
Enabled/disabled via MODE1 bitWhen disabled, processor simply acts in SISD mode
Program sequencer must be aware of status flags set by each set of hardware elementsConditional compute operations can be specified on both, either, or neither hardware setConditional branches and loops executed by AND’ing the condition tests on both hardware sets
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 36
ADSP-2116x SIMD Engine (3)Instruction Mode Transfer 1 Transfer 2Rx = Ry; SISD Rx loaded from Ry n/a
SIMD Rx loaded from Ry Sx loaded from Sy
Sx = Sy; SISD Sx loaded from Sy n/a
SIMD Sx loaded from Ry Rx loaded from Sy
SIMD Sx loaded from Sy Rx loaded from Ry
Rx = Sy; SISD Rx loaded from Sy n/a
SIMD Rx loaded from Sy Sx loaded from Ry
Sx = Ry; SISD Sx loaded from Ry n/a
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 37
ADSP-2126x Overview
Direct extension of 2116x, instructions are fully backward compatibleCore increased to 150-200 MHz w/ 1MB SRAMData buses increased from 32 to 64 bitsSynchronous, independent serial ports increased from 2 to 6ROM-based security
Prevents piracy of code and algorithmsPrevents peripheral devices from reading on-chip memory
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 38
ADSP-2136x OverviewCORE PROCESSOR
DAG1 DAG2 PROGRAMSEQUENCER
TIMER INSTRUCTIONCACHE
PROCESSINGELEMENT
(PEX)
PROCESSINGELEMENT
(PEY)
PX REGISTER
PM ADDRESS BUS
DM ADDRESS BUSPM DATA BUS
DM DATA BUS
4 BLOCKS ON-CHIP MEMORYBLOCK 0 BLOCK 1 BLOCK 2 BLOCK 3
SRAM1M BIT
SRAM1M BIT SRAM
0.5M BITSRAM
0.5M BITROM
2M BITROM
2M BIT
IOP REGISTERS
I/O PROCESSOR AND PERIPHEALS
SIGNALROUTING
UNIT
SPISPORTS
IDPPOG
TIMERSSRC
SPDIF
ADDR DATA ADDR DATA ADDR DATA ADDR DATA
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 39
ADSP-2136x Overview (2)
Direct extension of 2126x, instructions are fully backward compatible On-chip memory expanded from 2 to 4 blocksDigital Audio Interface (DAI) set of audio peripherals
Interrupt controller, interface data port, signal routing unit, clock generators, and timersDifferent units contain S/PDIF receiver/transmitter, sample rate converters, or DTCP encrypting engine
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 40
SHARC BenchmarksAlgorithm benchmarks supplied by manufacturer:
2106x 2116x 2126x 2136xClock Cycle 66 MHz 100 MHz 200 MHz 333 MHz
Instruction Cycle Time
15 ns 10 ns 6.67 ns 3 ns
MFLOPS Sustained
132 MFLOPS 400 MFLOPS 600 MFLOPS 1332 MFLOPS
MFLOPS Peak 198 MFLOPS 600 MFLOPS 900 MFLOPS 1998 MFLOPS
FIR Filter (per tap) 15 ns 5 ns 2.5 ns 1.5 ns
IIR Filter (per biquad)
61 ns 20 ns 10 ns 6 ns
Divide (y/x) 91 ns 30 ns 20 ns 9 ns
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 41
Applications Targeted
SHARC designed toSimplify DevelopmentSpeed time to MarketReduce Product Costs
Product targetedA/V Receivers
7.1 Surround Sound DecodingMixing ConsolesDigital SynthesizersAutomobiles
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 42
Systems Employing the SHARC
SRS Circle Surround IIMelody (w/ Auto Room Tuner)Metric Halo's Portable Pro Audio HubAlacron FT-P5
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 43
SHARC in SRS Circle Surround II
Full multi-channel surround sound from simple right/left stereo soundEncoding can be transmitted over standard stereo medium (broadcast television, radio, etc.) and maintains full backward compatibility
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 44
SHARC in SRS Circle Surround II (2)Output from each source is combined in constant phase filter banks and encoded in quadrature to prevent signal cancellation“Positional bias generator” analyzes ratios between left and right surround signals which multipliers then apply to the opposing left or right outputDecoder uses this level imbalance to direct the surround information to the correct output
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 45
SHARC Melody
“Optimized Surround Sound for the Mass Market”Core of high-fidelity audio decoders in Denon, Bose, and Kenwood productsAuto Room Tuner (ART) integrated software simplifies setup of complex audio systems
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 46
SHARC Melody ART
Automatically measures and corrects multi-channel sound system for room’s acousticsCorrects system deficienciesFor each speaker, ART calculates:
Sound pressure level (SPL) Distance of each speaker from listenerFrequency response
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 47
SHARC in Metric Halo's Portable Pro Audio Hub
Portable FireWire-based recording device, used in live recordings applications by motion pictures and major recording artists like “No Doubt” and “Dave Mathews Band”Serial ports used to interface to digital and mixed-signal peripheral devices Initially implemented on SHARC ADSP-2106x, later upgraded to ADSP-2126xFuture hybrid implementation will use a ADSP-2106x for FireWire processing coupled with a ADSP-2126x for audio processing
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 48
SHARC in Alacron FT-P5COTS (Commercial Off-The-Shelf) system for use in “distributed, compute intensive, high data rate applications” in commercial and military industriesSupports 1 to 96 ADSP-2106x processorsMakes extensive use of SHARC’s DMA through external PMC interface, supporting full-duplex communication in excess of 1 GB/sec
In-cabinet SAN clustersCompute nodes in distributed systems
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 49
SHARC vs. RISC Processors
RISC is...Less costly to design, test, and manufacture, since processors are less specialized
But...Parallel (stereo) computation requires two or more interconnected processors accessing shared memoryLess performance
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 50
Conclusion
SHARC offers great deal of computational power, with on-chip SRAM and SIMD architectureVariety of applications (especially audio processing) exploit it
A property of MVG_OMALLOOR
CS 433 Prof. Luddy Harrison Copyright 2005 University of Illinois 51
Citations
Processor details taken from product manuals and descriptions at http://www.analog.com
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 1
Other ISAs
• Next, we discuss some alternative instruction set designs.– Different ways of specifying memory addresses– Different numbers and types of operands in ALU instructions– A couple of advanced instruction sets
• VLIW (Very Long Instruction Word)– Texas Instruments C64– Analog Devices TigerSHARC
• ARM and Thumb
December 8, 2003 Other ISA's 2
Addressing modes
• The first instruction set design issue we’ll see are addressing modes, which let you specify memory addresses in various ways.– Each mode has its own assembly language notation.– Different modes may be useful in different situations.– The location that is actually used is called the effective address.
• The addressing modes that are available will depend on the datapath.– Our simple datapath only supports two forms of addressing.– Older processors like the 8086 have zillions of addressing modes.
• We’ll introduce some of the more common ones.
December 8, 2003 Other ISA's 3
Immediate addressing
• One of the simplest modes is immediate addressing, where the operand itself is accessed.
LD R1, #1999 R1 ← 1999
• This mode is a good way to specify initial values for registers.• We’ve already used immediate addressing several times.
– It appears in the string conversion program you just saw.
December 8, 2003 Other ISA's 4
Direct addressing
• Another possible mode is direct addressing, where the operand is a constant that represents a memory address.
LD R1, 500 R1 ← M[500]
• Here the effective address is 500, the same as the operand.• This is useful for working with pointers.
– You can think of the constant as a pointer.– The register gets loaded with the data at that address.
December 8, 2003 Other ISA's 5
Register indirect addressing
• We already saw register indirect mode, where the operand is a register that contains a memory address.
LD R1, (R0) R1 ← M[R0]
• The effective address would be the value in R0.• This is also useful for working with pointers. In the example above,
– R0 is a pointer, and R1 is loaded with the data at that address.– This is similar to R1 = *R0 in C or C++.
• So what’s the difference between direct mode and this one?– In direct mode, the address is a constant that is hard-coded into
the program and cannot be changed.– Here the contents of R0, and hence the address being accessed,
can easily be changed.
December 8, 2003 Other ISA's 6
• Register indirect mode makes it easy to access contiguous locations in memory, such as elements of an array.
• If R0 is the address of the first element in an array, we can easily access the second element too:
LD R1, (R0) // R1 contains the first elementADD R0, R0, #1LD R2, (R0) // R2 contains the second element
• This is so common that some instruction sets can automatically increment the register for you:
LD R1, (R0)+ // R1 contains the first elementLD R2, (R0)+ // R2 contains the second element
• Such instructions can be used within loops to access an entire array.
Stepping through arrays
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 7
Indexed addressing
• Operands with indexed addressing include a constant and a register.
LD R1, 500(R0) R1 ← M[R0 + 500]
• The effective address is the register data plus the constant. For instance, if R0 = 25, the effective address here would be 525.
• We can use this addressing mode to access arrays also.– The constant is the array address, while the register contains an
index into the array.– The example instruction above might be used to load the 25th
element of an array that starts at memory location 500.• It’s possible to use negative constants too, which would let you index
arrays backwards.
December 8, 2003 Other ISA's 8
PC-relative addressing
• We’ve seen PC-relative addressing already. The operand is a constant that is added to the program counter to produce the effective memory address.
200: LD R1, $30 R1 ← M[201 + 30]
• The PC usually points to the address of the next instruction, so the effective address here is 231 (assuming the LD instruction itself uses one word of memory).
• This is similar to indexed addressing, except the PC is used instead of a regular register.
• Relative addressing is often used in jump and branch instructions.– For instance, JMP $30 lets you skip the next 30 instructions.– A negative constant lets you jump backwards, which is common in
writing loops.
December 8, 2003 Other ISA's 9
Indirect addressing
• The most complicated mode that we’ll look at is indirect addressing.
LD R1, [360] R1 ← M[M[360]]
• The operand is a constant that specifies a memory location whichrefers to another location, whose contents are then accessed.
• The effective address here is M[360].• Indirect addressing is useful for working with multi-level pointers, or
“handles.”– The constant represents a pointer to a pointer.– In C, we might write something like R1 = **ptr.
December 8, 2003 Other ISA's 10
Addressing mode summary
Mode Notation Register transfer equivalentImmediate LD R1, #CONST R1 ← CONST
Direct LD R1, CONST R1 ← M[CONST]Register indirect LD R1, (R0) R1 ← M[R0]
Indexed LD R1, CONST(R0) R1 ← M[R0 + CONST]
Relative LD R1, $CONST R1 ← M[PC + CONST]
Indirect LD R1, [CONST] R1 ← M[M[CONST]]
December 8, 2003 Other ISA's 11
Number of operands
• Another way to classify instruction sets is according to the number of operands that each data manipulation instruction can have.
• Our example instruction set had three-address instructions, because each one had up to three operands—two sources and one destination.
• This provides the most flexibility, but it’s also possible to have fewer than three operands.
ADD R0, R1, R2
operation
destination sources
operands
R0 ← R1 + R2
Register transfer instruction:
December 8, 2003 Other ISA's 12
Two-address instructions
• In a two-address instruction, the first operand serves as both the destination and one of the source registers.
• Some other examples and the corresponding C code:
ADD R3, #1 R3 ← R3 + 1 R3++;MUL R1, #5 R1 ← R1 * 5 R1 *= 5;NOT R1 R1 ← R1’ R1 = ~R1;
ADD R0, R1
operation
destinationand source 1
source 2
operands
R0 ← R0 + R1
Register transfer instruction:
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 13
• Some computers, like this old Apple II, have one-address instructions.• The CPU has a special register called an accumulator, which implicitly
serves as the destination and one of the sources.
• Here is an example sequence which increments M[R0]:
LD (R0) ACC ← M[R0]ADD #1 ACC ← ACC + 1ST (R0) M[R0] ← ACC
One-address instructions
ADD R0
operation source
ACC ← ACC + R0
Register transfer instruction:
December 8, 2003 Other ISA's 14
The ultimate: zero addresses
• If the destination and sources are all implicit, then you don’t have to specify any operands at all!
• This is possible with processors that use a stack architecture. – HP calculators and their “reverse Polish notation” use a stack.– The Java Virtual Machine is also stack-based.
• How can you do calculations with a stack?– Operands are pushed onto a stack. The most recently pushed
element is at the “top” of the stack (TOS).– Operations use the topmost stack elements as their operands.
Those values are then replaced with the operation’s result.
December 8, 2003 Other ISA's 15
Stack architecture example
• From left to right, here are three stack instructions, and what the stack looks like after each example instruction is executed.
• This sequence of stack operations corresponds to one register transfer instruction:
TOS ← R1 + R2
R1… stuff 1 …… stuff 2 …
R2R1
… stuff 1 …… stuff 2 …
R1 + R2… stuff 1 …… stuff 2 …
(Top)
(Bottom)
PUSH R1 PUSH R2 ADD
December 8, 2003 Other ISA's 16
Data movement instructions
• Finally, the types of operands allowed in data manipulation instructions is another way of characterizing instruction sets.– So far, we’ve assumed that ALU operations can have only register
and constant operands.– Many real instruction sets allow memory-based operands as well.
• We’ll use the book’s example and illustrate how the following operation can be translated into some different assembly languages.
X = (A + B)(C + D)
• Assume that A, B, C, D and X are really memory addresses.
December 8, 2003 Other ISA's 17
Register-to-register architectures
• Our programs so far assume a register-to-register, or load/store, architecture, which matches our datapath from last week nicely.– Operands in data manipulation instructions must be registers.– Other instructions are needed to move data between memory and
the register file.• With a register-to-register, three-address instruction set, we might
translate X = (A + B)(C + D) into:
LD R1, A R1 ← M[A] // Use direct addressingLD R2, B R2 ← M[B]ADD R3, R1, R2 R3 ← R1 + R2 // R3 = M[A] + M[B]
LD R1, C R1 ← M[C]LD R2, D R2 ← M[D]ADD R1, R1, R2 R1 ← R1 + R2 // R1 = M[C] + M[D]
MUL R1, R1, R3 R1 ← R1 * R3 // R1 has the resultST X, R1 M[X] ← R1 // Store that into M[X]
December 8, 2003 Other ISA's 18
Memory-to-memory architectures
• In memory-to-memory architectures, all data manipulation instructions use memory addresses as operands.
• With a memory-to-memory, three-address instruction set, we might translate X = (A + B)(C + D) into simply:
• How about with a two-address instruction set?
ADD X, A, B M[X] ← M[A] + M[B]ADD T, C, D M[T] ← M[C] + M[D] // T is temporary storageMUL X, X, T M[X] ← M[X] * M[T]
MOVE X, A M[X] ← M[A] // Copy M[A] to M[X] firstADD X, B M[X] ← M[X] + M[B] // Add M[B]MOVE T, C M[T] ← M[C] // Copy M[C] to M[T]ADD T, D M[T] ← M[T] + M[D] // Add M[D]MUL X, T M[X] ← M[X] * M[T] // Multiply
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 19
Register-to-memory architectures
• Finally, register-to-memory architectures let the data manipulation instructions access both registers and memory.
• With two-address instructions, we might do the following:
LD R1, A R1 ← M[A] // Load M[A] into R1 firstADD R1, B R1 ← R1 + M[B] // Add M[B]LD R2, C R2 ← M[C] // Load M[C] into R2ADD R2, D R2 ← R2 + M[D] // Add M[D]MUL R1, R2 R1 ← R1 * R2 // MultiplyST X, R1 M[X] ← R1 // Store
December 8, 2003 Other ISA's 20
Size and speed
• There are lots of tradeoffs in deciding how many and what kind of operands and addressing modes to support in a processor.
• These decisions can affect the size of machine language programs.– Memory addresses are long compared to register file addresses, so
instructions with memory-based operands are typically longer than those with register operands.
– Permitting more operands also leads to longer instructions.• There is also an impact on the speed of the program.
– Memory accesses are much slower than register accesses.– Longer programs require more memory accesses, just for loading
the instructions!
• Most newer processors use register-to-register designs.– Reading from registers is faster than reading from RAM.– Using register operands also leads to shorter instructions.
December 8, 2003 21Other ISA's
Texas Instruments C64VLIW signal processor
December 8, 2003 Other ISA's 22
Program fetchProgram fetch
Instruction dispatchInstruction dispatch
Instruction decodeInstruction decode
TMS320C64x CPU
TI C64: ArchitectureTI C64: Architecture
Program cache/program memoryProgram cache/program memory3232--bit addressesbit addresses
Register file A Register file B
.L1 .S1 .M1 .D1 .D2 .M2 .S2 .L2
256-bit data
Data cache/data memoryData cache/data memory3232--bit addressbit address
88--, 16, 16--, 32, 32--, 64, 64-- bit databit data
Functional units:6 ALUs(L1, L2, S1, S2, D1, D2)2 multiplers (M1, M2)
December 8, 2003 Other ISA's 23
TMS320C64x Data PathsTMS320C64x Data Paths
The data path of C64x has the following components:
Two load-from-memory data paths;
Two store-to-memory data paths;
Two data address paths;Two register file data
cross paths;
Data path A
Data path B
Register file A
(A0-A31)
Register file B
(B0-B31)
.L1
.S1
.M1
.D1
.L2
.S2
.M2
.D2
LD1bLD1a
LD1aLD1b
DA1
DA2
ST1bST1a
ST2aST2b
December 8, 2003 Other ISA's 24
TI C64: Functional Units (Structure)TI C64: Functional Units (Structure)
.S1
.M1.D1
.L1
src1
src2
dstlong dstlong src
src1src2
dst
long src
long dstdst
src1
src2
long dstdst
src1
src2
Each functional unit has its own 32-bit write port into a GPR. Each functional unit reads directly from its own data path;
All units ending in 1 write to register file A, and all units ending in 2 write to register file B;
Each functional unit has two 32-bit read ports for source operands src1 and src2;
L and S units have an extra 8-bit-wide port for 40-bit long writes, as well as an 8-bit input for 40-bit long reads;
Each C64x multiplier can return up to a 64-bit result;
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 25
TI C64: .L (.L1 and .L2) Unit Operations Performed• 32/40-bit arithmetic and compare operations• 32-bit logical operations• Leftmost 1 or 0 counting for 32 bits• Normalization count for 32 and 40 bits• Byte shifts• Data packing/unpacking• 5-bit constant generation• Vector Operations:
– Dual 16-bit arithmetic operations– Quad 8-bit arithmetic operations– Dual 16-bit min/max operations– Quad 8-bit min/max operations
December 8, 2003 Other ISA's 26
.S (.S1 and .S2) Unit Operations Performed
• 32-bit arithmetic operations• 32/40-bit shifts and 32-bit bit-field operations• 32-bit logical operations• Branches• Constant generation• Register transfers to/from control register file (.S2 only)• Byte shifts• Data packing/unpacking• Vector Operations
– Dual 16-bit compare operations– Quad 8-bit compare operations– Dual 16-bit shift operations– Dual 16-bit saturated arithmetic operations– Quad 8-bit saturated arithmetic operations
December 8, 2003 Other ISA's 27
.M (.M1 and .M2) Unit Operations Performed
• 16 x 16 multiply operations• 16 x 32 multiply operations• Vector Operations
– Quad 8 x 8 multiply operations– Dual 16 x 16 multiply operations– Dual 16 x 16 multiply with add/subtract operations– Quad 8 x 8 multiply with add operation
• Bit expansion• Bit interleaving/de-interleaving• Variable shift operations• Rotation• Galois Field Multiply
December 8, 2003 Other ISA's 28
.D (.D1 and .D2) Unit Operations Performed
• 32-bit add, subtract, linear and circular address calculation (for circular arrays)• Loads and stores with 5-bit constant offset• Loads and stores with 15-bit constant offset (.D2 only)• Load and store double words with 5-bit constant• Load and store non-aligned words and double words• 5-bit constant generation• 32-bit logical operations
December 8, 2003 Other ISA's 29
Instruction to Functional Unit Mapping
.D UnitADD STB (15-bit offset)‡ADDAB STH (15-bit offset)‡ADDAH STW (15-bit offset)‡ADDAW SUBLDB SUBABLDBU SUBAHLDH SUBAWLDHU ZEROLDWLDB (15-bit offset)‡LDBU (15-bit offset)‡LDH (15-bit offset)‡LDHU (15-bit offset)‡LDW (15-bit offset)‡MVSTBSTHSTW
.S UnitADD SETADDK SHLADD2 SHRAND SHRUB disp SSHLB IRP† SUBB NRP† SUBUB reg SUB2CLR XOREXT ZEROEXTUMVMVC†MVKMVKHMVKLHNEGNOTOR
.M UnitMPYMPYUMPYUSMPYSUMPYHMPYHUMPYHUSMPYHSUMPYHLMPYHLUMPYHULSMPYHSLUMPYLHMPYLHUMPYLUHSMPYLSHUSMPYSMPYHLSMPYLHSMPYH
.L UnitABSADDADDUANDCMPEQCMPGTCMPGTUCMPLTCMPLTULMBDMVNEGNORMNOTORSADDSATSSUBSUBSUBUSUBCXORZERO
December 8, 2003 Other ISA's 30
Instruction Packets• Instructions are always fetched 8 (256-bits) at a time. This is called a
fetch packet• If the p-bit of instruction i is set, then instruction i and i+1 are
executed in the same cycle in parallel. • 1 to 8 instructions can be executed in parallel. This is called an execute
packet• In the C62x, packets could not cross the 8-word boundary, and thus
the 8th p-bit was always 0 and padding with NOPs was needed. The C64x did away with that restriction, and execute packets may now span multiple fetch packets.
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 31
Fetch Packet Example
G H4
E F3
D2
A B C1
InstructionsCycle/Execute Packet
December 8, 2003 Other ISA's 32
C64x Opcode MapC64x Opcode Map
Operations on the .L unit:
1 1 0 s p
1 02345
opxsrc1/cstsrc2dstcreg z
11121318 1723 2231 29 28 27
Operations on the .M unit:
0 0 0 s p
1 02345
opxsrc1/cstsrc2dstcreg z
11121318 1723 2231 29 28 27 6
00
7
Operations on the .M unit:
0 0 0 s p
1 02345
opsrc1/cstsrc2dstcreg z
121318 1723 2231 29 28 27 6
01
7
December 8, 2003 Other ISA's 33
C64x Opcode MapC64x Opcode Map
Load/store with 15-bit offset on the .D unit :
1 1 s p
1 02346
ucst15dst/srccreg z
23 2231 29 28 27 7
ld/st
8
y
Load/store with baseR + offset/cst on the .D unit :
1 1 s p
1 02346
modeoffset/usct5baseRdst/srccreg z
121318 1723 2231 29 28 27 7
ld/st
8
y
9
r
Operations on the .S unit:
0 0 0 s p
1 02345
opxsrc1/cstsrc2dstcreg z
11121318 1723 2231 29 28 27 6
1
ADDK on the .S unit:
1 0 0 s p
1 02345
cstdstcreg z
23 2231 29 28 27 6
01
7
December 8, 2003 34Other ISA's
Analog Device TigerSHARCVLIW Vector Signal Processor
December 8, 2003 Other ISA's 35
ADI TigerSHARC: Core Block Diagram
December 8, 2003 Other ISA's 36
ADI TigerSHARC: Computation Block Block Diagram
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 37
Register Data Formats
December 8, 2003 Other ISA's 38
Instruction Line Organization
December 8, 2003 Other ISA's 39
Instruction Encoding
December 8, 2003 Other ISA's 40
Compute Block
December 8, 2003 Other ISA's 41
IALU
December 8, 2003 Other ISA's 42
Load and Store
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 43
Sequencer
December 8, 2003 44Other ISA's
ARM and ThumbLow Power General Purpose Microprocssors
December 8, 2003 Other ISA's 45
ARM Family Overview• Architecture Versions
– ARM V3, V4, V5, V6– Called “architecture” in their literature, this is the programmer’s
view of the machine• The externally visible architecture• It is primarily a matter of Instruction Set Architecture
• Implementations– ARM7, ARM9, ARM10, ARM11
• With letter extensions – to be explained shortly– Called “cores” in their literature
December 8, 2003 Other ISA's 46
ARM Evolution
28 Jan 2005 Copyright ARM Ltd. 2002
ARM11 MicroArchitecture
December 8, 2003 Other ISA's 48
A property of MVG_OMALLOOR
December 8, 2003 Other ISA's 49 December 8, 2003 Other ISA's 50
ARMv5T
(ARM)
December 8, 2003 Other ISA's 51
ARMv5T
(Thumb)
December 8, 2003 Other ISA's 52
Summary
• Instruction sets can be classified along several lines.– Addressing modes let instructions access memory in various ways.– Data manipulation instructions can have from 0 to 3 operands.– Those operands may be registers, memory addresses, or both.
• Instruction set design is intimately tied to processor datapath design.
• VLIW and compact, low-power instruction sets represents endpoints on a continuum– The VLIW uses enormous instruction fetch bandwidth to keep lots
of functional units busy– Thumb mode attempts to pack irregular control code into as few
bits as possible to save instruction fetch bandwidth (power)
A property of MVG_OMALLOOR