powerpc 601 - nxp semiconductorscache.freescale.com/files/32bit/doc/user_guide/mpc601um.pdfxlii...

MPR601UM-01 MPC601UM/AD

PowerPC

™

601

RISC Microprocessor User's Manual

About This Book

xli

About This Book

The primary objective of this user’s manual is to define the functionality of the PowerPC™601 microprocessor for use by software and hardware developers. The 601 processor is thefirst in the family of PowerPC microprocessors, and can provide a reliable foundation fordeveloping products compatible with subsequent processors in the PowerPC family. The601 provides a bridge between the POWER architecture and the PowerPC architecture, andas a result differs from the PowerPC architecture in some respects. Therefore, a secondaryobjective of this manual is to describe these differences.

The PowerPC architecture is comprised of the following components:

• PowerPC user instruction set architecture—This includes the base user-level instruction set (excluding a few user-level cache-control instructions), user-level registers, programming model, data types, and addressing modes.

• PowerPC virtual environment architecture—This describes the semantics of the memory model that can be assumed by software processes and includes descriptions of the cache model, cache-control instructions, address aliasing, and other related issues. Implementations that conform to the PowerPC virtual environment architecture also adhere to the PowerPC user instruction set architecture, but may not necessarily adhere to the PowerPC operating environment architecture.

• PowerPC operating environment architecture—This includes the structure of the memory management model, supervisor-level registers, and the exception model. Implementations that conform to the PowerPC operating environment architecture also adhere to the PowerPC user instruction set architecture and the PowerPC virtual environment architecture.

It is beyond the scope of the manual to provide a thorough description of the PowerPCarchitecture. It must be kept in mind that each PowerPC processor is a unique PowerPCimplementation.

For readers of this manual who are concerned about compatibility issues regardingsubsequent PowerPC processors, it is critical to read Chapter 1, “Overview,” and inparticular Appendix H, “Implementation Summary for Programmers,” which outlines in avery general manner the components of the PowerPC architecture, and indicates where andhow the 601 diverges from the PowerPC definition. Instances where the 601 differs fromthe PowerPC architecture are noted throughout the manual.

xlii

PowerPC 601 RISC Microprocessor User's Manual

Audience

This manual is intended for system software and hardware developers and applicationsprogrammers who want to develop products for the 601 microprocessor and PowerPCprocessors in general. It is assumed that the reader understands operating systems,microprocessor system design, and the basic principles of RISC processing.

Organization

Following is a summary and a brief description of the major sections of this manual:

• Chapter 1, “Overview,” is useful for readers who want a general understanding of the features and functions of the PowerPC architecture and the 601 processor. This chapter also provides a general description of how the 601 differs from the PowerPC architecture.

• Chapter 2, “Registers and Data Types,” is useful for software engineers who need to understand the PowerPC programming model and the functionality of the registers implemented in the 601. This chapter also describes PowerPC conventions for storing data in memory.

• Chapter 3, “Addressing Modes and Instruction Set Summary,” provides an overview of the PowerPC addressing modes and a description of the instructions implemented by the 601, including the portion of the PowerPC instruction set and the additional instructions implemented by the 601.

Specific differences between the 601 implementation and the PowerPC implementation of individual instructions are noted.

• Chapter 4, “Cache and Memory Unit Operation,” provides a discussion of cache timing, look-up process, MESI protocol, and interaction with other units. This chapter contains information that pertains both to the PowerPC virtual environment architecture and to the specific implementation in the 601.

• Chapter 5, “Exceptions,” describes the exception model defined in the PowerPC operating environment architecture and the specific exception model implemented in the 601.

• Chapter 6, “Memory Management Unit,” provides descriptions of the MMU, interaction with other units, and address translation. Although this chapter does not provide an in-depth description of both the 64-bit and 32-bit memory management model defined by the PowerPC operating environment architecture, it does note differences between the defined 32-bit PowerPC definition and the 601 memory management implementation.

• Chapter 7, “Instruction Timing,” provides information about latencies, interlocks, special situations, and various conditions to help make programming more efficient. This chapter is of special interest to software engineers and system designers. Because each PowerPC implementation is unique with respect to instruction timing, this chapter primarily contains information specific to the 601.

About This Book

xliii

• Chapter 8, “Signal Descriptions,” provides descriptions of individual signals of the 601.

• Chapter 9, “System Interface Operation,” describes signal timings for various operations. It also provides information for interfacing to the 601.

• Chapter 10, “Instruction Set,” functions as a handbook of the PowerPC instruction set. It provides opcodes, sorted by mnemonic, as well as a more detailed description of each instruction. Instruction descriptions indicate whether an instruction is part of the PowerPC architecture or if it is specific to the 601. Each description indicates any differences in how the 601 implementation differs from the PowerPC definition. The descriptions also indicate the privilege level of each instruction and which execution unit or units executes the instruction.

• Appendix A, “Instruction Set Listings,” lists the superset of PowerPC and 601 processor instructions.

• Appendix B, “POWER Architecture Cross Reference,” describes the relationship between the 601 and the POWER architecture.

• Appendix C, “PowerPC Instructions Not Implemented,” describes the set of PowerPC instructions not implemented in the 601 processor.

• Appendix D, “Classes of Instructions,” describes how instructions are classified from the perspective of the PowerPC architecture.

• Appendix E, “Multiple-Precision Shifts,” describes how multiple-precision shift operations can be programmed.

• Appendix F, “Floating-Point Models,” gives examples of how the floating-point conversion instructions can be used to perform various conversions.

• Appendix G, “Synchronization Programming Examples,” gives examples showing how synchronization instructions can be used to emulate various synchronization primitives and how to provide more complex forms of synchronization.

• Appendix H, “Implementation Summary for Programmers,” is a compilation of the differences between the 601 processor and the PowerPC architecture.

• Appendix I, “Instruction Timing Examples,” shows instruction timings for code sequences, emphasizing situations where stalls may be encountered and showing methods of avoiding stalls where possible.

• This manual also includes a glossary and an index.

In this document, the terms “PowerPC 601 microprocessor” and “601” are used to denotethe first microprocessor from the PowerPC architecture family. The PowerPC 601microprocessors are available from IBM as PPC601 and from Motorola as MPC601.

xliv


Additional Reading

Following is a list of additional reading that provides background for the information inthis manual:

• John L. Hennessy and David A. Patterson,

Computer Architecture: A Quantitative Approach

, Morgan Kaufmann Publishers, Inc., San Mateo, CA

•

PowerPC 601 RISC Microprocessor Hardware Specifications

, MPC601EC/D (Motorola order number) and MPR601HSU-01 (IBM order number)

•

PowerPC 601 RISC Microprocessor Technical Summary

, MPC601/D (Motorola order number) and MPR601TSU-01 (IBM order number)

•

PowerPC Architecture

, published by International Business Machines Corporation, 52G7487 (order number)

Conventions

This document uses the following notational conventions:

ACTIVE_HIGH Names for signals that are active high are shown in uppercase text without an overbar.

ACTIVE_LOW A bar over a signal name indicates that the signal is active low—for example, ARTRY (address retry) and TS (transfer start). Active-low signals are referred to as asserted (active) when they are low and negated when they are high. Signals that are not active-low, such as AP0–AP3 (address bus parity signals) and TT0–TT4 (transfer type signals) are referred to as asserted when they are high and negated when they are low.

mnemonics

Instruction mnemonics are shown in lowercase bold.

italics

Italics indicate variable command parameters, for example,

bcctr

x

x'0F' Hexadecimal numbers

b'0011' Binary numbers

r

A|0 The contents of a specified GPR or the value 0.

REG[FIELD] Abbreviations or acronyms for registers are shown in uppercase text. Specific bit fields or ranges are shown in brackets.

x In certain contexts, such as a signal encoding, this indicates a don’t care. For example, if TT0–TT3 are binary encoded b'x001', the state of TT0 is a don’t care.

Acronyms and Abbreviations

Table i contains acronyms and abbreviations that are used in this document.

About This Book

xlv

Table i. Acronyms and Abbreviated Terms

Term Meaning

ALU Arithmetic logic unit

ATE Automatic test equipment

ASR Address space register

BAT Block address translation

BIST Built-in self test

BPU Branch processing unit

BUC Bus unit controller

BUID Bus unit ID

CAR Cache address register

CMOS Complementary metal-oxide semiconductor

COP Common on-chip processor

CR Condition register

CRTRY Cache retry queue

CTR Count register

DABR Data address breakpoint register

DAE Data access exception

DAR Data address register

DBAT Data BAT

DEC Decrementer register

DSISR DAE/source instruction service register

EA Effective address

EAR External access register

ECC Error checking and correction

FPECR Floating-point exception cause register

FPR Floating-point register

FPSCR Floating-point status and control register

FPU Floating-point unit

GPR General-purpose register

IABR Instruction address breakpoint register

IBAT Instruction BAT

IEEE Institute for Electrical and Electronics Engineers

IQ Instruction queue

xlvi


ITLB Instruction translation lookaside buffer

IU Integer unit

L2 Secondary cache

LIFO Last-in-first-out

LR Link register

LRU Least recently used

LSB Least-significant byte

lsb Least-significant bit

MESI Modified/exclusive/shared/invalid—cache coherency protocol

MMU Memory management unit

MQ MQ register

MSB Most-significant byte

msb Most-significant bit

MSR Machine state register

NaN Not a number

No-Op No operation

PID Processor identification tag

PIR Processor identification register

POWER Performance Optimized with Enhanced RISC architecture

PR Privilege-level bit

PTE Page table entry

PTEG Page table entry group

PVR Processor version register

RAW Read-after-write

RISC Reduced instruction set computer

RTC Real-time clock

RTCL Real-time clock lower register

RTCU Real-time clock upper register

RTL Register transfer language

RWITM Read with intent to modify

SDR1 Table search description register 1

SLB Segment lookaside buffer

Table i. Acronyms and Abbreviated Terms (Continued)

Term Meaning

About This Book

xlvii

Terminology Conventions

Table ii describes terminology conventions used in this manual.

SPR Special-purpose register

SPRG

n

General SPR

SR Segment register

SRR0 Machine status save/restore register 0

SRR1 Machine status save/restore register 1

TAP Test access port

TB Time base register

TLB Translation lookaside buffer

TTL Transistor-to-transistor logic

UTLB Unified translation lookaside buffer

UUT Unit under test

WAR Write-after-read

WAW Write-after-write

WIM Write-through/cache-inhibited/memory-coherency enforced bits

XATC Extended address transfer code

XER Integer exception register

Table ii. Terminology Conventions

IBM This Manual

Data storage interrupt (DSI) Data access exception (DAE)

Direct store segment I/O controller interface segment

Effective address Effective or logical address (logical is used in the context of address translation)

Effective segment ID (ESID) (64-bit implementations—not on the 601)

Logical segment ID (LSID) (64-bit implementations—not on the 601)

Extended mnemonics Simplified mnemonics

Extended Opcode Secondary opcode

Fixed-point unit (FXU) Integer unit (IU)

Instruction storage interrupt (ISI) Instruction access exception (IAE)

Interrupt Exception

Problem mode (or problem state) User-level privilege

Table i. Acronyms and Abbreviated Terms (Continued)

Term Meaning

xlviii


Table iii describes register and bit naming conventions used in this manual.

Programmable I/O (PIO) I/O controller interface operation

Real address Physical address

Real mode address translation Direct address translation

Relocation Translation

Special direct store segment Memory-forced I/O controller interface segment

Storage (noun) Memory (noun)

Storage (verb) Access (verb)

Store in Write back

Store through Write through

Table iii. Register and Bit Name Convention

IBM This Manual

Problem mode bit (MSR[PR]) Privilege level bit (MSR[PR])

Instruction relocate bit (MSR[IR]) Instruction address translation bit (MSR[IT])

Data relocate bit (MSR[DR]) Data address translation bit (MSR[DT])

Interrupt prefix bit (MSR[IP]) Exception prefix bit (MSR[EP])

Recoverable interrupt bit (MSR[RI])(not on the 601)

Recoverable exception bit (MSR[RE])(not on the 601)

Problem state protection key (SR[Kp]) User-state protection key (SR[Ku])

DSISR DSISR acronym redefined as “DAE/Source Instruction Service Register”

SDR1 SDR1 acronym redefined as “Table Search Description Register 1”

Block effective page index (BATx[BEPI]) Block logical page index (BATx[BLPI])

Block real page number (BATx[BRPN]) Physical block number (BATx[PBN])

Block length (BATx[BL]) Block size mask (BATx[BSM])

Real page number (PTE[RPN]) Physical page number (PTE[PPN])

Table ii. Terminology Conventions (Continued)

IBM This Manual

Chapter 1. Overview

1-1

Chapter 1 Overview

1010

This chapter provides an overview of PowerPC™ 601 microprocessor features, includinga block diagram showing the major functional components. It also provides an overview ofthe PowerPC architecture, and information about how the 601 implementation differs fromthe architectural definitions.

1.1 PowerPC 601 Microprocessor Overview

This section describes the features of the 601, provides a block diagram showing the majorfunctional units, and gives an overview of how the 601 operates.

The 601 is the first implementation of the PowerPC family of reduced instruction setcomputer (RISC) microprocessors. The 601 implements the 32-bit portion of the PowerPCarchitecture, which provides 32-bit effective (logical) addresses, integer data types of 8, 16,and 32 bits, and floating-point data types of 32 and 64 bits. For 64-bit PowerPCimplementations, the PowerPC architecture provides 64-bit integer data types, 64-bitaddressing, and other features required to complete the 64-bit architecture.

The 601 is a superscalar processor capable of issuing and retiring three instructions perclock, one to each of three execution units. Instructions can complete out of order forincreased performance; however, the 601 makes execution appear sequential.

The 601 integrates three execution units—an integer unit (IU), a branch processing unit(BPU), and a floating-point unit (FPU). The ability to execute three instructions in paralleland the use of simple instructions with rapid execution times yield high efficiency andthroughput for 601-based systems. Most integer instructions execute in one clock cycle.The FPU is pipelined so a single-precision multiply-add instruction can be issued everyclock cycle.

The 601 includes an on-chip, 32-Kbyte, eight-way set-associative, physically addressed,unified instruction and data cache and an on-chip memory management unit (MMU). TheMMU contains a 256-entry, two-way set-associative, unified translation lookaside buffer(UTLB) and provides support for demand paged virtual memory address translation andvariable-sized block translation. Both the UTLB and the cache use least recently used(LRU) replacement algorithms.

1-2


The 601 has a 64-bit data bus and a 32-bit address bus. The 601 interface protocol allowsmultiple masters to compete for system resources through a central external arbiter.Additionally, on-chip snooping logic maintains cache coherency in multiprocessorapplications. The 601 supports single-beat and burst data transfers for memory accesses; italso supports both memory-mapped I/O and I/O controller interface addressing.

The 601 uses an advanced, 3.6-V CMOS process technology and maintains full interfacecompatibility with TTL devices.

1.1.1 601 Features

This section describes details of the 601’s implementation of the PowerPC architecture.Major features of the 601 are as follows:

• High-performance, superscalar microprocessor

— As many as three instructions in execution per clock (one to each of the three execution units)

— Single clock cycle execution for most instructions

— Pipelined FPU for all single-precision and most double-precision operations

• Three independent execution units and two register files

— BPU featuring static branch prediction

— A 32-bit IU

— Fully IEEE 754-compliant FPU for both single- and double-precision operations

— Thirty-two GPRs for integer operands

— Thirty-two FPRs for single- or double-precision operands

• High instruction and data throughput

— Zero-cycle branch capability

— Programmable static branch prediction on unresolved conditional branches

— Instruction unit capable of fetching eight instructions per clock from the cache

— An eight-entry instruction queue that provides look-ahead capability

— Interlocked pipelines with feed-forwarding that control data dependencies in hardware

— Unified 32-Kbyte cache—eight-way set-associative, physically addressed; LRU replacement algorithm

— Cache write-back or write-through operation programmable on a per page or per block basis

— Memory unit with a two-element read queue and a three-element write queue

— Run-time reordering of loads and stores

— BPU that performs condition register (CR) look-ahead operations

Chapter 1. Overview

1-3

— Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte segment size

— A 256-entry, two-way set-associative UTLB

— Four-entry BAT array providing 128-Kbyte to 8-Mbyte blocks

— Four-entry, first-level ITLB

— Hardware table search (caused by UTLB misses) through hashed page tables

— 52-bit virtual address; 32-bit physical address

• Facilities for enhanced system performance

— Bus speed defined as selectable division of operating frequency

— A 64-bit split-transaction external data bus with burst transfers

— Support for address pipelining and limited out-of-order bus transactions

— Snooped copyback queues for cache block (sector) copyback operations

— Bus extensions for I/O controller interface operations

— Multiprocessing support features that include the following:

– Hardware enforced, four-state cache coherency protocol (MESI)

– Separate port into cache tags for bus snooping

• In-system testability and debugging features through boundary-scan capability

1.1.2 Block Diagram

Figure 1-1 provides a block diagram of the 601 that illustrates how the execution units—IU,FPU, and BPU—operate independently and in parallel.

The 601's 32-Kbyte, unified cache tag directory has a port dedicated to snooping bustransactions, preventing interference with processor access to the cache. The 601 alsoprovides address translation and protection facilities, including a UTLB and a BAT array,and a four-entry ITLB that contains the four most recently used instruction addresstranslations for fast access by the instruction unit.

Instruction fetching and issuing is handled in the instruction unit. Translation of addressesfor cache or external memory accesses are handled by the memory management unit. Bothunits are discussed in more detail in Sections 1.1.3, “Instruction Unit,” and 1.1.5, “MemoryManagement Unit (MMU).”

1-4


Figure 1-1. PowerPC 601 Microprocessor Block Diagram

(INSTRUCTION FETCH)

RTC

IU BPU FPU

MMU

SYSTEM INTERFACE

RTCU

RTCL

INSTRUCTIONQUEUE

+

+ * / +

INSTRUCTIONINSTRUCTION

8 WORDS

XERGPRFILE

CTRCRLR

2 WORDS1 WORDDATA

ADDRESS

UTLB

BATARRAY

ITLB TAGS32-KBYTE

CACHE(INSTRUC-

TION AND DA-

PHYSICAL ADDRESS

ADDRESS

DATA

SNOOPADDRESS

ADDRESS

DATA

64-BIT DATA BUS (2 WORDS)

32-BIT ADDRESS BUS (1 WORD)

MEMORY UNIT

SNOOPREAD

QUEUEWRITE QUEUE

+ * /

FPRFILE FPSCR

INSTRUCTION UNIT

ISSUE LOGIC

2 WORDS

4 WORDSDATA

8 WORDS

Chapter 1. Overview

1-5

1.1.3 Instruction Unit

As shown in Figure 1-1, the 601 instruction unit, which contains an instruction queue andthe BPU, provides centralized control of instruction flow to the execution units. Theinstruction unit determines the address of the next instruction to be fetched based oninformation from a sequential fetcher and the BPU. The IU also enforces pipeline interlocksand controls feed-forwarding.

The sequential fetcher contains a dedicated adder that computes the address of the nextsequential instruction based on the address of the last fetch and the number of wordsaccepted into the queue. The BPU searches the bottom half of the instruction queue for abranch instruction and uses static branch prediction on unresolved conditional branches toallow the instruction fetch unit to fetch instructions from a predicted target instructionstream while a conditional branch is evaluated. The BPU also folds out branch instructionsfor unconditional branches.

Instructions issued beyond a predicted branch do not complete execution until the branchis resolved, preserving the programming model of sequential execution. If any of theseinstructions are to be executed in the BPU, they are decoded but not issued. FPU and IUinstructions are issued and allowed to complete up to the register write-back stage.Write-back is performed when a correctly predicted branch is resolved, and instructionexecution continues without interruption along the predicted path.

If branch prediction is incorrect, the instruction fetcher flushes all predicted pathinstructions and instructions are issued from the correct path.

1.1.3.1 Instruction Queue

The instruction queue, shown in Figure 1-1, holds as many as eight instructions (a cacheblock) and can be filled from the cache during a single cycle. The instruction fetch canaccess only one cache sector at a time and will load as many instruction as space in the IQallows.

The upper half of the instruction queue (Q4–Q7) provides buffering to reduce the frequencyof cache accesses. Integer and branch instructions are dispatched to their respectiveexecution units from Q0 through Q3. Q0 functions as the initial decode stage for the IU.

For a more detailed overview of instruction dispatch, see Section 1.3.7, “601 InstructionTiming.”

1.1.4 Independent Execution Units

The PowerPC architecture’s support for independent floating-point, integer, and branchprocessing execution units allows implementation of processors with out-of-orderinstruction issue. For example, because branch instructions do not depend on GPRs orFPRs, branches can often be resolved early, eliminating stalls caused by taken branches.

The following sections describe the 601’s three execution units—the BPU, IU, and FPU.

1-6


1.1.4.1 Branch Processing Unit (BPU)

The BPU performs condition register (CR) look-ahead operations on conditional branches.The BPU looks through the bottom half of the instruction queue for a conditional branchinstruction and attempts to resolve it early, achieving the effect of a zero-cycle branch inmany cases.

The BPU uses a bit in the instruction encoding to predict the direction of the conditionalbranch. Therefore, when an unresolved conditional branch instruction is encountered, the601 fetches instructions from the predicted target stream until the conditional branch isresolved.

The BPU contains an adder to compute branch target addresses and three special-purpose,user-control registers—the link register (LR), the count register (CTR), and the CR. TheBPU calculates the return pointer for subroutine calls and saves it into the LR for certaintypes of branch instructions. The LR also contains the branch target address for the BranchConditional to Link Register (

bclr

x

) instruction. The CTR contains the branch targetaddress for the Branch Conditional to Count Register (

bcctr

x

) instruction. The contents ofthe LR and CTR can be copied to or from any GPR. Because the BPU uses dedicatedregisters rather than general-purpose or floating-point registers, execution of branchinstructions is largely independent from execution of integer and floating-pointinstructions.

1.1.4.2 Integer Unit (IU)

The IU executes all integer instructions and executes floating-point memory accesses inconcert with the FPU. The IU executes one integer instruction at a time, performingcomputations with its arithmetic logic unit (ALU), multiplier, divider, integer exceptionregister (XER), and the general-purpose register file. Most integer instructions aresingle-cycle instructions.

The IU interfaces with the cache and MMU for all instructions that access memory.Addresses are formed by adding the source 1 register operand specified by the instruction(or zero) to either a source 2 register operand or to a 16-bit, immediate value embedded inthe instruction.

Load and store instructions are issued and translated in program order; however, theaccesses can occur out of order. Synchronizing instructions are provided to enforce strictordering.

Load and store instructions are considered to have completed execution with respect toprecise exceptions after the address is translated. If the address for a load or storeinstruction hits in the UTLB or BAT array and it is aligned, the instruction execution (thatis, calculation of the address) takes one clock cycle, allowing back-to-back issue of loadand store instructions. The time required to perform the actual load or store operation variesdepending on whether the operation involves the cache, system memory, or an I/O device.

Chapter 1. Overview

1-7

1.1.4.3 Floating-Point Unit (FPU)

The FPU contains a single-precision multiply-add array, the floating-point status andcontrol register (FPSCR), and thirty-two 64-bit FPRs. The multiply-add array allows the601 to efficiently implement floating-point operations such as multiply, add, divide, andmultiply-add. The FPU is pipelined so that most single-precision instructions and manydouble-precision instructions can be issued back-to-back. The FPU contains two additionalinstruction queues. These queues allow floating-point instructions to be issued from theinstruction queue even if the FPU is busy, making instructions available for issue to theother execution units.

Like the BPU, the FPU can access instructions from the bottom half of the instruction queue(Q3–Q0), which permits floating-point instructions that do not depend on unexecutedinstructions to be issued early to the FPU.

The 601 supports all IEEE 754 floating-point data types (normalized, denormalized, NaN,zero, and infinity) in hardware, eliminating the latency incurred by software exceptionroutines.

1.1.5 Memory Management Unit (MMU)

The 601’s MMU supports up to 4 Petabytes (2

52

) of virtual memory and 4 Gigabytes (2

32

)of physical memory. The MMU also controls access privileges for these spaces on blockand page granularities. Referenced and changed status are maintained by the processor foreach page to assist implementation of a demand-paged virtual memory system.

The instruction unit generates all instruction addresses; these addresses are both forsequential instruction fetches and addresses that correspond to a change of program flow.The integer unit generates addresses for data accesses (both for memory and the I/Ocontroller interface).

After an address is generated, the upper order bits of the logical (effective) address aretranslated by the MMU into physical address bits. Simultaneously, the lower order addressbits (that are untranslated and therefore considered both logical and physical), are directedto the on-chip cache where they form the index into the eight-way set-associative tag array.After translating the address, the MMU passes the higher-order bits of the physical addressto the cache, and the cache lookup completes. For cache-inhibited accesses or accesses thatmiss in the cache, the untranslated lower order address bits are concatenated with thetranslated higher-order address bits; the resulting 32-bit physical address is then used by thememory unit and the system interface, which accesses external memory.

The MMU also directs the address translation and enforces the protection hierarchyprogrammed by the operating system in relation to the supervisor/user privilege level of theaccess and in relation to whether the access is a load or store.

For instruction accesses, the MMU first performs a lookup in the four entries of the ITLBfor both block- and page-based physical address translation. Instruction accesses that missin the ITLB and all data accesses cause a lookup in the UTLB and BAT array for the

1-8


physical address translation. In most cases, the physical address translation resides in oneof the TLBs and the physical address bits are readily available to the on-chip cache. In thecase where the physical address translation misses in the TLBs, the 601 automaticallyperforms a search of the translation tables in memory using the information in the tablesearch description register 1 (SDR1) and the corresponding segment register.

Memory management in the 601 is described in more detail in Section 1.3.6.2, “601Memory Management.”

1.1.6 Cache Unit

The PowerPC 601 microprocessor contains a 32-Kbyte, eight-way set associative, unified(instruction and data) cache. The cache line size is 64 bytes, divided into two eight-wordsectors, each of which can be snooped, loaded, cast-out, or invalidated independently. Thecache is designed to adhere to a write-back policy, but the 601 allows control ofcacheability, write policy, and memory coherency at the page and block level. The cacheuses a least recently used (LRU) replacement policy.

As shown in Figure 1-1, the cache provides an eight-word interface to the instructionfetcher and load/store unit. The surrounding logic selects, organizes, and forwards therequested information to the requesting unit. Write operations to the cache can beperformed on a byte basis, and a complete read-modify-write operation to the cache canoccur in each cycle.

The instruction unit provides the cache with the address of the next instruction to befetched. In the case of a cache hit, the cache returns the instruction and as many of theinstructions following it as can be placed in the eight-word instruction queue up to the cachesector boundary. If the queue is empty, as many as eight words (an entire sector) can beloaded into the queue in parallel.

The cache tag directory has one address port dedicated to instruction fetch and load/storeaccesses and one dedicated to snooping transactions on the system interface. Therefore,snooping does not require additional clock cycles unless a snoop hit that requires a cachestatus update occurs.

1.1.7 Memory Unit

The 601’s memory unit contains read and write queues that buffer operations between theexternal interface and the cache. These operations are comprised of operations resultingfrom load and store instructions that are cache misses and read and write operationsrequired to maintain cache coherency, table search, and other operations. The memory unitalso handles address-only operations and cache-inhibited loads and stores. As shown inFigure 1-2, the read queue contains two elements and the write queue contains threeelements. Each element of the write queue can contain as many as eight words (one sector)of data. One element of the write queue, marked snoop in Figure 1-2, is dedicated to writingcache sectors to system memory after a modified sector is hit by a snoop from anotherprocessor or snooping device on the system bus. The use of the write queue guarantees a

Chapter 1. Overview

1-9

high priority operation that ensures a deterministic response behavior when snooping hitsa modified sector.

Figure 1-2. Memory Unit

The other two elements in the write queue are used for store operations and writing backmodified sectors that have been deallocated by updating the queue; that is, when a cachelocation is full, the least-recently used cache sector is deallocated by first being copied intothe write queue and from there to system memory. Note that snooping can occur after asector has been pushed out into the write queue and before the data has been written tosystem memory. Therefore, to maintain a coherent memory, the write queue elements arecompared to snooped addresses in the same way as the cache tags. If a snoop hits a writequeue element, the data is first stored in system memory before it can be loaded into thecache of the snooping bus master. Coherency checking between the cache and the writequeue prevents dependency conflicts. Single-beat writes in the write queue are not snooped;coherency is ensured through the use of special cache operations that accompany thesingle-beat write operation on the bus.

Execution of a load or store instruction is considered complete when the associated addresstranslation completes, guaranteeing that the instruction has completed to the point where itis known that it will not generate an internal exception. However, after address translationis complete, a read or write operation can still generate an external exception.

Load and store instructions are always issued and translated in program order with respectto other load and store instructions. However, a load or store operation that hits in the cachecan complete ahead of those that miss in the cache; additionally, loads and stores that missthe cache can be reordered as they arbitrate for the system bus.

If a load or store misses in the cache, the operation is managed by the memory unit whichprioritizes accesses to the system bus. Read requests, such as loads, RWITMs, andinstruction fetches have priority over single-beat write operations. The priorities foraccessing the system bus are listed in Section 4.10.2, “Memory Unit Queuing Priorities.”

READ QUEUE

WRITE QUEUE

SNOOP

ADDRESS DATA

ADDRESS(from cache)

DATA(from cache)

SYSTEM INTERFACE

(to cache)

DATA QUEUE

(four word)

1-10


The 601 ensures memory consistency by comparing target addresses and prohibitinginstructions from completing out of order if an address matches. Load and store operationscan be forced to execute in strict program order.

1.1.8 System Interface

Because the cache on the 601 is an on-chip, write-back primary cache, the predominanttype of transaction for most applications is burst-read memory operations, followed byburst-write memory operations, I/O controller interface operations, and single-beat(noncacheable or write-through) memory read and write operations. Additionally, there canbe address-only operations, variants of the burst and single-beat operations (global memoryoperations that are snooped, and atomic memory operations, for example), and addressretry activity (for example, when a snooped read access hits a modified line in the cache).

Memory accesses can occur in single-beat (1–8 bytes) and four-beat burst (32 bytes) datatransfers. The address and data buses are independent for memory accesses to supportpipelining and split transactions. The 601 can pipeline as many as two transactions and haslimited support for out-of-order split-bus transactions.

Access to the system interface is granted through an external arbitration mechanism thatallows devices to compete for bus mastership. This arbitration mechanism is flexible,allowing the 601 to be integrated into systems that implement various fairness and busparking procedures to avoid arbitration overhead. Additional multiprocessor support isprovided through coherency mechanisms that provide snooping, external control of theon-chip cache and TLB, and support for a secondary cache. Multiprocessor softwaresupport is provided through the use of atomic memory operations.

Typically, memory accesses are weakly ordered—sequences of operations, includingload/store string and multiple instructions, do not necessarily complete in the order theybegin—maximizing the efficiency of the bus without sacrificing coherency of the data. The601 allows read operations to precede store operations (except when a dependency exists,of course). In addition, the 601 can be configured to reorder high priority write operationsahead of lower priority store operations. Because the processor can dynamically optimizerun-time ordering of load/store traffic, overall performance is improved.

1.2 Levels of the PowerPC Architecture

The PowerPC architecture consists of the following layers, and adherence to the PowerPCarchitecture can be measured in terms of which of the following levels of the architectureis implemented:

• PowerPC user instruction set architecture—Defines the base user-level instruction set, user-level registers, data types, floating-point exception model, memory models for a uniprocessor environment, and programming model for uniprocessor environment.

Chapter 1. Overview

1-11

• PowerPC virtual environment architecture—Describes the memory model for a multiprocessor environment, defines cache control instructions, and describes other aspects of virtual environments. Implementations that conform to the PowerPC virtual environment architecture also adhere to the PowerPC user instruction set architecture, but may not necessarily adhere to the PowerPC operating environment architecture.

• PowerPC operating environment architecture—Defines the memory management model, supervisor-level registers, synchronization requirements, and the exception model. Implementations that conform to the PowerPC operating environment architecture also adhere to the PowerPC user instruction set architecture and the PowerPC virtual environment architecture definition.

Note that while the 601 is said to adhere to the PowerPC architecture at all three levels, itdiverges in aspects of its implementation to a greater extent than should be expected ofsubsequent PowerPC processors. Many of the differences result from the fact that the 601design provides compatibility with an existing architecture standard (POWER), whileproviding a reliable platform for hardware and software development compatible withsubsequent PowerPC processors.

Note that except for the POWER instructions and the RTC implementation, the differencesbetween the 601 and the PowerPC architecture are primarily differences in the operatingenvironment architecture.

The PowerPC architecture allows a wide range of designs for such features as cache andsystem interface implementations.

1.3 The 601 as a PowerPC Implementation

The PowerPC architecture is derived from the IBM Performance Optimized with EnhancedRISC (POWER) architecture. The PowerPC architecture shares the benefits of the POWERarchitecture optimized for single-chip implementations. The architecture design facilitatesparallel instruction execution and is scalable to take advantage of future technologicalgains. For compatibility, the 601 also implements instructions from the POWER userprogramming model that are not part of the PowerPC definition.

1-12


This section describes the PowerPC architecture in general, noting where the 601 differs.The organization of this section follows the sequence of the chapters in this manual asfollows:

• Features—Section 1.3.1, “Features,” describes general features that the 601 shares with the PowerPC family of microprocessors. It does not list PowerPC features not implemented in the 601.

• Registers and programming model—Section 1.3.2, “Registers and Programming Model,” describes the registers for the operating environment architecture common among PowerPC processors and describes the programming model. It also describes differences in how the registers are used in the 601 and describes the additional registers that are unique to the 601.

• Instruction set and addressing modes—Section 1.3.3, “Instruction Set and Addressing Modes,” describes the PowerPC instruction set and addressing modes for the PowerPC operating environment architecture. It defines the PowerPC instructions implemented in the 601 as well as additional instructions implemented in the 601 but not defined in the PowerPC architecture.

• Cache implementation—Section 1.3.4, “Cache Implementation,” describes the cache model that is defined generally for PowerPC processors by the virtual environment architecture. It also provides specific details about the 601 cache implementation.

• Exception model—Section 1.3.5, “Exception Model,” describes the exception model of the PowerPC operating environment architecture and the differences in the 601 exception model.

• Memory management—Section 1.3.6, “Memory Management,” describes generally the conventions for memory management among the PowerPC processors. This section also describes the general differences between the 601 and the 32-bit PowerPC memory management specification.

• Instruction timing—Section 1.3.7, “601 Instruction Timing,” provides a general description of the instruction timing provided by the superscalar, parallel execution supported by the PowerPC architecture.

• System interface—Section 1.3.8, “System Interface,” describes the signals implemented on the 601.

1.3.1 Features

The 601 is a high-performance, superscalar PowerPC implementation. The PowerPCarchitecture allows optimizing compilers to schedule instructions to maximize performancethrough efficient use of the PowerPC instruction set and register model. The multiple,independent execution units allow compilers to maximize parallelism and instruction

Chapter 1. Overview

1-13

throughput. Compilers that take advantage of the flexibility of the PowerPC architecturecan additionally optimize system performance of the PowerPC processors.

The 601 implements the PowerPC architecture, with the extensions and variances listed inAppendix H, “Implementation Summary for Programmers.”

Specific features of the 601 are listed in Section 1.1.1, “601 Features.”

1.3.2 Registers and Programming Model

The following subsections describe the general features of the PowerPC registers andprogramming model and of the specific 601 implementation, respectively.

1.3.2.1 PowerPC Registers and Programming Model

The PowerPC architecture defines register-to-register operations for most computationalinstructions. Source operands for these instructions are accessed from the registers or areprovided as immediate values embedded in the instruction opcode. The three-registerinstruction format allows specification of a target register distinct from the two sourceoperands. Load and store instructions transfer data between registers and memory.

PowerPC processors have two levels of privilege—supervisor mode of operation (typicallyused by the operating environment) and one that corresponds to the user mode of operation(used by the application software). The programming models incorporate 32 GPRs, 32FPRs, special-purpose registers (SPRs), and several miscellaneous registers. Note that thereare several registers that are part of the PowerPC architecture that are not implemented inthe 601; for example, the time base registers are not implemented in the 601. Likewise, eachPowerPC implementation has its own unique set of hardware implementation (HID)registers, which are implementation-specific.

This division allows the operating system to control the application environment (providingvirtual memory and protecting operating-system and critical machine resources).Instructions that control the state of the processor, the address translation mechanism, andsupervisor registers can be executed only when the processor is operating in supervisormode.

The following sections summarize the PowerPC registers that are implemented in the 601processor. Chapter 2, “Register Models and Data Types,” provides more detailedinformation about the registers implemented in the 601.

1.3.2.1.1 General-Purpose Registers (GPRs)

The PowerPC architecture defines 32 user-level, general-purpose registers (GPRs). Theseregisters are either 32 bits wide in 32-bit PowerPC implementations and 64 bits wide in64-bit PowerPC implementations. The GPRs serve as the data source or destination for allinteger instructions.

1-14


1.3.2.1.2 Floating-Point Registers (FPRs)The PowerPC architecture also defines 32 user-level 64-bit floating-point registers (FPRs).The FPRs serve as the data source or destination for floating-point instructions. Theseregisters can contain data objects of either single- or double-precision floating-pointformats.

1.3.2.1.3 Condition Register (CR)The CR is a 32-bit user-level register that consists of eight four-bit fields that reflect theresults of certain operations, such as move, integer and floating-point compare, arithmetic,and logical instructions, and provide a mechanism for testing and branching.

1.3.2.1.4 Floating-Point Status and Control Register (FPSCR)The floating-point status and control register (FPSCR) is a user-level register that containsall exception signal bits, exception summary bits, exception enable bits, and roundingcontrol bits needed for compliance with the IEEE 754 standard.

1.3.2.1.5 Machine State Register (MSR)The machine state register (MSR) is a supervisor-level register that defines the state of theprocessor. The contents of this register is saved when an exception is taken and restoredwhen the exception handling completes. The 601 implements the MSR as a 32-bit register;64-bit PowerPC processors implement a 64-bit MSR.

1.3.2.1.6 Segment Registers (SRs)For memory management, 32-bit PowerPC implementations implement sixteen 32-bitsegment registers (SRs). Figure 2-12 shows the format of a segment register when the T bitis cleared and Figure 2-13 shows the layout when the T bit (SR[0]) is set. The fields in thesegment register are interpreted differently depending on the value of bit 0.

1.3.2.1.7 Special-Purpose Registers (SPRs)The PowerPC operating environment architecture defines numerous special-purposeregisters that serve a variety of functions, such as providing controls, indicating status,configuring the processor, and performing special operations. Some SPRs are accessedimplicitly as part of executing certain instructions. All SPRs can be accessed by using theMove to/from Special Purpose Register instructions, mtspr and mfspr.

In the 601, all SPRs are 32 bits wide.

1.3.2.1.8 User-Level SPRsThe following 601 SPRs are accessible by user-level software:

• Link register (LR)—The link register can be used to provide the branch target address and to hold the return address after branch and link instructions. The LR is 32 bits wide in 32-bit implementations.

• Count register (CTR)—The CTR is decremented and tested automatically as a result of branch-and-count instructions. The CTR is 32 bits wide in 32-bit implementations.

Chapter 1. Overview 1-15

• Integer exception register (XER)—The 32-bit XER contains the integer carry and overflow bits and two fields for the Load String and Compare Byte Indexed (lscbx) instruction (a POWER instruction implemented in the 601 but not defined by the PowerPC architecture).

1.3.2.1.9 Supervisor-Level SPRsThe 601 also contains SPRs that can be accessed only by supervisor-level software. Theseregisters consist of the following:

• The 32-bit data access exception (DAE)/source instruction service register (DSISR) defines the cause of data access and alignment exceptions.

• The data address register (DAR) is a 32-bit register that holds the address of an access after an alignment or data access exception.

• Decrementer register (DEC) is a 32-bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay. PowerPC architecture defines that the DEC frequency be provided as a subdivision of the processor clock frequency; however, the 601 implements a separate clock input that serves both the DEC and the RTC facilities.

• The 32-bit table search description register 1(SDR1) specifies the page table format used in logical-to-physical address translation for pages.

• The machine status save/restore register 0 (SRR0) is a 32-bit register that is used by the 601 for saving the address of the instruction that caused the exception, and the address to return to when a Return from Interrupt (rfi) instruction is executed.

• The machine status save/restore register 1 (SRR1) is a 32-bit register used to save machine status on exceptions and to restore machine status when an rfi instruction is executed.

• General SPRs, SPRG0–SPRG3, are 32-bit registers provided for operating system use.

• The external access register (EAR) is a 32-bit register that controls access to the external control facility through the External Control Input Word Indexed (eciwx) and External Control Output Word Indexed (ecowx) instructions.

• The processor version register (PVR) is a 32-bit, read-only register that identifies the version (model) and revision level of the PowerPC processor.

• Block address translation (BAT) registers—The PowerPC architecture defines 16 BAT registers, divided into four pairs of data BATs (DBATs) and four pairs of instruction BATs (IBATs). The 601 includes four pairs of unified BATs (BAT0U–BAT3U and BAT0L–BAT3L). See Figure 1-3 for a list of the SPR numbers for the BAT registers. Figure 2-23 and Figure 2-24 show the format of the upper and lower BAT registers. Note that the format for the 601’s implementation of the BAT registers differs from the PowerPC architecture definition.

1-16 PowerPC 601 RISC Microprocessor User's Manual

1.3.2.2 Additional Registers in the 601During normal execution, a program can access the registers, shown in Figure 1-3,depending on the program’s access privilege (supervisor or user, determined by theprivilege-level (PR) bit in the machine state register (MSR)). Note that registers such as thegeneral-purpose registers (GPRs) and floating-point registers (FPRs) are accessed throughoperands that are part of the instructions. Access to registers can be explicit (that is, throughthe use of specific instructions for that purpose such as Move to Special-Purpose Register(mtspr) and Move from Special-Purpose Register (mfspr) instructions) or implicit as thepart of the execution of an instruction. Some registers are accessed both explicitly andimplicitly.

The numbers to the left of the SPRs indicate the number that is used in the syntax of theinstruction operands to access the register.

Figure 1-3 shows all the 601 registers and includes the following registers that are not partof the PowerPC architecture:

• Real-time clock (RTC) registers—RTCU and RTCL (RTC upper and RTC lower). The registers can be read from by user-level software, but can be written to only by supervisor-level software. As shown in Figure 1-3, the SPR numbers for the RTC registers depend on the type of access used. For more information, see Section 2.2.5.3, “Real-Time Clock (RTC) Registers (User-Level).”

• MQ register (MQ). The MQ register is a 601-specific, 32-bit register used as a register extension to accommodate the product for the multiply instructions and the dividend for the divide instructions. It is also used as an operand of long rotate and shift instructions. This register, and the instructions that require it, is provided for compatibility with POWER architecture, and is not part of the PowerPC architecture. For more information, see Section 2.2.5.1, “MQ Register (MQ).” The MQ register is typically accessed implicitly as part of executing a computational instruction.

• Block-address translation (BAT) registers. The 601 includes eight block-address translation registers (BATs), consisting of four pairs of BATs (IBAT0U–IBAT3U and IBAT0L–IBAT3L). See Figure 1-3 for a list of the SPR numbers for the BAT registers. Figure 2-23 and Figure 2-24 show the formats of the upper and lower BAT registers. Note that the PowerPC architecture has twice as many BAT registers as the 601.

• Hardware implementation registers (HID0–HID2, HID5, and HID15). These registers are provided primarily for debugging. For more information, see Section 2.3.3.13.1, “Checkstop Sources and Enables Register—HID0” through Section 2.3.3.13.5, “Processor Identification Register (PIR)—HID15.” HID15 holds the four-bit processor identification tag (PID) that is useful for differentiating processors in multiprocessor system designs. For more information, see Section 2.3.3.13.5, “Processor Identification Register (PIR)—HID15.” Note that while it is not guaranteed that the implementation of HID registers is consistent among PowerPC


processors, other processors may be designed with similar or identical HID registers.

Figure 1-3. PowerPC 601 Microprocessor Programming Model—Registers

GPR0

GPR1

GPR31

0 31

User-Level SPRs

SPR0 MQ Register 1

SPR1 XER—Integer Exception Register

SPR4 RTCU—RTC Upper Register (For reading only)1,3

SPR5 RTCL—RTC Lower Register (For reading only)1,3

SPR8 LR—Link Register

SPR9 CTR—Count Register

USER PROGRAMMINGMODEL

Supervisor-Level SPRs

SPR18 DSISR—DAE/ Source Instruction Service Register

SPR19 DAR—Data Address Register

SPR20 RTCU—RTC Upper Register (For writing only)1,3

SPR21 RTCL—RTC Lower Register (For writing only)1,3

SPR22 DEC—Decrementer Register4

SPR25 SDR1—Table Search Description Register 1

SPR26 SRR0—Save and Restore Register 0


SPR272 SPRG0—SPR General 0




SPR282 EAR—External Access Register

SPR287 PVR—Processor Version Register

SPR528 IBAT0U—BAT 0 Upper 2

SPR529 IBAT0L—BAT 0 Lower 2







SPR1008 HID0 1

SPR1009 HID1 1

SPR1010 HID2 (IABR) 1

SPR1013 HID5 (DABR) 1

0 63

SUPERVISOR PROGRAMMING MODEL

Machine State Register 2

1 601-only registers. These registers are not necessarily supported by other PowerPC processors.2 These registers may be implemented differently on other PowerPC processors. The PowerPC architecture defines two sets of

BAT registers—eight IBATs and eight DBATs. The 601 implements the IBATs and treats them as unified BATs.3 RTCU and RTCL registers can be written only in supervisor mode, in which case different SPR numbers are used.4 DEC register can be read by user programs by specifying SPR6 in the mfspr instruction (for POWER compatibility).

FPR0

FPR1

FPR31

MSR

0 31

0 31

0 31

Segment Registers

SR0

SR1

SR15

0 31

Floating Point Status and

Control Register

CR

0 31

FPSCR

0 31

Condition Register


1.3.3 Instruction Set and Addressing ModesThe following subsections describe the PowerPC instruction set and addressing modes ingeneral. Differences in the 601’s instruction set are described in Section 1.3.3.2, “601Instruction Set.”

1.3.3.1 PowerPC Instruction Set and Addressing ModesAll PowerPC instructions are encoded as single-word (32-bit) opcodes. Instruction formatsare consistent among all instruction types, permitting efficient decoding to occur in parallelwith operand accesses. This fixed instruction length and consistent format greatly simplifiesinstruction pipelining.

1.3.3.1.1 PowerPC Instruction SetThe PowerPC instructions are divided into the following categories:

• Integer instructions—These include computational and logical instructions.

— Integer arithmetic instructions— Integer compare instructions— Integer logical instructions— Integer rotate and shift instructions

• Floating-point instructions—These include floating-point computational instructions, as well as instructions that affect the floating-point status and control register (FPSCR).

— Floating-point arithmetic instructions— Floating-point multiply/add instructions— Floating-point rounding and conversion instructions— Floating-point compare instructions— Floating-point status and control instructions

• Load/store instructions—These include integer and floating-point load and store instructions.

— Integer load and store instructions— Integer load and store multiple instructions— Floating-point load and store— Floating-point move instructions

— Primitives used to construct atomic memory operations (lwarx and stwcx. instructions)

• Flow control instructions—These include branching instructions, condition register logical instructions, trap instructions, and other instructions that affect the instruction flow.

— Branch and trap instructions— Condition register logical instructions


• Processor control instructions—These instructions are used for synchronizing memory accesses and management of caches, UTLBs, and the segment registers.

— Move to/from special purpose register instructions

— Move to/from MSR

— Synchronize

— Instruction synchronize

— TLB invalidate

• Memory control instructions—These instructions provide control of caches, TLBs, and segment registers.

— Supervisor-level cache management instructions— User-level cache instructions— Segment register manipulation instructions— Translation lookaside buffer management instructions

Note that this grouping of the instructions does not indicate which execution unit executesa particular instruction or group of instructions. This information, which is useful in takingfull advantage of superscalar parallel instruction execution, is provided in Chapter 7,“Instruction Timing,” and Chapter 10, “Instruction Set.”

Integer instructions operate on byte, half-word, and word operands. Floating-pointinstructions operate on single-precision (one word) and double-precision (one doubleword) floating-point operands. The PowerPC architecture uses instructions that are fourbytes long and word-aligned. It provides for byte, half-word, and word operand loads andstores between memory and a set of 32 general-purpose registers (GPRs). It also providesfor word and double-word operand loads and stores between memory and a set of 32floating-point registers (FPRs).

Computational instructions do not modify memory. To use a memory operand in acomputation and then modify the same or another memory location, the memory contentsmust be loaded into a register, modified, and then written back to the target location withdistinct instructions.

PowerPC processors follow the program flow when they are in the normal execution state.However, the flow of instructions can be interrupted directly by the execution of aninstruction or by an asynchronous event. Either kind of exception may cause one of severalcomponents of the system software to be invoked.

1.3.3.1.2 Calculating Effective Addresses The effective address (EA) is the 32-bit address computed by the processor when executinga memory access or branch instruction or when fetching the next sequential instruction.


The PowerPC architecture supports two simple memory addressing modes:

• EA = (rA|0) + offset (including offset = 0) (register indirect with immediate index)• EA = (rA|0) + rB (register indirect with index)

These simple addressing modes allow efficient address generation for memory accesses.Calculation of the effective address for aligned transfers occurs in a single clock cycle.

For a memory access instruction, if the sum of the effective address and the operand lengthexceeds the maximum effective address, the storage operand is considered to wrap aroundfrom the maximum effective address to effective address 0.

Effective address computations for both data and instruction accesses use 32-bit unsignedbinary arithmetic. A carry from bit 0 is ignored in 32-bit implementations.

1.3.3.2 601 Instruction SetThe 601 instruction set is defined as follows:

• The 601 implements the 32-bit PowerPC architecture instructions except as indicated in Appendix C, “PowerPC Instructions Not Implemented.” Otherwise, all instructions not implemented in the 601 are defined as optional in the PowerPC architecture.

• The 601 supports a number of POWER instructions that are otherwise not implemented in the PowerPC architecture. These are listed in Appendix B, “POWER Architecture Cross Reference.” Individual instructions are described in Chapter 10, “Instruction Set.”

• The 601 implements the External Control Input Word Indexed (eciwx) and External Control Output Word Indexed (ecowx) instructions, which are optional in the PowerPC architecture definition.

• Several of the instructions implemented in the 601 function somewhat differently than they are defined in the PowerPC architecture. These differences typically stem from design differences; for instance, the PowerPC architecture defines several cache control instructions specific to separate instruction and data cache designs.

When executed on the 601, such instructions may provide a subset of the functions of the instruction or they may be no-ops.

For a list of all PowerPC instructions and all 601-specific instructions, see Appendix A,“Instruction Set Listings.” Chapter 10, “Instruction Set,” describes each instruction,indicating whether an instruction is 601-specific and describing any differences in theimplementation on the 601.

1.3.4 Cache ImplementationThe following subsections describe the PowerPC architecture’s treatment of cache ingeneral, and the 601-specific implementation, respectively.


1.3.4.1 PowerPC Cache Characteristics The PowerPC architecture does not define hardware aspects of cache implementations. Forexample, some PowerPC processors may have separate instruction and data caches(Harvard architecture), while others, such as the 601, implement a unified cache.

PowerPC implementations can control the following memory access modes on a page orblock basis:

• Write-back/write-through mode• Cache-inhibited mode • Memory coherency

Note that in the 601 processor, a block is defined as an eight-word sector. The PowerPCvirtual environment architecture defines cache management instructions that provide ameans by which the application programmer can affect the cache contents.

1.3.4.2 601 Cache ImplementationThe 601 has a 32-Kbyte, eight-way set-associative unified (instruction and data) cache. Thecache is physically addressed and can operate in either write-back or write-through modeas specified by the PowerPC architecture.

The cache is configured as eight sets of 64 lines. Each line consists of two sectors, four statebits (two per sector), several replacement control bits, and an address tag. The two state bitsimplement the four-state MESI (modified-exclusive-shared-invalid) protocol. Each sectorcontains eight 32-bit words. Note that the PowerPC architecture defines the term block asthe cacheable unit. For the 601 processor, the block is a sector. A block diagram of the cacheorganization is shown in Figure 1-4.

Each cache line contains 16 contiguous words from memory that are loaded from a 16-wordboundary (that is, bits A26–A31 of the logical addresses are zero); thus, a cache line nevercrosses a page boundary. Misaligned accesses across a page boundary can incur aperformance penalty.

Cache reload operations are always performed on a sector basis (that is, the cache issnooped and updated and coherency is maintained on a per-sector basis). However, if theother sector in the line is marked invalid, an optional, low-priority update of that sector isattempted after the sector that contained the critical word is filled. The ability to attempt theother sector update can be disabled by the system software.

External bus transactions that load instructions or data into the cache always transfer themissed quad word first, regardless of its location in a cache sector; then the rest of the cachesector is filled. As the missed quad word is loaded into the cache, it is simultaneouslyforwarded to the appropriate execution unit so instruction execution resumes as quickly aspossible.


To ensure coherency among caches in a multiprocessor (or multiple caching-device)implementation, the 601 implements the MESI protocol. MESI stands formodified/exclusive/shared/invalid. These four states indicate the state of the cache block asfollows:

• Modified—The cache block is modified with respect to system memory; that is, data for this address is valid only in the cache and not in system memory.

• Exclusive—This cache block holds valid data that is identical to the data at this address in system memory. No other cache has this data.

• Shared—This cache block holds valid data that is identical to this address in system memory and at least one other caching device.

• Invalid—This cache block does not hold valid data.

Cache coherency is enforced by on-chip hardware bus snooping logic. Since the cache tagdirectory has a separate port dedicated to snooping bus transactions, bus snooping trafficdoes not interfere with processor access to the cache unless a snoop hit occurs.

Figure 1-4. Cache Unit Organization

1.3.5 Exception ModelThe following subsections describe the PowerPC exception model and the 601implementation, respectively.

LINE 63

LINE 0 ADDRESS TAG

ADDRESS TAG

SECTOR 0 SECTOR 1

8 WORDS 8 WORDS

16 WORDS

8 SETS


1.3.5.1 PowerPC Exception ModelThe PowerPC exception mechanism allows the processor to change to supervisor state as aresult of external signals, errors, or unusual conditions arising in the execution ofinstructions. When exceptions occur, information about the state of the processor is savedto certain registers and the processor begins execution at an address (exception vector)predetermined for each exception. The exception handler at the specified vector is thenprocessed with the processor in supervisor mode. The PowerPC exception model isdescribed in detail in Chapter 5, “Exceptions.”

Although multiple exception conditions can map to a single exception vector, a morespecific condition may be determined by examining a register associated with theexception—for example, the DAE/source instruction service register (DSISR) and thefloating-point status and control register (FPSCR). Additionally, some exception conditionscan be explicitly enabled or disabled by software.

The PowerPC architecture requires that exceptions be handled in program order; therefore,although a particular implementation may recognize exception conditions out of order, theyare presented strictly in order. When an instruction-caused exception is recognized, anyunexecuted instructions that appear earlier in the instruction stream, including any that havenot yet entered the execute state, are required to complete before the exception is taken. Anyexceptions caused by those instructions are handled first. Likewise, exceptions that areasynchronous and precise are recognized when they occur, but are not handled until allinstructions currently in the execute stage successfully complete execution and report theirresults.

Unless a catastrophic condition causes a system reset or machine check exception, only oneexception is handled at a time. If, for example, a single instruction encounters multipleexception conditions, those conditions are encountered sequentially. After the exceptionhandler handles an exception, the instruction execution continues until the next exceptioncondition is encountered. However, in many cases there is no attempt to reexecute theinstruction. This method of recognizing and handling exception conditions sequentiallyguarantees that exceptions are recoverable.

Exception handlers should save the information stored in SRR0 and SRR1 early to preventthe program state from being lost due to a system reset and machine check exception or toan instruction-caused exception in the exception handler, and before enabling externalinterrupts.

The PowerPC architecture supports four types of exceptions:

• Synchronous, precise—These are caused by instructions. All instruction-caused exceptions are handled precisely; that is, the machine state at the time the exception occurs is known and can be completely restored. This means that (excluding the trap and system call exceptions) the address of the faulting instruction is provided to the exception handler and that neither the faulting instruction nor subsequent instructions in the code stream will complete execution. The instructions that invoke


trap and system call exceptions complete execution before the exception is taken. When exception processing completes, execution resumes at the address of the next instruction.

• Synchronous, imprecise—The PowerPC architecture defines two imprecise floating-point exception modes, recoverable and nonrecoverable. Even though the 601 provides a means to enable the imprecise modes, it implements these modes identically to the precise mode (i.e., all enabled floating-point enabled exceptions are always precise on the 601).

• Asynchronous, precise—The external interrupt and decrementer exceptions are maskable asynchronous exceptions that are handled precisely. When these exceptions occur, their handling is postponed until all instructions, and any exceptions associated with those instructions, complete execution.

• Asynchronous, imprecise—There are two non-maskable asynchronous exceptions that are imprecise: system reset and machine check exceptions. These exceptions may not be recoverable, or may provide a limited degree of recoverability for diagnostic purpose.

The PowerPC architecture defines several of the exceptions differently than the 601implementation. For example, the PowerPC exception model provides a unique vector forthe trace exception; the 601 vectors trace exceptions to the run-mode exception handler.Other differences are noted in the following section, Section 1.3.5.2, “The 601 ExceptionModel.”

1.3.5.2 The 601 Exception ModelAs specified by the PowerPC architecture, all 601 exceptions can be described as eitherprecise or imprecise and either synchronous or asynchronous. Asynchronous exceptions arecaused by events external to the processor’s execution; synchronous exceptions, which areall handled precisely by the 601, are caused by instructions.

The 601 exception classes are shown in Table 1-1.

Although exceptions have other characteristics as well, such as whether they are maskableor nonmaskable, the distinctions shown in Table 1-1 define categories of exceptions that the601 handles uniquely. Note that Table 1-1 includes no synchronous imprecise instructions.

Table 1-1. PowerPC 601 Microprocessor Exception Classifications

Synchronous/Asynchronous Precise/Imprecise Exception Type

Asynchronous Imprecise Machine checkSystem reset

Asynchronous Precise External interruptDecrementer

Synchronous Precise Instruction-caused exceptions


While the PowerPC architecture supports imprecise handling of floating-point exceptions,the 601 implements these exception modes as precise exceptions.

The 601’s exceptions, and conditions that cause them, are listed in Table 1-2. Exceptionsthat are specific to the 601 are indicated.

Table 1-2. Exceptions and Conditions

Exception Type

Vector Offset(hex)

Causing Conditions

Reserved 00000 —

System reset 00100 A system reset is caused by the assertion of either SRESET or HRESET.

Machine check 00200 A machine check is caused by the assertion of the TEA signal during a data bus transaction.

Data access 00300 The cause of a data access exception can be determined by the bit settings in the DSISR, listed as follows:1 Set if the translation of an attempted access is not found in the primary

hash table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a BAT register; otherwise cleared.

4 Set if a memory access is not permitted by the page or BAT protection mechanism described in Chapter 6, “Memory Management Unit”; otherwise cleared.

5 Set if the access was to an I/O segment (SR[T] =1) by an eciwx, ecowx, lwarx, stwcx., or lscbx instruction; otherwise cleared. Set by an eciwx or ecowx instruction if the access is to an address that is marked as write-through.

6 Set for a store operation and cleared for a load operation. 9 Set if an EA matches the address in the DABR while in one of the three

compare modes. 11 Set if eciwx or ecowx is used and EAR[E] is cleared.

Instruction access

00400 An instruction access exception is caused when an instruction fetch cannot be performed for any of the following reasons:• The effective (logical) address cannot be translated. That is, there is a page

fault for this portion of the translation, so an instruction access exception must be taken to retrieve the translation from a storage device such as a hard disk drive.

• The fetch access is to an I/O segment.• The fetch access violates memory protection. If the key bits (Ks and Ku) in

the segment register and the PP bits in the PTE or BAT are set to prohibit read access, instructions cannot be fetched from this location.

External interrupt

00500 An external interrupt occurs when the INT signal is asserted.

Alignment 00600 An alignment exception is caused when the 601 cannot perform a memory access for any of several reasons, such as when the operand of a floating-point load or store operation is in an I/O segment (SR[T] = 1) or when a scalar load/store operand crosses a page boundary. Specific exception sources are described in Section 5.4.6, “Alignment Exception (x'00600').”


Program 00700 A program exception is caused by one of the following exception conditions, which correspond to bit settings in SRR1 and arise during execution of an instruction:• Floating-point enabled exception—A floating-point enabled exception

condition is generated when the following condition is met: (MSR[FE0] | MSR[FE1]) & FPSCR[FEX] is 1.

FPSCR[FEX] is set by the execution of a floating-point instruction that causes an enabled exception or by the execution of a “move to FPSCR” instruction that results in both an exception condition bit and its corresponding enable bit being set in the FPSCR.

• Illegal instruction—An illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode fields (including PowerPC instructions not implemented in the 601), or when execution of an optional instruction not provided in the 601 is attempted (these do not include those optional instructions that are treated as no-ops). The PowerPC instruction set is described in Chapter 3, “Addressing Modes and Instruction Set Summary.”

• Privileged instruction—A privileged instruction type program exception is generated when the execution of a privileged instruction is attempted and the MSR register user privilege bit, MSR[PR], is set. In the 601, this exception is generated for mtspr or mfspr with an invalid SPR field if SPR[0] = 1 and MSR[PR] = 1. This may not be true for all PowerPC processors.

• Trap—A trap type program exception is generated when any of the conditions specified in a trap instruction is met.

Floating-point unavailable

00800 A floating-point unavailable exception is caused by an attempt to execute a floating-point instruction (including floating-point load, store, and move instructions) when the floating-point available bit is disabled, MSR[FP] = 0.

Decrementer 00900 The decrementer exception occurs when the most significant bit of the decrementer (DEC) register transitions from 0 to 1. Must also be enabled with the MSR[EE] bit.

I/O controller interface error

00A00 An I/O controller interface error exception is taken only when an operation to an I/O controller interface segment fails (such a failure is indicated to the 601 by a particular bus reply packet). If an I/O controller interface exception is taken on a memory access directed to an I/O segment, the SRR0 contains the address of the instruction following the offending instruction. Note that this exception is not implemented in other PowerPC processors.

Reserved 00B00 —

System call 00C00 A system call exception occurs when a System Call (sc) instruction is executed.

Reserved 00D00 Other PowerPC processors may use this vector for trace exceptions.

Reserved 00E00 The 601 does not generate an interrupt to this vector. Other PowerPC processors may use this vector for floating-point assist exceptions.

Reserved 00E10–00FFF —

Reserved 01000–01FFF Reserved, implementation-specific

Table 1-2. Exceptions and Conditions (Continued)

Exception Type

Vector Offset(hex)

Causing Conditions


1.3.6 Memory ManagementThe following subsections describe the PowerPC memory management architecture, andthe specific 601 implementation, respectively.

1.3.6.1 PowerPC Memory ManagementThe primary functions of the MMU are to translate logical (effective) addresses to physicaladdresses for memory accesses, I/O accesses (most I/O accesses are assumed to bememory-mapped), and I/O controller interface accesses, and to provide access protectionon blocks and pages of memory.

There are three types of accesses generated by the 601 that require address translation:instruction accesses, data accesses to memory generated by load and store instructions, andI/O controller interface accesses generated by load and store instructions.

The PowerPC MMU and exception model support demand-paged virtual memory. Virtualmemory management permits execution of programs larger than the size of physicalmemory; demand-paged implies that individual pages are loaded into physical memoryfrom system memory only when they are first accessed by an executing program.

The hashed page table is a variable-sized data structure that defines the mapping betweenvirtual page numbers and physical page numbers. The page table size is a power of 2, andits starting address is a multiple of its size.

The page table contains a number of page table entry groups (PTEGs). A PTEG containseight page table entries (PTEs) of eight bytes each; therefore, each PTEG is 64 bytes long.

Run mode/ trace exception

02000 The run mode exception is taken depending on the settings of the HID1 register and the MSR[SE] bit. The following modes correspond with bit settings in the HID1 register:• Normal run mode—No address breakpoints are specified, and the 601

executes from zero to three instructions per cycle• Single instruction step mode—One instruction is processed at a time. The

appropriate break action is taken after an instruction is executed and the processor quiesces.

• Limited instruction address compare—The 601 runs at full speed (in parallel) until the EA of the instruction being decoded matches the EA contained in HID2. Addresses for branch instructions and floating-point instructions may never be detected.

• Full instruction address compare mode—Processing proceeds out of IQ0. When the EA in HID2 matches the EA of the instruction in IQ0, the appropriate break action is performed. Unlike the limited instruction address compare mode, all instructions pass through the IQ0 in this mode. That is, instructions cannot be folded out of the instruction stream.

The following mode is taken when the MSR[SE] bit is set.• MSR[SE] trace mode—Note that in other PowerPC implementations, the

trace exception is a separate exception with its own vector x'00D00'.

Table 1-2. Exceptions and Conditions (Continued)

Exception Type

Vector Offset(hex)

Causing Conditions


PTEG addresses are entry points for table search operations. Figure 6-16 shows two PTEGaddresses (PTEGaddr1 and PTEGaddr2) where a given PTE may reside.

Address translations are enabled by setting bits in the MSR—MSR[IT] enables instructiontranslations and MSR[DT] enables data translations.

1.3.6.2 601 Memory ManagementThe 601 MMU provides 4 Gbytes of logical address space accessible to supervisor and userprograms with a 4-Kbyte page size and 256-Mbyte segment size. Block sizes range from128 Kbyte to 8 Mbyte and are software selectable. In addition, the 601 uses an interim52-bit virtual address and hashed page tables in the generation of 32-bit physical addresses.

A UTLB provides address translation in parallel with the on-chip cache access, incurringno additional time penalty in the event of a UTLB hit. The UTLB is a cache of the mostrecently used page table entries. Software is responsible for maintaining the consistency ofthe UTLB with memory. The 601’s UTLB is a 256-entry, two-way set-associative cachethat contains instruction and data address translations. The 601 provides hardware tablesearch capability through the hashed page table on UTLB misses. Supervisor software caninvalidate UTLB entries selectively. In addition, UTLB control instructions can optionallybe broadcast on the external interface for remote invalidations.

The 601 also provides a four-entry BAT array that maintains address translations for blocksof memory. These entries define blocks that can vary from 128 Kbytes to 8 Mbytes. TheBAT array is maintained by system software.

To accelerate the instruction unit operation, the 601 uses a four-entry ITLB. The ITLBcontains up to four copies of the most recently used instruction address translations (pageor block) providing the instruction unit access to the most recently used translations withoutrequiring the UTLB or BAT array. The processor ensures that the ITLB is consistent withthe UTLB, and uses an LRU replacement algorithm when a miss is encountered.

The 601 MMU relies on the exception processing mechanism for the implementation of thepaged virtual memory environment and for enforcing protection of designated memoryareas. Exception processing is described in Chapter 5, “Exceptions.” Section 2.3.1,“Machine State Register (MSR),” describes the MSR of the 601, which controls some ofthe critical functionality of the MMU.

As specified by the PowerPC architecture, the hashed page table is a variable-sized datastructure that defines the mapping between virtual page numbers and physical pagenumbers. The page table size is a power of 2, and its starting address is a multiple of its size.

Also as specified by the PowerPC architecture, the page table contains a number of PTEGs.A PTEG contains eight page table entries (PTEs) of eight bytes each; therefore each PTEGis 64 bytes long. PTEG addresses are entry points for table search operations. Figure 6-16shows two PTEG addresses (PTEGaddr1 and PTEGaddr2) where a given PTE may reside.


1.3.7 601 Instruction TimingThe 601 is a pipelined superscalar processor. A pipelined processor is one in which theprocessing of an instruction is broken down into discrete stages, such as decode, execute,and writeback. Because the tasks required to process an instruction are broken into a seriesof tasks, an instruction does not require the entire resources of an execution unit. Forexample, after an instruction completes the decode stage, it can pass on to the next stage,while the subsequent instruction can advance into the decode stage. This improves thethroughput of the instruction flow. For example, it may take three cycles for an integerinstruction to complete, but if there are no stalls in the integer pipeline, a series of integerinstructions can have a throughput of one instruction per cycle.

A superscalar processor is one in which multiple pipelines are provided to allowinstructions to execute in parallel. The 601 has three execution units, one each for integerinstructions, floating-point instructions, and branch instructions. The IU and the FPU eachhave dedicated register files for maintaining operands (GPRs and FPRs, respectively),allowing integer calculations and floating-point calculations to occur simultaneouslywithout interference.

The 601 pipeline description can be broken into two parts, the processor core, whereinstruction execution takes place, and the memory subsystem, the interface between theprocessor core and system memory. The system memory includes a unified 32-Kbyte cacheand the bus interface unit.

Figure 1-5 shows the 601’s instruction queue and the IU, FPU, and BPU pipelines.

Each of the stages shown in Figure 1-5 is described in Section 7.2, “Pipeline Description.”

As shown in Figure 1-5, integer instructions are dispatched only from IQ0 (where they arealso usually decoded); whereas branch and floating-point instructions can be dispatchedfrom any of the bottom four elements in the instruction queue (IQ0–IQ3). The dispatch ofinteger instructions is restricted in this manner to provide an ordered flow of instructionsthrough the integer pipeline, which in turn provides a mechanism that ensures that allinstructions appear to complete in order. As branch and floating-point instructions aredispatched their position in the instruction stream is recorded by means of tags thataccompany the previous integer instruction through the integer pipeline. Note that when afloating-point or branch instruction cannot be tagged to an integer instruction, it is taggedto a no-op, or bubble, in the integer pipeline.

Logic associated with the integer completion (IC) stage reconstructs the program order,checks for data dependencies, and schedules the write-back stages of the three pipelines.Note that it is not necessary that the write-back stages need only be serialized if there aredata dependencies. For example, instructions that update the condition register (CR) mustperform write-back in strict order.


.

Figure 1-5. Pipeline Diagram of the Processor Core

CARB

IQ7

IQ6

IQ1

IQ5

IQ4

IQ3

IQ2

IC

FD

FPA

FWA

FPM

F1BE

MR

Dispatch Unit

Floating-Point

Integer Unit (IU)Branch Processing

BW

CACC

Cache (memory subsystem)

IWA IWL

FWL

Data Access Queueing Unit

Fetch Arbitration

Unit (BPU) = Unit Boundary

= Cycle Boundary

Unit (FPU)

FA

ISB

FPSB

ID

IE

IQ0

1 An integer instruction can be passed to the ID stage in the same cycle in which it enters IQ0.

1

(Instructions in the IQare said to be in the dispatch stage (DS))


The tagging mechanism is described in Section 7.3.1.4.4, “Synchronization Tags for thePrecise Exception Model.”

To minimize latencies due to data dependencies, the IU provides feed-forwarding. Forexample, if an integer instruction requires data that is the result of the execution of theprevious instruction, that data is made available to the IU at the same time that the previousinstruction’s write-back stage updates the GPR. This eliminates an additional clock cyclethat would have been necessary if the IU had to access the GPR. Feed-forwarding isavailable between IU execute and decode stage and IU write-back and decode stage.Feed-forwarding is described in Section 7.2.1.2, “Integer Unit (IU).”

Most integer instructions require one clock cycle per stage. Because results for most integerinstructions are available at the end of the execute stage, a series of single-cycle integerinstructions allow a throughput of one instruction per clock cycle. Other instructions, suchas the integer multiply, require more than one clock cycle to complete execution. Theseinstructions reduce the throughput accordingly.

The floating-point pipeline has more stages than the IU pipeline, as shown in Figure 1-5.The 601 supports both single- and double-precision floating-point operations, butdouble-precision instructions generally take longer to execute, typically by requiring twocycles in the FD, FPM, and FPA stages. However, many of these instructions, such as thedouble-precision floating-point multiply (fmul) and double-precision floating-pointaccumulate instructions (fmadd, fmsub, fnmadd, and fnmsub), allow stages to overlap.For example, when the second cycle of the FD stage begins, the first stage of FPM begins.Similarly the FPM stage overlaps with the FPA stage, allowing these instructions tocomplete these stages in four clock cycles instead of six. The timings for these instructionsare shown in Section 7.3.4.5.2, “Double-Precision Instruction Timing.”

Because the PowerPC architecture can be applied to such a wide variety ofimplementations, instruction timing among various PowerPC processors variesaccordingly.

1.3.8 System Interface The system interface is specific for each PowerPC processor implementation.

The 601 provides a versatile system interface that allows for a wide range ofimplementations. The interface includes a 32-bit address bus, a 64-bit data bus, and 52control and information signals (see Figure 1-6). The system interface allows foraddress-only transactions as well as address and data transactions. The 601 control andinformation signals include the address arbitration, address start, address transfer, transferattribute, address termination, data arbitration, data transfer, data termination, andprocessor state signals. Test and control signals provide diagnostics for selected internalcircuitry.


Figure 1-6. System Interface

The system interface supports bus pipelining, which allows the address tenure of onetransaction to overlap the data tenure of another. The extent of the pipelining depends onexternal arbitration and control circuitry. Similarly, the 601 supports split-bus transactionsfor systems with multiple potential bus masters—one device can have mastership of theaddress bus while another has mastership of the data bus. Allowing multiple bustransactions to occur simultaneously increases the available bus bandwidth for otheractivity and as a result, improves performance.

The 601 supports multiple masters through a bus arbitration scheme that allows variousdevices to compete for the shared bus resource. The arbitration logic can implement priorityprotocols, such as fairness, and can park masters to avoid arbitration overhead. The MESIprotocol ensures coherency among multiple devices and system memory. Also, the 601'son-chip cache and UTLB and optional second-level caches can be controlled externally.

The 601 clocking structure allows the bus to operate at integer multiples of the processorcycle time.

The following sections describe the 601 bus support for memory and I/O controllerinterface operations. Note that some signals perform different functions depending uponthe addressing protocol used.

1.3.8.1 Memory AccessesMemory accesses allow transfer sizes of 8, 16, 24, 32, 40, 48, 56, or 64 bits in one bus clockcycle. Data transfers occur in either single-beat transactions or four-beat burst transactions.A single-beat transaction transfers as much as 64 bits. Single-beat transactions are causedby non-cached accesses that access memory directly (that is, reads and writes when cachingis disabled, cache-inhibited accesses, and stores in write-through mode). Burst transactions,which always transfer an entire cache sector (32 bytes), are initiated when a sector in thecache is read from or written to memory. Additionally, the 601 supports address-onlytransactions used to invalidate entries in other processors’ TLBs and caches.

601Processor

ADDRESS

ADDRESS ARBITRATION

ADDRESS START

ADDRESS TRANSFER

TRANSFER ATTRIBUTE

ADDRESS TERMINATION

CLOCKS

DATA

DATA ARBITRATION

DATA TRANSFER

DATA TERMINATION

PROCESSOR STATE

TEST AND CONTROL

+3.6 V


1.3.8.2 I/O Controller Interface OperationsBoth memory and I/O accesses can use the same bus transfer protocols. The 601 also hasthe ability to define memory areas as I/O controller interface areas. Accesses to the I/Ocontroller interface redefine the function of some of the address transfer and transferattribute signals and add control to facilitate transfers between the 601 and specific I/Odevices that respond to this protocol. I/O controller interface transactions provide multipletransaction operations for variably-sized data transfers (1 to 128 bytes) and support a splitrequest/response protocol. The distinction between the two types of transfers is made withseparate signals—TS for memory-mapped accesses and XATS for I/O controller interfaceaccesses. Refer to Chapter 9, “System Interface Operation,” for more information.

1.3.8.3 601 SignalsThe 601 signals are grouped as follows:

• Address arbitration signals—The 601 uses these signals to arbitrate for address bus mastership.

• Address transfer start signals—These signals indicate that a bus master has begun a transaction on the address bus.

• Address transfer signals—These signals, which consist of the address bus, address parity, and address parity error signals, are used to transfer the address and to ensure the integrity of the transfer.

• Transfer attribute signals—These signals provide information about the type of transfer, such as the transfer size and whether the transaction is bursted, write-through, or cache-inhibited.

• Address transfer termination signals—These signals are used to acknowledge the end of the address phase of the transaction. They also indicate whether a condition exists that requires the address phase to be repeated.

• Data arbitration signals—The 601 uses these signals to arbitrate for data bus mastership.

• Data transfer signals—These signals, which consist of the data bus, data parity, and data parity error signals, are used to transfer the data and to ensure the integrity of the transfer.

• Data transfer termination signals—Data termination signals are required after each data beat in a data transfer. In a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat. They also indicate whether a condition exists that requires the data phase to be repeated.

• System status signals—These signals include the interrupt signal, checkstop signals, and both soft- and hard-reset signals. These signals are used to interrupt and, under various conditions, to reset the processor.


• Processor state signals—These two signals are used to set the reservation coherency bit and set the size of the 601’s output buffers.

• Miscellaneous signals—These signals provide information about the state of the reservation coherency bit.

• COP interface signals—The common on-chip processor (COP) unit is the master clock control unit and it provides a serial interface to the system for performing built-in self test (BIST).

• Test interface signals—These signals are used for internal testing.

• Clock signals—These signals determine the system clock frequency. These signals can also be used to synchronize multiprocessor systems.

NOTEA bar over a signal name indicates that the signal is activelow—for example, ARTRY (address retry) and TS (transferstart). Active-low signals are referred to as asserted (active)when they are low and negated when they are high. Signals thatare not active-low, such as AP0–AP3 (address bus paritysignals) and TT0–TT4 (transfer type signals) are referred to asasserted when they are high and negated when they are low.

1.3.8.4 Signal ConfigurationFigure 1-7 illustrates the 601 microprocessor's logical pin configuration, showing how thesignals are grouped.


Figure 1-7. PowerPC 601 Microprocessor Signal Groups

1.3.8.5 Real-Time Clock The real-time clock (RTC) facility, which is specific to the 601, provides a high-resolutionmeasure of real time to provide time of day and date with a calendar range of 136.19 years.The RTC consists of two registers—the RTC upper (RTCU) register and the RTC lower(RTCL) register. The RTCU register maintains the number of seconds from a point in timespecified by software. The RTCL register counts nanoseconds. The contents of eitherregister may be copied to any GPR.

111

11

32

41

1423111131

111

1111

111

6481

111

1111111

7

21111

601

BR

BGABB

TSXATS

A0–A31

AP0–AP3

APE

TT4TT0–TT3

TC0–TC1TSIZ0–TSIZ2

TBSTCIWT

GBLCSE0–CSE2

HP_SNP_REQ

AACKARTRY

SHD

2X_PCLKPCLK_ENBCLK_EN

RTC

DBGDBWODBB

DH0–DH31, DL0–DL31

DP0–DP7

DPE

TADRTRY

TEA

INTCKSTP_IN

CKSTP_OUTHRESETSRESET

RSRVSC_DRIVE

ESP INTERFACE

TEST INTERFACE

SYS_QUIESCRESUME

QUIESC_REQ

ADDRESSARBITRATION

ADDRESSTRANSFER

START

ADDRESSTRANSFER

TRANSFERATTRIBUTE

ADDRESSTERMINATION

CLOCKS

DATAARBITRATION

DATATRANSFER

DATATERMINATION

SYSTEMSTATUS

ESP SCANINTERFACE

TESTSIGNALS

+3.6 V

59 59

Chapter 2. Registers and Data Types 2-1

Chapter 2 Registers and Data Types2020

This chapter describes the PowerPC 601 microprocessor’s register organization, how theseregisters are accessed, and how data is represented in these registers. The 601 alwaysoperates in one of three distinct states which are described as follows:

• Normal instruction execution state—In this state, the 601 executes instructions in either user mode or supervisor mode. User mode can be entered from supervisor mode by executing the appropriate instructions. If an exception is detected while in user mode, the processor enters supervisor mode and begins executing the instructions at a predetermined location associated with the type of exception detected. In supervisor mode, the program has access to memory, registers, instructions, and other resources not available to programs executing in user mode.

• Reset state—In the reset state all processor instruction execution is aborted, registers are initialized appropriately, and external signals are placed in the high-impedance state. For more information about the reset state, see Section 2.7, “Reset.”

• Checkstop state—When a processor is in the checkstop state, instruction processing is suspended and generally cannot be restarted without resetting the processor. The checkstop state is provided to help identify and diagnose problems. The checkstop state is described in Section 5.4.2.2, “Checkstop State (MSR[ME] = 0).”

The PowerPC architecture defines register-to-register operations for all computationalinstructions. Source data for these instructions are accessed from the on-chip registers orare provided as immediate values embedded in the opcode. The three-register instructionformat allows specification of a target register distinct from the two source registers, thuspreserving the original data for use by other instructions and reducing the number ofinstructions required for certain operations. Data is transferred between memory andregisters with explicit load and store instructions only.

2.1 Normal Instruction Execution StateDuring normal execution, a program can access the registers, shown in Figure 2-1,depending on the program’s access privilege (supervisor or user, determined by theprivilege-level (PR) bit in the machine state register (MSR)). The general-purpose registers(GPRs) and floating-point registers (FPRs) are accessed through instruction operands.Access to registers can be explicit (that is, through the use of specific instructions for that


purpose such as Move to Special-Purpose Register (mtspr) and Move from Special-Purpose Register (mfspr) instructions) or implicit as part of the execution of an instruction.Some registers are accessed both explicitly and implicitly.

Figure 2-1. Programming Model—Registers

GPR0

GPR1

GPR31

0 31

User-Level SPRs

SPR0 MQ Register1

SPR1 XER—Integer Exception Register

SPR4 RTCU—RTC Upper Register (For reading only)1,3

SPR5 RTCL—RTC Lower Register (For reading only)1,3

SPR8 LR—Link Register

SPR9 CTR—Count Register

USER PROGRAMMINGMODEL

Supervisor-Level SPRs

SPR18 DSISR—DAE/ Source Instruction Service Register

SPR19 DAR—Data Address Register

SPR20 RTCU—RTC Upper Register (For writing only)1,3

SPR21 RTCL—RTC Lower Register (For writing only)1,3

SPR22 DEC—Decrementer Register4

SPR25 SDR1—Table Search Description Register 1







SPR282 EAR—External Access Register

SPR287 PVR—Processor Version Register









SPR1008 HID0 1

SPR1009 HID1 1

SPR1010 HID2 (IABR) 1

SPR1013 HID5 (DABR) 1

0 63

SUPERVISOR PROGRAMMING MODEL

Machine State Register 2

1 601-only registers. These registers are not necessarily supported by other PowerPC processors.2 These registers may be implemented differently on other PowerPC processors. The PowerPC architecture defines two sets of

BAT registers—eight IBATs and eight DBATs.The 601 implements the IBATs and treats them as unified BATs.3 RTCU and RTCL registers can be written only in supervisor mode, in which case different SPR numbers are used.4 DEC register can be read by user programs by specifying SPR6 in the mfspr instruction (for POWER compatibility).

FPR0

FPR1

FPR31

MSR

0 31

0 31

0 31

Segment Registers

SR0

SR1

SR15

0 31

Floating Point Status and

Control Register

CR

0 31

FPSCR

0 31

Condition Register


The numbers to the left of the SPRs indicate the number that is used in the syntax of theinstruction operands to access the register.

The 601’s user- and supervisor-level registers are described as follows:

• User-level registers—The user-level registers can be accessed by all software with either user or supervisor privileges. These include the following:

— General-purpose registers (GPRs). The 601 general-purpose register file consists of thirty-two 32-bit GPRs designated as GPR0–GPR31. This register file serves as the data source or destination for all integer instructions and provide data for generating addresses. See Section 2.2.1, “General Purpose Registers (GPRs),” for more information.

— Floating-point registers (FPRs). The floating-point register file consists of thirty-two 64-bit FPRs designated as FPR0–FPR31, which serves as the data source or destination for all floating-point instructions. These registers can contain data objects of either single- or double-precision floating-point format. In the 601 the floating-point register file is part of the FPU. For more information, see Section 2.2.2, “Floating-Point Registers (FPRs).”

— Floating-point status and control register (FPSCR). The FPSCR is a user-control register in the FPU. It contains all floating-point exception signal bits, exception summary bits, exception enable bits, and rounding control bits needed for compliance with the IEEE 754 standard. For more information, see Section 2.2.3, “Floating-Point Status and Control Register (FPSCR).”

— Condition register (CR). The condition register is a 32-bit register, divided into eight 4-bit fields, CR0–CR7, that reflects the results of certain arithmetic operations and provides a mechanism for testing and branching. For more information, see Section 2.2.4, “Condition Register (CR).”

The remaining user-level registers are SPRs. Note, however, that the PowerPC architecture provides a separate mechanism for accessing SPRs (the mtspr and mfspr instructions). These instructions are commonly used to access certain registers, while other SPRs may be more typically accessed as the side effect of executing other instructions. The XER and MQ registers are set implicitly by many instructions.

— MQ register (MQ). The MQ register is a 601-specific, 32-bit register used as a register extension to accommodate the product for the multiply instructions and the dividend for the divide instructions. It is also used as an operand of long rotate and shift instructions. This register, and the instructions that require it, is provided for compatibility with POWER architecture, and is not part of the PowerPC architecture. For more information, see Section 2.2.5.1, “MQ Register (MQ).” The MQ register is typically accessed implicitly as part of executing a computational instruction.

— Integer exception register (XER). The XER is a 32-bit register that indicates overflow and carries for integer operations. For more information, see Section 2.2.5.2, “Integer Exception Register (XER).”


— Real-time clock (RTC) registers—RTCU and RTCL (RTC upper and RTC lower). The RTCU register maintains the number of seconds from a time specified by software. The RTCL register maintains a fraction of the current second in nanoseconds. At the user-level, these registers can be read from with the mfspr instruction. As shown in Figure 2-1, the SPR numbers for the RTC registers depends on the type of access used. The contents of either register can be copied to any GPR. These registers are specific to the 601. These registers are not required by the PowerPC architecture, which instead uses the time base facility. For more information, see Section 2.2.5.3, “Real-Time Clock (RTC) Registers (User-Level).”

— Link register (LR). The 32-bit link register provides the branch target address for the Branch Conditional to Link Register (bclrx) instruction, and can optionally be used to hold the logical address of the instruction that follows a branch and link instruction, typically used for linking to subroutines. For more information, see Section 2.2.5.4, “Link Register (LR).”

— Count register (CTR). The count register is a 32-bit register for holding a loop count that can be decremented during execution of appropriately coded branch instructions. The CTR can also provide the branch target address for the Branch Conditional to Count Register (bcctrx) instruction. For more information, see Section 2.2.5.5, “Count Register (CTR).”

• Supervisor-level registers—The 601 incorporates registers that can be accessed only by programs executed with supervisor privileges. These registers consist of the machine state register, segment registers, and supervisor SPRs, described as follows:

— Machine state register (MSR). A 32-bit register that defines the state of the processor; see Figure 2-11. The MSR can be modified by the Move to Machine State Register (mtmsr), System Call (sc), and Return from Exception (rfi) instructions. It can be read by the Move from Machine State Register (mfmsr) instruction. Note that the MSR is a 64-bit register in 64-bit PowerPC implementations and a 32-bit register in 32-bit PowerPC implementations. For more information see Section 2.3.1, “Machine State Register (MSR).”

— Segment registers (SR). The 601 implements sixteen 32-bit segment registers (SR0–SR15). Figure 2-12 and Figure 2-13 show the format of a segment register. The fields in the segment register are interpreted differently depending on the value of bit 0. For more information, see Section 2.3.2, “Segment Registers.”


The remaining supervisor-level registers are SPRs:

— DAE/source instruction service register (DSISR). A 32-bit register that defines the cause of data access and alignment exceptions; see Figure 2-14. For more information, see Section 2.3.3.2, “DAE/Source Instruction Service Register (DSISR).”

— Data address register (DAR). A 32-bit register shown in Figure 2-15. After a data access or an alignment exception, DAR is set to the effective address generated by the faulting instruction. For more information, see Section 2.3.3.3, “Data Address Register (DAR).”

— Real-time clock (RTC) registers—RTCU and RTCL (RTC upper and RTC lower). The registers can be read from by user-level software, but can be written to only by supervisor-level software. As shown in Figure 2-1, the SPR numbers for the RTC registers depend on the type of access used. For more information, see Section 2.2.5.3, “Real-Time Clock (RTC) Registers (User-Level).”

— Decrementer register (DEC). This register is a 32-bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay. In the 601, the RTC provides the frequency for the DEC. In other PowerPC implementations, the frequency is a subdivision of the processor clock. For more information, see Section 2.3.3.5, “Decrementer (DEC) Register.”

— Table search description register 1 (SDR1). This register is a 32-bit register that specifies the page table base address used in virtual-to-physical address translation. For more information, see Section 2.3.3.6, “Table Search Description Register 1 (SDR1).”

— Machine status save/restore register 0 (SRR0). This register is a 32-bit register that is used by the 601 for saving machine status on exceptions and restoring machine status when an rfi instruction is executed. SRR0 is shown in Figure 2-18. For more information, see Section 2.3.3.7, “Machine Status Save/Restore Register 0 (SRR0).”

— Machine status save/restore register 1 (SRR1). This register is a 32-bit register used to save machine status on exceptions and to restore machine status when an rfi instruction is executed. SRR1 is shown in Figure 2-19. For more information, see Section 2.3.3.8, “Machine Status Save/Restore Register 1 (SRR1).”

— General SPRs (SPRG0–SPRG3). These registers are 32-bit registers provided for operating system use. See Figure 2-20. For more information, see Section 2.3.3.9, “General SPRs (SPRG0–SPRG3).”

— External access register (EAR). This register is a 32-bit register used in conjunction with the eciwx and ecowx instructions. Note that the EAR register and the eciwx and ecowx instructions are optional in the PowerPC architecture and may not be supported in other PowerPC processors. For more information about the external control facility, see Section 2.3.3.10, “External Access Register (EAR).”


— Processor version register (PVR). This register is a 32-bit, read-only register that identifies the version (model) and revision level of the PowerPC processor. For more information, see Section 2.3.3.11, “Processor Version Register (PVR).”

— Block-address translation (BAT) registers. The 601 includes eight block-address translation registers (BATs), consisting of four pairs of BATs (IBAT0U–IBAT3U and IBAT0L–IBAT3L). See Figure 2-1 for a list of the SPR numbers for the BAT registers. Figure 2-23 and Figure 2-24 show the formats of the upper and lower BAT registers. Note that the PowerPC architecture has twice as many BAT registers as the 601. For more information, see Section 2.3.3.12, “BAT Registers.”

— Hardware implementation registers (HID0–HID2, HID5, and HID15). These registers are provided primarily for debugging. For more information, see Section 2.3.3.13.1, “Checkstop Sources and Enables Register—HID0” through Section 2.3.3.13.5, “Processor Identification Register (PIR)—HID15.” HID15 holds the four-bit processor identification tag (PID) that is useful for differentiating processors in multiprocessor system designs. For more information, see Section 2.3.3.13.5, “Processor Identification Register (PIR)—HID15.”

Note that there are registers common to other PowerPC processors that are not implementedin the 601. When the 601 detects SPR encodings other than those defined in this document,it either takes a program exception (if bit 0 of the SPR encoding is set) or it treats theinstruction as a no-op (if bit 0 of the SPR encoding is clear).

2.1.1 Changing Privilege LevelsSupervisor-level access is provided through the 601’s exception mechanism. That is, whenan exception is taken, either due to an error or problem that needs to be serviced ordeliberately through the use of a trap or System Call (sc) instruction, the processor beginsoperating in supervisor mode. The level of access is indicated by the privilege-level (PR)bit in the MSR.

In user mode, the processor has access to user-level registers, memory, and instructions. Insupervisor mode, the processor has access to additional registers, instructions, and usuallyhas more authority to access memory. Instructions that can be accessed only fromsupervisor-level are listed in Section 3.2, “Exception Summary.”

2.2 User-Level RegistersThis section describes in detail the registers that can be accessed by user-level software. Alluser-level registers can be accessed by supervisor-level software.

2.2.1 General Purpose Registers (GPRs)Integer data is manipulated in the IU’s thirty-two 32-bit GPRs shown in Figure 2-2. Theseregisters are accessed as source and destination registers in the instruction syntax.


Figure 2-2. General Purpose Registers (GPRs)

All GPRs are cleared by hard reset.

2.2.2 Floating-Point Registers (FPRs)The PowerPC architecture provides thirty-two 64-bit FPRs as shown in Figure 2-3. Theseregisters are accessed as source and destination registers for floating-point instructions.Each FPR supports the double-precision floating-point format. Every instruction thatinterprets the contents of an FPR as a floating-point value uses the double-precisionfloating-point format for this interpretation.

All floating-point arithmetic instructions operate on data located in FPRs and, with theexception of compare instructions, place the result into an FPR. Information about thestatus of floating-point operations is placed into the floating-point status and controlregister (FPSCR) and in some cases, into CR after the completion of instruction execution.For information on how CR is affected for floating-point operations, see Section 2.2.4,“Condition Register (CR).”

Load and store double instructions transfer 64 bits of data between memory and the FPRswith no conversion. Load single instructions are provided to read a single-precisionfloating-point value from memory, convert it to double-precision floating-point format, andplace it in the target floating-point register. Store single instructions are provided to read adouble-precision floating-point value from a floating-point register, convert it to single-precision floating-point format, and place it in the target memory location.

Single- and double-precision arithmetic instructions accept values from the FPRs indouble-precision format. For single-precision arithmetic instructions, all input values mustbe representable in single-precision format; otherwise, the result placed into the target FPRand the setting of status bits in the FPSCR and in the condition register are undefined.

The 601’s floating-point arithmetic instructions produce intermediate results that may beregarded as infinitely precise. After normalization or denormalization, if the precision of theintermediate result cannot be represented in the destination format (single or doubleprecision), it is rounded before being placed in the target FPR. The final result is then placedinto the FPR in the double-precision format.

GPR0

GPR1

. . .

. . .

GPR31

0 31


Figure 2-3. Floating-Point Registers (FPRs)

All FPRs are cleared by hard reset.

2.2.3 Floating-Point Status and Control Register (FPSCR)The FPSCR, shown in Figure 2-4, contains bits to do the following:

• Record exceptions generated by floating-point operations• Record the type of the result produced by a floating-point operation• Control the rounding mode used by floating-point operations• Enable or disable the reporting of exceptions (invoking the exception handler)

Bits 0–23 are status bits. Bits 24–31 are control bits. Bits in the FPSCR are updated at thecompletion of the instruction execution.

Except for the floating-point enabled exception summary (FEX) and floating-point invalidoperation exception summary (VX), the floating-point exception condition bits in theFPSCR are bits 0–12 and 21–23 and are sticky. That is, once set, sticky bits remain set untilthey are cleared by an mcrfs, mtfsfi, mtfsf, or mtfsb0 instruction.

FEX and VX are the logical ORs of other FPSCR bits. Therefore these two bits are notlisted among the FPSCR bits directly affected by the various instructions.

Figure 2-4. Floating-Point Status and Control Register (FPSCR)

A listing of FPSCR bit settings is shown in Table 2-1.

Table 2-2 illustrates the floating-point result flags used by the 601. The result flagscorrespond to FPSCR bits 15–19.

FPR0

FPR1

. . .

. . .

FPR31

0 63

FPSCR

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 19 20 21 22 23 24 25 26 27 28 29 30 31

VXIDI

VXISI

VXSNAN

VXZDZ

VXIMZ

VXVC

VXSOFT

VXSQRT

VXCVI

Reserved

FX FEX VX OX UX ZX XX FR FI FPRF 0 VE OE UE ZE XE 0 RN


Table 2-1. FPSCR Bit Settings

Bit(s) Name Description

0 FX Floating-point exception summary (FX). Every floating-point instruction implicitly sets FPSCR[FX] if that instruction causes any of the floating-point exception bits in the FPSCR to transition from 0 to 1. The mcrfs instruction implicitly clears FPSCR[FX] if the FPSCR field containing FPSCR[FX] is copied. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions can set or clear FPSCR[FX] explicitly. This is a sticky bit.

1 FEX Floating-point enabled exception summary (FEX). This bit signals the occurrence of any of the enabled exception conditions. It is the logical OR of all the floating-point exception bits masked with their respective enable bits. The mcrfs instruction implicitly clears FPSCR[FEX] if the result of the logical OR described above becomes zero. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot set or clear FPSCR[FEX] explicitly. This is not a sticky bit.

2 VX Floating-point invalid operation exception summary (VX). This bit signals the occurrence of any invalid operation exception. It is the logical OR of all of the invalid operation exceptions. The mcrfs instruction implicitly clears FPSCR[VX] if the result of the logical OR described above becomes zero. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot set or clear FPSCR[VX] explicitly. This is not a sticky bit.

3 OX Floating-point overflow exception (OX). This is a sticky bit. See Section 5.4.7.4, “Overflow Exception Condition.”

4 UX Floating-point underflow exception (UX). This is a sticky bit. See Section 5.4.7.5, “Underflow Exception Condition.”

5 ZX Floating-point zero divide exception (ZX). This is a sticky bit. See Section 5.4.7.3, “Zero Divide Exception Condition.”

6 XX Floating-point inexact exception (XX). This is a sticky bit. See Section 5.4.7.6, “Inexact Exception Condition.”

7 VXSNAN Floating-point invalid operation exception for SNaN (VXSNAN). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

8 VXISI Floating-point invalid operation exception for ∞-∞ (VXISI). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

9 VXIDI Floating-point invalid operation exception for ∞/∞ (VXIDI). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

10 VXZDZ Floating-point invalid operation exception for 0/0 (VXZDZ). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

11 VXIMZ Floating-point invalid operation exception for ∞*0 (VXIMZ). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

12 VXVC Floating-point invalid operation exception for invalid compare (VXVC). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

13 FR Floating-point fraction rounded (FR). The last floating-point instruction that potentially rounded the intermediate result incremented the fraction. See Section 2.5.6, “Rounding.” This bit is not sticky.

14 FI Floating-point fraction inexact (FI). The last floating-point instruction that potentially rounded the intermediate result produced an inexact fraction or a disabled overflow exception. See Section 2.5.6, “Rounding.” This bit is not sticky.

The FPSCR is cleared by hard reset.

15–19 FPRF Floating-point result flags (FPRF). This field is based on the value placed into the target register even if that value is undefined. Refer to Table 2-2 for specific bit settings.15 Floating-point result class descriptor (C). Floating-point instructions other than the

compare instructions may set this bit with the FPCC bits, to indicate the class of the result.

16–19 Floating-point condition code (FPCC). Floating-point compare instructions always set one of the FPCC bits to one and the other three FPCC bits to zero. Other floating-point instructions may set the FPCC bits with the C bit, to indicate the class of the result. Note that in this case the high-order three bits of the FPCC retain their relational significance indicating that the value is less than, greater than, or equal to zero.16 Floating-point less than or negative (FL or <)17 Floating-point greater than or positive (FG or >)18 Floating-point equal or zero (FE or =)19 Floating-point unordered or NaN (FU or ?)

20 — Reserved

21 VXSOFT Not implemented in the 601. This is a sticky bit. For more detailed information refer to Table 5-17 and Section 5.4.7.2, “Invalid Operation Exception Conditions.”

22 VXSQRT Not implemented in the 601. For more detailed information refer to Table 5-17 and Section 5.4.7.2, “Invalid Operation Exception Conditions.”

23 VXCVI Floating-point invalid operation exception for invalid integer convert (VXCVI). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

24 VE Floating-point invalid operation exception enable (VE). See Section 5.4.7.2, “Invalid Operation Exception Conditions.”

25 OE Floating-point overflow exception enable (OE). See Section 5.4.7.4, “Overflow Exception Condition.”

26 UE Floating-point underflow exception enable (UE). This bit should not be used to determine whether denormalization should be performed on floating-point stores. See Section 5.4.7.5, “Underflow Exception Condition.”

27 ZE Floating-point zero divide exception enable (ZE). See Section 5.4.7.3, “Zero Divide Exception Condition.”

28 XE Floating-point inexact exception enable (XE). See Section 5.4.7.6, “Inexact Exception Condition.”

29 — Reserved. This bit may be implemented as the non-IEEE mode bit (NI) in other PowerPC implementations.

30–31 RN Floating-point rounding control (RN). See Section 2.5.6, “Rounding.”00 Round to nearest 01 Round toward zero 10 Round toward +infinity11 Round toward –infinity

Table 2-1. FPSCR Bit Settings (Continued)

2.2.4 Condition Register (CR)The condition register (CR) is a 32-bit register that reflects the result of certain operationsand provides a mechanism for testing and branching. The bits in the CR are grouped intoeight 4-bit fields, CR0–CR7, as shown in Figure 2-5.

Figure 2-5. Condition Register (CR)

The CR fields can be set in one of the following ways:

• Specified fields of the CR can be set by a move instruction (mtcrf, or mcrfs) to the CR from a GPR.

• Specified fields of the CR can be moved from one CRx field to another with the mcrf instruction.

• A specified field of the CR can be set by a move instruction (mcrxr) to the CR from the XER.

• Condition register logical instructions can be used to perform logical operations on specified bits in the condition register.

• CR0 can be the implicit result of an integer operation.

• CR1 can be the implicit result of a floating-point operation.

• A specified CR field can be the explicit result of either an integer or floating-point compare instruction.

Branch instructions are provided to test individual CR bits. The condition register is clearedby hard reset.

Table 2-2. Floating-Point Result Flags in FPSCR

Result Flags (Bits 15–19)

C < > = ?Result value class

1 0 0 0 1 Quiet NaN

0 1 0 0 1 –Infinity

0 1 0 0 0 –Normalized number

1 1 0 0 0 –Denormalized number

1 0 0 1 0 –Zero

0 0 0 1 0 +Zero

1 0 1 0 0 +Denormalized number

0 0 1 0 0 +Normalized number

0 0 1 0 1 +Infinity

CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7

0 3 4 7 8 11 12 15 16 19 20 23 24 27 28 31


2.2.4.1 Condition Register CR0 Field DefinitionIn most integer instructions, when the Rc bit is set, the first three bits of CR0 are set by analgebraic comparison of the result to zero; the fourth bit of CR0 is copied from XER[SO].The addic., andi., and andis. instructions set these four bits implicitly. These bits areinterpreted as shown in Table 2-3. If any portion of the result (the 32-bit value placed intothe target register) is undefined, the value placed into the first three bits of CR0 is undefined.

2.2.4.2 Condition Register CR1 Field DefinitionIn all floating-point instructions except mcrfs, fcmpu, and fcmpo, when Rc is specified,CR1 is copied from bits 0–3 of the floating-point status and control register (FPSCR). Formore information about the FPSCR, see Section 2.2.3, “Floating-Point Status and ControlRegister (FPSCR).” The bit settings for the CR1 field are shown in Table 2-4.

2.2.4.3 Condition Register CRn Field—Compare InstructionWhen a specified CR field is set by a compare instruction, the bits of the specified field areinterpreted, as shown in Table 2-5. A condition register field can also be accessed by themfcr, mcrf, and mtcrf instructions.

Table 2-3. Bit Settings for CR0 Field of CR

CR0 Bit

Description

0 Negative (LT)—This bit is set when the result is negative.

1 Positive (GT)—This bit is set when the result is positive (and not zero).

2 Zero (EQ)—This bit is set when the result is zero.

3 Summary overflow (SO)—This is a copy of the final state of XER[SO] at the completion of the instruction.

Table 2-4. Bit Settings for CR1 Field of CR

CR1 Bit

Description

4 Floating-point exception (FX)—This is a copy of the final state of FPSCR[FX] at the completion of the instruction.

5 Floating-point enabled exception (FEX)—This is a copy of the final state of FPSCR[FEX] at the completion of the instruction.

6 Floating-point invalid exception (VX)—This is a copy of the final state of FPSCR[VX] at the completion of the instruction.

7 Floating-point overflow exception (OX)—This is a copy of the final state of FPSCR[OX] at the completion of the instruction.

2.2.5 User-Level SPRsUser-level SPRs can be accessed by either user- or supervisor-level instructions. Themechanism referred to for accessing SPRs is the set of Move to Special Purpose Register(mtspr) and Move from Special Purpose Register (mfspr) instructions. These instructionsare commonly used to access certain registers, while other SPRs may be more typicallyaccessed as the side effect of executing other instructions. Some SPRs are implementation-specific; as noted, some SPRs in the 601 may not be implemented in other PowerPCprocessors, or may not be implemented in the same way in other PowerPC processors.

In general for registers with reserved bits, implementations return zeros or return the valuelast written to those bits. The only user-level SPR, in the 601,with reserved bits is the XER,which returns zeros.

The RTCL register is defined as 32 bits, but the lowest-order seven bits are notimplemented. Those bits are reserved, and zeros are loaded into the respective bit positionsof the target register when the RTCL is read.

When the 601 detects SPR encodings other than those defined in this document, it eithertakes a program exception (if bit 0 of the SPR encoding is set) or it treats the instruction asa no-op (if bit 0 of the SPR encoding is clear).

2.2.5.1 MQ Register (MQ)The MQ register (MQ), shown in Figure 2-6, is a 32-bit register used as a register extensionto accommodate the product for the multiply (mulx) instruction and the dividend for thedivide (divx) instruction. It is also used as an operand of long rotate and shift instructions.Note that the mulx, divx, and some of the long rotate and shift instructions are not part of

Table 2-5. CRn Field Bit Settings for Compare Instructions

CRn Bit*

Description

0 Less than, Floating-point less than (LT, FL).For integer compare instructions, (rA) < SIMM, UIMM, or (rB) (algebraic comparison) or (rA) SIMM, UIMM, or (rB) (logical comparison). For floating-point compare instructions, (frA) < (frB).

1 Greater than, floating-point greater than (GT, FG).For integer compare instructions, (rA) > SIMM, UIMM, or (rB) (algebraic comparison) or (rA) SIMM, UIMM, or (rB) (logical comparison). For floating-point compare instructions, (frA) > (frB).

2 Equal, floating-point equal (EQ, FE).For integer compare instructions, (rA) = SIMM, UIMM, or (rB). For floating-point compare instructions, (frA) = (frB).

3 Summary overflow, floating-point unordered (SO, FU).For integer compare instructions, this is a copy of the final state of XER[SO] at the completion of the instruction. For floating-point compare instructions, one or both of (frA) and (frB) is not a number (NaN).

*Here, the bit indicates the bit number in any one of the four-bit subfields, CR0–CR7.


the PowerPC architecture. See Chapter 10, “Instruction Set” for a more detailed account ofinstructions not implemented in the PowerPC architecture.

Figure 2-6. MQ Register (MQ)

The MQ register is not defined in the PowerPC architecture. However, in the 601, it may bemodified as a side effect during the execution of the mulli, mullw, mulhs, mulhu, divw,and divwu instructions, which are PowerPC instructions.

The value written to the MQ register during these operations is operand-dependent andtherefore, the MQ contents become undefined after any of these instructions executes. Inaddition, the MQ is modified by the implementation-specific instructions supported by the601 that are not part of the PowerPC architecture. These are listed in Table 2-6.

Table 2-6. PowerPC 601 Microprocessor-Specific Instructions that Modify the MQ Register

Mnemonic Instruction Name Read/Write

mul Multiply Read/write

div Divide Read/write

divs Divide Short Read/write

sliq Shift Left Immediate with MQ Read/write

slliq Shift Left Long Immediate with MQ Read/write

sle Shift Left Extended Write

sleq Shift Left Extended with MQ Read/write

slliq Shift Left Long Immediate with MQ Read/write

sllq Shift Left Long with MQ Read/write

slq Shift Left with MQ Write

sraiq Shift Right Algebraic Immediate with MQ Write

sraq Shift Right Algebraic with MQ Write

sre Shift Right Extended Write

srea Shift Right Extended Algebraic Write

sreq Shift Right Extended with MQ Read/write

sriq Shift Right Immediate with MQ Write

srliq Shift Right Long Immediate with MQ Read/write

srlq Shift Right Long with MQ Read/write

srq Shift Right with MQ Write

MQ

0 31


The PowerPC instructions listed in Table 2-7 use the MQ register as a buffer to create atemporary 64-bit value. These instructions leave the MQ register in an undefined state.

The Move to Special Purpose Register (mtspr) and Move from Special Purpose Register(mfspr) can access the MQ register. The SPR number for the MQ register is 0.

The MQ register is not part of the PowerPC architecture and will not be supported in otherPowerPC microprocessors.

The MQ register is cleared by hard reset.

2.2.5.2 Integer Exception Register (XER)The integer exception register (XER) is a user-level, 32-bit register as shown in Figure 2-7.

.

Figure 2-7. Integer Exception Register (XER)

XER is designated SPR1. The bit definitions for XER, shown in Table 2-8, are based on theoperation of an instruction considered as a whole, not on intermediate results. For example,the result of the Subtract from Carrying (subfcx) instruction is specified as the sum of threevalues. This instruction sets bits in the XER based on the entire operation, not on anintermediate sum.

Table 2-7. PowerPC Instructions that Use the MQ Register

Mnemonic Instruction Name

mulli Multiply Low Immediate

mullw Multiply Low

mulhw Multiply High Word

mulhwu Multiply High Word Unsigned

divw Divide Word

divwu Divide Word Unsigned

Reserved

SO OV CA 0 0 0 0 0 0 0 0 0 0 0 0 0 Byte compare value 0 Byte count

0 1 2 3 15 16 23 24 25 31


The XER is cleared by hard reset.

2.2.5.3 Real-Time Clock (RTC) Registers (User-Level)The real-time clock (RTC) registers provide a high-resolution measure of real time forindicating the date and time of day. The RTC facility provides a calendar range of roughly135 years. The RTC registers are 601-specific.

The RTC input is sampled using the CPU clock. Therefore, if the CPU clock is less thantwice the RTC frequency, real-time clock (and decrementer) sampling and incrementingerrors will occur. Therefore, in systems that change the CPU clock frequency dynamicallybeyond this limit, a method of saving and restoring the real-time clock register values viaexternal means is recommended for accuracy of the RTC.

The RTC registers, shown in Figure 2-8, consist of the following:

• Real-time clock upper (RTCU)—This register specifies the number of seconds that have elapsed since the time specified in the software.

• Real-time clock lower (RTCL)—This register contains the number of nanoseconds since the beginning of the current second.

Reading any portion of the RTC registers does not affect its contents. The writing of theRTCU and RTCL registers is allowed for supervisor programs only (mtspr is supervisor-

Table 2-8. Integer Exception Register Bit Definitions


0 SO Summary Overflow (SO)—The summary overflow bit (OV) is set whenever an instruction (except mtspr) sets the overflow bit (OV) to indicate overflow and remains set until software clears it (with the mtspr or mcrxr instruction). It is not altered by compare instructions or other instructions that cannot overflow.

1 OV Overflow (OV)—The overflow bit is set to indicate that an overflow has occurred during execution of an instruction. Integer and subtract instructions having OE = 1 set OV if the carry out of bit 0 is not equal to the carry out of bit 1, and clear it otherwise. The OV bit is not altered by compare instructions or other instructions that cannot overflow.

2 CA Carry (CA)—In general, the carry bit is set to indicate that a carry out of bit 0 occurred during execution of an instruction. Add carrying, subtract from carrying, add extended, and subtract from extended instructions set CA to one if there is a carry out of bit 0, and clear it otherwise. The CA bit is not altered by compare instructions, or other instructions that cannot carry, except that shift right algebraic instructions set the CA bit to indicate whether any “1” bits have been shifted out of a negative quantity.

3–15 — Reserved

16–23 This field contains the byte to be compared by a Load String and Compare Byte Indexed (lscbx) instruction. Note that lscbx is not a part of the PowerPC architecture.

24 — Reserved

25–31 This field specifies the number of bytes to be transferred by a Load String Word Indexed (lswx), Store String Word Indexed (stswx) or Load String and Compare Byte Indexed (lscbx) instruction.


only for RTC registers) and as supervisor-level registers, RTCU and RTCL must beaccessed using different SPR numbers, as shown in Figure 2-1.

In user-level, RTCU and RTCL are read-only. The SPR numbers for the RTCU, RTCL, andDEC registers differ depending upon whether the mtspr or mfspr instruction is used. Forthe mtspr instruction, RTCU is SPR20, RTCL is SPR2. For the mfspr instruction, RTCUis SPR4, RTCL is SPR5.

For compatibility with the PowerPC user instruction set architecture, it is recommendedthat the Move from Time Base instruction (mftb) be used instead of the mfspr instruction.This instruction, which is not implemented in the 601, causes an illegal instruction programexception on the 601; the mfspr instruction can be used to perform the operation in the601’s exception handler, and will function as defined in the architecture on other PowerPCprocessors. The mftb instruction, is described in Appendix C, “PowerPC Instructions NotImplemented.”

(1)

(2)

Figure 2-8. Real-Time Clock (RTC) Registers

The RTC runs constantly while power is applied and the external 7.8125 MHz oscillator isconnected. Note that the RTC will not be implemented in other PowerPC processors. Thecondition register is cleared by hard reset. Note that when an external clock is connected tothe RTC, the RTCL and RTCU registers are incremented automatically.

Both registers are cleared by a hard reset.

2.2.5.3.1 Real-Time Clock Lower (RTCL) RegisterThe RTCL functions as a 23-bit counter that provides the lower word of the RTC. As anindicator of the granularity of the RTC, enough bits are implemented to provide a resolutionthat is finer than the time required to execute 10 Add Immediate (addi) instructions. Thefollowing details describe the RTCL:

• Bits 0–1 and bits 25–31 are not implemented. (The number of lower order bits required is determined by the frequency of the oscillator—7.8125 MHz)

• The least significant implemented bit of the RTCL (bit 24) is incremented every 128 nS.

• The period of the RTCL is one billion nanoseconds (one second).

RTCU

0 31

0 1 2 2425 31

0 0 RTCL 0 0 0 0 0 0 0

Reserved


• Unless it is altered by software, the RTCL reaches its terminal count value of 999,999,872 (one billion minus 128) after 999,999,999 nS. The next time RTCL is incremented, it cycles to all zeros and RTCU is incremented.

• Using the mfspr instruction with RTCL does not affect its contents. Unimplemented bits are read as zeros.

• If the mtspr instruction is used to replace the contents of the RTCL with the contents of a GPR, the values of the GPR corresponding to the unimplemented bits in the RTCL are ignored.

2.2.5.3.2 Real-Time Clock Upper (RTCU) RegisterThe RTCU register is a 32-bit binary counter in which the least-significant bit isincremented in synchronization with the transition to zero of the RTCL counter (after one-billion nanoseconds—that is, every second). All 32 bits of the RTCU are implemented.When the RTCU is set to all ones, the next time it is incremented it becomes all zeros.

When the contents of the RTCU or the RTCL are copied to a GPR, bits in the GPRcorresponding to the unimplemented bits in the RTCL are cleared.

2.2.5.3.3 Reading the RTCThe contents of either RTC register can be copied into a GPR by user programs with themfspr instruction. Because the RTCL continues to increment and the RTCU may beincremented while instructions are being executed that read the two RTC registers, whenthe current time is required in a form that includes more than the upper or lower word ofthe RTC, the following procedure should be used:

1. Execute the following instruction sequence:

mfspr rA,r4 |read RTCLmfspr rB,r5 |read RTCUmfspr rC,r4 |read RTCL

2. If (rC) = (rA)

then the correct value has been obtainedelse repeat step 1

Step 2 is required because the RTC continues to increment and the RTCU may incrementwhile the instructions that read the two halves of the RTC are being executed. If the valuesin rC and rA match, the RTCU has not been incremented, and the RTCU value can be usedalong with the value in rB as the current RTC value. However, if the values of rC and rAdiffer, the RTCU has been incremented and it cannot be guaranteed which, if either, RTCUvalue should be associated with the value in rB.

Successive readings of the RTC registers do not necessarily give unique values. If uniquevalues are required, and if updating the RTCL at least once in the time it takes to execute10 addi instructions is insufficient to ensure unique values, a software solution is required.


2.2.5.3.4 RTC Synchronization in a Multiprocessor SystemTypically, RTCs must be synchronized in a multiprocessor system. One way to achievesynchronization is to use a gated RTC clock as the input to all 601s in a system. The gateclock can be enabled and disabled through the use of an I/O access (either I/O controllerinterface store instruction to a selected BUID, or a memory-mapped I/O access). Thisallows the RTC input clock to all processors to be turned on and off at the same time. Eachprocessor’s RTC register can then be loaded to the same value before starting the RTC inputclock.

2.2.5.4 Link Register (LR)The 32-bit link register (LR) supplies the branch target address for the Branch Conditionalto Link Register (bclrx) instruction, and can be used to hold the logical address of theinstruction that follows a branch and link instruction. The format of LR is shown inFigure 2-9.

Figure 2-9. Link Register (LR)

Note that although the two least-significant bits can accept any values written to them, theyare ignored when the LR is used as an address. The link register can be accessed by themtspr and mfspr instructions using SPR number 8. Fetching instructions along the targetpath (loaded by an mtspr instruction) is possible provided the link register is loadedsufficiently ahead of the branch instruction. It is usually possible for the 601 to fetch alonga target path loaded by a branch and link instruction.

Both conditional and unconditional branch instructions include the option of placing theeffective address of the instruction following the branch instruction in the LR.

As a performance optimization, and as an aid for handling the precise exception model, the601 implements a two-entry link register shadow. Shadowing allows the link register to beupdated by branch instructions that are executed out-of-order with respect to integerinstructions without destroying machine state information if any integer instructions takesa precise exception. This is not visible from software. The link register is cleared by hardreset.

Note that although the 601 does not implement a link stack register, one may beimplemented in subsequent PowerPC processors. For compatibility, use of the link registershould be controlled following the description in Section 3.6.1.5, “Branch Conditional toLink Register Address Mode.”

Branch Address

0 31


2.2.5.5 Count Register (CTR)The count register (CTR) is a 32-bit register for holding a loop count that can bedecremented during execution of branch instructions that contain an appropriately codedBO field. If the value in CTR is 0 before being decremented, it is –1 afterward. The countregister can also provide the branch target address for the Branch Conditional to CountRegister (bcctrx) instruction. The CTR is shown in Figure 2-10.

Figure 2-10. Count Register (CTR)

Fetching instructions along the target path is also possible provided the count register isloaded sufficiently ahead of the branch instruction.

The count register can be accessed by the mtspr and mfspr instructions by specifying theSPR number 9. In branch conditional instructions, the BO field specifies the conditionsunder which the branch is taken. The first four bits of the BO field specify how the branchis affected by or affects the condition register and the count register. The encoding for theBO field is shown in Table 3-25. The count register is cleared by hard reset.

2.3 Supervisor-Level RegistersSome 601 registers can be accessed only by supervisor-level software. These include themachine state register (MSR), the segment registers, and several SPRs.

2.3.1 Machine State Register (MSR)The machine state register (MSR), shown in Figure 2-11, is a 32-bit register that defines thestate of the processor. When an exception occurs, MSR bits, as described in Table 2-9, arealtered as determined by the exception. The MSR can also be modified by the mtmsr, sc,and rfi instructions. It can be read by the mfmsr instruction. Note that in 64-bit PowerPCimplementations, the MSR is a 64-bit register.

Figure 2-11. Machine State Register (MSR)

Table 2-9 shows the bit definitions for the MSR.

CTR

0 31

0 15 16 17 18 19 20 2122 23 24 25 26 27 28 29 30 31

Reserved

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EE PR FP ME FE0 SE 0 FE1 0 EP IT DT 0 0 0 0


Table 2-9. Machine State Register Bit Settings


0–15 — Reserved*

16 EE External exception enable 0 While the bit is cleared the processor delays recognition of external interrupts and

decrementer exception conditions. 1 The processor is enabled to take an external interrupt or the decrementer exception.

17 PR Privilege level 0 The processor can execute both user- and supervisor-level instructions.1 The processor can only execute user-level instructions.

18 FP Floating-point available 0 The processor prevents dispatch of floating-point instructions, including floating-point

loads, stores, and moves.1 The processor can execute floating-point instructions, and can take floating-point

enabled exception type program exceptions.

19 ME Machine check enable 0 A checkstop is taken, unless either HID0[CE] or HID0[EM] is cleared (disabled), in which

case the machine check exception is taken.1 Machine check exceptions are enabled.In the 601, this bit is set after a hard reset, although the PowerPC architecture specifies that this bit is cleared.

20 FE0 Floating-point exception mode 0 (See Table 2-10).

21 SE Single-step trace enable0 The processor executes instructions normally. 1 The processor generates a single-step trace exception upon the successful execution of

the next instruction. In the 601 this is implemented as a run-mode exception; the PowerPC architecture defines this as a trace exception. When this bit is set, the processor dispatches instructions in strict program order. Successful execution means the instruction caused no other exception. Single-step tracing may not be present on all implementations.

22 — Reserved * on the 601

23 FE1 Floating-point exception mode 1 (See Table 2-10).

24 — Reserved. This bit corresponds to the AL bit of the POWER architecture.

25 EP Exception prefix. The setting of this bit specifies whether an exception vector offset is prepended with Fs or 0s. In the following description, nnnnn is the offset of the exception. See Table 5-2.0 Exceptions are vectored to the physical address x'000n_nnnn'. 1 Exceptions are vectored to the physical address x'FFFn_nnnn'.

26 IT Instruction address translation 0 Instruction address translation is disabled. 1 Instruction address translation is enabled.For more information see Chapter 6, “Memory Management Unit.”

27 DT Data address translation 0 Data address translation is disabled. 1 Data address translation is enabled.For more information see Chapter 6, “Memory Management Unit.”

28–29 — Reserved


The floating-point exception mode bits are interpreted as shown in Table 2-10. For furtherdetails, see Section 5.4.7.1, “Floating-Point Enabled Program Exceptions.” Note that thesebits are logically ORed, so that if either is set the processor operates in precise mode.

Table 2-11 indicates the state of the MSR after a hard reset.

2.3.2 Segment RegistersThe sixteen 32-bit segment registers are present only in 32-bit PowerPC implementations.Figure 2-12 shows the format of a segment register in the 601. The value of bit 0, the T bit,determines how the remaining register bits are interpreted.

30 — Reserved* on the 601


*These reserved bits may be used by other PowerPC processors. Attempting to change these bits does not affect the operation of the 601. These bit positions always return a zero value when read. Note that bits 15 and 31 (ELE and LE) are defined by the PowerPC architecture to control little- and big-endian mode.

Table 2-10. Floating-Point Exception Mode Bits

FE0 FE1 Mode

0 0 Floating-point exceptions disabled

0 1 Floating-point imprecise nonrecoverable*

1 0 Floating-point imprecise recoverable*

1 1 Floating-point precise mode

*Because FE0 and FE1 are logically ORed on the 601, neither of these modes is available. If either bit is set, the processor operates in precise mode.

Table 2-11. State of MSR at Power Up

Bit Description

0–15 0 (Reserved)

16–18 0

19 1

20–24 0

25 1

26–27 0

28–31 0 (Reserved)

Table 2-9. Machine State Register Bit Settings (Continued)



Figure 2-12. Segment Register Format (T = 0)

Segment registers can be accessed by using the mtsr and mtsrin instructions. Segmentregister bit settings when T = 0 are described in Table 2-12.

Figure 2-13 shows the bit definition when T = 1.

Figure 2-13. Segment Register Format (T = 1)

The bits in the segment register when T = 1 are described in Table 2-13.

Table 2-12. Segment Register Bit Settings (T = 0)

Bits Name Description

0 T T = 0 selects this format

1 Ks Supervisor-state protection key

2 Ku User-state protection key

3–7 — Reserved

8–31 VSID Virtual segment ID

Table 2-13. Segment Register Bit Settings (T = 1)


0 T T = 1 selects this format.



3–11 BUID Bus unit ID. If BUID = x'07F' the transaction is a memory-forced I/O controller interface operation.

12–27 — Device specific data for I/O controller

28–31 Packet 1(0–3) This field contains address bits 0–3 of the packet 1 cycle (address-only).

0 1 2 3 7 8 31

T Ks Ku 0 0 0 0 0 VSID

Reserved

T Ks Ku BUID Controller Specific Information Packet 1(0–3)

0 1 2 3 11 12 27 28 31


If T = 0 in the selected segment register, the effective address is a reference to an ordinarymemory segment. For ordinary memory segments, the segmented address translationmechanism may be superseded by the block address translation (BAT) mechanism. If not,the 52-bit virtual address (VA) is formed by concatenating the following:

• The 24-bit VSID field from the segment register• The 16-bit page index, EA[4–19]• The 12-bit byte offset, EA[20–31]

The VA is then translated to a physical address as described in Section 6.8, “MemorySegment Model.”

If T = 1 in the selected segment register, the effective address is a reference to an I/Ocontroller interface segment. No reference is made to the page tables. For further discussionof address translation see Section 6.10, “I/O Controller Interface Address Translation.”

The 601 defines two types of I/O controller interface segments (segment register T-bit set)based on the value of the bus unit ID (BUID), as follows:

• I/O controller interface (BUID ≠ x'07F')—I/O controller interface accesses include all transactions between the 601 and subsystems (referred to as bus unit controllers (BUCs) mapped through I/O controller interface address space).

• Memory-forced I/O controller interface (BUID = x'07F')—Memory-forced I/O controller interface operations access memory space. They do not use the extensions to the memory protocol described for I/O controller interface accesses, and they bypass the page- and block-translation and protection mechanisms. The physical address is found by concatenating bits 28–31 of the respective segment register with bits 4–31 of the effective address. This address is marked as noncacheable, write-through, and global.

Because memory-forced I/O controller interface accesses address memory space, they are subject to the same coherency control as other memory reference operations. More generally, accesses to memory-forced I/O controller interface segments are considered to be cache-inhibited, write-through and memory-coherent operations with respect to the 601 cache and bus interface.

See Section 9.6.2, “I/O Controller Interface Transaction Protocol Details,” for moreinformation about the BUID.

The segment registers are cleared by hard reset.

2.3.3 Supervisor-Level SPRsMany of the SPRs can be accessed only by supervisor-level instructions; any attempt toaccess these SPRs with user-level instructions will result in a privileged exception. SomeSPRs are implementation-specific; some 601 SPRs may not be implemented in otherPowerPC processors, or may not be implemented in the same way. Table 2-14 summarizeshow the 601 treats the undefined bits in supervisor-level SPRs.


In some cases, not all of a register’s bits are implemented in hardware. For example, theRTCL register is defined to be 32 bits, but in the 601 only the 23 most significant bits existin hardware. Similarly, the DEC register is defined as having 32 bits, but only the 25 mostsignificant bits are implemented in hardware. In both cases, the unimplemented bits arereturned as zeros when they are read by the mfspr instruction.

The RTCU and RTCL register in supervisor mode and the mtspr instruction requires adifferent SPR encoding. For the mtspr instruction, RTCU is SPR20 and RTCL is SPR21.

When the 601 detects SPR encodings other than those defined in this document, it eithertakes a program exception (if bit 0 of the SPR encoding is set) or it treats the instruction asa no-op (if bit 0 of the SPR encoding is clear).

2.3.3.1 Synchronization for Supervisor-Level SPRs and Segment Registers

The processor has synchronization requirements when updating the following MMUregisters when the corresponding address translation is enabled (data accesses withMSR[DT] = 1 or instruction fetches with MSR[IT] = 1):

• SDR1• BATs (if MSR[DT] = 1 or MSR[IT] = 1)• Segment registers

In addition, there are other software requirements that should be observed when modifyingthese MMU registers and the MSR[IT] bit.

2.3.3.1.1 Context SynchronizationThe processor checks for read and write dependencies with respect to segment registers andspecial purpose registers, and executes a series of instructions involving those registers sothat read/write dependencies are not violated. For example, if an mtspr instruction writesa value to a particular SPR and an mfspr instruction later in the instruction stream reads thesame SPR, the mfspr reads the value written by the mtspr.

Table 2-14. Undefined Bits in Supervisor-Level Registers

Register Value Returned for Undefined Bits

FPSCR Zero

SDR1 Zero

All BATs Value last written to that bit position

HID0 Zero

HID1 Value last written to that bit position



HID15 Zero


It is important to note that dependencies caused by side effects of writing to segmentregisters and SPRs are not checked automatically. If an mtspr instruction writes a value toan SPR that changes how address translation is performed, a subsequent load instruction isnot guaranteed to use the new translation until the processor is explicitly synchronized byusing one of the following context-synchronizing operations:

• isync (Instruction Synchronize) instruction• sc (System Call) instruction• rfi (Return from Interrupt) instruction• Any exception other than machine check and system reset

Table 2-15 provides information on data access synchronization requirements.

1 The context-synchronizing event (most likely an isync instruction) prior to the tlbie instruction ensures that all previously issued memory access instructions have completed to a point where they will no longer cause an exception. The context-synchronizing event following the tlbie instruction ensures that subsequent memory access instructions will not use the TLB entry being invalidated. To ensure that all memory accesses previously translated by the TLB entry being invalidated have completed with respect to memory and that reference and change bit updates associated with those memory accesses have completed, a sync instruction rather than a context-synchronizing event is required after the tlbie instruction. Multiprocessor systems have other requirements to synchronize TLB invalidation.

For information on instruction access synchronization requirements see Table 2-16.

Table 2-15. Data Access Synchronization

Instruction/ Event Required Prior Required After

mtmsr (ME) None Context-synchronizing event

mtmsr (DT) None Context-synchronizing event

mtmsr (PR) None Context-synchronizing event

mtsr Context-synchronizing event Context-synchronizing event

mtspr (BAT) Context-synchronizing event Context-synchronizing event

mtspr (SDR1) sync Context-synchronizing event

mtspr (EAR) Context-synchronizing event Context-synchronizing event

tlbie1 Context-synchronizing event Context-synchronizing event

Table 2-16. Instruction Access Synchronization


Exception 1 None None

mtmsr (EP) None None

mtmsr (EE) 2 None None

mtmsr (ME) None Context-synchronizing event

mtmsr (IT) None Context-synchronizing event


1 These events are context-synchronizing.2 The effect of altering the EE bit is immediate as follows:

• If an mtmsr clears the EE bit, neither an external interrupt nor a decrementer exception occurs if the instruction is executed.

• If an mtmsr sets the EE bit, and an external interrupt or decrementer exception was being held off by the EE bit being 0, the exception is taken before the next instruction in the program stream that set the bit to 0 is issued.

3 The mtspr(SDR1) instruction is shown for completeness. Data accesses have stronger requirements that override this specification.

Note that the sync instruction, although not defined as context-synchronizing in thePowerPC architecture, can sometimes be used to provide the required synchronization.When a sync instruction is encountered, the 601 processor synchronizes updates to the CR,CTR, LR, MSR, FPSCR, and XER registers.

In general, context-synchronization is required when writes to registers that affectaddressing are preceded or followed by load or store instructions. Specifically, a context-synchronizing operation or a sync instruction must precede a modification of the BAT orsegment registers when the corresponding address translations are enabled. A syncinstruction must precede the modification of SDR1 when the corresponding (data accesseswith MSR[DT] = 1 or instruction fetches with MSR[IT] = 1) address translations areenabled, guaranteeing that the reference and change bits are updated in the correct context.

If the corresponding address translations are enabled, a context synchronization operationmust follow the modification of any of the above registers.

When several of the registers listed above are modified with no intervening instructions thatare affected by the changes, context synchronization or sync instructions are not requiredbetween the alterations. However, instructions fetched and/or executed after the alterationbut before the context synchronizing operation may be fetched and/or executed in either thecontext that existed before the alteration or the context established by the alteration.

For synchronization within a sequence of instructions, the isync instruction can be used asshown in the first example.

mtmsr (FP) None Context-synchronizing event

mtmsr (FE0,1) None Context-synchronizing event

mtmsr (SE) None Context-synchronizing event

rfi 1 None None

mtsr None Context-synchronizing event

mtspr (BAT) None Context-synchronizing event

mtspr (SDR1) 3 None Context-synchronizing event

tlbie None Context-synchronizing event

Table 2-16. Instruction Access Synchronization (Continued)



Example 1: Using the isync instruction—In this example a single segment register (n)needs to be updated in a context where loads and stores might otherwise execute ahead ofthe mtsr instruction and use the outdated address translation. Data and instruction addresstranslation is enabled (MSR[DT] = 1 and MSR[IT] = 1):

isyncmtsr sr,rnisync

The first isync instruction allows all instructions in the pipeline to complete, allowing themtsr instruction to dispatch and execute by itself.

Example 2: Using the isync instruction with a series of register modifications—Inexample 1, the single mtsr instruction could safely be replaced with a series of mtsrinstructions without each requiring an isync instruction. However, if both mtsr and mfsrinstructions are needed, they should be separated by an isync instruction, as follows:

isyncmtsr sr,r0mtsr sr,r1...mtsr sr,r7isyncmfsr r8,srmfsr r9,sr...mfsr r15,srisync

Example 3: Using the rfi instruction—When several registers are updated with nointervening loads or stores with MSR[DT] = 1 or instruction fetches with MSR[IT] = 1,context-synchronization between updates is unnecessary. When an exception is taken, theprocessor is synchronized automatically. In this example, a list of segment registers isupdated with several mtsr instructions followed by a single context-synchronizingoperation.

Because this example modifies all 16 segment registers (and therefore, affects the segmentregister(s) that control instruction fetching, this particular sequence must be executed indirect address translation mode (MSR[IT] = 0). Therefore, no synchronization is requiredbefore the segment registers are loaded. Even if the segment register(s) that controlinstruction fetching is not to be reloaded, the sequence can be executed with instructionaddress translation enabled (MSR[IT] = 1) and no additional synchronization before thesegment register instructions.

In this example the rfi instruction provides the needed synchronization after all 16 segmentregisters are loaded and before translated loads and stores are executed.

mtsr sr,r0mtsr sr,r1...mtsr sr,r15<load rest of machine state>rfi

2.3.3.1.2 Other Synchronization Requirements by RegisterThis section describes additional synchronization requirements.

SDR1 and MSR—The SDR1 register should be modified only when MSR[IT] = 0. Inaddition, the MSR[IT] bit should be altered only by software that has an address mappingsuch that logical addresses map directly to physical addresses.

Segment Registers—The only fields that should be modified in a segment registercurrently used for instruction fetching are the Ks and Kp bits. Note that any time segmentregisters are updated, the changes are guaranteed to take effect (including changes of theKx bits) only after a context-synchronizing operation has occurred.

BAT Registers—The only fields that should be modified in a BAT register currently usedfor instruction fetching are the Ks, Kp and the V (valid) bits. In the case of modifying theV bit for a BAT register currently used for instruction accesses, the instructionsimmediately following the mtspr for the BAT register must also be mapped by the pageaddress translation mechanism with the same logical to physical address mapping (oralternately, the instructions must be duplicated in the newly mapped space). Note that anytime the BAT registers are updated, the changes are guaranteed to take affect (includingchanges of the Kx bits) only after a context-synchronizing operation has completed.

In order to make a BAT register pair valid such that the BAT array entry then translates thecurrent instruction stream, the following sequence should be used if fields in both the upperand lower BAT registers are to be modified (for instruction address translation):

1. Clear the V bit in the BAT register pair.2. Initialize the other fields in the BAT register pair appropriately.3. Set the V bit in the BAT register pair.4. Perform a context-synchronizing operation.

2.3.3.2 DAE/Source Instruction Service Register (DSISR)The 32-bit DSISR, shown in Figure 2-14, identifies the cause of data access and alignmentexceptions.

Figure 2-14. DAE/Source Instruction Service Register (DSISR)

DSISR

0 31


For information about bit settings, see Section 5.4.3, “Data Access Exception (x'00300'),”and Section 5.4.6, “Alignment Exception (x'00600').”

The DSISR is cleared after a hard reset.

2.3.3.3 Data Address Register (DAR)The DAR is a 32-bit register as shown in Figure 2-15.

Figure 2-15. Data Address Register (DAR)

The effective address generated by a memory access instruction is placed in the DAR if theaccess causes an exception (I/O controller interface error, or alignment exception). Forinformation, see Section 5.4.3, “Data Access Exception (x'00300'),” and Section 5.4.6,“Alignment Exception (x'00600').”

2.3.3.4 Real-Time Clock (RTC) Registers (Supervisor-Level)The RTC registers can be written to only by supervisor-level software. Different SPRnumbers must be used with the mtspr instruction. The SPR number for the RTCU registeris 20; the SPR number for RTCL is 21.

The PowerPC architecture defines the DEC register as supervisor-only access for bothreads and writes. SPR22 is used for both reads and writes. The POWER architectureprovides user-level read access using SPR6. To ensure compatibility with subsequentPowerPC processors, the mfspr instruction should not be used in user-level.

2.3.3.5 Decrementer (DEC) RegisterThe DEC, shown in Figure 2-16, is a 32-bit decrementing counter that provides amechanism for causing a decrementer exception after a programmable delay. On the 601,the DEC is driven by the same frequency as the RTC (7.8125 MHz). On other PowerPCprocessors, the DEC frequency is based on a subdivision of the processor clock. The DECis cleared by hard reset.

Figure 2-16. Decrementer Register (DEC)

DAR

0 31

DEC

0 31


2.3.3.5.1 Decrementer OperationThe DEC counts down, causing an exception (unless masked by MSR[EE]) when it passesthrough zero. The DEC satisfies the following requirements:

• The operation of the RTC and the DEC are coherent (that is, the counters are driven by the same fundamental time base).

• Loading a GPR from the DEC has no effect on the DEC.

• Storing a GPR to the DEC replaces the value in the DEC with the value in the GPR.

• Whenever bit 0 of the DEC changes from 0 to 1, a decrementer exception request is signaled. (The exception breaks the pipeline in such a way that instructions in the execute state (except for instructions that have been dispatched ahead of undispatched integer instructions) complete execution, and instructions in decode stage remain undecoded until the exception handler returns control to the interrupted program). Multiple DEC exception requests may be received before the first exception occurs; however, any additional requests are canceled when the exception occurs for the first request.

• If the DEC is altered by software and the content of bit 0 is changed from 0 to 1, an exception request is signaled.

Note that the seven low-order bits are not implemented. Bit 24 changes every 128 nS. TheRTC input is sampled using the CPU clock. Therefore, if the CPU clock is less than twicethe RTC frequency, real-time clock (and decrementer) sampling and incrementing errorswill occur. Therefore, in systems that change the CPU clock frequency dynamically beyondthis limit, a method of saving and restoring the real-time clock register values via externalmeans is required.

2.3.3.5.2 Writing and Reading the DECThe content of the DEC can be read or written using the mfspr and mtspr instructions, bothof which are supervisor-level when they refer to the DEC. However, the 601 also allows thereading of the DEC in user mode (for POWER compatibility) via the SPR6 register. Notethat this functionality will not be supported in subsequent PowerPC processors. Using asimplified mnemonic for the mtspr instruction, the DEC may be written from GPR rA withthe following:

mtspr(dec) rA

If the execution of this instruction causes bit 0 of the DEC to change from 0 to 1, anexception request is signaled. The DEC may be read into GPR rA with the followingsequence:

mfspr(dec) rA


2.3.3.6 Table Search Description Register 1 (SDR1)The table search description register 1 (SDR1) is shown in Figure 2-17.

Figure 2-17. Table Search Description Register 1 (SDR1)

The bits of the SDR1 are described in Table 2-17.

The HTABORG field in SDR1 contains the high-order 16 bits of the 32-bit physical addressof the page table. Therefore, the page table is constrained to lie on a 216 byte (64 Kbytes)boundary at a minimum. At least 10 bits from the hash function are used to index into thepage table. The page table must consist of at least 64 Kbytes 210 PTEGs of 64 bytes each.

The page table can be any size 2n where 16 ≤ n ≤ 25. As the table size is increased, morebits are used from the hash to index into the table and the value in HTABORG must havemore of its low-order bits equal to 0. The HTABMASK field in SDR1 contains a mask valuethat determines how many bits from the hash are used in the page table index. This maskmust be of the form b'00...011...1'; that is, a string of 0 bits followed by a string of 1bits.The 1 bits determine how many additional bits (at least 10) from the hash are used in theindex; HTABORG must have this same number of low-order bits equal to 0. SeeFigure 6-21.

The number of low-order 0 bits in HTABORG must be at least the number of 1 bits inHTABMASK so that the final 32-bit physical address can be formed by logically ORing thevarious components.

2.3.3.7 Machine Status Save/Restore Register 0 (SRR0)The machine status save/restore register 0 (SRR0) is a 32-bit register the 601 uses to savemachine status on exceptions and restore machine status when an rfi instruction isexecuted. It also holds the EA for the instruction that follows the System Call (sc)instruction. The SRR0 is shown in Figure 2-18.

Table 2-17. Table Search Description Register 1 (SDR1) Bit Settings


0–15 HTABORG The high-order 16 bits of the 32-bit physical address of the page table


23–31 HTABMASK Mask for page table address

HTABORG 0000000 HTABMASK

0 15 16 222331

Reserved


Figure 2-18. Save/Restore Register 0 (SRR0)

When an exception occurs, SRR0 is set to point to an instruction such that all priorinstructions have completed execution and no subsequent instruction has begun execution.The instruction addressed by SRR0 may not have completed execution, depending on theexception type. SRR0 addresses either the instruction causing the exception or theimmediately following instruction. The instruction addressed can be determined from theexception type and status bits.

The SRR0 is cleared by hard reset.

For information on how specific exceptions affect SRR0, refer to the descriptions ofindividual exceptions in Chapter 5, “Exceptions.”

2.3.3.8 Machine Status Save/Restore Register 1 (SRR1)The SRR1 is a 32-bit register used to save machine status on exceptions and to restoremachine status when an rfi instruction is executed. The SRR1 is shown in Figure 2-19.

Figure 2-19. Machine Status Save/Restore Register 1 (SRR1)

In general, when an exception occurs, bits 0–15 of SRR1 are loaded with exception-specificinformation and bits 16–31 of MSR are placed into bits 16–31 of SRR1.

The SRR1 is cleared by hard reset.

For information on how specific exceptions affect SRR1, refer to the individual exceptionsin Chapter 5, “Exceptions.”

2.3.3.9 General SPRs (SPRG0–SPRG3)SPRG0 through SPRG3 are 32-bit registers provided for general operating system use, suchas performing a fast state save or for supporting multiprocessor implementations. SPRG0–SPRG3 are shown in Figure 2-20.

SRR0

031

0 151631

SRR1


Figure 2-20. General SPRs (SPRG0–SPRG3)

2.3.3.10 External Access Register (EAR)The EAR is a 32-bit SPR that controls access to the external control facility and identifiesthe target device for external control operations. The external control facility provides ameans for user-level instructions to communicate with special external devices. The EARis shown in Figure 2-21.

Figure 2-21. External Access Register (EAR)

This register is provided to support the External Control Input Word Indexed (eciwx) andExternal Control Output Word Indexed (ecowx) instructions, which are described inChapter 10, “Instruction Set.” Although access to the EAR is privileged, the operatingsystem can determine which tasks are allowed to issue external access instructions andwhen they are allowed to do so. The bit settings for the EAR are described in Table 2-18.Interpretation of the physical address transmitted by the eciwx and ecowx instructions andthe 32-bit value transmitted by the ecowx instruction is not prescribed by the PowerPCarchitecture but is determined by the target device.

For example, if the external control facility is used to support a graphics adapter, the ecowxinstruction could be used to send the translated physical address of a buffer containinggraphics data to the graphics device. The ecowx instruction could be used to load statusinformation from the graphics adapter.

SPRG0

SPRG1

SPRG2

SPRG3

0 31

0 1 272831

E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RID

Reserved


This register can also be accessed by using the mtspr and mfspr instructions using thevalue 282, b'01000 11010'. Synchronization requirements for the EAR are shown inTable 2-15 and Table 2-16.

The EAR is cleared by hard reset.

2.3.3.11 Processor Version Register (PVR)The PVR is a 32-bit, read-only register that identifies the version and revision level of thePowerPC processor (see Figure 2-22). The PVR cannot be modified. The contents of thePVR can be copied to a GPR by the mfspr instruction. Read access to the PVR is availablein supervisor mode only; write access is not provided.

Figure 2-22. Processor Version Register (PVR)

The PVR consists of two 16-bit fields:

• Version (bits 0–15)—A 16-bit number that identifies the version of the processor and of the PowerPC architecture. The processor version number is x'0001' for the 601.

• Revision (bits 16–31)—A 16-bit number that distinguishes between various releases of a particular version, (that is, an engineering change level). The value of the revision portion of the PVR is implementation-specific. The processor revision level is changed for each revision of the device. Contact your support center for specific information about the revision of the processor you are using.

Table 2-18. External Access Register (EAR) Bit Settings

Bit Name Description

0 E Enable bit 1 Enabled0 DisabledIf this bit is set, the eciwx and ecowx instructions can perform the specified external operation. If the bit is cleared, an eciwx or ecowx instruction causes a data access exception.

1–27 — Reserved

28–31 RID Resource ID. The RID is formed by concatenating TBST||TSIZ0–TSIZ2. Note that in other PowerPC implementations, this field may use bits 26–31.

0 151631

Version Revision


2.3.3.12 BAT RegistersThe block address translation mechanism in the 601 is implemented as a software-controlled array (BAT array). The BAT array maintains the address translation informationfor four blocks of memory. The BAT array in the 601 is maintained by the system softwareand is implemented as a set of eight special-purpose registers (SPRs). Each block is definedby a pair of SPRs called upper and lower BAT registers.

The 601 includes eight block-address translation (BAT) registers, grouped into four registerpairs: (IBAT0U–IBAT3U and IBAT0L–IBAT3L). Note that the PowerPC architectureidentifies these SPRs as IBATs, in the 601, they are implemented as unified BATs. SeeFigure 2-1 for a list of the SPR numbers for the BAT registers. Note that other PowerPCimplementations may have two sets of four pairs of BAT registers. The additional eightregisters are data BATs, or DBATs, (DBAT0U–DBAT3U and DBAT0L–DBAT3L). TheseBATs use the eight SPR numbers subsequent to those used by the IBATs (536–543).

Note that the implementation of the bit fields in the BATs are different from the otherPowerPC implementations. Figure 2-23 and Figure 2-24 show the format of the upper andlower BAT registers for the 601.

Figure 2-23. Upper BAT Register

Figure 2-24. Lower BAT Register

Table 2-19 describes the bits in the BAT registers.

BLPI 0 0 0 0 0 0 0 0 0 0 WIM Ks Ku PP

0 14 15 24 25 27 28 29 30 31

Reserved

PBN 0 0 0 0 0 0 0 0 0 0 V BSM

Reserved

0 14 15 24 25 26 31


Table 2-20 lists the BAT area lengths encoded in by BAT[BSM].

Table 2-19. BAT Registers

Register Bits Name Description

Upper BAT Registers

0–14 BLPI Block logical page index. This field is compared with bits 0–14 of the logical address to determine if there is a hit in that BAT array entry.


25–27 WIM Memory/cache access mode bitsW Write-throughI Caching-inhibitedM Memory coherenceFor detailed information about the WIM bits, see Section 6.3, “Memory/Cache Access Modes.”

28 Ks Supervisor mode key. This bit interacts with MSR[PR] and the PP field to determine the protection for the block. For more information, see Section 6.4, “General Memory Protection Mechanism.”

29 Ku User mode key. This bit also interacts with MSR[PR] and the PP field to determine the protection for the block. For more information, see Section 6.4, “General Memory Protection Mechanism.”

30–31 PP Protection bits for block. This field interacts with MSR[PR] and the Ks or Ku to determine the protection for the block as described in Section 6.4, “General Memory Protection Mechanism.”

Lower BAT Registers

0–14 PBN Physical block number. This field is used in conjunction with the BSM field to generate bits 0–14 of the physical address of the block.


25 V BAT register pair (BAT array entry) is valid if V = 1.

26–31 BSM Block size mask (0...5). BSM is a mask that encodes the size of the block. Values for this field are listed in Table 2-20.

Table 2-20. BAT Area Lengths

BAT Area Length

BSM Encoding

128 Kbytes 00 0000

256 Kbytes 00 0001

512 Kbytes 00 0011

1 Mbyte 00 0111

2 Mbytes 00 1111

4 Mbytes 01 1111

8 Mbytes 11 1111


Only the values shown in Table 2-20 are valid for the BSM field. The rightmost bit of BSMis aligned with bit 14 of the logical address. A logical address is determined to be within aBAT area if the logical address matches the value in the BLPI field.

The boundary between the string of zeros and the string of ones in BSM determines the bitsof logical address that participate in the comparison with BLPI. Bits in the logical addresscorresponding to ones in BSM are cleared for this comparison.

Bits in the logical address corresponding to ones in the BSM field, concatenated with the17 bits of the logical address to the right (more significant bits) of BSM, form the offsetwithin the BAT area. This is described in detail in Chapter 6, “Memory Management Unit.”

The value loaded into BSM determines both the length of the BAT area and the alignmentof the area in both logical and physical address space. The values loaded into BLPI andPBN must have at least as many low-order zeros as there are ones in BSM.

The BAT registers are cleared by hard reset. Use of BAT registers is described in Chapter 6,“Memory Management Unit.”

2.3.3.13 601 Implementation-Specific HID RegistersPowerPC processors may have implementation-specific SPRs, referred to as HID registers.Additional SPR encodings allow access to the implementation-dependent registers withinthe 601. The SPR encodings for the 601’s HID registers are described in Table 2-21. Notethat these encodings use split-field notation; that is, the order of two 5-bit components ofthe 10-bit encoding is reversed.

For additional information about the mtspr and mfspr instructions, refer to Chapter 10,“Instruction Set.”

2.3.3.13.1 Checkstop Sources and Enables Register—HID0The checkstop sources and enables register (HID0), shown in Figure 2-25, is a supervisor-level register that defines enable and monitor bits for each of the checkstop sources in the601. The SPR number for HID0 is 1008.

Table 2-21. Additional SPR Encodings

SPR NumberSPR Encoding

SPR(5–9)|SPR(0–4)Register Name Access

1008 11111 10000 Checkstop sources and enables register (HID0) Supervisor

1009 11111 10001 601 debug modes register (HID1) Supervisor

1010 11111 10010 IABR (HID2) Supervisor

1013 11111 10101 DABR (HID5) Supervisor

1023 11111 11111 PIR (HID15) Supervisor


Figure 2-25. Checkstop Sources and Enables Register (HID0)

Table 2-22 defines the bits in HID0. The enable bits (bits 15–31) can be used to maskindividual checkstop sources, although these are provided primarily to mask off any falsereports of such conditions for debugging purposes. Bit 0 (HID0[CE]) is a master checkstopenable; if it is cleared, all checkstop conditions are disabled; if it is set, individualconditions can be enabled separately. HID0[EM] (bit 16) enables and disables machinecheck checkstops; clearing this bit masks machine check checkstop conditions that occurwhen MSR[ME] is cleared. Bits 1–11 are the checkstop source bits, and can be used todetermine the specific cause of a checkstop condition.

Table 2-22. Checkstop Sources and Enables Register (HID0) Definition


0 CE Master checkstop enable. Enabled if set. If this bit is cleared and the TEA signal is asserted, a machine check exception is taken, regardless of the setting of MSR[ME].

1 S Microcode checkstop detected if set.

2 M Double machine check detected if set.

3 TD Multiple TLB hit checkstop if set.

4 CD Multiple cache hit checkstop if set.

5 SH Sequencer time out checkstop if set.

6 DT Dispatch time out checkstop if set.

7 BA Bus address parity error if set.

8 BD Bus data parity error if set.

9 CP Cache parity error if set.

10 IU Invalid microcode instruction if set.

11 PP I/O controller interface access protocol error if set.


15 ES Enable microcode checkstop. Enabled by hard reset. Enabled if set.

Reserved

HID0EDT

ESH

ECD

ETD

EBD

ECP

EIU

EPP

EBA

DRF

DRL

PAR

EMC EHP

CE S M TD CD SH DT BA BD CP IU PP 0 0 0 ES EM LM

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31


All enable bits except 15 and 24 are disabled at start up. The operating system should enablethese checkstop conditions before the power-on reset sequence is complete.

Checkstop enable bits can be set or cleared without restriction. If a checkstop source bit isset, it can be cleared; however, if the corresponding checkstop condition is still present onthe next clock, the bit will be set again. A checkstop source bit can only be set when thecorresponding checkstop condition occurs and the checkstop enable bit is set; it cannot beset via an mtspr instruction. That is, you cannot manually cause a checkstop.

16 EM Enable machine check checkstop. Disabled by hard reset. Enabled if set. If this bit is cleared and the TEA signal is asserted, a machine check exception is taken, regardless of the setting of MSR[ME].

17 ETD Enable TLB checkstop. Disabled by hard reset. Enabled if set.

18 ECD Enable cache checkstop. Disabled by hard reset. Enabled if set.

19 ESH Enable sequencer time out checkstop. Disabled by hard reset. Enabled if set.

20 EDT Enable dispatch time out checkstop. Disabled by hard reset. Enabled if set.

21 EBA Enable bus address parity checkstop. Disabled by hard reset. Enabled if set.

22 EBD Enable bus data parity checkstop. Disabled by hard reset. Enabled if set.

23 ECP Enable cache parity checkstop. Disabled by hard reset. Enabled if set.

24 EIU Enable for invalid ucode instruction checkstop. Enabled by hard reset. Enabled if set.

25 EPP Enable for I/O controller interface access protocol checkstop. Disabled by hard reset. Enabled if set.

26 DRF 0 Optional reload of alternate sector on instruction fetch miss is enabled.1 Optional reload of alternate sector on instruction fetch miss is disabled.

27 DRL 0 Optional reload of alternate sector on load/store miss is enabled.1 Optional reload of alternate sector on load/store miss is disabled.

28 LM 0 Big-endian mode is enabled.1 Little-endian mode is enabled.For more information about byte ordering, see Section 2.4.3, “Byte and Bit Ordering.” Note that in the PowerPC architecture, the selection between big- and little-endian mode is controlled by two bits in the MSR.

29 PAR 0 Precharge of the ARTRY and SHD signals is enabled.1 Precharge of the ARTRY and SHD signals is disabled.

30 EMC 0 No error detected in main cache during array initialization.1 Error detected in main cache during array initialization.

31 EHP 0 The HP_SNP_REQ signal is disabled. Use of the WRS queue position is restricted to a snoop hit that occurs when a read is pending. That is, its address tenure is complete but the data tenure has not begun.

1 The HP_SNP_REQ signal is enabled. Use of the WRS queue position is restricted to a snoop hit on an address tenure that had HP_SNP_REQ asserted.

Table 2-22. Checkstop Sources and Enables Register (HID0) Definition (Continued)



The HID0 register is set to x'80010080' by the hard reset operation. However, the state ofthe EMC bit depends on the results of the power-on diagnostics for the main cache array.This bit is set if the cache fails the built-in self test during the power-on sequence.

2.3.3.13.2 601 Debug Modes Register—HID1The 601 debug modes register (HID1) is a supervisor-level register that defines enable bitsfor the various debug modes supported by the 601; see Figure 2-26. The SPR number forHID1 is 1009.

Figure 2-26. PowerPC 601 Microprocessor Debug Modes Register

Table 2-23 shows bit settings for the HID1 register. Note that if both the single instructionstep option is specified for the M field (b'100') and the trap to run mode exception option isspecified in the RM field (b'10'), the processor iterates in an infinite loop.

Table 2-23. HID1 Register Definition


0 — Reserved

1–3 M 601 run modes000 Normal run mode001 Undefined. Do not use.010 Limited instruction address compare.011 Undefined. Do not use.100 Single instruction step101 Undefined. Do not use.110 Full instruction address compare111 Full branch target address compare

4–7 — Reserved

8–9 RM Response to address compare or single step00 Hard stop (Stop L1 clocks).01 Soft stop (Wait for system activity to quiesce).10 Trap to run mode exception (address vector x'02000'), with the base address

indicated in by the setting of MSR[IP]. This mode is valid for address comparisons and may produce unpredictable results when used with HID single-instruction step mode.

11 Reserved. Do not use.Note that when HID1[8–9] = 10, the trap address of x'2000' has a base address indicated by the setting of MSR[IP]. This mode is valid for address comparisons and may produce unpredictable results when used with HID single-step mode.

10–16 — Reserved. Do not use.

17 TL When set, this bit disables the broadcast of the tlbie instruction.

18–31 — Reserved. Do not use.

HID1

0 1 3 4 7 8 9 10 16 17 18 31

Reserved

0 M 0 0 0 0 RM 0 0 0 0 0 0 0 TL 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Note that when HID1[8–9] = 10, the trap address of x'2000' has a base address indicated bythe setting of MSR[IP]. This mode is valid for address comparisons and may produceunpredictable results when used with the HID single-step mode.

The HID1 register is cleared by a hard reset.

2.3.3.13.3 Instruction Address Breakpoint Register (IABR)—HID2The instruction address breakpoint register (IABR), is also HID2. The IABR, shown inFigure 2-27, is a supervisor-level register defined to hold an effective address that is usedto compare with either the logical address of the instruction in the decode phase of thepipeline or the EA of a branch target depending on the mode specified by the value ofHID1[M]. The results of the comparison are used differently depending on the debug modeused.

Figure 2-27. Instruction Address Breakpoint Register (IABR)—HID2

Table 2-24 lists HID2 register definitions. The HID2 register is cleared by the hard resetoperation.

The SPR number for HID2 is 1010.

2.3.3.13.4 Data Address Breakpoint Register (DABR)—HID5The data address breakpoint register (DABR) (HID5), as shown in Figure 2-28, is designedto hold an effective address that is used to compare with the effective address generated bya load or store operation. The results of the comparison are used to cause a data accessexception when the appropriate 601 debug mode bits are set (as described inSection 2.3.3.13.2, “601 Debug Modes Register—HID1”).

Figure 2-28. Data Address Breakpoint Register (DABR)



0–29 CEA Comparison effective address

30–31 — Reserved. This field should be set to zero.

HID2

0 29 30 31

CEA 0 0

Reserved

HID5

0 28 29 30 31

Reserved

DAB 0 SA


Table 2-25 describes bit settings in HID5. The HID5 register is cleared by the hard resetoperation.

The SPR number for HID5 is 1013.

If the DABR feature is enabled, operations that hit against a properly enabled DABR causea data access exception. For this type of data access exception (DAE), bit 9 of the DSISRis set and the data address register (DAR) contains the EA that caused the DABR match. Ifthe access crossed a double-word boundary, the DAR contains the EA of the access fromthe first double word (even if the DABR match was on the second double word). For moreinformation about data access exceptions, see Section 5.4.3, “Data Access Exception(x'00300').”

Table 2-26 describes how each instruction type interacts with the DABR feature.



0–28 DAB Data address breakpoint (EA). This field is set to the double-word EA to compare with enabled load or store EAs.

29 — Reserved, although on an mfspr (DABR), the value returned is the value last written.

30–31 SA Memory access types:00 Breakpoints disabled01 Breakpoints load accesses only10 Breakpoints store accesses only11 Breakpoints both load and store accesses

Table 2-26. DABR Results

Operation Description

Load instructions

If any part of the load access touches the double word specified in the DABR, and the appropriate enable bit is set, then the DAE occurs. In this case, the memory read operation is inhibited and register rD is not updated. If the operation is a load with update, the update to register rA is also inhibited.

Store instructions

If any part of the store access touches the double word specified in the DABR and the appropriate enable bit is set, the DAE occurs and the memory access is inhibited.If the operation is a store with update, then the update to register rA is also inhibited.

If the operation is a Store Conditional instruction and the reservation bit is not set at the time of the DABR compare (at the end of execution as soon as the EA is calculated), the DAE is not taken.

Load and store string and multiple instructions

These instructions are sequenced one register (one word) at a time through the IU for EA calculation. Each access is checked against the DABR as it is presented to the ATU. If a match occurs, the instruction is aborted and a DAE is taken.If the initial EA for the string or multiple is not word-aligned, some individual accesses may cross a double-word boundary. If either double word hits in the DABR, the access is inhibited and the DAE occurs.


2.3.3.13.5 Processor Identification Register (PIR)—HID15The PIR register, shown in Figure 2-29, is a 32-bit, supervisor-level register that holds the4-bit processor identification tag (PID). This tag is useful for processor differentiation inmultiprocessor system designs. The tag is also used to identify the sender and receiver tagfor I/O controller interface operations. For more information, see Section 9.6, “Memory-vs. I/O-Mapped I/O Operations.” The PIR can be accessed by the mfspr instruction byusing the SPR number 1023, as follows:

syncmfspr rD,1023sync

The PIR is cleared by the hard reset operation.

Figure 2-29. Processor Identification Register (PIR)

2.4 Operand ConventionsThis section describes the conventions used for storing values in registers and memory.

2.4.1 Data Organization in Memory and Data TransfersBytes in memory are numbered consecutively starting with 0. Each number is the addressof the corresponding byte.

Memory operands may be bytes, half words, words, or double words, or, for the load/storemultiple and move assist instructions, a sequence of bytes or words. The address of amemory operand is the address of its first byte (that is, of its lowest-numbered byte).Operand length is implicit for each instruction.

lscbx instruction

This instruction is not supported by the DABR Feature. No DAE occurs, even if the EA matches.

Cache control instructions

These instructions are not supported by the DABR Feature. No DAE occurs even if the EA matches.

Table 2-26. DABR Results (Continued)


PIR

0 27 28 31

Reserved

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PID


2.4.1.1 Alignment and Misaligned AccessesThe operand of a single-register memory access instruction has a natural alignmentboundary equal to the operand length. In other words, the “natural” address of an operandis an integral multiple of the operand length. A memory operand is said to be aligned if itis aligned at its natural boundary; otherwise it is misaligned.

Operands for single-register memory access instructions have the characteristics shown inTable 2-27. (Although not permitted as memory operands, quad words are shown becausequad-word alignment is desirable for certain memory operands).

The concept of alignment is also applied more generally to data in memory. For example,12 bytes of data are said to be word-aligned if its address is a multiple of four.

Some instructions require their memory operands to have certain alignment. In addition,alignment may affect performance. For single-register memory access instructions, the bestperformance is obtained when memory operands are aligned. Additional effects of dataplacement on performance are described in Chapter 7, “Instruction Timing.”

Instructions are four bytes long and word-aligned.

2.4.2 Effect of Operand Placement on PerformanceThe placement (location and alignment) of operands in memory affect the relativeperformance of memory accesses. Best performance is guaranteed if memory operands arealigned on natural boundaries. To obtain the best performance across the widest range ofPowerPC processor implementations, the programmer should assume the performancemodel described in Figure 2-30 with respect to the placement of memory operands.

Table 2-27. Memory Operands

Operand LengthAddr(28–31)

if aligned

Byte 8 bits xxxx

Half word 2 bytes xxx0

Word 4 bytes xx00

Double word 8 bytes x000

Note: An “x” in an address bit position indicates that the bit can be 0 or 1 independent of the state of other bits in the address.

Figure 2-30. Performance Effects of Memory Operand Placement

The performance of accesses varies depending on the following:

• Operand size• Operand alignment• Crossing a cache block (sector) boundary• Crossing a page boundary• Crossing a BAT boundary • Crossing a segment boundary

The load/store multiple instructions are defined by the PowerPC architecture to operateonly on aligned operands, although the 601 supports unaligned operands. The move assistinstructions have no alignment requirements.

2.4.2.1 Instruction RestartIf a memory access crosses a page or segment boundary, a number of conditions could abortthe execution of the instruction after part of the access has been performed. For example,this may occur when a program attempts to access a page it has not previously accessed orwhen the processor must check for a possible change in memory attributes when an accesscrosses a page boundary. When this occurs, the operating system may restart theinstruction. If the instruction is restarted, some bytes at that word address may be loadedfrom or stored to the target location a second time.

Operand Boundary Crossing

Size Byte Alignment None Cache Line Page BAT/Segment

Integer

4 Byte 4<4

OptimalGood

—Good

—Poor

—Poor

2 Byte 2<2

OptimalGood

—Good

—Poor

—Poor

1 Byte 1 Optimal — — —

Imw, stmw 4 Good Good Good Poor

String Good Good Poor Poor

Float

8 Byte 84<4

OptimalGoodPoor

—GoodPoor

—PoorPoor

—PoorPoor

4 Byte 4<4

optimalPoor

—Poor

—Poor

—Poor


The following rules apply to memory accesses with regard to restarting the instruction:

• Aligned accesses—A single-register instruction that accesses an aligned operand is not partially executed.

• Misaligned accesses—A single-register instruction that accesses a misaligned operand may be partially executed if the access crosses a page boundary and a data access exception occurs on the second page.

• Load/store multiple, move assist—These instructions may be partially executed if, in accessing the locations specified by the instruction, a page boundary is crossed and a data access exception occurs on the second page.

2.4.2.2 AtomicityAll aligned accesses are atomic. Instructions causing multiple accesses (for example,load/store multiple and move assist instructions) are not atomic.

2.4.2.3 Access OrderThe ordering of memory accesses is not guaranteed unless the programmer insertsappropriate ordering instructions, even if the accesses are generated by a single instruction.Misaligned accesses, load/store multiple instructions, and move assist instructions have noimplicit ordering characteristics. For example, processor A may store a word operand on anodd half-word boundary. It may appear to processor A that the store completed atomically.Processor or other mechanism B, executing a load from the same location, may get a resultthat is a combination of the value of the first half word that existed prior to the store byprocessor A and the value of the second half word stored by processor A.

2.4.3 Byte and Bit OrderingThe PowerPC architecture supports both big- and little-endian byte ordering. The defaultbyte- and bit ordering is big-endian, as shown in Figure 2-31. Byte ordering can be set tolittle-endian by setting the LM bit in the HID0 register. Note that the mechanism forselecting between byte orderings is different in the 601 than it is in the PowerPCarchitecture. The PowerPC architecture provides two enable bits in the MSR that allowindependent control for user- and supervisor-level software.

Figure 2-31. Big-Endian Byte and Bit Ordering

Byte 0 Byte 1 Byte N (max)

Big-Endian Byte Ordering

0 1 2 n

Big-Endian Bit Ordering

MSB

msb bit n (max)

If individual data items were indivisible, the concept of byte ordering would beunnecessary. Order of bits or groups of bits within the smallest addressable unit of memoryis irrelevant, because nothing can be observed about such order. Order matters only whenscalars, which the processor and programmer regard as indivisible quantities, can be madeup of more than one addressable units of memory.

For a device in which the smallest addressable unit is the 64-bit double word, there is noquestion of the order of bytes within double words. All transfers of individual scalarsbetween registers and memory are of double words. A subset of the 64 bit scalar (forexample, a byte) is not addressable in memory. As a result, to access any subset of the bitsof a scalar, the entire 64-bit scalar must be accessed, and when a memory location is read,the 64-bit value returned is the 64-bit value last written to that location.

For PowerPC processors, the smallest addressable memory unit is the byte (8 bits), andscalars are composed of one or more sequential bytes. When a 32-bit scalar is moved froma register to memory, it occupies four consecutive byte addresses, and a decision must bemade regarding the order of these bytes in these four addresses.

The choice of byte ordering is arbitrary. Although there are 24 ways (4!) to specify theordering of four bytes within a word, illustrated as all the permutations of ordering of fourelements—ABCD, ABDC, ACBD, ACDB…DBCA, DCAB, DCBA—where the bytes areordered lowest address to highest address, only two of these orderings are practical—ABCD (big-endian) and DCBA (little-endian).

The following example shows how the byte ordering is changed from big- to little-endianmode by setting HID0[28] (n refers to the address):

<msr[ee] is off (zero) >n sync |Instructions n+4 sync |accessed in n+8 sync |big-endian moden+c mtspr hid0(28)|n+10 sync |Instructions n+14 sync |accessed inn+18 sync |little-endian mode

The same instruction sequence can be used to go from little- to big-endian mode by clearingHID0[28].

2.4.3.1 Little-Endian Address ManipulationIn little-endian operations, the three least significant bits of an address are manipulated toprovide the appearance of a little-endian memory to the program for aligned loads andstores, as follows:

New_addr(29) <- EA(29) xor (word | half | byte)

New_addr(30) <- EA(30) xor (half | byte)

New_addr(31) <- EA(31) xor (byte)


The physical address used for an access generated by a load or a store to an operand that isless than a doubleword is modified as indicated. Addresses for aligned double-wordaccesses and cache control operations are not modified since the endian mode has no effecton aligned accesses larger than one word.

On a data access exception, the DAR contains the effective address (EA) generated by thememory access instruction (that is, the address before modification) regardless of theendian mode selected. SRR0 contains the EA of an instruction as described in Chapter 5,“Exceptions,” (that is, the address before modification).

If the processor is in little-endian mode, the address is modified; if the processor is in big-endian mode, the address is unmodified.

The T bit does not affect address manipulation or the detection of alignment exceptionconditions. Therefore I/O interface controller operations and BUID x'07F' segments receivethe modified address. The ecowx and eciwx instructions are treated as no-ops if the T bit isset regardless of whether the 601 is in little-endian mode.

Because the 601 defines a cache block as 32 bytes, bits 27–31 of the address are not usedfor snooping. The program address should be specified, when an address is loaded intoHID2 or HID5. That is, if the processor is in little-endian mode, a little-endian addressshould be specified, and if the processor is in big-endian mode, a big-endian address shouldbe specified.

2.4.3.2 Little-Endian Alignment ExceptionsAdditional alignment exception conditions can occur when the processor is in little-endianmode.

Load/store multiple operands (regardless of EA)

• lmw stmw

• lscbxx stswi

• lswi stswx

• lswx

The new alignment exception conditions are prioritized with other alignment exceptionsahead of data access exceptions. See Section 2.4.5.2, “Misaligned Scalars” for moreinformation.

2.4.3.3 Little-Endian Instruction FetchingIn little-endian mode, instructions are fetched in big-endian order; however, the instructionsare swapped within a double word before being passed to the instruction queue, thus puttingthe instructions in little-endian order for execution. On exceptions, the 601 reports thecorrect effective address (as defined by the programming model or computed by a storageaccess instruction) regardless of the endian mode selected.


2.4.3.4 Big-Endian Byte OrderingBig-endian ordering (ABCD) assigns the lowest address to the highest-order eight bits ofthe scalar. This is called big-endian because the big end of the scalar, considered as a binarynumber, comes first in memory.

2.4.3.5 Little-Endian Byte OrderingLittle-endian byte ordering (DCBA) assigns the lowest address to the lowest-order(rightmost) 8 bits of the scalar. The little end of the scalar, considered as a binary number,comes first in memory.

2.4.4 Structure Mapping ExamplesThe following C programming example contains an assortment of scalars and one characterstring. The value presumed to be in each structure element is shown in hexadecimal in thecomments and are used below to show how the bytes that comprise each structure elementare mapped into memory.

struct int a; /* x'11121314' word */double b; /* x'2122232425262728' doubleword */char * c; /* x'31323334' word */char d[7]; /* 'A','‘B','C','D','E','F','G' array of bytes */short e; /* x'5152' halfword */int f; /* x'61626364' word */

s;

2.4.4.1 Big-Endian MappingThe big-endian mapping of a structure S is shown in Figure 2-32. Addresses are shown inhexadecimal at the left of each double word and in small figures below each byte. Thecontent of each byte, as shown in the preceding C programming example, is shown inhexadecimal as characters for the elements of the string.

Figure 2-32. Big-Endian Mapping of Structure S

001100

1201

1302

1403

(*)04

(*)05

(*)06

(*)07

082108

2209

230A

240B

250C

260D

270E

280F

103110

3211

3312

3413

‘A’14

‘B’15

‘C’16

‘D’17

18‘E’18

‘F’19

‘G’1A

(*)1B

511C

521D

(*)1E

(*)1F

206120

6221

6322

6423


Note that the C structure mapping introduces padding (skipped bytes indicated by asterisks“(*)” in Figure 2-32) in the map in order to align the scalars on their proper boundaries—4 bytes between a and b, one byte between d and e, and two bytes between e and f. Bothbig- and little-endian mappings use the same amount of padding.

2.4.4.2 Little-Endian MappingFigure 2-33 shows the structure, S, using little-endian mapping. Double words are laid outfrom right to left.

Figure 2-33. Little-Endian Mapping of Structure S

2.4.5 PowerPC Byte OrderingThe default mapping for PowerPC processors is big-endian. In the 601, little-endian modecan be selected after a hard reset by setting the LM bit in the HID0 register in the 601through the use of the mtspr instruction. Note that the PowerPC architecture defines twobits in the MSR for specifying byte ordering—LE (little-endian mode) and ELE (exceptionlittle-endian mode). These bits are not implemented in the 601.

The 601 big- and little-endian mode operation differs from the PowerPC architecture in thefollowing ways:

• Choice of big- or little-endian modes is provided through HID0[LM]—bit 28 of HID0. The PowerPC architecture defines two bits in the MSR for this purpose.

• The basic mode switching sequence requires three sync instructions followed by the mtspr access to HID0[28], followed by three more sync instructions. This sequence should be used whenever the state of this bit is changed.

• External and decrementer exceptions should be disabled before executing the sequence.

• The starting address of the sequence does not matter; however, the sequence cannot cross a protection boundary.

• In some cases the mtspr access to HID0[LM] can occur twice depending on the alignment of the instruction.

(*)07

(*)06

(*)05

(*)04

1103

1202

1301

1400

00

210F

220E

230D

240C

250B

260A

2709

2808

08

‘D’17

‘C’16

‘B’15

‘A’14

3113

3212

3311

3410

10

(*)1F

(*)1E

511D

521C

(*)1B

‘G’1A

‘F’19

‘E’18

18

6123

6222

6321

6420

20


• In some cases not all of the sync instructions will actually be executed, depending on the starting address of the sequence.

• Although HID0[LM] can be switched dynamically, there are certain constraints (such as turning off translation and emptying the memory queues) that must be considered before the bit can be switched. Note that, when switching modes between tasks, this code sequence may not allow the 601 to operate at an optimal performance level.

2.4.5.1 Aligned ScalarsFor the load and store instructions, the effective address is computed as specified in theinstruction descriptions in Chapter 3, “Addressing Modes and Instruction Set Summary.”

Table 2-28 shows how the physical address is modified.

The modified physical address is passed to the data cache or the main memory and thespecified width of the data is transferred between a GPR or FPR and the (as modified)addressed memory locations. Although the data is stored using big-endian byte ordering(but not in the same bytes within double words as with LM = 0), the modification of the EAmakes it appear to the processor that it is stored in little-endian mode.

The structure S would be placed in memory as shown in Figure 2-34.

Figure 2-34. PowerPC Little-Endian Structure S in Memory or Cache

Table 2-28. EA Modifications

Data Width (Bytes) EA Modification

8 No change

4 XOR with b'100'

2 XOR with b'110'

1 XOR with b'111'

0000 01 02 03

1104

1205

1306

1407

082108

2209

230A

240B

250C

260D

270E

280F

10‘D’10

‘C’11

‘B’12

‘A’13

3114

3215

3316

3417

18(*)18

(*)19

511A

521B

(*)1C

‘G’1D

‘F’1E

‘E’1F

20(*)20

(*)21

(*)22

(*)23

6124

6225

6326

6427


Because of the modifications on the EA, the same structure S appears to the processor to bemapped into memory this way when LM = 1 (little-endian enabled). This is shown inFigure 2-35.

Figure 2-35. PowerPC Little-Endian Structure S as Seen by Processor

Note that as seen by the program executing in the processor, the mapping for the structureS is identical to the little-endian mapping shown in Figure 2-33. From outside of theprocessor, the addresses of the bytes making up the structure S are as shown in Figure 2-34.These addresses match neither the big-endian mapping of Figure 2-32 or the little-endianmapping of Figure 2-33. This must be taken into account when performing I/O operationsin little-endian mode; this is discussed in Section 2.4.7, “PowerPC Input/Output in Little-Endian Mode.”

2.4.5.2 Misaligned ScalarsPerforming an XOR operation on the low-order bits of the address of a scalar requires thescalar to be aligned on a boundary equal to a multiple of its length. When executing in little-endian mode (LM = 1), the 601 takes an alignment exception whenever a load or storeinstruction is issued with a misaligned EA, regardless of whether such an access could behandled without causing an exception in big-endian mode (LM = 0).

The PowerPC architecture states that half words, words, and double words be placed inmemory such that the little-endian address of the lowest-order byte is the EA computed bythe load or store instruction; the little-endian address of the next-lowest-order byte is onegreater, and so on. Figure 2-36 shows a four-byte word stored at little-endian address 5. Theword is presumed to contain the binary representation of x'11121314'.

Figure 2-36. PowerPC Little-Endian Mode, Word Stored at Address 5

07 06 05 041103

1202

1301

1400

00

210F

220E

230D

240C

250B

260A

2709

2808

08

‘D’17

‘C’16

‘B’15

‘A’14

3113

3212

3311

3410

10

(*)1F

(*)1E

511D

521C

(*)1B

‘G’1A

‘F’19

‘E’18

18

6123

6222

6321

6420

20

1207

1306

1405

(*)04

(*)03

(*)02

(*)01

(*)00

00

(*)0F

(*)0E

(*)0D

(*)0C

(*)0B

(*)0A

(*)09

1108

08


Figure 2-37 shows the same word stored by a little-endian program, as seen by the memorysystem (assuming big-endian mode).

Figure 2-37. Word Stored at Little-Endian Address 5 as Seen by Big-Endian Addressing

Note that the misaligned word in this example spans two double words. The two parts ofthe misaligned word are not contiguous in the big-endian addressing space.

2.4.5.3 Non-ScalarsThe PowerPC architecture has two types of instructions that handle non-scalars (multipleinstances of scalars). Neither type can deal with the modified EAs required in little-endianmode and both types cause alignment exceptions.

2.4.5.3.1 String OperationsThe load and store string instructions, listed in Table 2-29, cause alignment exceptionswhen they are executed in little-endian mode (HID0[LM] = 1).

String accesses are inherently byte-based operations, which, for improved performance, the601 handles as a series of word-aligned accesses.

Note that the system software must determine whether to emulate the excepting instructionor treat it as an illegal operation. Because little-endian mode programs are new with respectto the PowerPC architecture—that is, they are not POWER binaries—having the compilergenerate these instructions in little-endian mode would be slower than processing the stringin-line or by using a subroutine call.

Table 2-29. Load/Store String Instructions that Take Alignment Exceptions if LM = 1

Mnemonic Description

lswi Load String Word Immediate

lswx Load String Word Indexed

stswi Store String Word Immediate

stswx Store String Word Indexed

lscbx Load String and Compare Byte Indexed


2.4.5.3.2 Load and Store Multiple InstructionsThe instructions in Table 2-30 cause alignment exceptions when executed in little-endianmode (HID0[LM] = 1).

Although the words addressed by these instructions are on word boundaries, each word isin the half of its containing double word opposite from where it would be in big-endianmode.

Note that the system software must determine whether to emulate the excepting instructionor treat it as an illegal operation. Because little-endian mode programs are new with respectto the PowerPC architecture—that is, they are not POWER binaries—having the compilergenerate these instructions in little-endian mode would be slower than processing the stringin-line or by using a subroutine call.

2.4.6 PowerPC Instruction Memory Addressing in Little-Endian Mode

Each PowerPC instruction occupies 32 bits (one word) of memory. PowerPC processorsfetch and execute instructions as if the current instruction address had been advanced oneword for each sequential instruction. When operating with LM = 1, the address is modifiedaccording to the little-endian rule for fetching word-length scalars; that is, it is XORed withb'100'. A program is thus an array of little-endian words with each word fetched andexecuted in order (not including branches).

Consider the following example:

loop:cmplwi r5,0beq donelwzux r4, r5, r6add r7, r7, r4subi r5, 1b loop

done:stw r7, total

Assuming the program starts at address 0, these instructions are mapped into memory forbig-endian execution as shown in Figure 2-38.

Table 2-30. Load/Store Multiple Instructions that Take Alignment Exceptions if LM = 1

Mnemonic Instruction

lmw Load Multiple Word

stmw Store Multiple Word


Figure 2-38. PowerPC Big-Endian, Instruction Sequence as Seen by Processor

If this same program is assembled for and executed in little-endian mode, the mapping seenby the processor appears as shown in Figure 2-39.

Each machine instruction appears in memory as a 32-bit integer containing the valuedescribed in the instruction description, regardless of whether LM is set. This is becausescalars are always mapped in memory in big-endian byte order.

Figure 2-39. PowerPC Little-Endian, Instruction Sequence as Seen by Processor

When little-endian mapping is used, all references to the instruction stream must followlittle-endian addressing conventions, including addresses saved in system registers whenthe exception is taken, return addresses saved in the link register, and branch displacementsand addresses.

• An instruction address placed in the link register by branch and link, or an instruction address saved in an SPR when an exception is taken is the address that a program executing in little-endian mode would use to access the instruction as a word of data using a load instruction.

• An offset in a relative branch instruction reflects the difference between the addresses of the instructions, where the addresses used are those that a program executing in little-endian mode would use to access the instructions as data words using a load instruction.

00 loop: cmplwi r5, 8 beq done

00 01 02 03 04 05 06 07

08 lwzux r4, r5, r6 add r7, r7, r4

08 09 0A 0B 0C 0D 0E 0F

10 subi r5, 1 b loop

10 11 12 13 14 15 16 17

18 done: stw r7, total

18 19 1A 1B 1C 1D 1E 1F

beq done loop: cmplwi 00

07 06 05 04 03 02 01 00

add r7, r7, r4 lwzux r4, r5, r6 08

0F 0E 0D 0C 0B 0A 09 08

b loop subi r5, 1 10

17 16 15 14 13 12 11 10

done: stw r7, total 18

1F 1E 1D 1C 1B 1A 19 18


• A target address in an absolute branch instruction is the address that a program executing in little-endian mode would use to access the target instruction as a word of data using a load instruction.

2.4.7 PowerPC Input/Output in Little-Endian ModeInput/output operations, such as writing the contents of a memory page to disk, transfers abyte stream on both big- and little-endian systems. For the disk transfer, byte 0 of the pageis written to the first byte of a disk record and so on.

For a PowerPC system running in big-endian mode, both the processor and the memorysubsystem recognize the same byte as byte 0. However, this is not true for a PowerPCsystem running in little-endian mode because of the modification of the three low-order bitswhen the processor accesses memory.

In order for I/O transfers in little-endian mode to appear to transfer bytes properly, theymust be performed as if the bytes transferred were accessed one at a time, using the little-endian address modification appropriate for the single-byte transfers (XOR the bits withb'111'). This does not mean that I/O on little-endian PowerPC machines must be done usingonly one-byte-wide transfers. Data transfers can be as wide as desired, but the order of thebytes within double words must be as if they were fetched or stored one at a time.

Note that I/O operations can also be performed with certain devices by merely storing to orloading from addresses that are designated as I/O controller interface addresses (SR[T] isset). Care must be taken with such operations when defining the addresses to be usedbecause these addresses are subjected to the EA modifications described in Table 2-28. Aload or store that maps to a control register on a device may require the bytes of the valuetransferred to be reversed. If this reversal is required, the loads and stores with byte reversalinstructions may be used.

2.5 Floating-Point Execution ModelsThe IEEE-754 standard includes 32-bit and 64-bit arithmetic. The standard requires thatsingle-precision arithmetic be provided for single-precision operands. The standard permitsdouble-precision arithmetic instructions to have either (or both) single-precision or double-precision operands, but states that single-precision arithmetic instructions should not acceptdouble-precision operands.

The PowerPC architecture follows these guidelines:

• Double-precision arithmetic instructions can have operands of either or both precisions

• Single-precision arithmetic instructions require all operands to be single-precision• Double-precision arithmetic instructions produce double-precision values• Single-precision arithmetic instructions produce single-precision values


For arithmetic instructions, conversions from double- to single-precision must be doneexplicitly by software, while conversions from single- to double-precision are doneimplicitly.

All PowerPC implementations provide the equivalent of the following execution models toensure that identical results are obtained. Definition of the arithmetic instructions forinfinities, denormalized numbers, and NaNs follow conventions described in the followingsections.

Although the double-precision format specifies an 11-bit exponent, exponent arithmeticuses two additional bit positions to avoid potential transient overflow conditions. An extrabit is required when denormalized double-precision numbers are prenormalized. A secondbit is required to permit computation of the adjusted exponent value in the followingexamples when the corresponding exception enable bit is one:

• Underflow during multiplication using a denormalized factor.• Overflow during division using a denormalized divisor.

2.5.1 Execution Model for IEEE OperationsThe following description uses 64-bit arithmetic as an example; 32-bit arithmetic is similarexcept that the fraction field is a 23-bit field and the single-precision guard, round, andsticky bits (described in this section) are logically adjacent to the 23-bit FRACTION field.

The bits and fields for the IEEE 64-bit execution model are defined as follows:

• The S bit is the sign bit.

• The C bit is the carry bit that captures the carry out of the significand.

• The L bit is the leading unit bit of the significand which receives the implicit bit from the operands.

• The FRACTION is a 52-bit field, which accepts the fraction of the operands.

• The guard (G), round (R), and sticky (X) bits are extensions to the low-order bits of the accumulator. The G and R bits are required for post normalization of the result. The G, R, and X bits are required during rounding to determine if the intermediate result is equally near the two nearest representable values. The X bit serves as an extension to the G and R bits by representing the logical OR of all bits that may appear to the low-order side of the R bit, either due to shifting the accumulator right or other generation of low-order result bits. The G and R bits participate in the left shifts with zeros being shifted into the R bit. Table 2-31 shows the significance of the G, R, and X bits with respect to the intermediate result (IR), the next lower in magnitude representable number (NL), and the next higher in magnitude representable number (NH).


The significand of the intermediate result is made up of the L bit, the FRACTION, and theG, R, and X bits.

The infinitely precise intermediate result of an operation is the result normalized in bits L,FRACTION, G, R, and X of the floating-point accumulator.

Before results are stored into an FPR, the significand is rounded if necessary, using therounding mode specified by FPSCR[RN]. If rounding causes a carry into C, the significandis shifted right one position and the exponent is incremented by one. This may yield aninexact result and possibly exponent overflow. Fraction bits to the left of the bit positionused for rounding are stored into the FPR, and low-order bit positions, if any, are set to zero.

Four rounding modes are provided which are user-selectable through FPSCR[RN] asdescribed in Section 2.5.6, “Rounding.” For rounding, the conceptual guard, round, andsticky bits are defined in terms of accumulator bits.

Table 2-32 shows the positions of the guard, round, and sticky bits for double-precision andsingle-precision floating-point numbers.

Rounding can be treated as though the significand were shifted right, if required, until theleast significant bit to be retained is in the low-order bit position of the FRACTION. If anyof the guard, round, or sticky bits are nonzero, the result is inexact.

Table 2-31. Interpretation of G, R, and X Bits

G R X Interpretation

0 0 0 IR is exact

0 0 1

IR closer to NL0 1 0

0 1 1

1 0 0 IR midway between NL & NH

1 0 1

IR closer to NH1 1 0

1 1 1

Table 2-32. Location of the Guard, Round and Sticky Bits

Format Guard Round Sticky

Double G bit R bit X bit

Single 24 25 26–52 G,R,X


Z1 and Z2, defined in Section 2.5.6, “Rounding,” can be used to approximate the result inthe target format when one of the following rules is used:

• Round to nearest

— Guard bit = 0: The result is truncated. (Result exact (GRX = 000) or closest to next lower value in magnitude (GRX = 001, 010, or 011)

— Guard bit = 1: Depends on round and sticky bits:

Case a: If the round or sticky bit is one (inclusive), the result is incremented. (result closest to next higher value in magnitude (GRX = 101, 110, or 111))

Case b: If the round and sticky bits are zero (result midway between closest representable values) then if the low-order bit of the result is one, the result is incremented. Otherwise (the low-order bit of the result is zero) the result is truncated (this is the case of a tie rounded to even).

• If during the round to nearest process, truncation of the unrounded number produces the maximum magnitude for the specified precision, the following action is taken:

— Guard bit = 1: Store infinity with the sign of the unrounded result.

— Guard bit = 0: Store the truncated (maximum magnitude) value.

• Round toward zero—Choose the smaller in magnitude of Z1 or Z2. If the guard, round, or sticky bit is nonzero, the result is inexact.

• Round toward +infinityChoose Z1.

• Round toward –infinityChoose Z2.

Where the result is to have fewer than 53 bits of precision because the instruction is a floating round to single-precision or single-precision arithmetic instruction, the intermediate result either is normalized or is placed in correct denormalized form before the result is potentially rounded.

2.5.1.1 Execution Model for Multiply-Add Type InstructionsThe PowerPC architecture makes use of a special instruction form that performs up to threeoperations in one instruction (a multiply, an add, and a negate). With this added capabilityis the ability to produce a more exact intermediate result as an input to the rounder. The 32-bit arithmetic is similar except that the fraction field is smaller. Note that the roundingoccurs only after add; therefore, the computation of the sum and product together areinfinitely precise before the final result is rounded to a representable format.

The first part of the operation is a multiply. The multiply has two 53-bit significands asinputs, which are assumed to be prenormalized, and produces a result conforming to theabove model. If there is a carry out of the significand (into the C bit), the significand isshifted right one position, placing the L bit into the most significant bit of the FRACTIONand placing the C bit into the L bit. All 106 bits (L bit plus the fraction) of the product takepart in the add operation. If the exponents of the two inputs to the adder are not equal, the


significand of the operand with the smaller exponent is aligned (shifted) to the right by anamount added to that exponent to make it equal to the other input’s exponent. Zeros areshifted into the left of the significand as it is aligned and bits shifted out of bit 105 of thesignificand are ORed into the X' bit. The add operation also produces a result conformingto the above model with the X' bit taking part in the add operation.

The result of the add is then normalized, with all bits of the add result, except the X' bit,participating in the shift. The normalized result provides an intermediate result as input tothe rounder that conforms to the model described in Section 2.5.1, “Execution Model forIEEE Operations,” where:

• The guard bit is bit 53 of the intermediate result.• The round bit is bit 54 of the intermediate result.• The sticky bit is the OR of all remaining bits to the right of bit 55, inclusive.

If the instruction is floating negative multiply-add or floating negative multiply-subtract,the final result is negated.

Status bits are set to reflect the result of the entire operation: for example, no status isrecorded for the result of the multiplication part of the operation.

2.5.2 Floating-Point Data FormatThe PowerPC architecture defines the representation of a floating-point value in twodifferent binary, fixed-length formats. The format may be a 32-bit format for a single-precision floating-point value or a 64-bit format for a double-precision floating-point value.The single-precision format may be used for data in memory. The double-precision formatcan be used for data in memory or in floating-point registers.

The length of the exponent and the fraction fields differ between these two precisionformats. The structure of the single-precision format is shown in Figure 2-40; the structureof the double-precision format is shown in Figure 2-41.

Figure 2-40. Floating-Point Single-Precision Format

Figure 2-41. Floating-Point Double-Precision Format

S EXP FRACTION

0 1 8 9 31

S EXP FRACTION

0 1 11 12 63


Values in floating-point format consist of three fields:

• S (sign bit)• EXP (exponent + bias)• FRACTION (fraction)

If only a portion of a floating-point data item in memory is accessed, as with a load or storeinstruction for a byte or half word (or word in the case of floating-point double-precisionformat), the value affected depends on whether the PowerPC system is using big- or little-endian byte ordering, which is described in Section 2.4.3, “Byte and Bit Ordering.” Big-endian mode is the default.

The significand consists of a leading implied bit concatenated on the right with theFRACTION. This leading implied bit is a 1 for normalized numbers and a 0 fordenormalized numbers in the unit bit position (that is, the first bit to the left of the binarypoint). Values representable within the two floating-point formats can be specified by theparameters listed in Table 2-33.

The exponent is expressed as an 8-bit value for single-precision numbers or an 11-bit valuefor double-precision numbers. These bits hold the biased exponent; the true value of theexponent can be determined by subtracting 127 for single-precision numbers and 1023 fordouble-precision values. This is shown in Figure 2-42. Note that using a bias eliminates theneed for a sign bit. The highest-order bit is used both to generate the number, and is animplicit sign bit. Note also that two values are reserved—all bits set indicates that thenumber is an infinity or NaN and all bits cleared indicates that the number is either zero ordenormalized.

Table 2-33. IEEE Floating-Point Fields

Parameter Single-Precision Double-Precision

Exponent bias +127 +1023

Maximum exponent (unbiased)

+127 +1023

Minimum exponent –126 –1022

Format width 32 bits 64 bits

Sign width 1 bit 1 bit

Exponent width 8 bits 11 bits

Fraction width 23 bits 52 bits

Significand width 24 bits 53 bits


.

Figure 2-42. Biased Exponent Format

2.5.2.1 Value RepresentationThe PowerPC architecture defines numerical and non-numerical values representablewithin single- and double-precision formats. The numerical values are approximations tothe real numbers and include the normalized numbers, denormalized numbers, and zerovalues. The non-numerical values representable are the positive and negative infinities, andthe NaNs. The positive and negative infinities are adjoined to the real numbers but are notnumbers themselves, and the standard rules of arithmetic do not hold when they appear inan operation. They are related to the real numbers by “order” alone. It is possible, however,to define restricted operations among numbers and infinities as defined below. The relativelocation on the real number line for each of the defined entities is shown in Figure 2-43.

Figure 2-43. Approximation to Real Numbers

Biased Exponent(binary)

Single-Precision(unbiased)

Double-Precision(unbiased)

11. . . . .11 Reserved for Infinities and NaNs

11. . . . .10 +127 +1023

11. . . . .01 +126 +1022

. . .

. . .

. . .

10. . . . .00 1 1

01. . . . .11 0 0

01. . . . .10 –1 –1

. . .

. . .

. . .

00. . . . .01 –126 –1022

00. . . . .00 Reserved for Zeros and Denormalized Numbers

Positive

Negative

Zero

–∞ –NORM –DENORM –0 +0 +DENORM +NORM +∞

Unrepresentable, small numbers

Tiny Tiny

The positive and negative NaNs are not related to the numbers or ±∞ by order or value, butthey are encodings that convey diagnostic information such as the representation ofuninitialized variables. Table 2-34 describes each of the floating-point formats.

The following sections describe floating-point values defined in the architecture:

2.5.2.2 Binary Floating-Point NumbersBinary floating-point numbers are machine-representable values used to approximate realnumbers. Three categories of numbers are supported: normalized numbers, denormalizednumbers, and zero values.

2.5.2.3 Normalized Numbers (±NORM)The values for normalized numbers have a biased exponent value in the range:

• 1–254 in single-precision format• 1–2046 in double-precision format

The implied unit bit is one. Normalized numbers are interpreted as follows:

NORM = (–1)s x 2E x (1.fraction)

where (s) is the sign, (E) is the unbiased exponent and (1.fraction) is the significandcomposed of a leading unit bit (implied bit) and a fractional part. The format for normalizednumbers is shown in Figure 2-44.

Table 2-34. Recognized Floating-Point Numbers

Sign Bit Biased Exponent Leading Bit Fraction Value

0 Maximum x Nonzero +NaN

0 Maximum x Zero +Infinity

0 0 < Exponent < Maximum 1 Nonzero +Normalized

0 0 0 Nonzero +Denormalized

0 0 0 Zero +0

1 0 0 Zero –0

1 0 0 Nonzero –Denormalized

1 0 < Exponent < Maximum 1 Nonzero –Normalized

1 Maximum x Zero –Infinity

1 Maximum x Nonzero –NaN

Figure 2-44. Format for Normalized Numbers

The ranges covered by the magnitude (M) of a normalized floating-point number areapproximately equal to the following:

Single-precision format:

1.2x10–38 ≤ M ≤ 3.4x1038

Double-precision format:

2.2x10–308 ≤ M ≤ 1.8x10308

2.5.2.4 Zero Values (±0)Zero values have a biased exponent value of zero and a mantissa (leading bit = 0) value ofzero. This is shown in Figure 2-45. Zeros can have a positive or negative sign. The sign ofzero is ignored by comparison operations (that is, comparison regards +0 as equal to –0).

Figure 2-45. Format for Zero Numbers

2.5.2.5 Denormalized Numbers (±DENORM)Denormalized numbers have a biased exponent value of zero and a nonzero fraction field value. The format for denormalized numbers is shown in Figure 2-46.

Figure 2-46. Format for Denormalized Numbers

MIN<EXPONENT<MAX(BIASED) FRACTION = ANY BIT PATTERN

SIGN OF MANTISSA, 0 OR 1

FRACTION = 0


EXPONENT = 0(BIASED)


EXPONENT = 0(BIASED)

FRACTION = ANY NONZEROBIT PATTERN

Denormalized numbers are nonzero numbers smaller in magnitude than the representablenormalized numbers. They are values in which the implied unit bit is zero. Denormalizednumbers are interpreted as follows:

DENORM = (–1)s x 2Emin x (0.fraction)

where (Emin) is the minimum representable exponent value (–126 for single-precision,–1022 for double-precision).

2.5.2.6 Infinities (±∞)Positive and negative infinities have the maximum biased exponent value:

• 255 in the single-precision format• 2047 in the double-precision format

The format for infinities is shown in Figure 2-47.

Figure 2-47. Format for Positive and Negative Infinities

The fraction value is zero. Infinities are used to approximate values greater in magnitudethan the maximum normalized value. Infinity arithmetic is defined as the limiting case ofreal arithmetic, with restricted operations defined between numbers and infinities. Infinitiesand the reals can be related by ordering in the affine sense:

–∞ < every finite number < +∞

Arithmetic using infinite numbers is always exact and does not signal any exception, exceptwhen an exception occurs due to the invalid operations as described in Section 5.4.7.2,“Invalid Operation Exception Conditions.”

2.5.2.7 Not a Numbers (NaNs)NaNs have the maximum biased exponent value and a nonzero fraction field value. Theformat for NaNs is shown in Figure 2-48. The sign bit of NaNs is ignored (that is, NaNs areneither positive nor negative). If the high-order bit of the fraction field is a zero, the NaN isa signaling NaN; otherwise it is a quiet NaN (QNaN).

Figure 2-48. Format for NaNs

Signaling NaNs signal exceptions when they are specified as arithmetic operands.


EXPONENT = MAXIMUM(BIASED) FRACTION = 0

SIGN OF MANTISSA (0 for +NaN, 1 for –NaN)

EXPONENT = MAXIMUM(BIASED)

FRACTION = ANY NONZEROBIT PATTERN


Quiet NaNs represent the results of certain invalid operations, such as invalid arithmeticoperations on infinities or on NaNs, when the invalid operation exception is disabled(FPSCR[VE] = 0). QNaNs are generated under the following conditions:

• An invalid operation occurs and FPSCR[VE] = 0

• An mffs instruction is executed and the upper 32 bits are undefined (this case is 601-specific).

• On Floating Convert to Integer with Round (fctir) and Floating Convert to Integer with Round toward Zero (fctirz) the PowerPC architecture defines bits 0–31 of the target floating point register as undefined. In the 601, these bits take on the value x'FFF8 0000' (which is the representation for a QNaN).

Quiet NaNs propagate through all operations, except ordered comparison and conversionto integer operations without signaling exceptions. Specific encodings in QNaNs can thusbe preserved through a sequence of operations and used to convey diagnostic informationto help identify results from invalid operations.

When a QNaN results from an operation because an operand is a NaN or because a QNaNis generated due to a disabled invalid operation exception, the following rule is applied todetermine the QNaN with the high-order fraction bit set to one that is to be stored as theresult:

If (frA) is a NaNThen frD ← (frA)

Else if (frB) is a NaNThen frD ← (frB)Else if (frC) is a NaN

Then frD ← (frC)Else if generated QNaN

Then frD ← generated QNaN

If the operand specified by frA is a NaN, that NaN is stored as the result. Otherwise, if theoperand specified by frB is a NaN (if the instruction specifies an frB operand), that NaN isstored as the result. Otherwise, if the operand specified by frC is a NaN (if the instructionspecifies an frC operand), that NaN is stored as the result. Otherwise, if a QNaN isgenerated by a disabled invalid operation exception, that QNaN is stored as the result. If aQNaN is to be generated as a result, the QNaN generated has a sign bit of zero, an exponentfield of all ones, and a high-order fraction bit of one with all other fraction bits zero. Aninstruction that generates a QNaN as the result of a disabled invalid operation generates thisQNaN. This is shown in Figure 2-49.

Figure 2-49. Representation of Generated QNaN

SIGN OF MANTISSA, NaN OR 1

111...1 1000....00


2.5.3 Sign of ResultThe following rules govern the sign of the result of an arithmetic operation, when theoperation does not yield an exception. These rules apply even when the operands or resultsare ±0 or ±∞:

• The sign of the result of an addition operation is the sign of the source operand having the larger absolute value. If both operands have the same sign, the sign of the result of an addition operation is the same as the sign of the operands. The sign of the result of the subtraction operation, x – y, is the same as the sign of the result of the addition operation, x + (–y).

• When the sum of two operands with opposite sign, or the difference of two operands with the same sign, is exactly zero, the sign of the result is positive in all rounding modes except round toward negative infinity(–∞), in which case the sign is negative.

• The sign of the result of a multiplication or division operation is the exclusive OR of the signs of the source operands.

• The sign of the result of a round to single-precision or convert to/from integer operation is the sign of the source operand.

For multiply-add instructions, these rules are applied first to the multiplication operationand then to the addition or subtraction operation (one of the source operands to the additionor subtraction operation is the result of the multiplication operation).

2.5.4 Normalization and DenormalizationWhen an arithmetic operation produces an intermediate result, consisting of a sign bit, anexponent, and a nonzero significand with a zero leading bit, the result is not a normalizednumber and must be normalized before it is stored.

A number is normalized by shifting its significand left while decrementing its exponent byone for each bit shifted, until the leading significand bit becomes one. The guard bit and theround bit participate in the shift with zeros shifted into the round bit; see Section 2.5.1,“Execution Model for IEEE Operations.” During normalization, the exponent is regardedas if its range were unlimited. If the resulting exponent value is less than the minimum valuethat can be represented in the format specified for the result, the intermediate result is saidto be “tiny” and the stored result is determined by the rules described in Section 5.4.7.5,“Underflow Exception Condition.” The sign of the number does not change.

When an arithmetic operation produces a nonzero intermediate result whose exponent isless than the minimum value that can be represented in the format specified, the storedresult may need to be denormalized. The result is determined by the rules described inSection 5.4.7.5, “Underflow Exception Condition.”

A number is denormalized by shifting its significand to the right while incrementing itsexponent by one for each bit shifted until the exponent equals the format’s minimum value.If any significant bits are lost in this shifting process then “Loss of Accuracy” has occurredand an underflow exception is signaled. The sign of the number does not change.


When denormalized numbers are operands of multiply and divide operations, operands areprenormalized internally before performing the operations.

2.5.5 Data Handling and PrecisionThere are specific instructions for moving floating-point data between the FPRs andmemory. For double-precision format data, the data is not altered during the move. Forsingle-precision data, the format is converted to double-precision format when data isloaded from memory into an FPR. A format conversion from double- to single-precision isperformed when data from an FPR is stored. Floating-point exceptions cannot occur duringthese operations.

All arithmetic operations use floating-point double-precision format.

Floating-point single-precision formats are used by the following four types of instructions:

• Load Floating-Point Single-Precision (lfs)—This instruction accesses a single-precision operand in single-precision format in memory, converts it to double-precision, and loads it into an FPR. Exceptions are not detected during the load operation.

• Floating-point Round to Single-Precision (frspx)—If the operand is not already in single-precision range, the floating round to single-precision instruction rounds a double-precision operand to single-precision, checking the exponent for single-precision range and handling any exceptions according to respective enable bits in the FPSCR. The instruction places that operand into an FPR as a double-precision operand. For results produced by single-precision arithmetic instructions and by single-precision loads, this operation does not alter the value.

• Single-precision arithmetic instructions—These instructions take operands from the FPRs in double-precision format, performs the operation as if it produced an intermediate result correct to infinite precision and with unbounded range, and then forces this intermediate result to fit in single-precision format. Status bits in the FPSCR and in the condition register are set to reflect the single-precision result. The result is then converted to double-precision format and placed into an FPR. The result falls within the range supported by the single format.

• For single-precision operations, source operands must be representable in single-precision format. If they are not, the result placed into the target FPR, and the setting of status bits in the FPSCR and in the condition register, are undefined.

• Store Floating-Point Single-Precision (stfs)—This form of instruction converts a double-precision operand to single-precision format and stores that operand into memory. If the operand requires denormalization in order to fit in single-precision format, it is automatically denormalized prior to being stored. No exceptions are detected on the store operation (the value being stored is effectively assumed to be the result of an instruction of one of the preceding three types).


When the result of a Load Floating-Point Single-Precision (lfs), Floating-Point Round toSingle-Precision (frspx), or single-precision arithmetic instruction is stored in an FPR, thelow-order 29 fraction bits are zero. This is shown in Figure 2-50.

Figure 2-50. Single-Precision Representation in an FPR

The frspx instruction allows conversion from double- to single-precision with appropriateexception checking and rounding. This instruction should be used to convert double-precision floating-point values (produced by double-precision load and arithmeticinstructions) to single-precision values before storing them into single-format memoryelements or using them as operands for single-precision arithmetic instructions. Valuesproduced by single-precision load and arithmetic instructions can be stored directly, or useddirectly as operands for single-precision arithmetic instructions, without preceding thestore, or the arithmetic instruction, by frspx.

A single-precision value can be used in double-precision arithmetic operations. The reverseis true only if the double-precision value can be represented in single-precision format.Some implementations may execute single-precision arithmetic instructions faster thandouble-precision arithmetic instructions. Therefore, if double-precision accuracy is notrequired, using single-precision data and instructions can speed operations.

2.5.6 RoundingAll arithmetic instructions defined by the PowerPC architecture produce an intermediateresult considered infinitely precise. This result must then be written with a precision offinite length into an FPR. After normalization or denormalization, if the infinitely preciseintermediate result cannot be represented in the precision required by the instruction, it isrounded before being placed into the target FPR.

The instructions that potentially round their result are the arithmetic, multiply-add, androunding and conversion instructions. As shown in Figure 2-51, whether rounding occursdepends on the source values.

S EXP x x x x x x x x x xx x x x x x x x x x x x x 00000000000000000000000000000

0 1 11 12 63

Bit 35


Figure 2-51. Rounding Flow Diagram

Each of these instructions sets FPSCR bits FR and FI, according to whether roundingoccurs (FI) and whether the fraction was incremented (FR). If rounding occurs, FI is set toone and FR may be either zero or one. If rounding does not occur, both FR and FI arecleared. Other floating-point instructions do not alter FR and FI. Four modes of roundingare provided that are user-selectable through the floating-point rounding control field in theFPSCR. See Section 2.2.3, “Floating-Point Status and Control Register (FPSCR).” Theseare encoded as follows in Table 2-35.

Let Z be the infinitely precise intermediate arithmetic result or the operand of a conversionoperation. If Z can be represented exactly in the target format, no rounding occurs and theresult in all rounding modes is equivalent to truncation of Z. If Z cannot be representedexactly in the target format, let Z1 and Z2 be the next larger and next smaller numbersrepresentable in the target format that bound Z; then Z1 or Z2 can be used to approximatethe result in the target format.

Table 2-35. FPSCR Bit Settings—RN Field

RN Rounding Mode

00 Round to nearest

01 Round toward zero

10 Round toward +infinity

11 Round toward –infinity

Rounding

Yes

NoFI = 0FR = 0

No

FI = 1

FR = 0

Yes

FI = 1

Fraction Incremented


Figure 2-52 shows a graphical representation of Z, Z1, and Z2 in this case and Figure 2-53shows the selection of Z1 and Z2 for the four rounding settings.

Figure 2-52. Relation of Z1 and Z2

Rounding follows the four following rules:

• Round to nearest—Choose the best approximation (Z1 or Z2. In case of a tie, choose the one which is even (least significant bit 0)).

• Round toward zero—Choose the smaller in magnitude (Z1 or Z2).

• Round toward +infinity—Choose Z1.

• Round toward –infinity—Choose Z2.

See Section 2.5.1, “Execution Model for IEEE Operations,” for a detailed explanation ofrounding. If Z is to be rounded up and Z1 does not exist (that is, if there is no number largerthan Z that is representable in the target format), then an overflow exception occurs if Z ispositive and an underflow exception occurs if Z is negative. Similarly, if Z is to be roundeddown and Z2 does not exist, then an overflow exception occurs if Z is negative and anunderflow exception occurs if Z is positive. The results in these cases are defined in Section5.4.7.1, “Floating-Point Enabled Program Exceptions.”

By incrementing LSB of Z

Infinitely precise value

By truncating after LSB

Z2 Z1 0 Z2 Z1

Z ZNegative values Positive values


Figure 2-53. Selection of Z1 and Z2

2.6 PowerPC Registers Unimplemented in the 601The following PowerPC registers are not implemented in the 601:

• The time base SPRs are used in the PowerPC architecture instead of the RTC registers. The architected time base facility operates as a subdivision of the frequency provided by the processor clock.

• Floating-point exception cause register (FPECR)—This is a supervisor-level SPR (1023) that is used by some implementations to determine the cause of a floating-point error.

Yes

No

Z is infinitely precise result

or operand

Does Z fit target format?

Rounding = Truncation

No

Round toward –∞?

Choose Z2

Z1 ≤ Z ≤ Z2

Yes

NoYes

Round toward +∞?

No

Choose Z1

YesRound

toward 0?

No

Choose Z1

Round to nearest

Choose best approxi-mation (Z1 or Z2)

if tie

Choose even value (Z1 or Z2 whose lsb is 0)


• Address space register (ASR)—The ASR is a 64-bit SPR used in 64-bit implementations to perform address translations.

• Each PowerPC processor implements a unique set of HID registers. Note that some of these registers may be implemented the same way in more than one PowerPC processor design.

An mtspr or mfspr instruction that specifies an unimplemented register is treated as a no-op. If a privilege violation is indicated, the program exception has priority over the no-op.This can occur if a user-mode program tries to access a register with bit 0 of the SPRencoding field (in the instruction format) set. However, in this case the program exceptionis taken regardless of whether the SPR encoding specified an implemented register.

2.7 ResetThe following sections describe hard reset and soft reset in the 601 processor. For moreinformation about the reset exception see Section 5.4.1, “Reset Exceptions (x'00100').”

2.7.1 Hard ResetThe hard reset sequence begins when the hard reset signal HRESET is negated after beingdriven as described in Section 8.2.9.4.1, “Hard reset (HRESET)—Input.” Note that a hardreset operation is required on power-on in order to properly reset the 601.

Table 2-36 shows the state of the registers after a hard reset and before it fetches the firstinstruction from address x'FFF0 0100' in the system reset exception vector.

Table 2-36. Settings after Hard Reset (Used at Power-On)

Register Setting Register Setting

GPRs All 0s SRR1 00000000

FPRs All 0s SPRG0 00000000

FPSCR 00000000 SPRG1 00000000

Condition register All 0s SPRG2 00000000

Segment registers All 0s SPRG3 00000000

MSR 00001040 (ME and EP set)

EAR 00000000

MQ 00000000 PVR 000100011

XER 00000000 BAT registers All 0s

RTCU3 00000000 HID0 800100802

RTCL3 000000003 HID1 00000000

Link register 00000000 HID2 00000000

CTR 00000000 HID5 00000000

DSISR 00000000 HID15 00000000


Notes: 1 In the earliest release of the 601 (DD1), this is 00010000. Later versions of the hardware may be different.

2 Master checkstop enable on, sequencer GPR self-test checkstop invalid microcode instruction checkstop on.

3 Note that if external clock is connected to RTC for the 601, then the RTCL, RTCU, and DEC can change from their initial value of 0s without receiving instructions to load those registers.

The following is also true after a hard reset operation:

• External checkstops are enabled.

• The on-chip COP has given control of the PIs/POs to the rest of the chip for functional use.

• Since the reset exception has data and instruction translation disabled (MSR[DT] and MSR[IT] both cleared), the chip operates in direct address translation mode. This implies that instruction fetches as well as loads and stores are cacheable. (Operations that correspond to direct address translations are implicitly cacheable, not write-through mode, and require coherency checking on the bus).

• All internal arrays and registers are cleared during the hard reset process.

• Reinitializes big-endian mode.

2.7.2 Soft ResetRegisters are not re-initialized when a soft reset occurs (SRESET is asserted as describedin Section 8.2.9.4.2, “Soft reset (SRESET)—Input”). The SRR0 and SRR1 registers areupdated with instruction and MSR data, and the MSR values are reset according toprocedures described in Section 5.4.1, “Reset Exceptions (x'00100').”

DAR 00000000 TLBs All 0s

DEC3 00000000 Cache All 0s

SDR1 00000000 Tag directory All 0s. (However, the LRU bits are initialized such that each side of the cache has a unique LRU value).

SRR0 00000000

Table 2-36. Settings after Hard Reset (Used at Power-On) (Continued)

Register Setting Register Setting

Chapter 3. Addressing Modes and Instruction Set Summary 3-1

Chapter 3 Addressing Modes and Instruction Set Summary3030

This chapter describes instructions and address modes supported by the PowerPC 601microprocessor. These instructions are divided into the following categories:

• Integer instructions—These include arithmetic and logical instructions.

• Floating-point instructions—These include floating-point arithmetic instructions, as well as instructions that affect the floating-point status and control register.

• Load/store instructions—These include integer and floating-point load and store instructions.

• Flow control instructions—These include branching instructions, condition register logical instructions, trap instructions, and other instructions that affect the instruction flow.

• Processor control instructions—These instructions are used for synchronizing memory accesses and management of caches, TLBs, and the segment registers.

This grouping of the instructions does not necessarily indicate the execution unit thatprocesses a particular instruction or group of instructions. This information, which is usefulin taking full advantage of the 601’s superscalar parallel instruction execution, is providedin Chapter 10, “Instruction Set.”

Integer instructions operate on byte, half-word, and word operands. Floating-pointinstructions operate on single-precision and double-precision floating-point operands. ThePowerPC architecture uses instructions that are four bytes long and word-aligned. Itprovides for byte, half-word, and word operand fetches and stores between memory and aset of 32 general-purpose registers (GPRs). It also provides for word and double-wordoperand fetches and stores between memory and a set of 32 floating-point registers (FPRs).

Arithmetic and logical instructions do not read or modify memory. To use the contents of amemory location in a computation and then modify the same or another memory location,the memory contents must be loaded into a register, modified, and then written back to thetarget location using load or store instructions.


The 601 appears to execute instructions sequentially and in program order, but theexecution of a sequence of instructions may be interrupted as a result of an exceptioncaused by one of the instructions in the sequence, or by some asynchronous event.

3.1 Memory AddressingA program references memory using the effective (logical) address computed by theprocessor when it executes a memory access or branch instruction, or when the nextsequential instruction is fetched.

3.1.1 Effective Address CalculationAn effective address is the 32-bit sum computed by the processor when executing amemory access or branch instruction or when fetching the next sequential instruction. Fora memory access instruction, if the sum of the effective address and the operand lengthexceeds the maximum effective address, the memory operand is considered to wrap aroundfrom the maximum effective address through effective address 0, as described in thefollowing paragraphs.

Effective address computations for both data and instruction accesses use 32-bit unsignedbinary arithmetic. A carry from bit 0 is ignored.

Load and store operations have three categories of effective address generation:

• Register indirect with immediate index mode. The d operand is added to the contents of the GPR specified by the rA operand to generate the effective address.

• Register indirect with index mode. The contents of the GPR specified by rB operand are added to the contents of the GPR specified by the rA operand to generate the effective address.

• Register indirect mode. The contents of the GPR specified by the rA operand are used as the effective address.

Branch instructions have three categories of effective address generation:

• Immediate addressing. The BD or LI operands are sign extended, and are appended with b'00' in the two low-order bit positions (bits 30 and 31) to generate the branch effective address. If the AA bit (bit 30) is cleared, the BD or LI operands are treated as displacements; if the AA bit is set, the BD or LI operands are treated as absolute addresses.

• Link register indirect. The contents of the link register with the two low-order bits cleared to zero are used as the branch effective address.

• Count register indirect. The contents of the count register with the two low-order bits cleared to zero are used as the branch effective address.

Branch instructions can optionally load the link register with the next sequential instructionaddress (current instruction address + 4).


3.1.2 Context Synchronization The System Call (sc), Return from Interrupt (rfi), and Instruction Synchronize(isync)instructions perform context synchronization by allowing previously issued instructions tocomplete before performing a context switch. Execution of one of these instructionsensures the following:

• No higher priority exception exists.

• All previous instructions have completed to a point where they can no longer cause an exception. If a prior memory access instruction causes direct-store error exceptions, the results must be determined before this instruction is executed.

• Previous instructions complete execution in the context (privilege, protection, and address translation) under which they were issued.

• The instructions following the sc, rfi, or isync instruction execute in the context established by these instructions.

The Move to Machine State Register instruction (mtmsr) is execution synchronizing. Itensures that all preceding instructions have completed execution and will not cause anexception before the instruction executes, but does not ensure subsequent instructionsexecute in the newly established environment. For example, if the mtmsr sets the MSR(PR)bit to 1, unless an isync immediately follows the mtmsr, a privileged instruction could beexecuted or privileged access could be performed without causing an exception eventhough the MSR(PR) bit indicates user mode.

3.2 Exception SummaryThere are two kinds of exceptions in the 601—those caused directly by the execution of aninstruction and those caused by an asynchronous event. Either may cause components ofthe system software to be invoked.

Exceptions can be caused directly by the execution of an instruction as follows:

• An attempt to execute an illegal instruction or an attempt by a user-level program to execute the supervisor-level instructions listed below cause the illegal instruction or supervisor-level instruction handler to be invoked. The 601provides the following supervisor-level instructions: dcbi, mfmsr, mfspr, mfsr, mfsrin, mtmsr, mtspr, mtsr, mtsrin, rfi, and tlbie. Note that the mfspr and mtspr instructions are executable at both the user- and supervisor-level, depending on the SPR encoding.

• An attempt to access memory in a manner that violates memory protection, or an attempt to access memory that is not available (page fault), causes the data access exception handler or instruction access exception handler to be invoked.

• An attempt to access memory with an effective address alignment that is invalid for the instruction causes the alignment exception handler to be invoked.

• The execution of an sc instruction causes the system service program to be invoked.


• The execution of a trap instruction that traps causes the program exception trap handler to be invoked.

• The execution of a floating-point instruction when floating-point instructions are disabled causes the floating-point unavailable handler to be invoked.

• The execution of an instruction that causes a floating-point exception while floating-point exceptions are enabled causes the floating-point enabled exception handler to be invoked.

Exceptions caused by asynchronous events are described in Chapter 5, “Exceptions.”

3.3 Integer InstructionsThis section describes the integer instructions. These consist of the following:

• Integer arithmetic instructions• Integer compare instructions• Integer rotate and shift instructions• Integer logical instructions.

Integer instructions use the content of the GPRs as source operands and place results intoGPRs, into the integer exception register (XER), and into condition register fields. Trapinstructions compare the contents of one GPR with a second GPR or with immediate dataand, if the conditions are met, invoke the program exception trap handler.

These instructions treat the source operands as signed integers unless the instruction isexplicitly identified as an unsigned operation.

The integer instructions that are coded to update the condition register and the integerlogical and arithmetic instructions (addic., andi., and andis.) set condition register fieldCR0 (bits 0–3) to characterize the result of the operation. The condition register field CR0is set as if the result were compared algebraically to zero.

The integer arithmetic instructions (addic, addic., subfic, addc, subfc, adde, subfe,addme, subfme, addze, and subfze) always set integer exception register bit, CA, to reflectthe carry out of bit 0. Integer arithmetic instructions with the overflow enable (OE) bit setwill cause the XER bits SO and OV to be set to reflect overflow of the 32-bit result.

Unless otherwise noted, when condition register field CR0 and the XER are affected theyreflect the value placed in the target register.

3.3.1 Integer Arithmetic InstructionsIn the 601, instructions that select the overflow option (enable XER(OV)) or that set theinteger exception register carry bit (CA) may delay the execution of subsequentinstructions.


The 601 integer unit contains the user accessible MQ register and supports the multiply(mul), divide (div), shift, and rotate instructions that use this register. Neither the registernor the associated instructions are present in other PowerPC processors nor are they definedin the PowerPC architecture. The execution of any PowerPC multiply or divide instructioncauses the content of the MQ to be undefined.

Table 3-1 lists the integer arithmetic instructions for the 601. Note that some of theinstructions are specific to the 601 implementation.

Table 3-1. Integer Arithmetic Instructions

Name MnemonicOperand Syntax

Operation

Add Immediate

addi rD,rA,SIMM The sum (rA|0) + SIMM is placed into register rD.

Add Immediate Shifted

addis rD,rA,SIMM The sum (rA|0) + (SIMM || x '0000') is placed into register rD.

Add addadd.addoaddo.

rD,rA,rB The sum (rA) + (rB) is placed into register rD.

add Addadd. Add with CR Update. The dot suffix enables the update of

the condition register.addo Add with Overflow Enabled. The o suffix enables the

overflow bit (OV) in the XER. addo. Add with Overflow and CR Update. The o. suffix enables

the update of the condition register and enables the overflow bit (OV) in the XER.

Subtract from

subfsubf.subfosubfo.

rD,rA,rB The sum ¬ (rA) + (rB) +1 is placed into rD.

subf Subtract fromsubf. Subtract from with CR Update. The dot suffix enables the

update of the condition register.subfo Subtract from with Overflow Enabled. The o suffix enables

the overflow. The o suffix enables the overflow bit (OV) in the XER.

subfo. Subtract from with Overflow and CR Update. The o. suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.

Add Immediate Carrying

addic rD,rA,SIMM The sum (rA) + SIMM is placed into register rD.

Add Immediate Carrying and Record

addic. rD,rA,SIMM The sum (rA) + SIMM is placed into rD. The condition register is updated.

Subtract from Immediate Carrying

subfic rD,rA,SIMM The sum ¬ (rA) + SIMM + 1 is placed into register rD.


Add Carrying

addcaddc.addcoaddco.

rD,rA,rB The sum (rA) + (rB) is placed into register rD.

addc Add Carryingaddc. Add Carrying with CR Update. The dot suffix enables the

update of the condition register.addco Add Carrying with Overflow Enabled. The o suffix enables

the overflow bit (OV) in the XER. addco. Add Carrying with Overflow and CR Update. The o. suffix

enables the update of the condition register and enables the overflow bit (OV) in the XER.

Subtract from Carrying

subfcsubfc.subfcosubfco.

rD,rA,rB The sum ¬ (rA) + (rB) + 1 is placed into register rD.

subfc Subtract from Carryingsubfc. Subtract from Carrying with CR Update. The dot suffix

enables the update of the condition register.subfco Subtract from Carrying with Overflow. The o suffix enables

the overflow bit (OV) in the XER. subfco. Subtract from Carrying with Overflow and CR Update.

The o. suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.

Add Extended

addeadde.addeoaddeo.

rD,rA,rB The sum (rA) + (rB) + XER[CA] is placed into register rD.

adde Add Extendedadde. Add Extended with CR Update. The dot suffix enables the

update of the condition register. addeo Add Extended with Overflow. The o suffix enables the

overflow bit (OV) in the XER. addeo. Add Extended with Overflow and CR Update. The o. suffix


Subtract from Extended

subfesubfe.subfeosubfeo.

rD,rA,rB The sum ¬ (rA) + (rB) + XER[CA] is placed into register rD.

subfe Subtract from Extendedsubfe. Subtract from Extended with CR Update. The dot suffix

enables the update of the condition register.subfeo Subtract from Extended with Overflow. The o suffix

enables the overflow bit (OV) in the XER. subfeo. Subtract from Extended with Overflow and CR Update.

The o. suffix enables the update of the condition register and enables the overflow (OV) bit in the XER.

Add to Minus One Extended

addmeaddme.addmeoaddmeo.

rD,rA The sum (rA) + XER[CA] + x'FFFFFFFF' is placed into register rD.

addme Add to Minus One Extendedaddme. Add to Minus One Extended with CR Update. The dot

suffix enables the update of the condition register.addmeo Add to Minus One Extended with Overflow. The o suffix

enables the overflow bit (OV) in the XER. addmeo. Add to Minus One Extended with Overflow and CR

Update. The o. suffix enables the update of the condition register and enables the overflow (OV) bit in the XER.

Table 3-1. Integer Arithmetic Instructions (Continued)


Operation


Subtract from Minus One Extended

subfmesubfme.subfmeosubfmeo.

rD,rA The sum ¬ (rA) + XER(CA) + x'FFFFFFFF' is placed into register rD.

subfme Subtract from Minus One Extendedsubfme. Subtract from Minus One Extended with CR Update. The

dot suffix enables the update of the condition register.subfmeo Subtract from Minus One Extended with Overflow. The o

suffix enables the overflow bit (OV) in the XER. subfmeo. Subtract from Minus One Extended with Overflow and CR

Update. The o. suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.

Add to Zero Extended

addzeaddze.addzeoaddzeo.

rD,rA The sum (rA) + XER[CA] is placed into register rD.

addze Add to Zero Extendedaddze. Add to Zero Extended with CR Update. The dot suffix

enables the update of the condition register.addzeo Add to Zero Extended with Overflow. The o suffix enables

the overflow bit (OV) in the XER. addzeo. Add to Zero Extended with Overflow and CR Update. The

o. suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.

Subtract from Zero Extended

subfzesubfze.subfzeosubfzeo.

rD,rA The sum ¬ (rA) + XER[CA] is placed into register rD.

subfze Subtract from Zero Extendedsubfze. Subtract from Zero Extended with CR Update. The dot

suffix enables the update of the condition register.subfzeo Subtract from Zero Extended with Overflow. The o suffix

enables the overflow bit (OV) in the XER. subfzeo. Subtract from Zero Extended with Overflow and CR

Update. The o. suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.

Negate negneg.negonego.

rD,rA The sum ¬ (rA) + 1 is placed into register rD.

neg Negateneg. Negate with CR Update. The dot suffix enables the update

of the condition register.nego Negate with Overflow. The o suffix enables the overflow bit

(OV) in the XER. nego. Negate with Overflow and CR Update. The o. suffix


Multiply Low Immediate

mulli rD,rA,SIMM The low-order 32 bits of the 48-bit product (rA)∗ SIMM are placed into register rD. The low-order 32 bits of the product are the correct 32-bit product. The low-order bits are independent of whether the operands are treated as signed or unsigned integers. However, XER[OV] is set based on the result interpreted as a signed integer. The high-order bits are lost. This instruction can be used with mulhwx to calculate a full 64-bit product.



Operation


Multiply Low

mullwmullw.mullwomullwo.

rD,rA,rB The low-order 32 bits of the 64-bit product (rA)∗ (rB) are placed into register rD. The low-order 32 bits of the product are the correct 32-bit product. The low-order bits are independent of whether the operands are treated as signed or unsigned integers. However, XER[OV] is set based on the result interpreted as a signed integer.

The high-order bits are lost. This instruction can be used with mulhwx to calculate a full 64-bit product.This instruction may execute faster if rB contains the operand having the smaller absolute value.

mullw Multiply Low mullw. Multiply Low with CR Update. The dot suffix enables the

update of the condition register.mullwo Multiply Low with Overflow. The o suffix enables the

overflow bit (OV) in the XER.mullwo. Multiply Low with Overflow and CR Update. The o. suffix


Multiply High Word

mulhwmulhw.

rD,rA,rB The contents of rA and rB are interpreted as 32-bit signed integers. The 64-bit product is formed. The high-order 32 bits of the 64-bit product are placed into rD.

Both operands and the product are interpreted as signed integers.

This instruction may execute faster if rB contains the operand having the smaller absolute value.

mulhw Multiply High Wordmulhw. Multiply High Word with CR Update. The dot suffix enables

the update of the condition register.

Multiply High Word Unsigned

mulhwumulhwu.

rD,rA,rB The contents of rA and of rB are extracted and interpreted as 32-bit unsigned integers. The 64-bit product is formed. The high-order 32 bits of the 64-bit product are placed into rD.

Both operands and the product are interpreted as unsigned integers.

This instruction may execute faster if rB contains the operand having the smaller absolute value.

mulhwu Multiply High Word Unsignedmulhwu. Multiply High Word Unsigned with CR Update. The dot

suffix enables the update of the condition register.



Operation

Divide Word divwdivw.divwodivwo.

rD,rA,rB The dividend is the signed value of (rA). The divisor is the signed value of (rB). The quotient is placed into rD. The remainder is not supplied as a result.Both operands are interpreted as signed integers. The quotient is the unique signed integer that satisfies the following:

dividend = (quotient * divisor) + rwhere 0 ≤ r < |divisor| if the dividend is non-negative, and –|divisor| < r ≤ 0 if the dividend is negative.If an attempt is made to perform any of the divisions

x'8000_0000' /–1or <anything> / 0

the contents of register rD are undefined, as are the contents of the LT, GT, and EQ bits of the condition register field CR0 if the instruction has condition register updating enabled. In these cases, if instruction overflow is enabled, then XER[OV] is set.The 32-bit signed remainder of dividing (rA) by (rB) can be computed as follows, except in the case that (rA) = –231 and (rB) = –1:

divw rD,rA,rB rD = quotientmullw rD,rD,rB rD = quotient ∗ divisorsubf rD,rD,rA rD = remainder

divw Divide Worddivw. Divide Word with CR Update. The dot suffix enables the

update of the condition register.divwo Divide Word with Overflow. The o suffix enables the overflow

bit (OV) in the XER.divwo. Divide Word with Overflow and CR Update. The o. suffix




Operation

Divide Word Unsigned

divwudivwu.divwuodivwuo.

rD,rA,rB The dividend is the value of (rA). The divisor is the value of (rB). The 32-bit quotient is placed into rD. The remainder is not supplied as a result.Both operands are interpreted as unsigned integers. The quotient is the unique unsigned integer that satisfies the following:

dividend = (quotient * divisor) + r where 0 ≤ r < divisor.If an attempt is made to perform the division

<anything> / 0 the contents of register rD are undefined, as are the contents of the LT, GT, and EQ bits of the condition register field CR0 if the instruction has the condition register updating enabled. In these cases, if instruction overflow is enabled, then XER[OV] is set.

The 32-bit unsigned remainder of dividing (rA) by (rB) can be computed as follows:

divwu rD,rA,rB rD = quotientmullw rD,rD,rB rD = quotient * divisorsubf rD,rD,rA rD = remainder

divwu Divide Word Unsigneddivwu. Divide Word Unsigned with CR Update. The dot suffix

enables the update of the condition register.divwuo Divide Word Unsigned with Overflow. The o suffix enables

the overflow bit (OV) in the XER.divwuo. Divide Word Unsigned with Overflow and CR Update. The

o. suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.

Difference or Zero Immediate

dozi rD,rA,SIMM This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

The sum ¬ (rA) + SIMM + 1 is placed into register rD if greater than 0; if the sum is less than or equal to 0, register rD is cleared to 0.

This instruction is specific to the 601.



Operation


Difference or Zero

dozdoz.dozodozo.

rD,rA,rB This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

The sum ¬ (rA) + (rB) + 1 is placed into register rD. If the value in register rA is algebraically greater than the value in register rB, register rD is cleared.If the instruction has condition register updating enabled, condition register field CR0 is set to reflect the result placed in register rD (i.e., if register rD is set to zero, EQ is set to 1).

If the instruction has overflow enabled, XER[OV] is only set on positive overflows.

doz Difference or Zerodoz. Difference or Zero with CR Update. The dot suffix enables

the update of the condition register.dozo Difference or Zero with Overflow. The o suffix enables the

overflow bit (OV) in the XER.dozo. Difference or Zero with Overflow and CR Update. The o.

suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.


Absolute absabs.absoabso.

rD,rA This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

The absolute value |(rA)| is placed into register rD. If register rA contains the most negative number (i.e., x ‘80000000'), the result of the instruction is the most negative number and sets the XER[OV] bit if enabled.

abs Absoluteabs. Absolute with CR Update. The dot suffix enables the

update of the condition register.abso Absolute with Overflow. The o suffix enables the overflow

bit (OV) in the XERabso. Absolute with Overflow and CR Update. The o. suffix





Operation


Negative Absolute

nabsnabs.nabsonabso.

rD,rA This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

The negative absolute value –|(rA)| is placed into register rD.

Note: nabs never overflows. If the instruction is overflow enabled, then XER[OV] is cleared to zero and XER[SO] is not changed.

nabs Negative Absolutenabs. Negative Absolute with CR Update. The dot suffix enables

the update of the condition register.nabso Negative Absolute with Overflow. The o suffix enables the

overflow bit (OV) in the XERnabso. Negative Absolute with Overflow and CR Update. The o.

suffix enables the update of the condition register and enables the overflow bit (OV) in the XER.


Multiply mulmul.mulomulo.


Bits 0–31 of the product (rA)∗ (rB) are placed into register rD. Bits 32–63 of the product (rA)∗ (rB) are placed into the MQ register.

If the condition register updating is enabled, then LT, GT, and EQ reflect the result in the low-order 32 bits (contents of MQ register). If the instruction is overflow enabled, then the XER[SO] and XER[OV] bits are set to one if the product cannot be represented in 32 bits.

mul Multiplymul. Multiply with CR Update. The dot suffix enables the update

of the condition register.mulo Multiply with Overflow. The o suffix enables the overflow

bit (OV) in the XER.mulo. Multiply with Overflow and CR Update. The o. suffix





Operation


Divide divdiv.divodivo.


The quotient [(rA) || (MQ)]/(rB) is placed into register rD. The remainder is placed in the MQ register. The remainder has the same sign as the dividend, except that a zero quotient or a zero remainder is always positive. The results obey the equation:

dividend = (divisor ∗ quotient) + remainder

where dividend is the original (rA) || (MQ), divisor is the original (rB), quotient is the final (rD), and remainder is the final (MQ).

If the condition register updating is enabled, condition register field CR0 bits LT, GT, and EQ reflect the remainder. If the instruction is overflow enabled, then the XER[SO] and XER[OV] bits are set to one if the quotient cannot be represented in 32 bits.

For the case of –231/–1, the MQ register is cleared to zero and –231 is placed in register rD. For all other overflows, (MQ), (rD), and condition register field CR0 (if condition register updating is enabled) are undefined.

div Dividediv. Divide with CR Update. The dot suffix enables the update

of the condition register.divo Divide with Overflow. The o suffix enables the overflow bit

(OV) in the XER.divo. Divide with Overflow and CR Update. The o. suffix enables

the update of the condition register and enables the overflow bit (OV) in the XER.




Operation


In addition to supporting all of the PowerPC integer arithmetic instructions, the 601supports the POWER arithmetic instructions summarized in Table 3-1 and Table 3-2 anddescribed in detail in Chapter 10, “Instruction Set.” Note that in order to achieve fullcompatibility with future PowerPC implementations, it is up to software to either emulatethese operations in the program exception handler, or to completely avoid their use.

Divide Short divsdivs.divsodivso.


The quotient (rA)/(rB) is placed into register rD. The remainder is placed in MQ. The remainder has the same sign as the dividend, except that a zero quotient or a zero remainder is always positive. The results obey the equation:

dividend = (divisor ∗ quotient) + remainder

where the dividend is the original (rA), divisor is the original (rB), quotient is the final (rD), and remainder is the final (MQ).

If the condition register updating is enabled, then the condition register field CR0 bits LT, EQ, and GT reflect the remainder. If the instruction is overflow enabled, then the XER[SO] and XER[OV] bits are set to one if the quotient cannot be represented in 32 bits (e.g., as is the case when the divisor is zero, or the dividend is –231 and the divisor is –1). For the case of –231/–1, the MQ register is cleared and –231 is placed in register rD. For all other overflows, (MQ), (rD), and condition register field CR0 (if condition register updating is enabled) are undefined.

divs Divide Shortdivs. Divide Short with CR Update. The dot suffix enables the

update of the condition register.divso Divide Short with Overflow. The o suffix enables the

overflow bit (OV) in the XER.divso. Divide Short with Overflow and CR Update. The o. suffix





Operation


3.3.2 Integer Compare InstructionsThe integer compare instructions algebraically or logically compare the contents of registerrA with either the UIMM operand, the SIMM operand, or the contents of register rB.Algebraic comparison compares two signed integers. Logical comparison compares twounsigned numbers. Table 3-3 summarizes the integer compare instructions provided by the601processor.

While the PowerPC architecture specifies that the value in the L field determines whetherthe operands are treated as 32- or 64-bit values, the 601 ignores the value in the L field andtreats the operands as 32-bit values. The simplified mnemonics for integer compareinstructions as shown in Table 3-4 correctly clear the L value in the instruction rather thanrequiring it to be coded as a numeric operand.

Table 3-2. PowerPC 601 Microprocessor-Specific Integer Arithmetic Instruction Summary


dozi Difference or Zero Immediate

dozx Difference or Zero

absx Absolute

nabsx Negative Absolute

mulx Multiply

divx Divide

divsx Divide Short

Table 3-3. Integer Compare Instructions


Operation

Compare Immediate

cmpi crfD,L,rA,SIMM The contents of register rA is compared with the sign-extended value of the SIMM operand, treating the operands as signed integers. The result of the comparison is placed into the CR field specified by operand crfD.

Compare cmp crfD,L,rA,rB The contents of register rA is compared with register rB, treating the operands as signed integers. The result of the comparison is placed into the CR field specified by operand crfD.

Compare Logical Immediate

cmpli crfD,L,rA,UIMM The contents of register rA is compared with x'0000' || UIMM, treating the operands as unsigned integers. The result of the comparison is placed into the CR field specified by operand crfD.

Compare Logical

cmpl crfD,L,rA,rB The contents of register rA is compared with register rB, treating the operands as unsigned integers. The result of the comparison is placed into the CR field specified by operand crfD.


The crfD field can be omitted if the result of the comparison is to be placed in CR0.Otherwise the target CR field must be specified in the instruction crfD field, using one ofthe CR field symbols (CR0–CR7) or an explicit field number.

The instructions listed in Table 3-4 are simplified mnemonics supported in all PowerPCimplementations that provide compare word capability for 32-bit operands.

The following examples demonstrate the use of the simplified word compare mnemonics:

• Compare 32 bits in register rA with immediate value 100 and place result in condition register field CR0.

cmpwi rA,100 (equivalent to cmpi 0,0,rA,100)

• Same as (1), but place results in condition register field CR4.

cmpwi cr4,rA,100 (equivalent to cmpi 4,0,rA,100)

• Compare registers rA and rB as logical 32-bit quantities and place result in condition register field CR0.

cmplw rA,rB (equivalent to cmpl 0,0,rA,rB)

3.3.3 Integer Logical InstructionsThe logical instructions shown in Table 3-5 perform bit-parallel operations. Logicalinstructions with the condition register update enabled and instructions andi. and andis. setcondition register field CR0 to characterize the result of the logical operation. These fieldsare set as if the sign-extended low-order 32 bits of the result were algebraically comparedto zero. Logical instructions without condition register update and the remaining logicalinstructions do not modify the condition register. Logical instructions do not change theXER[SO], XER[OV], and XER[CA] bits.

Table 3-4. Word Compare Simplified Mnemonics

Operation Simplified Mnemonic Equivalent to:

Compare Word Immediate cmpwi crfD,rA,SIMM cmpi crfD,0,rA,SIMM

Compare Word cmpw crfD,rA,rB cmp crfD,0,rA,rB

Compare Logical Word Immediate

cmplwi crfD,rA,UIMM cmpli crfD,0,rA,UIMM

Compare Logical Word cmplw crfD,rA,rB cmpl crfD,0,rA,rB


Table 3-5. Integer Logical Instructions


Operation

AND Immediate

andi. rA,rS,UIMM The contents of rS is ANDed with x'0000' || UIMM and the result is placed into rA.

AND ImmediateShifted

andis. rA,rS,UIMM The contents of rS is ANDed with UIMM || x'0000' and the result is placed into rA.

OR Immediate

ori rA,rS,UIMM The contents of rS is ORed with x'0000' || UIMM and the result is placed into rA.

The preferred no-op is ori 0,0,0

OR ImmediateShifted

oris rA,rS,UIMM The contents of rS is ORed with UIMM ||x'0000' and the result is placed into rA.

XOR Immediate

xori rA,rS,UIMM The contents of rS is XORed with x'0000' || UIMM and the result is placed into rA.

XOR ImmediateShifted

xoris rA,rS,UIMM The contents of rS is XORed with UIMM ||x'0000' and the result is placed into rA.

AND andand.

rA,rS,rB The contents of rS is ANDed with the contents of register rB and the result is placed into rA.

and ANDand. AND with CR Update. The dot suffix enables the update of

the condition register.

OR oror.

rA,rS,rB The contents of rS is ORed with the contents of rB and the result is placed into rA.

or ORor. OR with CR Update. The dot suffix enables the update of the

condition register.

XOR xorxor.

rA,rS,rB The contents of rS is XORed with the contents of rB and the result is placed into register rA.

xor XORxor. XOR with CR Update. The dot suffix enables the update of

the condition register.

NAND nandnand.

rA,rS,rB The contents of rS is ANDed with the contents of rB and the one’s complement of the result is placed into register rA.

nand NANDnand. NAND with CR Update. The dot suffix enables the update of

the condition register.NAND with rA = rB can be used to obtain the one's complement.

NOR nornor.

rA,rS,rB The contents of rS is ORed with the contents of rB and the one’s complement of the result is placed into register rA.

nor NORnor. NOR with CR Update. The dot suffix enables the update of

the condition register.NOR with rA = rB can be used to obtain the one's complement.


3.3.4 Integer Rotate and Shift InstructionsRotate and shift instructions provide powerful and general ways to manipulate registercontents. Table 3-6 shows the types of rotate and shift operations provided by the 601.

Equivalent eqveqv.

rA,rS,rB The contents of rS is XORed with the contents of rB and the complemented result is placed into register rA.

eqv Equivalenteqv. Equivalent with CR Update. The dot suffix enables the

update of the condition register.

AND with Complement

andcandc.

rA,rS,rB The contents of rS is ANDed with the complement of the contents of rB and the result is placed into rA.

andc AND with Complementandc. AND with Complement with CR Update. The dot suffix

enables the update of the condition register.

OR with Complement

orc orc.

rA,rS,rB The contents of rS is ORed with the complement of the contents of rB and the result is placed into rA.

orc OR with Complementorc. OR with Complement with CR Update. The dot suffix


Extend Sign Byte

extsbextsb.

rA,rS Register r S[24–31] are placed into rA[24–31]. Bit 24 of rS is placed into rA[0–23].

extsb Extend Sign Byteextsb. Extend Sign Byte with CR Update. The dot suffix enables the


Extend Sign Half Word

extshextsh.

rA,rS Register r S[16–31] are placed into rA[16–31]. Bit 16 of rS is placed into rA[0–15].

extsh Extend Sign Half Wordextsh. Extend Sign Half Word with CR Update. The dot suffix


Count Leading Zeros Word

cntlzwcntlzw.

rA,rS A count of the number of consecutive zero bits of rS is placed into rA. This number ranges from 0 to 32, inclusive.

cntlzw Count Leading Zeros Wordcntlzw. Count Leading Zeros Word with CR Update. The dot suffix

enables the update of the condition register.When the Count Leading Zeros Word instruction has condition register updating enabled, the LT field is cleared to zero in CR0.

Table 3-5. Integer Logical Instructions (Continued)


Operation


The IU performs rotation operations on data from a GPR and returns the result, or a portionof the result, to a GPR. Rotation operations rotate a 32-bit quantity left by a specifiednumber of bit positions. Bits that exit from position 0 enter at position 31. A rotate rightoperation can be accomplished by specifying a rotation of 32-n bits, where n is the rightrotation amount.

Rotate and shift instructions employ a mask generator. The mask is 32 bits long and consistsof “1” bits from a start bit, MB, through and including a stop bit, ME, and “0” bitselsewhere. The values of MB and ME range from 0 to 31. If MB > ME, the “1” bits wraparound from position 31 to position 0. Thus the mask is formed as follows:

if MB ≤ ME then

mask[mstart–mstop] = onesmask[all other bits] = zeros

elsemask[mstart–31] = onesmask[0–mstop] = onesmask[all other bits] = zeros

It is not possible to specify an all-zero mask. The use of the mask is described in thefollowing sections.

If condition register updating is enabled, rotate and shift instructions set condition registerfield CR0 according to the contents of rA at the completion of the instruction. Rotate andshift instructions do not change the values of XER[OV] and XER[SO] bits. Rotate and shiftinstructions, except algebraic right shifts, do not change the XER[CA] bit.

Table 3-6. Rotate and Shift Operations


Extract Select a field of n bits starting at bit position b in the source register, right or left justify this field in the target register, and clear all other bits of the target register to zero.

Insert Select a field of n bits in the source register, insert this field starting at bit position b of the target register, and leave other bits of the target register unchanged. (No simplified mnemonic is provided for insertion of a field when operating on double words; such an insertion requires more than one instruction.)

Rotate Rotate the contents of a register right or left n bits without masking.

Shift Shift the contents of a register right or left n bits, clearing vacated bits to 0 (logical shift).

Clear Clear the leftmost or rightmost n bits of a register to 0.

Clear left and shift left

Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to scale a known non-negative array index by the width of an element.


Simplified mnemonics allow simpler coding of often-used functions such as clearing theleftmost or rightmost bits of a register, left justifying or right justifying an arbitrary field,and performing simple rotates and shifts. Some of these are shown as examples with therotate instructions.

POWER Compatibility Note: In addition to supporting the PowerPC integer rotate andshift instructions, the 601 also supports all POWER rotate and shift instructions. Note thatin order to achieve full compatibility with all POWER applications on future PowerPCimplementations, it is left up to software to either emulate these operations in theinstruction exception handler, or to completely avoid their use. These 601-specific rotateand shift instructions are summarized in Table 3-7.

Table 3-7. PowerPC 601 Microprocessor-Specific Rotate and Shift Instructions


rlmix Rotate Left then Mask Insert

rribx Rotate Right and Insert Bit

maskgx Mask Generate

maskirx Mask Insert from Register

slqx Shift Left with MQ

srqx Shift Right with MQ

sliqx Shift Left Immediate with MQ

slliqx Shift Left Long Immediate with MQ

sriqx Shift Right Immediate with MQ

srliqx Shift Right Long Immediate with MQ

sllqx Shift Left Long with MQ

srlqx Shift Right Long with MQ

slex Shift Left Extended

sleqx Shift Left Extended with MQ

srex Shift Right Extended

sreqx Shift Right Extended with MQ

sraiqx Shift Right Algebraic Immediate with MQ

sraqx Shift Right Algebraic with MQ

sreax Shift Right Extended Algebraic


3.3.4.1 Integer Rotate InstructionsInteger rotate instructions rotate the contents of a register. The result of the rotation isinserted into the target register under control of a mask (if a mask bit is 1 the associated bitof the rotated data is placed into the target register, and if the mask bit is 0 the associatedbit in the target register is unchanged), or ANDed with a mask before being placed into thetarget register.

Rotate left instructions allow right-rotation of the contents of a register to be performed bya left-rotation of 32 – n, where n is the number of bits by which to rotate right.

The integer rotate instructions are summarized in Table 3-8.

Table 3-8. Integer Rotate Instructions

Name Mnemonic Operand Syntax Operation

Rotate Left Word Immediate then AND with Mask

rlwinmrlwinm.

rA,rS,SH,MB,ME The contents of register rS are rotated left by the number of bits specified by operand SH. A mask is generated having “1” bits from the bit specified by operand MB through the bit specified by operand ME and “0” bits elsewhere. The rotated data is ANDed with the generated mask and the result is placed into register rA.

rlwinm Rotate Left Word Immediate then AND with Maskrlwinm. Rotate Left Word Immediate then AND with Mask with

CR Update. The dot suffix enables the update of the condition register.

Simplified mnemonics:extlwi rA,rS,n,b rlwinm rA,rS,b,0,n-1srwi rA,rS,n rlwinm rA,rS,32-n,n,31 clrrwi rA,rS,n rlwinm rA,rS,0,0,31-nNote: The rlwinm instruction can be used for extracting, clearing and shifting bit fields using the methods shown below:

To extract an n-bit field that starts at bit position b in register rS, right-justified into rA (clearing the remaining 32-n bits of rA), set SH=b +n, MB=32-n, and ME=31.

To extract an n-bit field that starts at bit position b in rS, left-justified into rA, set SH=b, MB=0, and ME=n–1.

To rotate the contents of a register left (right) by n bits, set SH=n (32–n), MB=0, and ME=31.

To shift the contents of a register right by n bits, set SH=32-n, MB=n, and ME=31.

To clear the high-order b bits of a register and then shift the result left by n bits, set SH=n, MB=b–n and ME=31-n.

To clear the low-order n bits of a register, set SH=0, MB=0, and ME=31–n.


Rotate Left Word then AND with Mask

rlwnmrlwnm.

rA,rS,rB,MB,ME The contents of rS are rotated left by the number of bits specified by rB[27–31]. A mask is generated having “1” bits from the bit specified by operand MB through the bit specified by operand ME and “0” bits elsewhere. The rotated data is ANDed with the generated mask and the result is placed into rA.

rlwinm Rotate Left Word then AND with Maskrlwinm. Rotate Left Word then AND with Mask with CR Update.

The dot suffix enables the update of the condition register.

Simplified mnemonics:

rotlw rA,rS,rB rlwnm rA,rS,rB,0,31

Note: The rlwinm instruction can be used to extract and rotate bit fields using the methods shown below:

To extract an n-bit field that starts at the variable bit position b in the register specified by operand rS, right-justified into rA (clearing the remaining 32-n bits of rA), set r B[27–31]=b+n, MB=32-n, and ME=31. To extract an n-bit field that starts at variable bit position b in the register specified by operand rS, left-justified into rA (clearing the remaining 32-n bits of rA), set rB[27–31]=b, MB=0, and ME=n-1.

To rotate the contents of the low-order 32 bits of a register left (right) by variable n bits, set rB[27–31]=n (32-n), MB=0, and ME=31.

Rotate Left Word Immediate then Mask Insert

rlwimirlwimi.

rA,rS,SH,MB,ME The contents of rS are rotated left by the number of bits specified by operand SH. A mask is generated having “1” bits from the bit specified by MB through the bit specified by ME and “0” bits elsewhere. The rotated data is inserted into rA under control of the generated mask.

rlwimi Rotate Left Word Immediate then Maskrlwimi. Rotate Left Word Immediate then Mask Insert with CR

Update. The dot suffix enables the update of the condition register.

Simplified mnemonic:

inslwi rA,rS,n,b rlwimi rA,rS,32-b,b,b+n-1

Note: The opcode rlwimi can be used to insert a bit field into the contents of register specified by operand rA using the methods shown below:To insert an n-bit field that is left-justified in rS into rA starting at bit position b, set SH=32-b, MB=b, and ME=(b+n)-1.

To insert an n-bit field that is right-justified in rS into rA starting at bit position b, set SH=32-(b+n), MB=b, and ME=(b+n)-1.Simplified mnemonics are provided for both of these methods.

Table 3-8. Integer Rotate Instructions (Continued)


Rotate Left then Mask Insert

rlmirlmi.

rA,rS,rB,MB,ME This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

The contents of rS is rotated left the number of positions specified by bits 27–31 of rB. The rotated data is inserted into rA under control of the generated mask.

rlmi Rotate Left then Mask Insertrlmi. Rotate Left then Mask Insert with CR Update. The dot

suffix enables the update of the condition register.This instruction is specific to the 601.

Rotate Right and Insert Bit

rribrrib.

rA,rS,rB This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

Bit 0 of rS is rotated right the amount specified by bits 27–31 of rB. The bit is then inserted into rA.

rrib Rotate Right and Insert Bitrrib. Rotate Right and Insert Bit with CR Update. The dot


Mask Generate

maskgmaskg.


Let mstart = rS[27–31], specifying the starting point of a mask of ones. Let mstop = rB[27–31], specifying the end point of the mask of ones.

If mstart < mstop+1 thenMASK(mstart…mstop) = onesMASK(all other bits) = zeros

If mstart = mstop+1 thenMASK(0-31) = ones

If mstart > mstop+1 then MASK(mstop+1…mstart-1) = zerosMASK(all other bits) = ones

MASK is then placed in rA.maskg Mask Generatemaskg. Mask Generate with CR Update. The dot suffix enables

the update of the condition register.This instruction is specific to the 601.

Mask Insert from Register

maskirmaskir.


Register rS is inserted into rA under control of the mask in rB.

maskir Mask Insert from Registermaskir. Mask Insert from Register with CR Update. The dot


Table 3-8. Integer Rotate Instructions (Continued)


3.3.4.2 Integer Shift InstructionsThe instructions in this section perform left and right shifts. Immediate-form logical(unsigned) shift operations are obtained by specifying masks and shift values for certainrotate instructions. Simplified mnemonics are provided to make coding of such shiftssimpler and easier to understand.

Any shift right algebraic instruction, followed by addze, can be used to divide quickly by2n.

Multiple-precision shifts can be programmed as shown in Appendix E, “Multiple-PrecisionShifts.”

The integer shift instructions are summarized in Table 3-9.

Table 3-9. Integer Shift Instructions


Operation

Shift Left Word

slwslw.

rA,rS,rB The contents of rS are shifted left the number of bits specified by rB[27–31]. Bits shifted out of position 0 are lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into rA.

If rB[26] = 1, then rA is filled with zeros.

slw Shift Left Word slw. Shift Left Word with CR Update. The dot suffix enables the


Shift Right Word

srwsrw.

rA,rS,rB The contents of rS are shifted right the number of bits specified by rB[27–31]. Zeros are supplied to the vacated positions on the left. The 32-bit result is placed into rA.

If rB[26] = 1, then rA is filled with zeros.

srw Shift Right Word srw. Shift Right Word with CR Update. The dot suffix enables


Shift Right Algebraic Word Immediate

srawisrawi.

rA,rS,SH The contents of rS are shifted right the number of bits specified by operand SH. Bits shifted out of position 31 are lost. The 32-bit result is sign extended and placed into rA. XER[CA] is set if r S contains a negative number and any “1” bits are shifted out of position 31; otherwise XER[CA] is cleared. An operand SH of zero causes rA to be loaded with the contents of rS and XER[CA] to be cleared to 0.

srawi Shift Right Algebraic Word Immediatesrawi. Shift Right Algebraic Word Immediate with CR Update.



Shift Right Algebraic Word

srawsraw.

rA,rS,rB The contents of rS are shifted right the number of bits specified by rB[27–31]. If rB[26] = 1, then rA is filled with 32 sign bits (bit 0) from rS. If rB[26] = 0, then rA is filled from the left with sign bits. XER[CA] is set to 1 if rS contains a negative number and any “1” bits are shifted out of position 31; otherwise XER[CA] is cleared to 0. An operand (rB) of zero causes rA to be loaded with the contents of rS, and XER[CA] to be cleared to 0. Condition register field CR0 is set based on the value written into rA.

sraw Shift Right Algebraic Word sraw. Shift Right Algebraic Word with CR Update. The dot suffix


Shift Left with MQ

slqslq.


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of register rB. The rotated word is placed in the MQ register.

When bit 26 of register rB is a zero, a mask of 32 – n ones followed by n zeros is generated.

When bit 26 of register rB is a one, a mask of all zeros is generated. The logical AND of the rotated word and the generated mask is placed into register rA.

slq Shift Left with MQslq. Shift Left with MQ with CR Update. The dot suffix enables

the update of the condition register.This instruction is specific to the 601.

Shift Right with MQ

srqsrq.


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 of register rB. The rotated word is placed into the MQ register. When bit 26 of register rB is a zero, a mask of n zeros followed by 32-n ones is generated.

When bit 26 of register rB is a one, a mask of all zeros is generated. The logical AND of the rotated word and the generated mask is placed in rA.

srq Shift Right with MQsrq. Shift Right with MQ with CR Update. The dot suffix

enables the update of the condition register.This instruction is specific to the 601.

Table 3-9. Integer Shift Instructions (Continued)


Operation


Shift Left Immediate with MQ

sliqsliq.

rA,rS,SH This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

Register rS is rotated left n bits where n is the shift amount specified by operand SH. The rotated word is placed in the MQ register. A mask of 32 – n ones followed by n zeros is generated. The logical AND of the rotated word and the generated mask is placed into register rA.

sliq Shift Left Immediate with MQsliq. Shift Left Immediate with MQ with CR Update. The dot


Shift Right Immediate with MQ

sriqsriq.


Register rS is rotated left 32 – n bits where n is the shift amount specified by operand SH. The rotated word is placed into the MQ register. A mask of n zeros followed by 32 – n ones is generated. The logical AND of the rotated word and the generated mask is placed in register rA.

sriq Shift Right Immediate with MQsriq. Shift Right Immediate with MQ with CR Update. The dot


Shift Left Long Immediate with MQ

slliqslliq.


Register rS is rotated left n bits where n is the shift amount specified by SH. A mask of 32 – n ones followed by n zeros is generated. The rotated word is then merged with the contents of MQ, under control of the generated mask. The merged word is placed into rA. The rotated word is placed into the MQ register.

slliq Shift Left Long Immediate with MQslliq. Shift Left Long Immediate with MQ with CR Update. The

dot suffix enables the update of the condition register.This instruction is specific to the 601.



Operation


Shift Right Long Immediate with MQ

srliqsrliq.


Register rS is rotated left 32 – n bits where n is the shift amount specified by operand SH. A mask of n zeros followed by 32 – n ones is generated. The rotated word is then merged with the contents of the MQ register, under control of the generated mask. The merged word is placed in register rA. The rotated word is placed into the MQ register.

srliq Shift Right Long Immediate with MQsrliq. Shift Right Long Immediate with MQ with CR Update. The

dot suffix enables the update of the condition register.This instruction is specific to the 601.

Shift Left Long with MQ

sllqsllq.


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of register rB.

When bit 26 of register rB is a zero, a mask of 32 – n ones followed by n zeros is generated. The rotated word is then merged with the contents of the MQ register, under control of the generated mask.

When bit 26 of register rB is a one, a mask of 32 – n zeros followed by n ones is generated. A word of zeros is then merged with the contents of the MQ register, under control of the generated mask.

The merged word is placed in register rA. The MQ register is not altered.

sllq Shift Left Long with MQsllq. Shift Left Long with MQ with CR Update. The dot suffix


Shift Right Long with MQ

srlqsrlq.


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 of register rB.

When bit 26 of register rB is a zero, a mask of n zeros followed by 32 – n ones is generated. The rotated word is then merged with the contents of the MQ register, under control of the generated mask.

When bit 26 of register rB is a one, a mask of n ones followed by 32 – n zeros is generated. A word of zeros is then merged with the contents of the MQ register, under control of the generated mask.

The merged word is placed in register rA. The MQ register is not altered.

srlq Shift Right Long with MQsrlq. Shift Right Long with MQ with CR Update. The dot suffix




Operation


Shift Left Extended

slesle.


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of register rB. The rotated word is placed in the MQ register. A mask of 32 – n ones followed by n zeros is generated.

The logical AND of the rotated word and the generated mask is placed in register rA.

sle Shift Left Extendedsle. Shift Left Extended with CR Update. The dot suffix


Shift Right Extended

sresre.


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 of register rB. The rotated word is placed into the MQ register. A mask of n zeros followed by 32 – n ones is generated.

The logical AND of the rotated word and the generated mask is placed in register rA.

sre Shift Right Extendedsre. Shift Right Extended with CR Update. The dot suffix


Shift Left Extended with MQ

sleqsleq.


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of register rB. A mask of 32 – n ones followed by n zeros is generated. The rotated word is then merged with the contents of the MQ register, under control of the generated mask. The merged word is placed in register rA. The rotated word is placed in the MQ register.

sleq Shift Left Extended with MQsleq. Shift Left Extended with MQ with CR Update. The dot




Operation


Shift Right Extended with MQ

sreqsreq.


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 of register rB. A mask of n zeros followed by 32 – n ones is generated. The rotated word is then merged with the contents of the MQ register, under control of the generated mask. The merged word is placed in register rA. The rotated word is placed into the MQ register.

sreq Shift Right Extended with MQsreq. Shift Right Extended with MQ with CR Update. The dot


Shift Right Algebraic Immediate with MQ

sraiqsraiq.


Register rS is rotated left 32 – n bits where n is the shift amount specified by the operand SH. A mask of n zeros followed by 32 – n ones is generated. The rotated word is placed in the MQ register.

The rotated word is then merged with a word of 32 sign bits from register rS, under control of the generated mask. The merged word is placed in register rA. The rotated word is ANDed with the complement of the generated mask. This 32-bit result is ORed together and then ANDed with bit 0 of register rS to produce XER[CA].

Shift Right Algebraic instructions can be used for a fast divide by 2n if followed with addze.

sraiq Shift Right Algebraic Immediate with MQsraiq. Shift Right Algebraic Immediate with MQ with CR Update.

The dot suffix enables the update of the condition register.This instruction is specific to the 601.



Operation


Shift Right Algebraic with MQ

sraqsraq.


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 of register rB. When bit 26 of register rB is a zero, a mask of n zeros followed by 32 – n ones is generated. When bit 26 of register rB is a one, a mask of all zeros is generated. The rotated word is placed in the MQ register. The rotated word is then merged with a word of 32 sign bits from register rS, under control of the generated mask.

The merged word is placed in register rA.

The rotated word is ANDed with the complement of the generated mask. This 32-bit result is ORed together and then ANDed with bit 0 of register rS to produce XER[CA].

Shift Right Algebraic instructions can be used for a fast divide by 2n if followed with addze.

sraq Shift Right Algebraic with MQsraq. Shift Right Algebraic with MQ with CR Update. The dot


Shift Right Extended Algebraic

sreasrea.


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 of register rB. A mask of n zeros followed by 32 – n ones is generated. The rotated word is placed in the MQ register.

The rotated word is then merged with a word of 32 sign bits from register rS, under control of the generated mask.

The merged word is placed in register rA.

The rotated word is ANDed with the complement of the generated mask. This 32-bit result is ORed together and then ANDed with bit 0 of register rS to produce XER[CA].

srea Shift Right Extended Algebraicsrea. Shift Right Extended Algebraic with CR Update. The dot




Operation


3.4 Floating-Point InstructionsThis section describes the floating-point instructions, which include the following:

• Floating-point arithmetic instructions• Floating-point multiply-add instructions• Floating-point rounding and conversion instructions• Floating-point compare instructions• Floating-point status and control register instructions

Floating-point loads and stores are discussed in Section 3.5, “Load and Store Instructions.”

3.4.1 Floating-Point Arithmetic InstructionsSingle-precision instructions execute faster than their double-precision equivalents in the601. For additional details on floating-point performance, refer to Chapter 7, “InstructionTiming.”

The floating-point arithmetic instructions are summarized in Table 3-10.

Table 3-10. Floating-Point Arithmetic Instructions


Operation

Floating- Point Add

faddfadd.

frD,frA,frB The floating-point operand in register frA is added to the floating-point operand in register frB. If the most significant bit of the resultant significand is not a one the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR and placed into register frD.

Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added algebraically to form an intermediate sum. All 53 bits in the significand as well as all three guard bits (G, R, and X) enter into the computation.

If a carry occurs, the sum's significand is shifted right one bit position and the exponent is increased by one.

FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1.

fadd Floating-Point Add fadd. Floating-Point Add with CR Update. The dot suffix enables



Floating- Point Add Single-Precision

faddsfadds.

frD,frA,frB The floating-point operand in register frA is added to the floating-point operand in register frB. If the most significant bit of the resultant significand is not a one, the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR and placed into register frD.

Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added algebraically to form an intermediate sum. All 53 bits in the significand as well as all three guard bits (G, R, and X) enter into the computation.

If a carry occurs, the sum's significand is shifted right one bit position and the exponent is increased by one.


fadds Floating-Point Single-Precision fadds. Floating-Point Single-Precision with CR Update. The dot


Floating- Point Subtract

fsubfsub.

frD,frA,frB The floating-point operand in register frB is subtracted from the floating-point operand in register frA. If the most significant bit of the resultant significand is not a 1, the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR and placed into register frD.

The execution of the Floating-Point Subtract instruction is identical to that of Floating-Point Add, except that the contents of register frB participates in the operation with its sign bit (bit 0) inverted.


fsub Floating-Point Subtract fsub. Floating-Point Subtract with CR Update. The dot suffix


Floating- Point Subtract Single-Precision

fsubsfsubs.

frD,frA,frB The floating-point operand in register frB is subtracted from the floating-point operand in register frA. If the most significant bit of the resultant significand is not a 1, the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR and placed into register frD.

The execution of the Floating-Point Subtract instruction is identical to that of Floating-Point Add, except that the contents of register frB participates in the operation with its sign bit (bit 0) inverted.


fsubs Floating-Point Subtract Single-Precision fsubs. Floating-Point Subtract Single-Precision with CR Update.


Table 3-10. Floating-Point Arithmetic Instructions (Continued)


Operation


Floating-Point Multiply

fmulfmul.

frD,frA,frC The floating-point operand in register frA is multiplied by the floating-point operand in register frC.

If the most significant bit of the resultant significand is not a 1, the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR and placed into register frD.

Floating-point multiplication is based on exponent addition and multiplication of the significands.


fmul Floating-Point Multiply fmul. Floating-Point Multiply with CR Update. The dot suffix


Floating- Point Multiply Single- Precision

fmulsfmuls.

frD,frA,frC The floating-point operand in register frA is multiplied by the floating-point operand in register frC.


Floating-point multiplication is based on exponent addition and multiplication of the significands.


fmuls Floating-Point Multiply Single-Precision fmuls. Floating-Point Multiply Single-Precision with CR Update.


Floating- Point Divide

fdivfdiv.

frD,frA,frB The floating-point operand in register frA is divided by the floating-point operand in register frB. No remainder is preserved.


Floating-point division is based on exponent subtraction and division of the significands.

FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1 and zero divide exceptions when FPSCR[ZE] = 1.

fdiv Floating-Point Divide fdiv. Floating-Point Divide with CR Update. The dot suffix




Operation


3.4.2 Floating-Point Multiply-Add InstructionsThese instructions combine multiply and add operations without an intermediate roundingoperation. The fractional part of the intermediate product is 106 bits wide, and all 106 bitstake part in the add/subtract portion of the instruction.

The floating-point multiply-add instructions are summarized in Table 3-11.

Floating- Point Divide Single- Precision

fdivsfdivs.

frD,frA,frB The floating-point operand in register frA is divided by the floating-point operand in register frB. No remainder is preserved.



FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1 and zero divide exceptions when FPSCR[ZE] = 1.

fdivs Floating-Point Divide Single-Precisionfdivs. Floating-Point Divide Single-Precision with CR Update.


Table 3-11. Floating-Point Multiply-Add Instructions


Operation

Floating- Point Multiply- Add

fmaddfmadd.

frD,frA,frC,frB The floating-point operand in register frA is multiplied by the floating-point operand in register frC. The floating-point operand in register frB is added to this intermediate result.

If the most significant bit of the resultant significand is not a one the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR and placed into register frD.


fmadd Floating-Point Multiply-Addfmadd. Floating-Point Multiply-Add with CR Update. The dot suffix




Operation


Floating- Point Multiply- Add Single- Precision

fmaddsfmadds.




fmadds Floating-Point Multiply-Add Single-Precision fmadds. Floating-Point Multiply-Add Single-Precision with CR


Floating- Point Multiply- Subtract

fmsubfmsub.

frD,frA,frC,frB The floating-point operand in register frA is multiplied by the floating-point operand in register frC. The floating-point operand in register frB is subtracted from this intermediate result.



fmsub Floating-Point Multiply-Subtract fmsub. Floating-Point Multiply-Subtract with CR Update. The dot


Floating- Point Multiply- Subtract Single- Precision

fmsubsfmsubs.




fmsubs Floating-Point Multiply-Subtract Single-Precisionfmsubs. Floating-Point Multiply-Subtract Single-Precision with CR


Table 3-11. Floating-Point Multiply-Add Instructions (Continued)


Operation


Floating- Point Negative Multiply- Add

fnmaddfnmadd.


If the most significant bit of the resultant significand is not a one the result is normalized. The result is rounded to the target precision under control of the floating-point rounding control field RN of the FPSCR, then negated and placed into register frD.

This instruction produces the same result as would be obtained by using the floating-point multiply-add instruction and then negating the result, with the following exceptions:

• QNaNs propagate with no effect on their sign bit.• QNaNs that are generated as the result of a disabled invalid

operation exception have a "sign" bit of zero.• SNaNs that are converted to QNaNs as the result of a disabled

invalid operation exception retain the "sign" bit of the SNaN.FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1.

fnmadd Floating-Point Negative Multiply-Add fnmadd. Floating-Point Negative Multiply-Add with CR Update. The

dot suffix enables the update of the condition register.

Floating- Point Negative Multiply- Add Single- Precision

fnmaddsfnmadds.



This instruction produces the same result as would be obtained by using the floating-point multiply-add instruction and then negating the result, with the following exceptions:


operation exception have a “sign” bit of zero.• SNaNs that are converted to QNaNs as the result of a disabled

invalid operation exception retain the “sign” bit of the SNaN.FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1.

fnmadds Floating-Point Negative Multiply-Add Single-Precision fnmadds. Floating-Point Negative Multiply-Add Single-Precision with

CR Update. The dot suffix enables the update of the condition register.



Operation


Floating- Point Negative Multiply- Subtract

fnmsubfnmsub.



This instruction produces the same result as would be obtained by using the floating-point multiply-subtract instruction and then negating the result, with the following exceptions:


operation exception have a sign bit of zero.• SNaNs that are converted to QNaNs as the result of a disabled

invalid operation exception retain the sign bit of the SNaN.FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1.

fnmsub Floating-Point Negative Multiply-Subtract fnmsub. Floating-Point Negative Multiply-Subtract with CR Update.


Floating- Point Negative Multiply- Subtract Single- Precision

fnmsubsfnmsubs.



This instruction produces the same result as would be obtained by using the floating-point multiply-subtract instruction and then negating the result, with the following exceptions:

• QNaNs propagate with no effect on their "sign" bit.• QNaNs that are generated as the result of a disabled invalid

operation exception have a "sign" bit of zero.• SNaNs that are converted to QNaNs as the result of a disabled

invalid operation exception retain the "sign" bit of the SNaN.FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation exceptions when FPSCR[VE] = 1.

fnmsubs Floating-Point Negative Multiply-Subtract Single-Precisionfnmsubs. Floating-Point Negative Multiply-Subtract

Single-Precision with CR Update. The dot suffix enables the update of the condition register.



Operation


3.4.3 Floating-Point Rounding and Conversion InstructionsThe floating-point rounding instruction is used to truncate a 64-bit double-precisionnumber to a 32-bit single-precision floating-point number. The floating-point convertinstructions converts a 64-bit double-precision floating point number to a 32-bit signedinteger number.

The PowerPC architecture defines bits 0–31 of floating-point register frD as undefinedwhen executing the Floating-Point Convert to Integer Word (fctiw) and Floating-PointConvert to Integer Word with Round toward Zero (fctiwz) instructions. In the 601, thesebits take on the value x'FFF8 0000' (which is the representation for a QNaN). This valuemay differ in future PowerPC processors, and software should avoid dependence on this601 feature.

The floating-point rounding and conversion instructions are shown in Table 3-12.

Examples of uses of these instructions to perform various conversions can be found inAppendix F, “Floating-Point Models.”

Table 3-12. Floating-Point Rounding and Conversion Instructions


Operation

Floating- Point Round to Single- Precision

frspfrsp.

frD,frB If it is already in single-precision range, the floating-point operand in register frB is placed into register frD. Otherwise the floating-point operand in register frB is rounded to single-precision using the rounding mode specified by FPSCR[RN] and placed into register frD.

The rounding is described fully in Appendix F, “Floating-Point Models.”


frsp Floating-Point Round to Single-Precision frsp. Floating-Point Round to Single-Precision with CR Update.


Floating- Point Convert to Integer Word

fctiwfctiw.

frD,frB The floating-point operand in register frB is converted to a 32-bit signed integer, using the rounding mode specified by FPSCR[RN], and placed in bits 32–63 of register frD. Bits 0–31 of register frD are undefined.

If the operand in register frB is greater than 231 – 1, bits 32–63 of register frD are set to x'7FFF_FFFF'.

If the operand in register frB is less than –231, bits 32–63 of register frD are set to x '8000_0000'.

The conversion is described fully in Appendix F, “Floating-Point Models.”

Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined. FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the result is inexact.

fctiw Floating-Point Convert to Integer Word fctiw. Floating-Point Convert to Integer Word with CR Update.


3.4.4 Floating-Point Compare InstructionsFloating-point compare instructions compare the contents of two floating-point registersand the comparison ignores the sign of zero (that is, +0 = –0). The comparison can beordered or unordered. The comparison sets one bit in the designated CR field and clears theother three bits. The FPCC (floating-point condition code; bits 16–19 in the floating-pointstatus and control register) is set in the same way.

The CR field and the FPCC are interpreted as shown in Table 3-13.

The PowerPC architecture defines CR1 and the CR field specified by operand crfD asundefined when executing the Floating-Point Compare Unordered (fcmpu) andFloating-Point Compare Ordered (fcmpo) instructions with condition register updatingenabled.

Floating- Point Convert to Integer Word with Round

fctiwzfctiwz.

frD,frB The floating-point operand in register frB is converted to a 32-bit signed integer, using the rounding mode Round toward Zero, and placed in bits 32–63 of register frD. Bits 0–31 of register frD are undefined.

If the operand in frB is greater than 231 – 1, bits 32–63 of frD are set to x'7FFF_FFFF'.

If the operand in register frB is less than –231, bits 32–63 of register frD are set to x '8000_0000'.

The conversion is described fully in Appendix F, “Floating-Point Models.”

Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined. FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the result is inexact.

fctiwz Floating-Point Convert to Integer Word with Round Toward Zero

fctiwz. Floating-Point Convert to Integer Word with Round Toward Zero with CR Update. The dot suffix enables the update of the condition register.

Table 3-13. CR Bit Settings


0 FL (frA) < (frB)

1 FG (frA) > (frB)

2 FE (frA) = (frB)

3 FU (frA) ? (frB) (unordered)

Table 3-12. Floating-Point Rounding and Conversion Instructions (Continued)


Operation


The floating-point compare instructions are summarized in Table 3-14.

3.4.5 Floating-Point Status and Control Register InstructionsEvery FPSCR instruction appears to synchronize the effects of all floating-pointinstructions executed by a given processor. Executing an FPSCR instruction ensures that allfloating-point instructions previously initiated by the given processor appear to havecompleted before the FPSCR instruction is initiated and that no subsequent floating-pointinstructions appear to be initiated by the given processor until the FPSCR instruction hascompleted. In particular:

• All exceptions caused by the previously initiated instructions are recorded in the FPSCR before the FPSCR instruction is initiated.

• All invocations of the floating-point exception handler caused by the previously initiated instructions have occurred before the FPSCR instruction is initiated.

• No subsequent floating-point instruction that depends on or alters the settings of any FPSCR bits appears to be initiated until the FPSCR instruction has completed.

Floating-point memory access instructions are not affected by the execution of the FPSCRinstructions.

The floating-point status and control register instructions are summarized in Table 3-15.

Table 3-14. Floating-Point Compare Instructions


Operation

Floating- Point Compare Unordered

fcmpu crfD,frA,frB The floating-point operand in register frA is compared to the floating-point operand in register frB. The result of the compare is placed into CR field crfD and the FPCC.

If an operand is a NaN, either quiet or signaling, CR field crfD and the FPCC are set to reflect unordered. If an operand is a Signaling NaN, VXSNAN is set.

Floating- Point Compare Ordered

fcmpo crfD,frA,frB The floating-point operand in register frA is compared to the floating-point operand in register frB. The result of the compare is placed into CR field crfD and the FPCC.

If an operand is a NaN, either quiet or signalling, CR field crfD and the FPCC are set to reflect unordered. If an operand is a Signalling NaN, VXSNAN is set, and if invalid operation is disabled (VE = 0) then VXVC is set. Otherwise, if an operand is a Quiet NaN, VXVC is set.


Table 3-15. Floating-Point Status and Control Register Instructions


Operation

Move from FPSCR

mffsmffs.

frD The contents of the FPSCR are placed into bits 32–63 of register frD. In the 601, bits 0–31 of floating-point register frD are set to the value x'FFFF_FFFF'.

mffs Move from FPSCRmffs. Move from FPSCR with CR Update. The dot suffix enables


Move to Condition Register from FPSCR

mcrfs crfD,crfS The contents of FPSCR field specified by operand crfS are copied to the CR field specified by operand crfD. All exception bits copied are cleared to zero in the FPSCR.

Move to FPSCR Field Immediate

mtfsfimtfsfi.

crfD,IMM The value of the IMM field is placed into FPSCR field crfD. All other FPSCR fields are unchanged.

mtfsfi Move to FPSCR Field Immediatemtfsfi. Move to FPSCR Field Immediate with CR Update. The dot

suffix enables the update of the condition register.When FPSCR[0–3] is specified, bits 0 (FX) and 3 (OX) are set to the values of IMM[0] and IMM[3] (that is, even if this instruction causes OX to change from 0 to 1, FX is set from IMM[0] and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 1 and 2 (FEX and VX) are set according to the usual rule described in Section 2.2.3, “Floating-Point Status and Control Register (FPSCR),” and not from IMM[1–2].

Move to FPSCR Fields

mtfsfmtfsf.

FM,frB Bits 32–63 of register frB are placed into the FPSCR under control of the field mask specified by FM. The field mask identifies the 4-bit fields affected. Let i be an integer in the range 0–7. If FM = 1 then FPSCR field i (FPSCR bits 4∗ i through 4∗ i+3) is set to the contents of the corresponding field of the low-order 32 bits of register frB.

mtfsf Move to FPSCR Fields mtfsf. Move to FPSCR Fields with CR Update. The dot suffix

enables the update of the condition register.In other PowerPC implementations, the mtfsf instruction may perform more slowly when only a portion of the fields are updated. This is not the case in the 601.

When FPSCR[0–3] is specified, bits 0 (FX) and 3 (OX) are set to the values of frB[32] and frB[35] (that is, even if this instruction causes OX to change from 0 to 1, FX is set from frB[32] and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 1 and 2 (FEX and VX) are set according to the usual rule described in Section 2.2.3, “Floating-Point Status and Control Register (FPSCR),” and not from frB[33–34].

Move to FPSCR Bit 0

mtfsb0mtfsb0.

crbD The bit of the FPSCR specified by operand crbD is cleared to 0.

Bits 1 and 2 (FEX and VX) cannot be explicitly reset.

mtfsb0 Move to FPSCR Bit 0mtfsb0. Move to FPSCR Bit 0 with CR Update. The dot suffix



3.5 Load and Store InstructionsThis section describes the load and store instructions of the 601, which consist of thefollowing:

• Integer load instructions• Integer store instructions• Integer load and store with byte reversal instructions• Integer load and store multiple instructions• Floating-point load instructions• Floating-point store instructions• Floating-point move instructions• Memory synchronization instructions

3.5.1 Integer Load and Store Address GenerationInteger load and store operations generate effective addresses using register indirect withimmediate index mode, register indirect with index mode, or register indirect mode. Notethat the 601 is optimized for load and store operations that are aligned on naturalboundaries, and operations that are not naturally aligned may suffer performancedegradation. Refer to section 5.4.6.1, “Integer Alignment Exceptions” for additionalinformation about load and store address alignment exceptions.

3.5.1.1 Register Indirect with Immediate Index Addressing Instructions using this addressing mode contain a signed 16-bit immediate index(d operand) which is sign extended to 32 bits, and added to the contents of a generalpurpose register specified in the instruction (rA operand) to generate the effective address.If the rA field of the instruction specifies r0, a value of zero will be added to the immediateindex (d operand) in place of the contents of r0. The option to specify rA or 0 is shown inthe instruction descriptions as (rA|0).

Figure 3-1 shows how an effective address is generated when using register indirect withimmediate index addressing.

Move to FPSCR Bit 1

mtfsb1mtfsb1.

crbD The bit of the FPSCR specified by operand crbD is set to 1.

Bits 1 and 2 (FEX and VX) cannot be reset explicitly.

mtfsb1 Move to FPSCR Bit 1mtfsb1. Move to FPSCR Bit 1 with CR Update. The dot suffix


Table 3-15. Floating-Point Status and Control Register Instructions (Continued)


Operation


.

Figure 3-1. Register Indirect with Immediate Index Addressing

3.5.1.2 Register Indirect with Index AddressingInstructions using this addressing mode cause the contents of two general purpose registers(specified as operands rA and rB) to be added in the generation of the effective address. Azero in place of the rA operand causes a zero to be added to the contents of the generalpurpose register specified in operand rB. The option to specify rA or 0 is shown in theinstruction descriptions as (rA|0).

Figure 3-2 shows how an effective address is generated when using register indirect withindex addressing.

Figure 3-2. Register Indirect with Index Addressing

No

0 16 17 31

Sign Extension d

0 31

GPR (rA)

0

0 31

GPR (rD/rS)StoreLoad

Yes

Instruction Encoding:0 6 7 1112 16 17 31

Opcode rD/rS rA d

+0 31

Effective Address

rA=0?

Memory Interface

No

0 31

GPR (rA)

0

+

0 31

GPR (rD/rS)Memory Interface

StoreLoad

Yes

0 31

GPR (rB)

Instruction Encoding:

rA=0?

0 31

Effective Address

0 6 7 1112 16 17 21 22 30 31

Opcode rD/rS rA rB Subopcode 0Reserved


3.5.1.3 Register Indirect Addressing Instructions using this addressing mode use the contents of the general purpose registerspecified by the rA operand as the effective address. A zero in the rA operand causes aneffective address of zero to be generated. The option to specify rA or 0 is shown in theinstruction descriptions as (rA|0).

Figure 3-3 shows how an effective address is generated when using register indirectaddressing.

Figure 3-3. Register Indirect Addressing

3.5.2 Integer Load InstructionsFor load instructions, the byte, half word, word, or double word addressed by EA is loadedinto rD. Many integer load instructions have an update form, in which rA is updated withthe generated effective address. For these forms, if rA ≠ 0 and rA ≠ rD, the effectiveaddress is placed into rA and the memory element (byte, half word, or word) addressed byEA is loaded into rD.

Note that non-601 implementations of the architecture may run the load half algebraicinstructions (lha, lhax) and the load with update (lbzu, lbzux, lhzu, lhzux, lhau, lhaux)instructions with greater latency than other types of load instructions. In the 601, theseinstructions operate with the same latency as other load instructions. For details oninstruction timing, see Chapter 7, “Instruction Timing.”

No

StoreLoad

Yes0 31

0 0 0 0 0•••••••••••••••••••••••••••••••••••••••••••••••••••• 0 0 0 0 0

Instruction Encoding:0 6 7 11 12 16 17 21 22 30 31

rA=0?

0 31

GPR (rA)

0 31

Effective Address

Opcode rD/rS rA NB Subopcode 0

0 31

GPR (rD/rS)Memory Interface

Reserved


The PowerPC architecture defines load with update instructions with rA = 0 or rA = rD asan invalid form. In the POWER architecture, these forms are not considered invalid andspecifications exist for these cases. To maintain compatibility with the POWERarchitecture, for the case where rA = 0, the 601 does not update r0. In cases where rA = rD,the load data is loaded into rD and the register rA update is suppressed. In addition, thePowerPC architecture defines integer load instructions with the condition register updateoption enabled to be an invalid form and the POWER architecture does not. Forcompatibility, the 601 executes the instruction in a manner consistent with the PowerPCarchitecture and it causes an undefined value to be placed into the condition register CR0field.

Table 3-16 summarizes the load instructions available for the 601.

Table 3-16. Integer Load Instructions


Operation

Load Byte and Zero

lbz rD,d(rA) The effective address is the sum (rA|0)+d. The byte in memory addressed by the EA is loaded into register rD[24–31]. The remaining bits in register rD are cleared to 0.

Load Byte and Zero Indexed

lbzx rD,rA,rB The effective address is the sum (rA|0)+(rB). The byte in memory addressed by the EA is loaded into register rD[24–31]. The remaining bits in register rD are cleared to 0.

Load Byte and Zero with Update

lbzu rD,d(rA) The effective address (EA) is the sum (rA|0)+d. The byte in memory addressed by the EA is loaded into register rD[24–31]. The remaining bits in register rD are cleared to 0. The EA is placed into register rA. If operand rA = 0 the 601 does not update r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Byte and Zero with Update Indexed

lbzux rD,rA,rB The effective address (EA)is the sum (rA|0)+(rB). The byte addressed by the EA is loaded into register rD[24–31]. The remaining bits in register rD are cleared to 0. The EA is placed into register rA. If operand rA = 0 the 601 does not update register r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Half Word and Zero

lhz rD,d(rA) The effective address is the sum (rA|0)+d. The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in rD are cleared to 0.

LoadHalf Word and Zero Indexed

lhzx rD,rA,rB The effective address is the sum (rA|0)+(rB). The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are cleared.


LoadHalf Word and Zero with Update

lhzu rD,d(rA) The effective address is the sum (rA|0)+d. The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are cleared.

The EA is placed into register rA.

If operand rA = 0 the 601 does not update register r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Half Word and Zero with Update Indexed

lhzux rD,rA,rB The effective address is the sum (rA|0)+(rB). The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are cleared. The EA is placed into register rA. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Half Word Algebraic

lha rD,d(rA) The effective address is the sum (rA|0)+d. The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are filled with a copy of the most significant bit of the loaded half word.

Load Half Word Algebraic Indexed

lhax rD,rA,rB The effective address is the sum (rA|0)+(rB). The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are filled with a copy of the most significant bit of the loaded half word.

Load Half Word Algebraic with Update

lhau rD,d(rA) The effective address is the sum (rA|0)+d. The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are filled with a copy of the most significant bit of the loaded half word. The EA is placed into register rA. If operand rA = 0 the 601 does not update register r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Half Word Algebraic with Update Indexed

lhaux rD,rA,rB The effective address is the sum (rA|0)+(rB). The half word in memory addressed by the EA is loaded into register rD[16–31]. The remaining bits in register rD are filled with a copy of the most significant bit of the loaded half-word. The EA is placed into register rA. If operand rA=0 the 601 does not update r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Word and Zero

lwz rD,d(rA) The effective address is the sum (rA|0)+d. The word in memory addressed by the EA is loaded into register rD[0–31].

Load Word and Zero Indexed

lwzx rD,rA,rB The effective address is the sum (rA|0)+(rB). The word in memory addressed by the EA is loaded into register rD[0–31].

Table 3-16. Integer Load Instructions (Continued)


Operation


3.5.3 Integer Store Instructions For integer store instructions, the contents of register rS are stored into the byte, half word,word or double word in memory addressed by EA. Many store instructions have an updateform, in which register rA is updated with the effective address. For these forms, thefollowing rules apply:

• If rA ≠ 0, the effective address is placed into register rA.

• If rS = rA, the contents of register rS are copied to the target memory element, then the generated EA is placed into rA.

The PowerPC architecture defines store with update instructions with rA = 0 as an invalidform. In the POWER architecture, this form is not considered invalid and in this case rA isnot updated. To maintain compatibility with POWER in this case, the 601 does not updateregister r0. In addition, the PowerPC architecture defines integer store instructions with thecondition register update option enabled to be an invalid form and the POWER architecturedoes not. To maintain compatibility in these cases, the 601 executes the instruction asdescribed in the PowerPC architecture, and it loads an undefined value into CR0 field of thecondition register.

Table 3-17 provides a summary of the integer store instructions for the 601.

Load Word and Zero with Update

lwzu rD,d(rA) The effective address is the sum (rA|0)+d. The word in memory addressed by the EA is loaded into register rD[0–31]. The EA is placed into register rA. If operand rA = 0 the 601 does not update register r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Load Word and Zero with Update Indexed

lwzux rD,rA,rB The effective address is the sum (rA|0)+(rB). The word in memory addressed by the EA is loaded into register rD[0–31]. The EA is placed into register rA. If operand rA = 0 the 601 does not update register r0, or if rA = rD the load data is loaded into register rD and the register update is suppressed. Although the PowerPC architecture defines load with update instructions with operand rA = 0 or rA = rD as invalid forms, the 601 allows these cases.

Table 3-16. Integer Load Instructions (Continued)


Operation


Table 3-17. Integer Store Instructions


Operation

Store Byte stb rS,d(rA) The effective address is the sum (rA|0) + d. Register rS[24–31] is stored into the byte in memory addressed by the EA.

Store Byte Indexed

stbx rS,rA,rB The effective address is the sum (rA|0) + (rB). rS[24–31] is stored into the byte in memory addressed by the EA.

Store Byte with Update

stbu rS,d(rA) The effective address is the sum (rA|0) + d. rS[24–31] is stored into the byte in memory addressed by the EA. The EA is placed into register rA.

Store Byte with Update Indexed

stbux rS,rA,rB The effective address is the sum (rA|0) + (rB). rS[24–31] is stored into the byte in memory addressed by the EA. The EA is placed into register rA.

Store Half Word

sth rS,d(rA) The effective address is the sum (rA|0) + d. rS[16–31] is stored into the half word in memory addressed by the EA.

Store half Word Indexed

sthx rS,rA,rB The effective address (EA) is the sum (rA|0) + (rB). rS[16–31] is stored into the half word in memory addressed by the EA.

Store Half Word with Update

sthu rS,d(rA) The effective address is the sum (rA|0) + d. rS[16–31] is stored into the half word in memory addressed by the EA. The EA is placed into register rA.

Store Half Word with Update Indexed

sthux rS,rA,rB The effective address is the sum (rA|0) + (rB). rS[16–31] is stored into the half word in memory addressed by the EA. The EA is placed into register rA.

Store Word stw rS,d(rA) The effective address is the sum (rA|0) + d. Register rS is stored into the word in memory addressed by the EA.

Store Word Indexed

stwx rS,rA,rB The effective address is the sum (rA|0) + (rB). rS is stored into the word in memory addressed by the EA.

Store Word with Update

stwu rS,d(rA) The effective address is the sum (rA|0) + d. Register rS is stored into the word in memory addressed by the EA. The EA is placed into register rA.

Store Word with Update Indexed

stwux rS,rA,rB The effective address is the sum (rA|0) + (rB). Register rS is stored into the word in memory addressed by the EA. The EA is placed into register rA.


3.5.4 Integer Load and Store with Byte Reversal InstructionsTable 3-18 describes integer load and store with byte reversal instructions. Note that inother PowerPC implementations, load byte-reverse instructions may have greater latencythan other load instructions.

This is not the case in the 601. These instructions operate with the same latency as otherload instructions.

3.5.5 Integer Load and Store Multiple InstructionsThe load/store multiple instructions are used to move blocks of data to and from the GPRs.The load multiple and store multiple instructions may have operands that require memoryaccesses crossing a 4-Kbyte page boundary. As a result, these instructions may beinterrupted by a data access exception associated with the address translation of the secondpage. In this case, the 601 performs all of the memory references from the first page, andnone of the memory references from the second page before taking the exception. Foradditional information, refer to Section 5.4.3, “Data Access Exception (x'00300').”

In future implementations, these instructions are likely to have greater latency and takelonger to execute, perhaps much longer, than a sequence of individual load or storeinstructions that produce the same results.

Table 3-18. Integer Load and Store with Byte Reversal Instructions


Operation

Load Half Word Byte-Reverse Indexed

lhbrx rD,rA,rB The effective address is the sum (rA|0) + (rB). Bits 0–7 of the half word in memory addressed by the EA are loaded into rD[24–31]. Bits 8–15 of the half word in memory addressed by the EA are loaded into rD[16–23]. The rest of the bits in rD are cleared to 0.

Load Word Byte- Reverse Indexed

lwbrx rD,rA,rB The effective address is the sum (rA|0) + (rB). Bits 0–7 of the word in memory addressed by the EA are loaded into rD[24–31]. Bits 8–15 of the word in memory addressed by the EA are loaded into rD[16–23]. Bits 16–23 of the word in memory addressed by the EA are loaded into rD[8–15]. Bits 24–31 of the word in memory addressed by the EA are loaded into rD[0–7].

Store Half Word Byte- Reverse Indexed

sthbrx rS,rA,rB The effective address is the sum (rA|0) + (rB). rS[24–31] are stored into bits 0–7 of the half word in memory addressed by the EA. rS[16–23] are stored into bits 8–15 of the half word in memory addressed by the EA.

Store Word Byte- Reverse Indexed

stwbrx rS,rA,rB The effective address is the sum (rA|0) + (rB). rS[24–31] are stored into bits 0–7 of the word in memory addressed by EA. rS[16–23] are stored into bits 8–15 of the word in memory addressed by the EA. rS[8–15] are stored into bits 16–23 of the word in memory addressed by the EA. rS[0–7] are stored into bits 24–31 of the word in memory addressed by the EA.


The PowerPC architecture defines the load multiple word (lmw) instruction with rA in therange of registers to be loaded as an invalid form. In the POWER architecture, this form isnot considered invalid. To maintain compatibility with the POWER architecture in thiscase, the 601 will execute the instruction normally, except that the loading of register rA isskipped. If rA = 0, the register is not considered to be actually used for addressing, and theupdate of r0 (if it is in the range of registers to be loaded) is loaded. In addition, thePowerPC architecture defines the load multiple and store multiple instructions withmisaligned operands (that is, the EA is not a multiple of 4) to be an invalid form and thePOWER architecture does not. To maintain compatibility with the POWER architecture,the 601 executes these instructions subject to the performance degradation as described in5.4.6.1, “Integer Alignment Exceptions.” Note that on other PowerPC implementations,load and store multiple instructions that are not on a word boundary either take analignment exception or generate results that are boundedly undefined.

Table 3-19 summarizes the integer load and store multiple instructions for the 601.

3.5.6 Integer Move String InstructionsThe integer move string instructions allow movement of data from memory to registers orfrom registers to memory without concern for alignment. These instructions can be used fora short move between arbitrary memory locations or to initiate a long move betweenmisaligned memory fields. However, in future implementations, these instructions arelikely to have greater latency and take longer to execute, perhaps much longer, than asequence of individual load or store instructions that produce the same results.

Table 3-19. Integer Load and Store Multiple Instructions

Name Mnemoni

cOperand Syntax

Operation

Load Multiple Word

lmw rD,d(rA) The effective address is the sum (rA|0) + d.

n = 32 – rD.

n consecutive words starting at EA are loaded into the GPR specified by rD through GPR 31.

If the EA is not a multiple of four, the alignment exception handler may be invoked if a page boundary is crossed.

Store Multiple Word

stmw rS,d(rA) The effective address is the sum (rA|0) + d.

n = (32 – rS).

n consecutive words starting at the EA are stored from the GPR specified by rS through GPR 31.

If the EA is not a multiple of four, the alignment exception handler may be invoked if a page boundary is crossed.


Load/store string indexed instructions of zero length have no effect. Table 3-20 summarizesthe integer move string instructions available for the 601.

Table 3-20. Integer Move String Instructions


Operation

Load String Word Immediate

lswi rD,rA,NB The EA is (rA|0).

Let n = NB if NB ≠ 0, n = 32 if NB = 0; n is the number of bytes to load. Let nr = (n/4); nr is the number of registers to receive data.

n consecutive bytes starting at the EA are loaded into GPRs rD through rD+nr-1. Bytes are loaded left to right in each register. The sequence of registers wraps around to r0 if required. If the four bytes of register rD+nr-1 are only partially filled, the unfilled low-order byte(s) of that register are cleared to 0.

If rA is in the range of registers specified to be loaded, it will be skipped in the load process. If operand rA = 0, the register is not considered as used for addressing, and will be loaded.

Load String Word Indexed

lswx rD,rA,rB The EA is the sum (rA|0)+(rB).

Let n = XER[25–31]; n is the number of bytes to load.

Let nr = CEIL(n/4); nr is the number of registers to receive data.

If n>0, n consecutive bytes starting at the EA are loaded into registers rD through rD+nr-1.

Bytes are loaded left to right in each register. The sequence of registers wraps around to r0 if required. If the four bytes of register rD+nr-1 are only partially filled, the unfilled low-order byte(s) of that register are cleared to 0.

If n=0, the contents of register rD is undefined.

If rA is in the range of registers specified to be loaded, it will be skipped in the load process. If operand rA=0, the register is not considered as used for addressing, and will be loaded.


Load string and store string instructions may involve operands that are not word-aligned.As described in Section 5.4.6, “Alignment Exception (x'00600'),” a misaligned stringoperation suffers a performance penalty compared to an aligned operation of the same type.A non-word-aligned string operation that crosses a 4-Kbyte boundary, or a word-alignedstring operation that crosses a 256-Mbyte boundary always causes an alignment exception.

Load String and Compare Byte Indexed

lscbxlscbx.


The EA is the sum (rA|0)+(rB). XER[25–31] contains the byte count. Register rD is the starting register. n = XER[25–31], which is the number of bytes to be loaded. nr = CEIL(n/4), which is the number of registers to receive data. Starting with the leftmost byte in rD, consecutive bytes in memory addressed by the EA are loaded into rD through rD+nr-1, wrapping around back through GPR 0 if required, until either a byte match is found with XER[16–23] or n bytes have been loaded. If a byte match is found, that byte is also loaded.

Bytes are always loaded left to right in the register. In the case when a match was found before n bytes were loaded, the contents of the rightmost byte(s) not loaded of that register and the contents of all succeeding registers up to and including rD+nr-1 are undefined. Also, no reference is made to memory after the matched byte is found. In the case when a match was not found, the contents of the rightmost byte(s) not loaded of rD+nr-1 is undefined.

When XER[25–31]=0, the content of rD is unchanged. The count of the number of bytes loaded up to and including the matched byte, if a match was found, is placed in XER[25–31].

lscbx Load String and Compare Byte Indexedlscbx. Load String and Compare Byte Indexed with CR Update.

The dot suffix enables the update of the condition register.This instruction is specific to the 601.

Store String Word Immediate

stswi rS,rA,NB The EA is (rA|0).

Let n = NB if NB ≠ 0, n = 32 if NB = 0; n is the number of bytes to store.

Let nr = CEIL(n/4); nr is the number of registers to supply data.

n consecutive bytes starting at the EA are stored from register rS through rS+nr-1.

Bytes are stored left to right from each register. The sequence of registers wraps around through r0 if required.

Store String Word Indexed

stswx rS,rA,rB The effective address is the sum (rA|0)+(rB).

Let n = XER[25–31]; n is the number of bytes to store.

Let nr = CEIL(n/4); nr is the number of registers to supply data.

n consecutive bytes starting at the EA are stored from register rS through rS+nr-1.

Bytes are stored left to right from each register. The sequence of registers wraps around through r0 if required.

Table 3-20. Integer Move String Instructions (Continued)


Operation


A non-word-aligned string operation that crosses a double-word boundary is also slowerthan a word-aligned string operation.

Although a word-aligned string operation that crosses a 4-Kbyte boundary operates at the601's fastest rate, the instruction may be interrupted by a data access exception associatedwith the address translation of the second page. In this case, the 601 performs all memoryreferences from the first page and none from the second before taking the exception. Formore information, refer to Section 5.4.3, “Data Access Exception (x'00300').”

The Load String and Compare Byte Indexed (lscbx) instruction can lead to severalarchitecturally undefined results. When the last register loaded is only partially filled, theremaining bytes are considered to be undefined. If loading is terminated due to a bytematch, all succeeding bytes are considered to be undefined. In addition, if the conditionregister update option is enabled, and XER[25–31] = 0, condition register field CR0 isundefined. In all of these cases, the 601 does not guarantee particular results for theseundefined fields. The values should simply be treated as undefined.

If the EA associated with an lscbx instruction is directed to a memory-forced I/O controllerinterface segment (that is, the segment register T bit is set and the BUID field equals x'07F'),the address is translated appropriately and the operation proceeds. On the other hand, if theEA associated with an lscbx instruction is directed to an I/O segment (that is, the segmentregister T bit is set but the BUID does not equal x'07F'), the 601 takes a data accessexception and sets bit 5 of the DSISR.

If rA is in the range of registers to be loaded for a Load String Word Immediate (lswi)instruction or if either rA or rB are in the range of registers to be loaded for a Load StringWord Indexed (lswx) or lscbx instruction, then the PowerPC architecture considers theinstruction to be of an invalid form. In the POWER architecture, this form is not consideredinvalid and specifications exist for these cases. To maintain compatibility with the POWERarchitecture in this case, the 601 executes the instruction normally, but loading of theseregisters is inhibited. In addition, the lswx, lscbx and stswx instructions that specify a stringlength of zero are considered an invalid form in the PowerPC architecture, but not in thePOWER architecture. For compatibility with the POWER architecture, the 601 executesthese instructions, but does not alter register rD or cause a memory access.

3.5.7 Memory Synchronization InstructionsMemory synchronization instructions control the order in which memory operations arecompleted with respect to asynchronous events, and the order in which memory operationsare seen by other processors or memory access mechanisms. Additional information aboutthese instructions and about related aspects of memory management can be found inChapter 6, “Memory Management Unit.”

Internally, the 601 handles the synchronize (sync) and the Enforce In-Order Execution ofI/O (eieio) instructions identically. These instructions delay execution of subsequentinstructions until previous instructions have completed to the point that they can no longer


cause an exception, until all previous memory accesses are performed globally, and thesync or eieio operation is broadcast onto the 601 bus interface.

System designs that use a second-level cache should take special care to accept thebroadcast sync operation and perform the appropriate actions to guarantee that memoryreferences that may be queued internally to the second-level cache have been performedglobally.

The number of cycles required to complete a sync or eieio instruction depends on systemparameters and on the processor's state when the instruction is issued. As a result, frequentuse of these instructions may degrade performance slightly.

The PowerPC architecture defines the sync instruction with condition register updateenabled to be an invalid form, whereas the POWER architecture does not. Forcompatibility, the 601 executes the instruction consistently with the PowerPC architecture,and loads an undefined value into condition register field CR0.

The Instruction Synchronize (isync) instruction causes the 601 to purge its instructionbuffers, wait for any preceding sync instructions to complete, and then branch to the nextsequential instruction (which has the effect of clearing the pipeline behind the isyncinstruction).

The proper paired use of the Load Word and Reserve Indexed (lwarx) and Store WordConditional Indexed (stwcx.) instructions allows programmers to emulate commonsemaphore operations such as “test and set”, “compare and swap”, “exchange memory”,and “fetch and add”. Examples of these semaphore operations can be found in Appendix G,“Synchronization Programming Examples.” The lwarx instruction must be paired with anstwcx. instruction with the same effective address used for both instructions of the pair, andthe reservation granularity is 32 bytes.

The concept behind the use of the lwarx and stwcx. instructions is that a processor mayload a semaphore from memory, compute a result based on the value of the semaphore, andconditionally store it back to the same location. The conditional store is performed basedupon the existence of a reservation established by the preceding lwarx. If the reservationexists when the store is executed, the store is performed and a bit is set to one in theCondition Register. If the reservation does not exist when the store is executed, the targetmemory location is not modified and a bit is set to zero in the condition register.

The lwarx and stwcx. primitives allow software to read a semaphore, compute a resultbased on the value of the semaphore, store the new value back into the semaphore locationonly if that location has not been modified since it was first read, and determine if the storewas successful. If the store was successful, the sequence of instructions from the read of thesemaphore to the store that updated the semaphore appear to have been executed atomically(that is, no other processor or mechanism modified the semaphore location between theread and the update), thus providing the equivalent of a real atomic operation. However,other processors may have read from the location during this operation.


The lwarx and stwcx. instructions require the EA to be aligned. Exception handlingsoftware should not attempt to emulate a misaligned lwarx or stwcx. instruction, becausethere is no correct way to define the address associated with the reservation.

In general, the lwarx and stwcx. instructions should be used only in system programs,which can be invoked by application programs as needed.

At most one reservation exists simultaneously on any processor. The address associatedwith the reservation can be changed by a subsequent lwarx instruction. The conditionalstore is performed based upon the existence of a reservation established by the precedinglwarx regardless of whether the address generated by the lwarx matches that generated bythe stwcx. A reservation held by the processor is cleared by any of the following:

• executing an stwcx. instruction to any address, • execution of an sc instruction, • execution of an instruction that causes an exception, • occurrence of an asynchronous exception, • attempt by some other device to modify a location in the reservation granularity (32

bytes).

The memory synchronization instructions available for the 601 are summarized inTable 3-21.

Table 3-21. Memory Synchronization Instructions

Name Mnemonic Operand

SyntaxOperation

Enforce In-Order Execution of I/O

eieio The eieio instruction provides an ordering function for the effects of load and store instructions executed by a given processor. Executing an eieio instruction ensures that all memory accesses previously initiated by the given processor are complete with respect to main memory before allowing any memory accesses subsequently initiated by the given processor to access main memory.

The eieio instruction orders load and store operations to cache- inhibited memory, and store operations to write-through cache memory.

The eieio instruction performs the same function as a sync instruction when executed by the 601.

Instruction Synchronize

isync This instruction waits for all previous instructions to complete, and then discards any fetched instructions, causing subsequent instructions to be fetched (or refetched) from memory and to execute in the context established by the previous instructions. This instruction has no effect on other processors or on their caches.


3.5.8 Floating-Point Load and Store Address GenerationFloating point load and store operations generate effective addresses using the registerindirect with immediate index mode and register indirect with index mode, the details ofwhich are described below. Floating-point loads and stores are not supported for I/Oaccesses when the SR[BUID] is not equal to x'07F'. The use of floating-point loads andstores for I/O access will result in an alignment exception.

Load Word and Reserve Indexed

lwarx rD,rA,rB The effective address is the sum (rA|0)+(rB). The word in memory addressed by the EA is loaded into register rD.

This instruction creates a reservation for use by an stwcx. instruction. An address computed from the EA is associated with the reservation, and replaces any address previously associated with the reservation.

The EA must be a multiple of 4. If it is not, the alignment exception handler will be invoked if the word loaded crosses a page boundary, or the results may be undefined.

Store Word Conditional Indexed

stwcx. rS,rA,rB The effective address is the sum (rA|0)+(rB).

If a reservation exists, register rS is stored into the word in memory addressed by the EA and the reservation is cleared.

If a reservation does not exist, the instruction completes without altering memory or the contents of the cache.

The EQ bit in the condition register field CR0 is modified to reflect whether the store operation was performed (i.e., whether a reservation existed when the stwcx. instruction began execution). If the store was completed successfully, the EQ bit is set to one.

The EA must be a multiple of 4; otherwise, the alignment exception handler will be invoked if the word stored crosses a page boundary, or the results may be undefined.

Synchronize sync Executing a sync instruction ensures that all instructions previously initiated by the given processor appear to have completed before any subsequent instructions are initiated by the given processor. When the sync instruction completes, all memory accesses initiated by the given processor prior to the sync will have been performed with respect to all other mechanisms that access memory. The sync instruction can be used to ensure that the results of all stores into a data structure, performed in a “critical section” of a program, are seen by other processors before the data structure is seen as unlocked.

The eieio instruction may be more appropriate than sync for cases in which the only requirement is to control the order in which memory references are seen by I/O devices.

Table 3-21. Memory Synchronization Instructions (Continued)

Name Mnemonic Operand

SyntaxOperation


3.5.8.1 Register Indirect with Immediate Index Addressing Instructions using this addressing mode contain a signed 16-bit immediate index (doperand) which is sign extended to 32 bits, and added to the contents of a general purposeregister specified in the instruction (rA operand) to generate the effective address. A zeroin the rA operand causes a zero to be added to the immediate index (d operand). This isshown in the instruction descriptions as (rA|0).

Figure 3-4 shows how an effective address is generated when using register indirect withimmediate index addressing.

Figure 3-4. Register Indirect with Immediate Index Addressing

3.5.8.2 Register Indirect with Index AddressingInstructions using this addressing mode add the contents of two general purpose registers(specified in operands rA and rB) to generate the effective address. A zero in the rAoperand causes a zero to be added to the contents of general purpose register specified inoperand rB. This is shown in the instruction descriptions as (rA|0).

Figure 3-5 shows how an effective address is generated when using register indirect withindex addressing.

No

0 16 17 31

Sign Extension d

0

+

StoreLoad

Yes

Instruction Encoding:0 6 7 11 12 16 17 31

Opcode frD/frS rA d

0 31

Effective Address

rA=0

Memory Access

0 63

FPR (frD/frS)

0 31

GPR (rA)


Figure 3-5. Register Indirect with Index Addressing

The PowerPC architecture defines floating-point load and store with update instructions(lfsu, lfsux, lfdu, lfdux, stfsu, stfsux, stfdu, stfdux) with operand rA = 0 as invalid formsof the instructions, but the POWER architecture does not. To maintain compatibility withthe POWER architecture, the 601 accesses memory for these cases but inhibits the updateof the integer register r0.

In addition, the PowerPC architecture defines floating-point load and store instructions withthe condition register update option enabled to be an invalid form. For compatibility withthe POWER architecture, the 601 executes the instruction normally, but also writes anundefined value into the condition register field CR1.

The PowerPC architecture defines that the FPSCR[UE] bit should not be used to determinewhether denormalization should be performed on floating-point stores. The 601 complieswith this definition, although this is different from some POWER architectureimplementations.

3.5.9 Floating-Point Load InstructionsThere are two forms of the floating-point load instruction—single-precision anddouble-precision formats. Because the FPRs support only floating-point, double-precisionformat, single-precision floating-point load instructions convert single-precision data todouble-precision format before loading the operands into the target FPR. This conversionis described in Section 3.5.9.1, “Double-Precision Conversion for Floating-Point LoadInstructions.” Table 3-22 provides a summary of the floating-point load instructions.

No

0 31

GPR (rA)

0

+

0 63

FPR (frD/frS)Memory Access

StoreLoad

Yes

0 31

GPR (rB)

0 31

Effective Address

Instruction Encoding:0 6 7 1112 1617 21 22 30 31

rA=0?

Opcode frD/frS rA rB Subopcode 0Reserved


Table 3-22. Floating-Point Load Instructions


Operation

Load Floating-Point Single- Precision

lfs frD,d(rA) The effective address is the sum (rA|0)+d.

The word in memory addressed by the EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double-precision format and placed into register frD.

Load Floating- Point Single- Precision Indexed

lfsx frD,rA,rB The effective address is the sum (rA|0)+(r B).

The word in memory addressed by the EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double-precision and placed into register frD.

Load Floating-Point Single- Precision with Update

lfsu frD,d(rA) The effective address is the sum (rA|0)+d.

The word in memory addressed by the EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double-precision (see Section 3.5.9.1, “Double-Precision Conversion for Floating-Point Load Instructions,”) and placed into register frD.

The EA is placed into the register specified by rA.

Load Floating-Point Single- Precision with Update Indexed

lfsux frD,rA,rB The effective address is the sum (rA|0)+(r B).

The word in memory addressed by the EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double-precision (see Section 3.5.9.1, “Double-Precision Conversion for Floating-Point Load Instructions,”) and placed into register frD.


Load Floating-Point Double- Precision

lfd frD,d(rA) The effective address is the sum (rA|0)+d.

The double word in memory addressed by the EA is placed into register frD.

Load Floating-Point Double-Precision Indexed

lfdx frD,rA,rB The effective address is the sum (rA|0)+(r B).


Load Floating-Point Double-Precision with Update

lfdu frD,d(rA) The effective address is the sum (rA|0)+d.



Load Floating-Point Double-Precision with Update Indexed

lfdux frD,rA,rB The effective address is the sum (rA|0)+(r B).



3.5.9.1 Double-Precision Conversion for Floating-Point Load Instructions

The steps for converting a floating-point value from a single-precision memory format tothe double-precision register format are as follows:

WORD[0–31] is the floating-point, single-precision operand accessed from memory.

Normalized Operand

If WORD[1–8] > 0 and WORD[1–8] < 255frD[0–1] < WORD[0–1] frD[2] < ¬WORD[1] frD[3] < ¬WORD[1] frD[4] < ¬WORD[1] frD[5–63] < WORD[2–31] || 29b'0'

Denormalized Operand

If WORD[1–8] = 0 and WORD[9–31] ≠ 0sign < WORD[0] exp < –126frac[0–52] < b'0'‘|| WORD[9–31] || 29b'0'normalize the operand

Do while frac 0 = 0frac < frac[1–52] || b'0'exp < exp – 1

EndfrD[0] < signfrD[1–11] < exp + 1023frD[12–63] < frac[1–52]

Infinity / QNaN / SNaN / Zero

If WORD[1–8] = 255 or WORD[1–31] = 0frD[0–1] < WORD[0–1] frD[2] < WORD[1] frD[3] < WORD[1] frD[4] < WORD[1] frD[5–63] < WORD[2–31] || 29b'0'

For double-precision floating-point load instructions, no conversion is required as the datafrom memory is copied directly into the FPRs.

Many floating-point load instructions have an update form in which register rA is updatedwith the EA. For these forms, if operand rA ≠ 0, the effective address is placed into registerrA and the memory element (word or double word) addressed by the EA is loaded into thefloating-point register specified by operand frD.


3.5.10 Floating-Point Store InstructionsThis section describes floating-point store instructions. There are two basic forms of thestore instruction—single- and double-precision. Because the FPRs support onlyfloating-point, double-precision format, single-precision floating-point store instructionsconvert double-precision data to single-precision format before storing the operands. Theconversion steps are described in Section 3.5.10.1, “Double-Precision Conversion forFloating-Point Store Instructions.” Table 3-23 is a summary of the floating point storeinstructions provided by the 601.

Table 3-23. Floating-Point Store Instructions


Operation

Store Floating-Point Single-Precision

stfs frS,d(rA) The EA is the sum (rA|0)+d.The contents of register frS is converted to single-precision and stored into the word in memory addressed by the EA.

Store Floating-Point Single-Precision Indexed

stfsx frS,rA,rB The EA is the sum (rA|0)+(rB).The contents of register frS is converted to single-precision and stored into the word in memory addressed by the EA.

Store Floating-Point Single-Precision with Update

stfsu frS,d(rA) The EA is the sum (rA|0)+d.The contents of register frS is converted to single-precision and stored into the word in memory addressed by the EA.

The EA is placed into the register specified by operand rA.

Store Floating-Point Single-Precision with Update Indexed

stfsux frS,rA,rB The EA is the sum (rA|0)+(rB).

The contents of register frS is converted to single-precision and stored into the word in memory addressed by the EA.

The EA is placed into the register specified by operand rA.

Store Floating-Point Double-Precision

stfd frS,d(rA) The effective address is the sum (rA|0)+d.

The contents of register frS is stored into the double word in memory addressed by the EA.

Store Floating-Point Double-Precision Indexed

stfdx frS,rA,rB The EA is the sum (rA|0)+(rB).


Store Floating-Point Double-Precision with Update

stfdu frS,d(rA) The effective address is the sum (rA|0)+d.



Store Floating-Point Double-Precision with Update Indexed

stfdux frS,rA,rB The EA is the sum (rA|0)+(rB).

The contents of register frS is stored into the double word in memory addressed by EA.


3.5.10.1 Double-Precision Conversion for Floating-Point Store Instructions

The steps for converting a floating-point value from the double-precision register format tosingle-precision memory format are as follows:

Let WORD[0–31] be the word in memory written to.

No Denormalization Required

If frS[1–11] > 896 or frS[1–63] = 0WORD[0–1] < frS[0–1]WORD[2–31]< frS[5–34]

Denormalization Required

If 874 ≤ frS[1–11] ≤ 896sign < frS[0]exp < frS[1–11] – 1023frac < b'1' || frS[12–63]Denormalize operand

Do while exp < –126frac < b'0' || frac[0–62]exp < exp + 1

EndWORD[0] < signWORD[1–8] < x'00'WORD[9–31] < frac[1–23]

For double-precision floating-point store instructions, no conversion is required as the datafrom the FPRs is copied directly into memory. Many floating-point store instructions havean update form, in which register rA is updated with the effective address. For these forms,if operand rA ≠ 0, the effective address is placed into register rA.

Floating-point store instructions are listed in Table 3-23. Recall that rA, rB, and rD denoteGPRs, while frA, frB, frC, frS, and frD denote FPRs.

3.5.11 Floating-Point Move InstructionsFloating-point move instructions copy data from one floating-point register to another withdata modifications as described for each instruction. These instructions do not modify theFPSCR. The condition register update option in these instructions controls the placing ofresult status into condition register field CR1. If the condition register update option isenabled, then CR1 is set, otherwise CR1 is unchanged. Floating-point move instructions arelisted in Table 3-24.


3.6 Branch and Flow Control InstructionsBranch instructions are executed by the BPU. Some of these instructions can redirectinstruction execution conditionally based on the value of bits in the condition register.When the branch processor encounters one of these instructions, it scans the executionpipelines to determine whether an instruction in progress may affect the particularcondition register bit. If no interlock is found, the branch can be resolved immediately bychecking the bit in the condition register and taking the action defined for the branchinstruction.

If an interlock is detected, the branch is considered unresolved and the direction of thebranch is predicted using the “y” bit as described in Table 3-25. The interlock is monitoredwhile instructions are fetched for the predicted branch. When the interlock is cleared, thebranch processor determines whether the prediction was correct based on the value of thecondition register bit. If the prediction is correct, the branch is considered completed andinstruction fetching continues. If the prediction is incorrect, the fetched instructions arepurged, and instruction fetching continues along the alternate path.

3.6.1 Branch instruction Address CalculationBranch instructions can alter the sequence of instruction execution. Instruction addressesare always assumed to be word aligned with the 601; the processor ignores the twolow-order bits of the generated branch target address.

Table 3-24. Floating-Point Move Instructions


Operation

Floating- Point Move Register

fmrfmr.

frD,frB The contents of register frB is placed into frD.

fmr Floating-Point Move Registerfmr. Floating-Point Move Register with CR Update. The dot


Floating- Point Negate

fnegfneg.

frD,frB The contents of register frB with bit 0 inverted is placed into register frD.

fneg Floating-Point Negatefneg. Floating-Point Negate with CR Update. The dot suffix


Floating- Point Absolute Value

fabsfabs.

frD,frB The contents of frB with bit 0 cleared to 0 is placed into frD.

fabs Floating-Point Absolute Value fabs. Floating-Point Absolute Value with CR Update. The dot


Floating- Point Negative Absolute Value

fnabsfnabs.

frD,frB The contents of frB with bit 0 set to one is placed into frD.

fnabs Floating-Point Negative Absolute Value fnabs. Floating-Point Negative Absolute Value with CR Update.



Branch instructions compute the effective address (EA) of the next instruction addressusing the following addressing modes:

• Branch relative • Branch conditional to relative address• Branch to absolute address • Branch conditional to absolute address • Branch conditional to link register • Branch conditional to count register

3.6.1.1 Branch Relative Address ModeInstructions that use branch relative addressing generate the next instruction address bysign extending and appending b'00' to the immediate displacement operand LI, and addingthe resultant value to the current instruction address. Branches using this address modehave the absolute addressing option (AA) disabled. If the link register update option (LK)is enabled, the effective address of the instruction following the branch instruction is placedin the link register.

Figure 3-6 shows how the branch target address is generated when using the branch relativeaddressing mode.

Figure 3-6. Branch Relative Addressing

3.6.1.2 Branch Conditional Relative Address ModeIf the branch conditions are met, instructions that use the branch conditional relativeaddress mode generate the next instruction address by sign extending and appending b'00'to the immediate displacement operand (BD) and adding the resultant value to the currentinstruction address. Branches using this address mode have the absolute addressing option(AA) disabled. If the link register update option (LK) is enabled, the effective address ofthe instruction following the branch instruction is placed in the link register.

0 6 7 29 30 31

18 LI AA LK

0 31

Branch Target Address


+0 31

Current Instruction Address

0 6 7 29 30 31

LI 0 0Sign Extension

Reserved


Figure 3-7 shows how the branch target address is generated when using the branchconditional relative addressing mode.

Figure 3-7. Branch Conditional Relative Addressing

3.6.1.3 Branch to Absolute Address ModeInstructions that use branch to absolute address mode generate the next instruction addressby sign extending and appending b'00' to the LI operand. Branches using this address modehave the absolute addressing option (AA) enabled. If the link register update option (LK)is enabled, the effective address of the instruction following the branch instruction is placedin the link register.

Figure 3-8 shows how the branch target address is generated when using the branch toabsolute address mode.

Figure 3-8. Branch to Absolute Addressing

0 6 7 1112 16 17 30 31

16 BO BI BD AA LK

Yes

0 31



No

+0 31

Current Instruction Address

0 31

Next Sequential Instruction Address

0 16 17 29 30 31

Sign Extension BD 0 0

Condition Met?

Reserved

0 6 7 29 30 31

18 LI AA LK

0 6 7 29 30 31

0 29 30 31



LI 1 0Sign Extension

0 0


3.6.1.4 Branch Conditional to Absolute Address ModeIf the branch conditions are met, instructions that use the branch conditional to absoluteaddress mode generate the next instruction address by sign extending and appending b'00'to the BD operand. Branches using this address mode have the absolute addressing option(AA) enabled. If the link register update option (LK) is enabled, the effective address of theinstruction following the branch instruction is placed in the link register.

Figure 3-9 shows how the branch target address is generated when using the branchconditional to absolute address mode.

Figure 3-9. Branch Conditional to Absolute Addressing

3.6.1.5 Branch Conditional to Link Register Address ModeIf the branch conditions are met, the branch conditional to link register instruction generatesthe next instruction address by fetching the contents of the link register and clearing the twolow order bits to zero. If the link register update option (LK) is enabled, the effectiveaddress of the instruction following the branch instruction is placed in the link register.

Figure 3-10 shows how the branch target address is generated when using the branchconditional to link register address mode.

0 6 7 1112 16 17 29 30 31

16 BO BI BD AA LK

0 16 17 29 30 31

0 29 30 31



No0 31


Sign Extension BD 1 0

Condition Met?

Yes

0 0


Figure 3-10. Branch Conditional to Link Register Addressing

3.6.1.6 Branch Conditional to Count RegisterIf the branch conditions are met, the branch conditional to count register instructiongenerates the next instruction address by fetching the contents of the count register andclearing the two low order bits to zero. If the link register update option (LK) is enabled,the effective address of the instruction following the branch instruction is placed in the linkregister.

Figure 3-11 shows how the branch target address is generated when using the branchconditional to count register address mode.

0 6 7 11 12 16 17 21 22 30 31

Condition Met?

0 0

30 31

LR

0 29

0 31



No0 31


Yes

19 BO BI 0 0 0 0 0 16 LK

||

Reserved


Figure 3-11. Branch Conditional to Count Register Addressing

3.6.2 Conditional Branch ControlFor branch conditional instructions, the BO operand specifies the conditions under whichthe branch is taken. The first four bits of the BO operand specify how the branch is affectedby or affects the condition and count registers. The fifth bit, shown in Table 3-25 as havingthe value y, is used by some PowerPC implementations for branch prediction as describedbelow.

The encodings for the BO operands are shown in Table 3-25.

Table 3-25. BO Operand Encodings

BO Description

0000y Decrement the CTR, then branch if the decremented CTR ≠ 0 and the condition is FALSE.

0001y Decrement the CTR, then branch if the decremented CTR = 0 and the condition is FALSE.

001zy Branch if the condition is FALSE.

0100y Decrement the CTR, then branch if the decremented CTR ≠ 0 and the condition is TRUE.

0101y Decrement the CTR, then branch if the decremented CTR = 0 and the condition is TRUE.

011zy Branch if the condition is TRUE.

1z00y Decrement the CTR, then branch if the decremented CTR ≠ 0.

0 0

30 31

CTR

0 29

0 31



Condition Met?

No

Yes

0 31


0 6 7 1112 16 17 21 22 30 31

19 BO BI 00000 528 LK

||

Reserved


The “branch always” encoding of the BO operand does not have a “y” bit.

Setting the “y” bit to 0 indicates a predicted behavior for the branch instruction:

• For bcx with a negative value in the displacement operand, the branch is taken.

• In all other cases (bcx with a non-negative value in the displacement operand, bclrx, or bcctrx), the branch is not taken.

Setting the “y” bit to 1 reverses the preceding indications.

The sign of the displacement operand is used as described above even if the target is anabsolute address. The default value for the “y” bit should be 0, and should only be set to 1if software has determined that the prediction corresponding to “y” = 1 is more likely to becorrect than the prediction corresponding to “y” = 0. Software that does not compute branchpredictions should set the “y” bit to zero.

In most cases, the branch should be predicted to be taken if the value of the followingexpression is 1, and to fall through if the value is 0.

((BO[0] & BO[2]) | S) ⊕ BO[4]

In the expression above, S (bit 16 of the branch conditional instruction coding) is the signbit of the displacement operand if the instruction has a displacement operand and is 0 if theoperand is reserved. BO[4] is the “y” bit, or 0 for the “branch always” encoding of the BOoperand. (Advantage is taken of the fact that, for bclrx and bcctrx, bit 16 of the instructionis part of a reserved operand and therefore must be 0.)

The 5-bit BI operand in branch conditional instructions specifies which of the 32 bits in theCR represents the condition to test.

When the branch instructions contain immediate addressing operands, the target addressescan be computed sufficiently ahead of the branch instruction that instructions can befetched along the target path. If the branch instructions use the link and count registers,instructions along the target path can be fetched if the link or count register is loadedsufficiently ahead of the branch instruction.

1z01y Decrement the CTR, then branch if the decremented CTR = 0.

1z1zz Branch always.

The z indicates a bit that must be zero; otherwise, the instruction form is invalid.

The y bit provides a hint about whether a conditional branch is likely to be taken and is used by the 601 to improve performance. Other implementations may ignore the y bit.

Table 3-25. BO Operand Encodings (Continued)

BO Description


Branching can be conditional or unconditional, and optionally a branch return address iscreated by the storage of the effective address of the instruction following the branchinstruction in the link register after the branch target address has been computed. This isdone regardless of whether the branch is taken. While the 601 does not provide a linkregister stack, future implementations may keep a stack of the link register values mostrecently set by branch and link instructions, with the possible exception of the form shownbelow for obtaining the address of the next instruction. To benefit from this stack, thefollowing programming conventions should be used.

In the examples below, let A, B, and Glue represent subroutine labels.

Obtaining the address of the next instruction– use the following form of branch and link.

bcl 20,31,$+4

Loop counts– keep them in the count register, and use one of the branch conditionalinstructions to decrement the count and to control branching (for example, branching backto the start of a loop if the decremented counter value is nonzero).

Computed GOTOs, case statements, etc.– Use the count register to hold the address tobranch to, and use the bcctr instruction with the link register option disabled (LK = 0) tobranch to the selected address.

Direct subroutine linkage– where A calls B and B returns to A. The two branches shouldbe as follows:

• A calls B—Use a branch instruction that enables the link register (LK = 1).

• B returns to A—Use the bclr instruction with the link register option disabled (LK = 0) (the return address is in, or can be restored to, the link register).

Indirect subroutine linkage– where A calls Glue, Glue calls B, and B returns to A rather thanto Glue. (Such a calling sequence is common in linkage code used when the subroutine thatthe programmer wants to call, here B, is in a different module from the caller: the binderinserts “glue” code to mediate the branch.)

The three branches should be as follows:

• A calls Glue—Use a branch instruction that sets the link register with the link register option enabled (LK = 1).

• Glue calls B—Place the address of B in the count register, and use the bcctr instruction with the link register option disabled (LK = 0).

• B returns to A—Use the bclr instruction with the link register option disabled (LK = 0) (the return address is in, or can be restored to, the link register).

3.6.3 Basic Branch MnemonicsThe mnemonics in Table 3-26 allow all the common BO operand encodings to be specifiedas part of the mnemonic, along with the absolute address (AA) and set link register (LK)bits.


Notice that there are no simplified mnemonics for relative and absolute unconditionalbranches. For these, the basic mnemonics b, ba, bl, and bla are used.

Table 3-26 provides the abbreviated set of simplified mnemonics for the most commonlyperformed conditional branches. Unusual cases of conditional branches can be coded usinga basic branch conditional mnemonic (bc, bclr, bcctr) with the condition to be testedspecified as a numeric first operand.

Instructions using a mnemonic from Table 3-26 that tests a condition specify the conditionas the first operand of the instruction. Table 3-27 summarizes the mnemonic symbols andthe equivalent numeric values used to interpret a condition register CR field during a branchconditional instruction compare operation.

Table 3-26. Simplified Branch Mnemonics

LR bit not set LR bit set

Branch Semanticsbc Relative

bca Absolute

bclr to LR

bcctr to CTR

bcl Relative

bcla Absolute

bclrl to LR

bcctrl to CTR

Branch unconditionally — — blr bctr — — blrl bctrl

Branch if condition true bt bta btlr btctr btl btla btlrl btctrl

Branch if condition false

bf bfa bflr bfctr bfl bfla bflrl bfctrl

Decrement CTR, branch if CTR nonzero

bdnz bdnza bdnzlr — bdnzl bdnzla bdnzlrl —

Decrement CTR, branch if CTR nonzero AND condition true

bdnzt bdnzta bdnztlr — bdnztl bdnztla bdnztlrl —

Decrement CTR, branch if CTR nonzero AND condition false

bdnzf bdnzfa bdnzflr — bdnzfl bdnzfla bdnzflrl —

Decrement CTR, branch if CTR zero

bdz bdza bdzlr — bdzl bdzla bdzlrl —

Decrement CTR, branch if CTR zero AND condition true

bdzt bdzta bdztlr — bdztl bdztla bdztlrl —

Decrement CTR, branch if CTR zero AND condition false

bdzf bdzfa bdzflr — bdzfl bdzfla bdzflrl —


Table 3-28 summarizes the mnemonic symbols and the equivalent numeric values used toidentify the condition register CR field to be evaluated by the compare operation.

The simplified branch mnemonics and the symbols in Table 3-27 and Table 3-28 arecombined in an expression that identifies the bit (0–31) of CR to be tested, as follows:

Examples:

• Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a count loaded into CTR).

bdnz target (equivalent to bc 16,0,target)

• Same as (1) but branch only if CTR is nonzero and condition in CR0 is “equal.”

bdnzt eq,target (equivalent to bc 8,2,target)

• Same as (2), but “equal” condition is in CR5.

bdnzt 4*cr5+eq,target (equivalent to bc 8,22,target)

Table 3-27. Condition Register CR Field Bit Symbols

Symbol Value Meaning

lt 0 Less than

gt 1 Greater than

eq 2 Equal

so 3 Summary overflow

un 3 Unordered (after floating-point comparison)

Table 3-28. Condition Register CR Field Identification Symbols

Symbol Value Meaning

cr0 0 CR0

cr1 4 CR1

cr2 8 CR2

cr3 12 CR3

cr4 16 CR4

cr5 20 CR5

cr6 24 CR6

cr7 28 CR7


• Branch if bit 27 of CR is false.

bf 27,target (equivalent to bc 4,27,target)

• Same as (4), but set the link register. This is a form of conditional “call.”

bfl 27,target (equivalent to bcl 4,27,target)

3.6.4 Branch Mnemonics Incorporating ConditionsThe mnemonics defined in Table 3-30 are variations of the “branch if condition true” and“branch if condition false” BO encodings, with the most common values of the BI operandrepresented in the mnemonic rather than specified as a numeric operand.

The two-letter codes for the most common combinations of branch conditions are shown inTable 3-29.

These codes are reflected in the simplified mnemonics shown in Table 3-30.

Table 3-29. Two-Letter Codes for Branch Comparison Conditions

Code Meaning

lt Less than

le Less than or equal

eq Equal

ge Greater than or equal

gt Greater than

nl Not less than

ne Not equal

ng Not greater than

so Summary overflow

ns Not summary overflow

un Unordered (after floating-point comparison)

nu Not unordered (after floating-point comparison)


Instructions using the mnemonics in Table 3-30 specify the condition register field in anoptional first operand. If the CR field being tested is CR0, this operand need not bespecified. Otherwise, one of the CR field symbols listed in Table 3-28 is coded as the firstoperand.

Examples:

• Branch if CR0 reflects condition “not equal.”

bne target (equivalent to bc 4,2,target)

• Same as (1), but condition is in CR3.

bne cr3,target (equivalent to bc 4,14,target)

• Branch to an absolute target if CR4 specifies “greater than,” setting the link register. This is a form of conditional “call”, as the return address is saved in the link register.

bgtla cr4,target (equivalent to bcla 12,17,target)

• Same as (3), but target address is in the count register.

bgtctrl cr4 (equivalent to bcctrl 12,17)

Table 3-30. Simplified Branch Mnemonics with Comparison Conditions

LR bit not set LR bit set

Branch Semanticsbc Relative

bca Absolute

bclr to LR

bcctr to CTR

bcl Relative

bcla Absolute

bclrl to LR

bcctrl to CTR

Branch if less than blt blta bltlr bltctr bltl bltla bltlrl bltctrl

Branch if less than or equal ble blea blelr blectr blel blela blelrl blectrl

Branch if equal beq beqa beqlr beqctr beql beqla beqlrl beqctrl

Branch if greater than or equal

bge bgea bgelr bgectr bgel bgela bgelrl bgectrl

Branch if greater than bgt bgta bgtlr bgtctr bgtl bgtla bgtlrl bgtctrl

Branch if not less than bnl bnla bnllr bnlctr bnll bnlla bnllrl bnlctrl

Branch if not equal bne bnea bnelr bnectr bnel bnela bnelrl bnectrl

Branch if not greater than bng bnga bnglr bngctr bngl bngla bnglrl bngctrl

Branch if summary overflow

bso bsoa bsolr bsoctr bsol bsola bsolrl bsoctrl

Branch if not summary overflow

bns bnsa bnslr bnsctr bnsl bnsla bnslrl bnsctrl

Branch if unordered bun buna bunlr bunctr bunl bunla bunlrl bunctrl

Branch if not unordered bnu bnua bnulr bnuctr bnul bnula bnulrl bnuctrl


3.6.5 Branch InstructionsTable 3-31 describes the branch instructions provided by the 601.

Table 3-31. Branch Instructions


Operation

Branch bbablbla

imm_addr b Branch. Branch to the address computed as the sum of the immediate address and the address of the current instruction.

ba Branch Absolute. Branch to the absolute address specified.

bl Branch then Link. Branch to the address computed as the sum of the immediate address and the address of the current instruction. The instruction address following this instruction is placed into the link register (LR).

bla Branch Absolute then Link. Branch to the absolute address specified. The instruction address following this instruction is placed into the link register (LR).

Branch Conditional

bcbcabclbcla

BO,BI,target_addr

The BI operand specifies the bit in the condition register (CR) to be used as the condition of the branch. The BO operand is used as described in Table 3-25.

bc Branch Conditional. Branch conditionally to the address computed as the sum of the immediate address and the address of the current instruction.

bca Branch Conditional Absolute. Branch conditionally to the absolute address specified.

bcl Branch Conditional then Link. Branch conditionally to the address computed as the sum of the immediate address and the address of the current instruction. The instruction address following this instruction is placed into the link register.

bcla Branch Conditional Absolute then Link. Branch conditionally to the absolute address specified. The instruction address following this instruction is placed into the link register.

Branch Conditional to Link Register

bclrbclrl

BO,BI The BI operand specifies the bit in the condition register to be used as the condition of the branch. The BO operand is used as described in Table 3-25.

bclr Branch Conditional to Link Register. Branch conditionally to the address in the link register.

bclrl Branch Conditional to Link Register then Link. Branch conditionally to the address specified in the link register. The instruction address following this instruction is then placed into the link register.


3.6.6 Condition Register Logical InstructionsCondition register logical instructions, shown in Table 3-32, and the Move ConditionRegister Field (mcrf) instruction are defined as flow control instructions, although they areexecuted by the IU.

Note that if the link register update option (LR) is enabled for any of these instructions, thePowerPC architecture defines these forms of the instructions as invalid; however, the 601executes these instructions and leaves the link register in an undefined state.

Branch Conditional to Count Register

bcctrbcctrl

BO,BI The BI operand specifies the bit in the condition register to be used as the condition of the branch. The BO operand is used as described in Table 3-25.

bcctr Branch Conditional to Count Register. Branch conditionally to the address specified in the count register.

bcctrl Branch Conditional to Count Register then Link. Branch conditionally to the address specified in the count register. The instruction address following this instruction is placed into the link register.

Note: If the “decrement and test CTR” option is specified (BO[2]=0), the instruction form is invalid. For the 601, the decremented count register is tested for zero and branches based on this test, but instruction fetching is directed to the address specified by the nondecremented version of the count register. Use of this invalid form of this instruction is not recommended.

Table 3-31. Branch Instructions (Continued)


Operation


Table 3-32. Condition Register Logical Instructions


Operation

Condition Register AND

crand crbD,crbA,crbB The bit in the condition register specified by crbA is ANDed with the bit in the condition register specified by crbB. The result is placed into the condition register bit specified by crbD.

Condition Register OR

cror crbD,crbA,crbB The bit in the condition register specified by crbA is ORed with the bit in the condition register specified by crbB. The result is placed into the condition register bit specified by crbD.

Condition Register XOR

crxor crbD,crbA,crbB The bit in the condition register specified by crbA is XORed with the bit in the condition register specified by crbB. The result is placed into the condition register bit specified by crbD.

Condition Register NAND

crnand crbD,crbA,crbB The bit in the condition register specified by crbA is ANDed with the bit in the condition register specified by crbB. The complemented result is placed into the condition register bit specified by crbD.

Condition Register NOR

crnor crbD,crbA,crbB The bit in the condition register specified by crbA is ORed with the bit in the condition register specified by crbB. The complemented result is placed into the condition register bit specified by crbD.

Condition Register Equivalent

creqv crbD,crbA,crbB

The bit in the condition register specified by crbA is XORed with the bit in the condition register specified by crbB. The complemented result is placed into the condition register bit specified by crbD.

Condition Register AND with Complement

crandc crbD,crbA,crbB

The bit in the condition register specified by crbA is ANDed with the complement of the bit in the condition register specified by crbB and the result is placed into the condition register bit specified by crbD.

Condition Register OR with Complement

crorc crbD,crbA,crbB

The bit in the condition register specified by crbA is ORed with the complement of the bit in the condition register specified by crbB and the result is placed into the condition register bit specified by crbD.

Move Condition Register Field

mcrf crfD,crfS The contents of crfS are copied into crfD. No other condition register fields are changed.


3.6.7 System Linkage InstructionsThis section describes the system linkage instructions (see Table 3-33). The System Call(sc) instruction permits a program to call on the system to perform a service.

3.6.8 Simplified Mnemonics for Branch Processor InstructionsTo simplify assembly language programming, a set of simplified mnemonics and symbolsis provided that defines simple shorthand for the most frequently used forms of branchconditional, compare, trap, rotate and shift, and certain other instructions.

Mnemonics are provided so that branch conditional instructions can be coded with thecondition as part of the instruction mnemonic rather than as a numeric operand. Some ofthese are shown as examples with the branch instructions.

The PowerPC architecture-compliant assemblers provide the mnemonics and symbolslisted here and possibly others. Programs written to be portable across various assemblersfor the PowerPC architecture should not assume the existence of mnemonics not definedhere.

Table 3-33. System Linkage Instructions


Operation

System Call sc — When executed, the effective address of the instruction following the sc instruction is placed into SRR0. Bits 16–31 of the MSR are placed into bits 16–31 of SRR1, and bits 0–15 of SRR1 are set to undefined values. Then a system call exception is generated. The exception causes the MSR to be altered as described in Section 5.4, “Exception Definitions.”

The exception causes the next instruction to be fetched from offset x'C00' from the base physical address indicated by the new setting of MSR[IP]. For a discussion of POWER compatibility with respect to instruction bits 16–29, refer to Appendix B, Section B.10, “System Call/Supervisor Call.” To ensure compatibility with future versions of the PowerPC architecture, bits 16–29 should be coded as zero and bit 30 should be coded as a 1. The PowerPC architecture defines bit 31 as reserved, and thereby cleared to 0; in order for the 601 to maintain compatibility with the POWER architecture, the execution of an sc instruction with bit 31 (the LK bit) set to 1 will cause an update of the Link register with the address of the instruction following the sc instruction.

This instruction is context synchronizing.

Return from Interrupt

rfi — Bits 16–31 of SRR1 are placed into bits 16–31 of the MSR, then the next instruction is fetched, under control of the new MSR value, from the address SRR0[0–29] || b'00'.

This instruction is a supervisor-level instruction and is context synchronizing.


3.6.9 Trap Instructions and MnemonicsThe trap instructions shown in Table 3-34 are provided to test for a specified set ofconditions. If any of the conditions tested by a trap instruction are met, the system traphandler is invoked. If the tested conditions are not met, instruction execution continuesnormally.

The trap instructions evaluate a trap condition as follows:

The contents of register rA is compared with either the sign-extended SIMM field or withthe contents of register rB, depending on the trap instruction. The comparison results infive conditions which are ANDed with operand TO. If the result is not 0, the trap exceptionhandler is invoked. These conditions are provided in Table 3-35.

A standard set of codes has been adopted for the most common combinations of trapconditions, as shown in Table 3-36. The mnemonics defined in Table 3-37 are variations ofthe trap instructions, with the most useful values of the trap instruction TO operandrepresented as a mnemonic rather than specified as a numeric operand.

Table 3-34. Trap Instructions


Operand Syntax

Trap Word Immediate

twi TO,rA,SIMM The contents of rA is compared with the sign-extended SIMM operand. If any bit in the TO operand is set to 1 and its corresponding condition is met by the result of the comparison, then the system trap handler is invoked.

Trap Word tw TO,rA,rB The contents of rA is compared with the contents of rB. If any bit in the TO operand is set to 1 and its corresponding condition is met by the result of the comparison, then the system trap handler is invoked.

Table 3-35. TO Operand Bit Encoding

TO Bit ANDed with Condition

0 Less than

1 Greater than

2 Equal

3 Logically less than

4 Logically greater than

Note: U indicates an unsigned greater than evaluation will be performed.

These codes are reflected in the mnemonics shown in Table 3-37.

Table 3-36. Trap Mnemonics Coding

Code MeaningTO Operand

Encoding< > = U

lt Less than 16 1 0 0 0 0

le Less than or equal 20 1 0 1 0 0

eq Equal 4 0 0 1 0 0

ge Greater than or equal 12 0 1 1 0 0

gt Greater than 8 0 1 0 0 0

nl Not less than 12 0 1 1 0 0

ne Not equal 24 1 1 0 0 0

ng Not greater than 20 1 0 1 0 0

llt Logically less than 2 0 0 0 1 0

lle Logically less than or equal 6 0 0 1 1 0

lge Logically greater than or equal

5 0 0 1 0 1

lgt Logically greater than 1 0 0 0 0 1

lnl Logically not less than 5 0 0 1 0 1

lng Logically not greater than 6 0 0 1 1 0

(none) Unconditional 31 1 1 1 1 1

Table 3-37. Trap Mnemonics

32-Bit Comparison

Trap Semantics twi Immediate tw Register

Trap unconditionally — trap

Trap if less than twlti twlt

Trap if less than or equal twlei twle

Trap if equal tweqi tweq

Trap if greater than or equal twgei twge

Trap if greater than twgti twgt

Trap if not less than twnli twnl

Trap if not equal twnei twne

Trap if logically less than twllti twllt


Examples:

• Trap if Rx, considered as a 32-bit quantity, is logically greater than x'7FF'.

twlg rA, x'7FF' (equivalent to twi 1,rA, x'7FF')

• Trap unconditionally

trap (equivalent to tw 31,0,0)

3.7 Processor Control InstructionsProcessor control instructions are used to read from and write to the machine state register(MSR), condition register (CR), and special purpose registers (SPRs).

3.7.1 Move to/from Machine State Register and Condition Register Instructions

Table 3-38 summarizes the instructions provided by the 601 for reading from or writing tothe machine state register and the condition register.

Trap if not greater than twngi twng

Trap if logically less than twllti twllt

Trap if logically less than or equal twllei twlle

Trap if logically greater than or equal twlgei twlge

Trap if logically greater than twlgti twlgt

Trap if logically not less than twlnli twlnl

Trap if logically not greater than twlngi twlng

Table 3-37. Trap Mnemonics (Continued)

32-Bit Comparison

Trap Semantics twi Immediate tw Register


3.7.2 Move to/from Special-Purpose Register InstructionsThe 601 defines an additional register (MQ register) to the user register set andprogramming model. As a result, the mtspr and mfspr instructions have been extended toaccommodate access to the MQ register for the 601. The SPR field encoding for the MQregister is b'00000 00000'.

For compatibility with the POWER architecture, the 601 also allows user-level read accessto the decrementer (DEC) register. Note that the PowerPC architecture does not allowuser-level access to the DEC register. The SPR encoding for DEC is b'00110 00000' and isvalid only for the mfspr instruction. For more information about the mtspr and mfsprinstructions, refer to Chapter 10, “Instruction Set.”

Simplified mnemonics are provided for the mtspr and mfspr instructions so they can becoded with the SPR name as part of the mnemonic rather than as a numeric operand. Someof these are shown as examples with the two instructions (see Table 3-39).

Table 3-38. Move to/from Machine State Register/Condition Register Instructions


Operation

Move to Condition Register Fields

mtcrf CRM,rS The contents of rS are placed into the condition register under control of the field mask specified by operand CRM. The field mask identifies the 4-bit fields affected. Let i be an integer in the range 0–7. If CRM(i) = 1, then CR field i (CR bits 4*i through 4*i+3) is set to the contents of the corresponding field of r S.

In some PowerPC implementations, this instruction may perform more slowly when only a portion of the fields are updated as opposed to all of the fields. This is not true for the 601.

Move to Condition Register from XER

mcrxr crfD The contents of XER[0–3] are copied into the condition register field designated by crfD. All other fields of the condition register remain unchanged. XER[0–3] is cleared to 0.

Move from Condition Register

mfcr rD The contents of the condition register are placed into rD.

Move to Machine State Register

mtmsr rS The contents of rS are placed into the MSR.This instruction is a supervisor-level instruction and is context synchronizing.

Move from Machine State Register

mfmsr rD The contents of the MSR are placed into rD. This is a supervisor-level instruction.


For mtspr and mfspr instructions, the SPR number coded in assembly language does notappear directly as a 10-bit binary number in the instruction. The number coded is split intotwo 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing inbits 16–20 of the instruction and the low-order 5 bits in bits 11–15.

Table 3-40 summarizes the SPR encodings to which the 601 permits user-level access.

Table 3-39. Move to/from Special Purpose Register Instructions


Operation

Move to Special Purpose Register

mtspr SPR,rS The SPR field denotes a special purpose register, encoded as shown in Table 3-40. The contents of rS are placed into the designated SPR.

Simplified mnemonic examples:mtxer rA mtspr 1,rAmtlr rA mtspr 8,rAmtctr rA mtspr 9,rA

Move from Special Purpose Register

mfspr rD,SPR The SPR field denotes a special purpose register, encoded as shown in Table 3-40. The contents of the designated SPR are placed into rD.

Simplified mnemonic examples:mfxer rA mfspr rA,1mflr rA mfspr rA,8mfctr rA mfspr rA,9

Table 3-40. User-Level SPR Encodings

Decimal Value in rD SPR[0–4] SPR[5–9]

Register Name

Description

0 b'00000 00000' MQ MQ register

1 b'00001 00000' XER Integer exception register

4 b'00100 00000' RTCU Real- time clock upper register1

5 b'00101 00000' RTCL Real- time clock lower register1

6 b'00110 00000' DEC Decrementer register 2

8 b'01000 00000' LR Link register

9 b'01001 00000' CTR Count register

1 Read-only when accessed at user-level.2 Access to the DEC register is restricted to read-only while the processor is in the user

mode. User-level decrementer access is provided for POWER compatibility, and is specific to the 601.


Table 3-41 summarizes SPR encodings that the 601 permits at the supervisor level.

Table 3-41. Supervisor-Level SPR Encodings


Register Name

Description

18 b'10010 00000' DSISR DAE/source instruction service register

19 b'10011 00000' DAR Data address register

20 b'10100 00000' RTCU Real- time clock upper register

21 b'10101 00000' RTCL Real- time clock lower register

22 b'10110 00000' DEC Decrementer register

25 b'11001 00000' SDR1 Table search description register 1

26 b'11010 00000' SRR0 Save and restore register 0

27 b'11011 00000' SRR1 Save and restore register 1

272 b'10000 01000' SPRG0 SPR general 0




282 b'11010 01000' EAR External access register

287 b'11111 01000' PVR Processor version register

528 b'10000 10000' IBAT0U Instruction BAT 0 upper

529 b'10001 10000' IBAT0L Instruction BAT 0 lower







1008 b'10000 11111' Checkstop (HID0)

Checkstop sources and enables register

1009 b'10001 11111' Debug(HID1)

Debug modes register

1010 b'10010 11111' IABR(HID2)

Instruction address breakpoint register

1013 b'10101 11111' DABR(HID5)

Data address breakpoint register


SPR encodings shown in Table 3-40 can also be used while at the supervisor level.

The mtspr and mfspr instructions specify a special purpose register (SPR) as a numericoperand. Simplified mnemonics are provided that represent the SPR in the mnemonic ratherthan requiring it to be coded as an operand. Table 3-42 below specifies the simplifiedmnemonics provided on the 601 for SPR operations.

1023 b'11111 11111' PIR(HID15)

Processor identification register

If the SPR field contains any value other than one of the values shown in Table 3-40, the instruction form is invalid. For an invalid instruction form in which SPR[0]=1, the system supervisor-level instruction error handler will be invoked if the instruction is executed by a user-level program. If the instruction is executed by a supervisor-level program, the result is a no-op.

SPR[0]=1 if and only if writing the register is supervisor-level. Execution of this instruction specifying a defined and supervisor-level register when MSR[PR]=1 results in a privilege violation type program exception.

SPR encodings for the DEC, MQ, RTCL, and RTCU registers are not part of the PowerPC architecture. For forward compatability with other members of the PowerPC microprocessor family the mftb instruction should be used to obtain the contents of the RTCL and RTCU registers. The mftb instruction is a PowerPC instruction unimplemented by the 601, and will be trapped by the illegal instruction exception handler, which can then issue the appropriate mfspr instructions for reading the RTCL and RTCU registers

The PVR (processor version register) is a read-only register.

Table 3-42. SPR Simplified Mnemonics

Special Purpose Register

Move to SPR Simplified Mnemonic

Move to SPR Instruction

Move from SPR Simplified Mnemonic

Move from SPR Instruction

Integer unit exception register

mtxer rS mtspr 1,rS mfxer rD mfspr rD,1

Link register mtlr rS mtspr 8,rS mflr rD mfspr rD,8

Count register mtctr rS mtspr 9,rS mfctr rD mfspr rD,9

DAE/source instruction service register

mtdsisr rS mtspr 18,rS mfdsisr rD mfspr rD,18

Data address register mtdar rS mtspr 19,rS mfdar rD mfspr rD,19

Decrementer mtdec rS mtspr 22,rS mfdec rD mfspr rD,22

Table search description register 1

mtsdr1 rS mtspr 25,rS mfsdr1 rD mfspr rD,25

Status save/restore register 0

mtsrr0 rS mtspr 26,rS mfsrr0 rD mfspr rD,26

Table 3-41. Supervisor-Level SPR Encodings (Continued)


Register Name

Description


3.8 Memory Control InstructionsThis section describes memory control instructions, which include the following:

• Cache management instructions• Segment register manipulation instructions• Translation lookaside buffer management instructions

3.8.1 Supervisor-Level Cache Management InstructionTable 3-43 summarizes the operation of the only supervisor-level cache managementinstruction implemented on the 601.

Status save/restore register 1

mtsrr1 rS mtspr 27,rS mfsrr1 rD mfspr rD,27

General SPRs (SPRG0—SPRG3)

mtsprg n, rS mtspr 272+n,rS mfsprg rD, n mfspr rD,272+n

External access register mtear rS mtspr 282,rS mfear rD mfspr rD,282

Processor version register

_ _ mfpvr rD mfspr rD,287

BAT register, upper mtibatu n, rS mtspr 528+(2*n),rS mfibatu rD, n mfspr rD,528+(2*n)

Bat register, lower mtibatl n, rS mtspr 529+ (2*n),rS mfibatl rD, n, mfspr rD,529+(2*n)

Table 3-42. SPR Simplified Mnemonics (Continued)

Special Purpose Register

Move to SPR Simplified Mnemonic

Move to SPR Instruction

Move from SPR Simplified Mnemonic

Move from SPR Instruction


3.8.2 User-Level Cache InstructionsThe instructions summarized in this section provide user-level programs the ability tomanage the 601’s unified cache. The term block in the context of the cache refers to a sectorwithin the cache (and not a block defined by the block address translation (BAT)mechanism).

As with other memory-related instructions, the effect of the cache instructions on memoryare weakly consistent. If the programmer needs to ensure that cache or other instructionshave been performed with respect to all other processors and mechanisms, a syncinstruction must be placed in the program following those instructions.

When data address translation is disabled (MSR[DT] = 0), the Data Cache Block Set toZero (dcbz) instruction allocates a line in the cache and may not verify that the physicaladdress is valid. If a line is created for an invalid physical address, a machine check

Table 3-43. Cache Management Supervisor-Level Instruction


Operation

Data Cache Block Invalidate

dcbi rA,rB The effective address is the sum (rA|0)+(rB).

The action taken depends on the memory mode associated with the target, and the state (modified, unmodified) of the block. The following list describes the action to take if the block containing the byte addressed by the EA is or is not in the cache.

• Coherency required (WIM = xx1)— Unmodified block—Invalidates copies of the block in the

caches of all processors.— Modified block—Invalidates copies of the block in the caches

of all processors. (Discards the modified contents.)— Absent block—If copies are in the caches of any other

processor, causes the copies to be invalidated. (Discards any modified contents.)

• Coherency not required (WIM = xx0)— Unmodified block—Invalidates the block in the local cache.— Modified block—Invalidates the block in the local cache.

(Discards the modified contents.)— Absent block—No action is taken.

When data address translation is enabled, MSR[DT]=1, and the logical (effective) address has no translation, a data access exception occurs. See Section 5.4.3, “Data Access Exception (x'00300').”

The function of this instruction is independent of the write-through and cache-inhibited/allowed modes determined by the WIM bit settings of the block containing the byte addressed by the EA.

This instruction is treated as a store to the addressed byte with respect to address translation and protection. The reference and change bits are modified appropriately.

If the EA specifies a memory address for which T = 1 in the corresponding segment register, the instruction is treated as a no-op.

This is a supervisor-level instruction.


condition may result when an attempt is made to write that line back to memory. The linecould be written back as the result of the execution of an instruction that causes a cache missand the invalid addressed line is the target for replacement or a Data Cache Block Store(dcbst) instruction.

Any cache control instruction that generates an effective address that corresponds to an I/Ocontroller interface segment (SR[T] = 1) that has the SR[BUID] field equal to x'07F'translates the address appropriately and performs the cache operation based on that address.A cache control instruction that generates an effective address that corresponds to an I/Ocontroller interface segment (SR[T] = 1), but with the SR[BUID] not equal to x'07F' istreated as a no-op.

Since the 601 is implemented with a unified (combined instruction and data) cache, theInstruction Cache Block Invalidate (icbi) instruction is treated as a no-op by the 601processor. Table 3-44 summarizes the cache instructions that are accessible to user-levelprograms.

Table 3-44. User-Level Cache Instructions


Operation

Data Cache Block Touch

dcbt rA,rB The EA is the sum (rA|0)+(rB).

This instruction provides a method for improving performance through the use of software-initiated fetch hints. The 601 performs the fetch for the cases when the address hits in the UTLB or the BTLB, and when it is permitted load access from the addressed page. The operation is treated similarly to a byte load operation with respect to memory protection.

If the address translation does not hit in the UTLB or BTLB, or if it does not have load access permission, the instruction is treated as a no-op.

If the access is directed to a cache-inhibited page, or to an I/O controller interface segment, then the bus operation occurs, but the cache is not updated.

This instruction never affects the reference or change bits in the hashed page table.

While the 601 maintains a cache line size of 64 bytes, the dcbt instruction may only result in the fetch of a 32-byte sector (the one directly addressed by the EA). The other 32-byte sector in the cache line may or may not be fetched, depending on activity in the dynamic memory queue.

A successful dcbt instruction will affect the state of the TLB and cache LRU bits as defined by the LRU algorithm.

Data Cache Block Touch for Store

dcbtst rA,rB The EA is the sum (rA|0)+(rB).

The dcbtst instruction operates exactly like the dcbt instruction as implemented on the 601.


Cache Line Compute Size

clcs rD,rA This is a POWER instruction, and is not part of the PowerPC architecture. This instruction will not be supported by other PowerPC implementations.

This instruction places the cache line size specified by operand rA into register rD. The rA operand is encoded as follows:

01100 Instruction cache line size (returns value of 64)01101 Data cache line size (returns value of 64)01110 Minimum line size (returns value of 64)01111 Maximum line size (returns value of 64)

All other encodings of the rA operand return undefined values.This instruction is specific to the 601.

Data Cache Block Set to Zero

dcbz rA,rB The EA is the sum (rA|0)+(rB).

If the block (the cache sector consisting of 32 bytes) containing the byte addressed by the EA is in the data cache, all bytes are cleared to 0.

If the block containing the byte addressed by the EA is not in the data cache and the corresponding page is caching-allowed, the block is established in the data cache without fetching the block from main memory, and all bytes of the block are cleared to 0.

If the page containing the byte addressed by the EA is caching-inhibited or write-through, then the system alignment exception handler is invoked.

If the block containing the byte addressed by the EA is in coherence required mode, and the block exists in the data cache(s) of any other processor(s), it is kept coherent in those caches.

The dcbz instruction is treated as a store to the addressed byte with respect to address translation and protection.

If the EA corresponds to an I/O controller interface segment (SR[T] = 1), the dcbz instruction is treated as a no-op.

Data Cache Block Store

dcbst rA,rB The EA is the sum(rA|0)+(rB).

If the block (the cache sector consisting of 32 bytes) containing the byte addressed by the EA is in coherence required mode, and a block containing the byte addressed by the EA is in the data cache of any processor and has been modified, the writing of it to main memory is initiated.

The function of this instruction is independent of the write-through and cache-inhibited/allowed modes of the block containing the byte addressed by the EA.

This instruction is treated as a load from the addressed byte with respect to address translation and protection.

If the EA corresponds to an I/O controller interface segment (SR[T] = 1), the dcbst instruction is treated as a no-op.

Table 3-44. User-Level Cache Instructions (Continued)


Operation


3.8.3 Segment Register Manipulation InstructionsThe instructions listed in Table 3-45 provide access to the segment registers of the 601.These instructions operate completely independently of the MSR[IT] and MSR[DT] bitsettings. Note that the rA operand is not defined for the mtsrin and mfsrin instructions inthe 601. Refer to Section 2.3.3.1, “Synchronization for Supervisor-Level SPRs andSegment Registers,” for serialization requirements and other recommended precautions toobserve when manipulating the segment registers.

Data Cache Block Flush

dcbf rA,rB The EA is the sum (rA|0) + (rB).The action taken depends on the memory mode associated with the target, and on the state of the block. The following list describes the action taken for the various cases, regardless of whether the page or block containing the addressed byte is designated as write-through or if it is in the caching-inhibited or caching-allowed mode.• Coherency required (WIM = xx1)

— Unmodified block—Invalidates copies of the block in the caches of all processors.

— Modified block—Copies the block to memory. Invalidates copies of the block in the caches of all processors.

— Absent block—If modified copies of the block are in the caches of other processors, causes them to be copied to memory and invalidated. If unmodified copies are in the caches of other processors, causes those copies to be invalidated.

• Coherency not required (WIM = xx0)— Unmodified block—Invalidates the block in the processor’s

cache.— Modified block—Copies the block to memory. Invalidates the

block in the processor’s cache.— Absent block—Does nothing.

Table 3-44. User-Level Cache Instructions (Continued)


Operation


3.8.4 Translation Lookaside Buffer Management InstructionThe 601 implements a TLB that caches portions of the page table. As changes are made tothe address translation tables, the TLB must be updated. This is done by explicitlyinvalidating TLB entries (both in the set) with the Translation Lookaside Buffer InvalidateEntry (tlbie) instruction. Refer to Chapter 6, “Memory Management Unit” for additionalinformation about TLB operation. Table 3-46 summarizes the operation of the tlbieinstruction.

Table 3-45. Segment Register Manipulation Instructions


Operation

Move to Segment Register

mtsr SR,rS The contents of rS is placed into segment register specified by operand SR.


Move to Segment Register Indirect

mtsrin rS,rB The contents of rS are copied to the segment register selected by bits 0–3 of rB.


Move from Segment Register

mfsr rD,SR The contents of the segment register specified by operand SR are placed into rD.


Move from Segment Register Indirect

mfsrin rD,rB The contents of the segment register selected by bits 0–3 of rB are copied into rD.



Because the presence, absence, and exact semantics of the translation lookaside buffermanagement instruction is implementation dependent, system software should encapsulateuses of the instruction into subroutines to minimize the impact of migrating from oneimplementation to another.

3.9 External Control InstructionsThe external control instructions provide a means for a user-level program to communicatewith a special-purpose device. Two instructions are provided and are summarized inTable 3-47.

Table 3-46. Translation Lookaside Buffer Management Instruction


Operation

Translation Lookaside Buffer Invalidate Entry

tlbie rB The effective address is the contents of rB. If the TLB contains an entry corresponding to the EA, that entry is removed from the TLB. The TLB search is done regardless of the settings of MSR[IT] and MSR[DT]. Also, a TLB invalidate operation is broadcast on the system bus unless disabled by setting bit 17 in HID1.

Block address translation for the EA, if any, is ignored.

Because the 601 supports broadcast of TLB entry invalidate operations, the following must be observed:

• The tlbie instruction must be contained in a critical section of memory controlled by software locking, so that the tlbie is issued on only one processor at a time.

• A sync instruction must be issued after every tlbie and at the end of the critical section. This causes hardware to wait for the effects of the preceding tlbie instructions(s) to propagate to all processors.

A processor detecting a TLB invalidate broadcast does the following:

1. Prevents execution of any new load, store, cache control or tlbie instructions and prevents any new reference or change bit updates

2. Waits for completion of any outstanding memory operations (including updates to the reference and change bits associated with the entry to be invalidated)

3. Invalidates the two entries (both associativity classes) in the UTLB indexed by the matching address

4. Resumes normal executionThis is a supervisor-level instruction.

Nothing is guaranteed about instruction fetching in other processors if tlbie deletes the page in which another processor is executing.


3.10 Miscellaneous Simplified MnemonicsIn order to make assembly language programs simpler to write and easier to understand, aset of simplified mnemonics is provided that define a shorthand for some of the mostfrequently used instructions. PowerPC compliant assemblers provide the simplifiedmnemonics listed here, and in the sections describing the branch, arithmetic, compare, trap,rotate and shift, and move to/from special purpose register instructions. Programs writtento be portable across the various assemblers for the PowerPC architecture should notassume the existence of mnemonics not defined in this user’s manual.

Table 3-47. External Control Instructions


Operation

External Control Input Word Indexed

eciwx rD,rA,rB The EA is the sum (rA|0) + (rB).

If the external access register (EAR) E-bit (bit 0) is set to 1, a load request for the physical address corresponding to the EA is sent to the device identified by the EAR Resource ID bits (bits 28–31), bypassing the cache. The word returned by the device is placed in rD. The EA sent to the device must be word aligned.

If the EAR[E] = 0, a data access exception is invoked, with bit 11 of DSISR set to 1, and bit 6 cleared to 0 to indicate that the exception occurred during a load operation.

The eciwx instruction is supported for EAs that reference ordinary memory segments (SR[T] = 0), for EAs mapped by BAT registers, and for EAs generated when MSR[DT] = 0.The instruction is treated as a no-op for EAs in I/O controller interface segments (SR[T] = 1).

The access caused by this instruction is treated as a load from the location addressed by the EA with respect to protection and reference and change recording.

External Control Output Word Indexed

ecowx rS,rA,rB The EA is the sum (rA|0) + (rB).

If the External Access Register (EAR) E-bit (bit 0) is set to 1, a store request for the physical address corresponding to the EA and the contents of rS are sent to the device identified by EAR[RID] (resource ID) (bits 28–31), bypassing the cache. The EA sent to the device must be word aligned.

If the EAR[E] = 0, a data access exception is invoked, with bit 11 of DSISR set to 1, and bit 6 set to 1 to indicate that the exception occurred during a store operation.

The ecowx instruction is supported for EAs that reference ordinary memory segments (SR[T] = 0), for EAs mapped by BAT registers, and for EAs generated when MSR[DT] = 0.The instruction is treated as a no-op for EAs in I/O controller interface segments (SR[T] = 1).

The access caused by this instruction is treated as a store to the location addressed by the EA with respect to protection and reference and change recording


3.10.1 No-Op Many PowerPC instructions can be coded in a way that, effectively, no operation isperformed. An additional mnemonic is provided for the preferred form of no-op. If animplementation performs any type of run-time optimization related to no-ops, the preferredform is the no-op that will trigger this.

no-op (equivalent to ori 0,0,0)

3.10.2 Load ImmediateThe addi and addis instructions can be used to load an immediate value into a register.Additional mnemonics are provided to convey the idea that no addition is being performedbut that data is being moved from the immediate operand of the instruction to a register.

Load a 16-bit signed immediate value into rA:li rD,value (equivalent to addi rA,0,value)

Load a 16-bit signed immediate value, shifted left by 16 bits, into rA: lis rD,value (equivalent to addis rA,0,value)

3.10.3 Load AddressThis mnemonic permits computing the value of a base-displacement operand, using theaddi instruction which normally requires a separate register and immediate operands.

la rD,SIMM(rA) (equivalent to addi rD,rA,SIMM)

The la mnemonic is useful for obtaining the address of a variable specified by name,allowing the assembler to supply the base register number and compute the displacement.If the variable v is located at offset SIMMv bytes from the address in register rv, and theassembler has been told to use register rv as a base for references to the data structurecontaining v, then the following line causes the address of v to be loaded into register rD.

la rD,v (equivalent to addi rD,rA,SIMMv

3.10.4 Move RegisterSeveral PowerPC instructions can be coded to simply copy the contents of one register toanother. An extended mnemonic is provided to move data from one register to another withno computational activity.

The following instruction copies the contents of register rS into register rA. Thismnemonic can be coded with a “.” to cause the condition register update option to bespecified in the underlying instruction.

mr rA,rS (equivalent to or rA,rS,rS)


3.10.5 Complement RegisterSeveral PowerPC instructions can be coded to complement the contents of one register andplace the result in another register. A simplified mnemonic is provided that complementsthe contents of rS and places the results into register rA. This mnemonic can be coded witha “.” to cause the condition register update option to be specified in the underlyinginstruction.

not rA,rS (equivalent to nor rA,rS,rS)

Chapter 4. Cache and Memory Unit Operation 4-1

Chapter 4 Cache and Memory Unit Operation4040

The PowerPC 601 microprocessor contains a 32-Kbyte, eight-way set associative, unified(instruction and data) cache. The cache line size is 64 bytes, divided into two eight-wordsectors, each of which can be snooped, loaded, cast-out, or invalidated independently. Thecache is designed to adhere to a write-back policy, but the 601 allows control ofcacheability, write policy, and memory coherency at the page and block level. The cacheuses a least recently used (LRU) replacement policy.

The 601’s on-chip cache is nonblocking. Burst operations to the cache are the result of acache sector reload caused by a cache miss, and are buffered such that the cache update isreduced to two single-cycle operations of four words. That is, the results of the first two andthe last two beats are buffered and written to the cache in single cycles apiece. This freesthe cache to perform lower priority operations in the meantime.

System operations, including cache operations, connect to the system interface through thememory unit, which includes a two-element read queue and a three-element write queue.

As shown in Figure 1-1, the cache provides an eight-word interface to the instructionfetcher and load/store unit. The surrounding logic selects, organizes, and forwards therequested information to the requesting unit. Write operations to the cache can beperformed on a byte basis, and a complete read-modify-write operation to the cache canoccur in each cycle.

The cache unit and the memory unit coordinate cache reload and cast-out operations so thata cache miss does not block the use of the cache for other operations during the next cycle.Cache reload operations always occur on a sector basis, with the option of reloading theadditional sector as a low-priority operation. On load operations and fetch operations, thecritical data is forwarded to the requesting unit without waiting for the entire cache line tobe loaded.

The 601 maintains cache coherency in hardware by coordinating activity between thecache, the memory unit, and the bus interface logic. As bus operations are performed on thebus by other processors, the 601 bus snooping logic monitors the addresses that arereferenced. These addresses are compared with the addresses resident in the cache. Thecache unit uses a second port into its tag directory to check for a matching entry and the


memory queue unit does the same. If there is a snoop hit, the 601’s bus snooping logicresponds to the bus interface with the appropriate snoop status. An additional snoop actionmay be forwarded to the cache or to the memory unit as a result of a snoop hit.

Note that in this chapter the term multiprocessor is used in the context of maintaining cachecoherency, although the system could include other devices that can access system memory,maintain their own caches, and function as bus masters requiring cache coherency.

This chapter describes the organization of the 601’s on-chip cache, the MESI cachecoherency protocol, special concerns for cache coherency in single- and multiple-processorsystems, cache control instructions, various cache operations, and the interaction betweenthe cache and the memory unit.

4.1 Cache OrganizationThe cache is configured as eight sets of 64 lines. Each line consists of two sectors, four statebits (two per sector), an address tag, and several bits to maintain the LRU function. The twostate bits implement the four-state MESI (modified-exclusive-shared-invalid) protocol.Each sector contains eight 32-bit words. Note that PowerPC architecture defines thecacheable unit as a block, which is a sector in the 601.

The instruction unit accesses the cache frequently in order to maintain the flow ofinstructions through the instruction queue. The queue is eight words (one sector) long, soan entire sector can be loaded into the instruction unit on a single clock cycle.

The cache organization is shown in Figure 4-1. Note that the replacement algorithm isstrictly an LRU algorithm; that is, the least recently used sector is used, which may meanthat a modified sector will be replaced on a miss if it is the least recently used, even if invalidsectors are available. However, for performance reasons, certain conditions (for example,the execution of some cache instructions) generate accesses to the cache without modifyingthe bits that perform the LRU function.

Each cache line contains 16 contiguous words from memory that are loaded from a 16-wordboundary (that is, bits A26–A31 of the logical (effective) addresses are zero); as a result,cache lines are aligned with page boundaries.

Note that address bits A20–A25 provide an index to select a line. Bits A26–A31 select abyte within a line. The tags consists of bits PA0–PA19. Address translation occurs inparallel, such that higher-order bits (the tag bits in the cache) are physical.


Figure 4-1. Cache Organization

4.2 Cache ArbitrationThe instruction unit and the integer unit both access the cache; however, the cache unithandles only one access per cycle. Furthermore, since the cache is nonblocking, a precedingcache operation may generate a cache reload operation which must also compete for cacheaccess. The bus snooping logic may create additional snoop actions that use the cache. The601 efficiently handles simultaneous requests to access the on-chip cache.

The 601 implements cache arbitration logic to prioritize the various cache requests that canoccur on each cycle. The cache unit provides a cache retry queue (CRTRY) if a cachingoperation cannot be completed. There are three entries in this queue, providing a buffer forone outstanding floating-point store, a buffer for an integer load/store or floating-point load,and a buffer for an instruction fetch. Priority is given first to floating-point stores, then tointeger stores, and finally to instruction fetches.

A similar situation arises with respect to the bus. Internal bus arbitration logic chooses thehighest priority operation from the memory queue for presentation onto the bus. Thesepriorities are listed in Section 4.10.2, “Memory Unit Queuing Priorities.”

LINE 63

LINE 0 ADDRESS TAG

ADDRESS TAG

SECTOR 0 SECTOR 1

8 WORDS 8 WORDS

16 WORDS

8 SETS


The 601 supports a fully-coherent 4-Gbyte physical memory address space. Bus snoopingis used to drive a MESI four-state cache-coherency protocol that ensures the coherency ofall processor and DMA transactions to and from global memory with respect to theprocessor’s cache. The MESI protocol is described in Section 4.7.2, “MESI Protocol.” Allpotential bus masters must employ similar snooping and coherency-control mechanisms.

4.3 Cache Access PrioritiesThe 601 prioritizes pending cache operations as follows:

1. Cache reloads. Note that the cache is nonblocking. Four-beat burst reloads on the system bus are buffered into two, single-cycle transactions of four words each, freeing the cache to perform lower priority operations in the meantime.

2. Second-cycle cast-out operations when the additional sector is modified

3. Snoop requests that hit in the tag directory. These may generate a cache sector push operation or cache state change.

4. Floating-point store operations.

5. Integer operation retries. If a higher priority operation occurs when an integer operation is ready to cache its results, the results are held in a buffer until the higher priority operation completes, then it is retried on the next clock cycle. This prevents the integer unit from stalling when this situation occurs.

6. Integer unit requests

7. Instruction fetches

4.4 Basic Cache OperationsThis section describes operations that can occur to the cache, and how these operations areimplemented in the 601.

4.4.1 Cache ReloadsA cache sector is reloaded after a read miss occurs in the cache. The cache sector thatcontains the address is updated by a burst transfer of the data from system memory. Notethat if a read miss occurs in a multiprocessor system, and the data is modified in anothercache, the modified data is first written to external memory before the cache reload occurs.

An instruction fetch that is generated to fill the instruction queue (not explicitly required bythe program flow) does not generate a reload operation in the case of a cache miss.

4.4.2 Cache Cast-Out OperationThe 601 uses an LRU replacement algorithm to determine which of the eight possible cachelocations should be used for a cache update. Adding a new sector to the cache causes anymodified data associated with the least recently used element to be written back, or cast out,


to system memory. This may be both sectors of the line, depending on which sectors aremodified, even though only one sector may be reloaded. Casting out of the adjacent sectoris referred to as a second-cycle cast-out operation.

4.4.3 Cache Sector Push OperationWhen a cache sector in the 601 is snooped and hit by another processor and the data ismodified, the cache sector must be written to memory and made available to the snoopingdevice. The cache sector that is hit is said to be pushed out onto the bus. The 601 supportstwo kinds of push operations—normal push operations and enveloped high-priority pushoperations, which are described in Section 4.7.11, “Enveloped High-Priority Cache SectorPush Operation.”

4.4.4 Optional Cache Sector Line-Fill OperationThe two sectors in a cache line contain contiguous memory addresses; therefore, the twosectors share the same line address tag. Cache coherency, however, is maintained on asector granularity, so there are separate coherency state bits for each sector. If one sector ofthe line is filled from memory, the 601 may attempt to load the other sector as a low-prioritybus operation. If the other sector is not transferred, the cache line in the snooping processorcontains one sector that is in the shared state (the one that was transferred because of thesnoop hit) and one sector that is invalid (if the optional cache line fill is not performed).Correspondingly, the processor issuing the reload request may bring in the second cachesector in a shared or exclusive state.

Note that the optional reload of an adjacent sector on an instruction fetch miss can bedisabled globally by setting bit 26 in the HID0 register, and the optional reload of theadjacent sector on a load/store miss can be disabled by setting bit 27.

4.5 Cache Data TransactionsThe 601 output signal TBST (transfer burst) indicates to the system whether the currenttransaction is a single-beat transaction or four-beat burst transfer. Burst transactions havean assumed address order. For cacheable load operations or cacheable, non-write-throughstore operations that miss the cache, the 601 presents the quad-word aligned addressassociated with the read or store that initiated the transaction.

As shown in Figure 4-2, this quad word contains the address of the load or store that missedthe cache. This minimizes latency by allowing the critical code or data to be forwarded tothe processor before the rest of the sector is filled. For all other burst operations, however,the entire sector is transferred in order (oct-word aligned).


Figure 4-2. Quad-Word Address Ordering

4.6 Access to I/O Controller Interface Segments The 601 supports both memory-mapped and I/O-mapped access to I/O devices. In additionto the high-performance bus protocol for memory-mapped I/O accesses, the 601 providesthe ability to map memory areas to the I/O controller interface (SR[T] = 1) with thefollowing two kinds of operations:

• I/O controller interface operations. These operations are considered to address the noncoherent and noncacheable I/O controller interface; therefore, the 601 does not maintain coherency for these operations, and the cache is bypassed completely.

• Memory-forced I/O controller interface operations. These operations are considered to address memory space and are therefore subject to the same coherency control as memory accesses. These operations are global memory references within the 601 and are considered to be noncacheable.

Cache behavior (write-back, cache-inhibition, and enforcement of MESI coherency) forthese operations is determined by the settings of the WIM bits; see Section 6.3,“Memory/Cache Access Modes.”

4.7 Cache Coherency The primary objective of a coherent memory system is to provide the same image ofmemory to all devices using the system. Coherency allows synchronization, cooperativeuse of shared resources, and task migration among the processors. Otherwise, multiplecopies of a memory location, some containing stale values, could exist in a system resulting

A B C D

601 Cache AddressBits (27..28)

Beat

Beat

0 0 0 1 1 0 1 1

A B C D

0 1 2 3

If address requested is in double word A or B then the address placed on the bus are that ofquad-word A, and the four data beats are ordered in the following manner:

If address requested is in double word C or D then the address placed on the bus will be thatof quad-word C, and the four data beats are ordered in the following manner:

C D A B

0 1 2 3


in errors when the stale values are used. Each potential bus master must follow rules formanaging the state of its cache. For example, a device must broadcast its intention to reada sector that is not currently in the cache. It must also broadcast the intention to write intoa sector that is currently not owned exclusively. Other devices respond to these broadcastsby snooping their caches for the broadcast addresses and reporting status back to theoriginating device. The status returned includes a shared indicator (another device has acopy of the addressed sector) and a retry indicator (another device either has a modifiedcopy of the addressed sector that it needs to push out of the chip, or another device had aproblem that prevented appropriate snooping).

For faster performance, the 601 has a second path into the cache directory so snooping andmainstream instruction processing occur concurrently. Instruction processing is interruptedonly when the snoop control logic detects a state change or that a snoop push of modifieddata is required to maintain memory coherency.

To maintain coherency, secondary caches must forward all relevant system bus traffic ontothe 601 bus, which takes the appropriate actions to maintain the MESI protocol.

Support for lwarx and stwcx. instructions on noncacheable pages may be somewhat morecomplicated for a secondary cache than normal cacheable memory accesses. This isbecause the secondary cache may not normally forward writes to noncacheable pages in theprocessor. However, to maintain the reservation coherency bit, the secondary cache mustforward all writes that hit against the address of a reservation set by a lwarx instructionuntil the reservation is cleared.

4.7.1 Memory Management Access Mode Bits—W, I, and MSome memory characteristics can be set on either a block or page basis by using the WIMbits in the BAT registers or page table entry (PTE) respectively. The WIM bits control thefollowing functionality:

• Write-through (W bit)• Caching-inhibited (I bit)• Memory coherency (M bit)

These bits allow both single- and multiprocessor-system designs to exploit numeroussystem-level performance optimizations. These bits are described in detail in Chapter 2,“Registers and Data Types,” and Chapter 6, “Memory Management Unit.” Using these bitscarelessly can cause coherency problems—the processor must ensure that the coherency ofthe location is maintained (i.e., the processor must manage mismatched W bit handling incases of mixed WIM = b'101' and WIM = b'001'.) The 601 considers either of these casesto be a programming error that may compromise memory coherency. These paradoxes canoccur within a single processor or across several devices, as described in Section 4.7.5.1,“Coherency in Single-Processor Systems,” and Section 4.7.5.2, “Coherency inMultiprocessor Systems.”


4.7.2 MESI ProtocolThe 601 cache characterizes each 32-byte sector it contains as being in one of four MESIstates. Addresses presented to the cache are indexed into the cache directory with bits A20–A25 and the upper-order 20 bits from the physical address translation (PA0–PA19) arecompared against the indexed cache directory tags. If no tags match, the result is a cachemiss. If a tag matches, a cache hit occurred and the directory indicates the state of the sectorthrough two state bits kept with the tag. The four possible states for a sector in the cacheare the invalid state (I), the shared state (S), the exclusive state (E), and the modified state(M). The four MESI states are defined in Table 4-1 and illustrated in Figure 4-3.

Table 4-1. MESI State Definitions

MESI State Definition

Modified (M) The addressed sector is valid in the cache and in only this cache. The sector is modified with respect to system memory—that is, the modified data in the sector has not been written back to memory.

Exclusive (E) The addressed sector is in this cache only. The data in this sector is consistent with system memory.

Shared (S) The addressed sector is valid in the cache and in at least one other cache. This sector is always consistent with system memory. That is, the shared state is shared-unmodified; there is no shared-modified state.

Invalid (I) This state indicates that the addressed sector is not resident in the cache.


Figure 4-3. MESI States

4.7.3 MESI State DiagramThe 601 provides dedicated hardware to provide memory coherency by snooping bustransactions. The address retry capability of the 601 enforces the MESI protocol, as shownin Figure 4-4. Figure 4-4 assumes that the WIM bits are set to 001; that is, write-back,caching-not-inhibited, and memory coherency enforced.

Modified in Cache A

Cache A Cache B

System Memory

Cache A Cache B

System Memory

Cache A Cache B Cache A Cache B

System Memory

Valid DataM Invalid DataI

Shared in Cache A

Valid Data Valid DataS S

Invalid Data Valid Data

Exclusive in Cache A

E Valid Data Invalid DataI

Valid Data

Invalid DataI Don’t CareX

Invalid in Cache A


Table 4-7 gives a detailed list of MESI transitions for various operations and WIM bitsettings.

Figure 4-4. MESI Cache Coherency Protocol—State Diagram (WIM = 001)

4.7.4 MESI Hardware ConsiderationsIn addition to the hardware required to monitor bus traffic for coherency, the 601 has acache port dedicated to snooping so that comparing cache entries to address traffic on thebus does not affect the 601’s on-chip cache.

The global (GBL) signal, asserted as part of the address attribute field, enables the snoopinghardware of the 601. Address bus masters assert GBL to indicate that the current transactionis a global access (that is, an access to memory shared by more than one device). If GBL isnot asserted for the transaction, that transaction is not snooped.

SH

W(s

ecto

r w

rite-

back

)

SHARED

SHR

RH

RH

EXCLUSIVE

SHW

RMS

SH

R

SHWSHR

RME WH

WH

WH

RH

MODIFIED

SH

W

INVALID(On a miss, the

replaced line is first cast out to memory

if modified)

WM

BUS TRANSACTIONS

RH = Read Hit = Snoop PushRMS = Read Miss, SharedRME = Read Miss, Exclusive = Invalidate TransactionWH = Write HitWM = Write Miss = Read-with-Intent-to-Modify

SHR = Snoop Hit on a ReadSHW = Snoop Hit on a Write or = Read

Read-with-Intent-to-Modify


Normally, GBL reflects the M-bit value specified for the memory reference in thecorresponding translation descriptor(s). Care must be taken to minimize the number ofpages marked as global, because the retry protocol enforces coherency and can useconsiderable bus bandwidth if much data is shared. Therefore available bus bandwidth candecrease as more traffic is marked global. Note that in Figure 4-4, write hits to unmodifiedlines of nonglobal pages do not generate invalidate broadcasts.

The 601 snoops a transaction if the transfer start (TS) and GBL inputs are asserted togetherin the same bus clock (this is a qualified snooping condition). No snoop update to the 601cache occurs if the snooped transaction is not marked global. This includes invalidationcycles.

When the 601 detects a qualified snoop condition, the address associated with the TS iscompared with the cache tags through a dedicated cache-tag snoop port. Snooping finishesif no hit is detected. If, however, the address hits in the cache, the 601 reacts according tothe MESI protocol shown in Figure 4-4.

Because they do not require snooping, cache sector cast-outs, and snoop pushes do notassert GBL. The 601 marks these transactions as nonglobal.

To facilitate external monitoring of the internal cache tags, the cache set member signals(CSE0–CSE2) represent in binary the sector of the cache set being replaced on readoperations (including read-with-intent-to-modify operations). This does not apply and isnot necessary for write operations to memory. Note that these signals are valid only for 601burst operations. Table 4-2 shows the (cache set element) CSE encodings.

4.7.5 Coherency PrecautionsCache coherency is greatly affected by whether the 601 is used in a single- or multiple-processor implementation. This section describes precautions for implementing coherentsingle- and multiple-processor systems.

Table 4-2. CSE0–CSE2 Signals

CSE0–CSE2 Cache Set Element

000 Set 0

001 Set 1

010 Set 2

011 Set 3

100 Set 4

101 Set 5

110 Set 6

111 Set 7


4.7.5.1 Coherency in Single-Processor SystemsThe following situations concerning coherency can be encountered within a single-processor implementation:

• Load or store to a cache-inhibited page (WIM = b'X1X') and a cache hit occurs

Caching is inhibited for this page (I = 1). Load or store operations to a cache-inhibited page that hit in the cache cause a paradox. If the addressed sector is not modified, the 601 invalidates the sector and performs the memory access. If the addressed sector in the cache line is modified, the 601 flushes the modified sector before accessing memory.

• Store to a page marked write-through (WIM = b'10X') and a cache hit to a modified sector

This page is marked as write-through (W = 1). The 601 pushes the modified sector to memory and marks the sector exclusive (E). Then the 601writes the data into the cache, marking it exclusive and passing on a write-with-flush operation (to the memory queue).

Note that when WIM bits are changed, it is critical that the cache contents should reflect thenew WIM bit settings. For example, if a block or page that had allowed caching becomescaching-inhibited, software should ensure that the appropriate cache sectors are flushed tomemory and invalidated.

4.7.5.2 Coherency in Multiprocessor SystemsOther situations concerning coherency can occur across multiple processors (or systemsthat employ multiple devices that incorporate caches). Paradoxes in multiprocessor systemsare particularly difficult to handle since some scenarios cause modified data to be purgedand others may lead to bus deadlock scenarios.

Most multiprocessor paradoxes center around the interprocessor coherency of the memorycoherency bit (the M bit). Improper use of the M bit can lead to multiple devices acceptinga cache sector and marking the data as exclusive, leading to the possibility of the samecache line being modified in multiple caches.

Although these coherency paradoxes are considered programming errors, the 601 attemptsto handle the offending conditions and minimize the negative effects on memory coherency.Note that the intent of this effort is to ease the debugging of multiprocessor operatingsystem development.


The following list shows some of the operations provided by the 601:

• Noncacheable write operations appear on the processor bus as write-with-flush operations, which forces other processors with modified copies of the addressed sector to write data back to memory and to mark the sector as invalid in the cache. Devices with an unmodified copy of the sector must mark the sector as invalid in their caches.

• All noncacheable read operations appear on the 601 bus as read (with clean) operations, which forces processors with modified copies of the addressed data to write the data back to memory before the read operation completes.

Note that when WIM bits are changed, it is critical that the cache contents should reflect thenew WIM bit settings. For example, if a block or page that had allowed caching becomescaching-inhibited, the appropriate cache sectors should be updated to leave no indicationthat caching had previously been allowed.

Additional information on bus operations that are generated for specific instructions andstate conditions can be found in Chapter 9, “System Interface Operation.”

4.7.6 Memory Loads and StoresTable 4-3 provides a general overview of memory coherency actions performed by the 601on load operations.

Noncacheable cases are not part of this table. The first three cases also involve selecting areplacement class and casting-out modified data that may have resided in that replacementclass.

Table 4-4 provides an overview of memory coherency actions on store operations. Thistable does not include noncacheable or write-through cases nor does it completely describethe exact mechanisms for the operations described. It describes generally what happenswithin the chip. The read-with-intent-to-modify (RWITM) examples involve selecting areplacement class and casting-out modified data that may have resided in that replacementclass.

Table 4-3. Memory Coherency Actions on Load Operations

Cache StateBus

OperationARTRY SHD Action

I Read Negated Negated Load data and mark E

I Read Negated Asserted Load data and mark S

I Read Asserted Don’t care Retry read operation

S None Don’t care Don’t care Read from cache

E None Don’t care Don’t care Read from cache

M None Don’t care Don’t care Read from cache


4.7.7 Atomic Memory ReferencesThe lwarx/stwcx. instruction combination can be used to emulate atomic memoryreferences. These instructions are described in Chapter 3, “Addressing Modes andInstruction Set Summary,” and Chapter 10, “Instruction Set.”

4.7.8 Snoop Response to Bus OperationsWhen the 601 is not the bus master, it monitors bus traffic and performs cache and memory-queue snooping as appropriate. The snooping operation is triggered by the receipt of aqualified snoop request. A qualified snoop request is generated by the simultaneousassertion of the TS and GBL bus signals.

Instruction processing is interrupted only when a snoop hit occurs and the snoop statemachine determines that an additional cache snoop is required to resolve the coherency ofthe offended sector.

The 601 maintains a write queue of bus operations in progress and/or pending arbitration.This write queue is also be snooped in response to qualified snoop requests. Note thatsector-length (four-beat) write operations, are always snooped in the write queue; however,single-beat writes are not snooped. Coherency for single-beat writes is maintained by theuse of cache operations that are broadcast with the write on the system interface.

The 601 drives two snoop status signals (ARTRY and SHD) in response to a qualified snooprequest that hits. These signals provide information about the state of the addressed sectorfor the current bus operation. For more information about these signals, see Chapter 8,“Signal Descriptions.”

Table 4-4. Memory Coherency Actions on Store Operations

Cache StateBus

OperationARTRY SHD Action

I RWITM Negated Don’t care Load data, modify it, mark M

I RWITM Asserted Don’t care Retry the RWITM

S Kill Negated Don’t care Modify cache, mark M

S Kill Asserted Don’t care Retry the kill operation

E None Don't care Don’t care Modify cache, mark M

M None Don't care Don’t care Modify cache


4.7.9 Cache Reaction to Specific Bus OperationsThere are several bus transaction types defined for the 601 bus. The 601 must snoop thesetransactions and perform the appropriate action to maintain memory coherency; seeTable 4-5. For example, because single-beat write operations are not snooped when they arequeued in the memory unit, additional operations such as flush or kill operations, must bebroadcast when the write is passed to the system interface to ensure coherency.

A processor may assert ARTRY for any bus transaction due to internal conflicts that preventthe appropriate snooping. In general, if ARTRY is not asserted, each snooping processormust take full ownership for the effects of the bus transaction with respect to the state of theprocessor. The processor can assert ARTRY if an internal conflict prevents it from snoopingproperly.

The transactions in Table 4-5 correspond to the transfer type signals TT0–TT4, which aredescribed in Section 8.2.4.1, “Transfer Type (TT0–TT4).”

Table 4-5. Response to Bus Transactions

Transaction Response

Clean block The clean operation is an address-only bus transaction, initiated by executing a dcbst instruction. This operation affects only sectors marked as modified (M). Assuming the GBL signal is asserted, modified sectors are pushed out to memory, changing the state to E.

Flush block The flush operation is an address-only bus transaction initiated by executing a dcbf instruction. Assuming the GBL signal is asserted, the flush block operation results in the following:• If the addressed sector is shared or exclusive, an additional snoop action is

generated internally that invalidates the addressed sector.• If the addressed sector is in the M state, ARTRY is asserted and an additional

internally generated snoop action is initiated that pushes the modified sector out of the cache and invalidates the sector.

• If HID0[31] = 0, and any bus read operation is pending during this snoop operation, the write-back of the modified sector is considered to be a high-priority bus operation that may be enveloped within the pending load operation.

• If HID0[31] = 1, and the snoop flush was presented with HP_SNP_REQ asserted, the write-back of the modified sector is considered to be a high-priority bus operation that may be enveloped within the pending load operation.

• If the addressed sector hits any of the three entries in the write queue, that entry is tagged as a high-priority push, after which it can be loaded from memory.


Write with flushWrite with flush atomic

Write-with-flush and write-with-flush-atomic operations occur after the processor issues a store or stwcx. instruction, respectively. • If the addressed sector is in the shared or exclusive state, the address snoop forces

the state of the addressed sector to invalid. • If the addressed sector is in the modified state, the address snoop causes the ARTRY

to be asserted and initiates a push of the modified sector out of the cache and changes the state of the sector to invalid.

• If HID0[31] = 0, and any bus read operation is pending during this snoop operation, the write-back of the modified sector is considered to be a high-priority bus operation that may be enveloped within the pending load operation.

• If HID0[31] = 1, and the snoop write was presented with HP_SNP_REQ asserted, the write-back of the modified sector is considered to be a high-priority bus operation that may be enveloped within the pending load operation.

• If the addressed sector hits any of the three entries in the write queue, that entry is tagged as a high-priority push operation.

Kill block The kill-block operation is an address-only bus transaction initiated when one of the following occurs:• a dcbi instruction is executed• a dcbz operation to a block marked S or I is executed• a write operation to a block marked S occursIf a snoop hit occurs, an additional snoop is initiated internally and the sector is forced to the I state, effectively killing any modified data that may have been in the sector. The three-entry write queue is also snooped, and if a queue entry hits, it is purged.

Write with kill In a write-with-kill operation, the processor snoops the cache for a copy of the addressed sector. If one is found, an additional snoop action is initiated internally and the sector is forced to the I state, killing modified data that may have been in the sector. In addition to snooping the cache, the three-entry write queue is also snooped. A kill operation that hits an entry in the write queue purges that entry from the queue.

Read Read atomic

The read operation is used by most single-beat and burst read operations on the bus. A read on the bus with the GBL signal asserted causes the following responses:• If the addressed sector is in the cache but is invalid, the 601 takes no action.• If the sector is in the shared state, the 601 asserts the shared snoop status indicator.• If the sector is in the E state, the 601 asserts the shared snoop status indicator and

initiates an additional snoop action to change the state of that sector from E to S.• If the sector is in the cache in the M state, the 601 asserts both the ARTRY and the

SHD snoop status signals. It also initiates an additional snoop action to push the modified sector out of the chip and to mark that cache sector as shared.

Read atomic operations appear on the bus in response to lwarx instructions and generate the same snooping responses as read operations.

Read with intent to modify (RWITM) RWITM atomic

An RWITM operation is issued to acquire exclusive use of a memory location for the purpose of modifying it. • If the addressed sector is in the I state, the 601 takes no action. • If the addressed sector is in the cache and in the S or E state, the 601 initiates an

additional snoop action to change the state of the cache sector to I. • If the addressed sector is in the cache and in the M state, the 601 asserts both the

ARTRY and the SHD snoop status signals. It also initiates an additional snoop action to push the modified sector out of the chip and to change the state of that sector in the cache from M to I.

The RWITM atomic operations appear on the bus in response to stwcx. instructions and are snooped like RWITM instructions.

Table 4-5. Response to Bus Transactions (Continued)



4.7.10 Internal ARTRY ScenariosThe following scenarios, along with others, cause the 601 to assert the ARTRY signal.

• Snoop hits to a sector in the M state (optional on kill requests)

• Snoop hits when a reload dump request is active

• Snoop hits on a valid (that is, not cancelled) operation that is queued internally.

• Snoop hits while a cast-out request is pending during this or the next clock cycle.

4.7.11 Enveloped High-Priority Cache Sector Push OperationIf the 601 has a read operation outstanding on the bus and another pipelined bus operationhits against a modified sector, the 601 provides a high-priority push operation. Thistransaction can be enveloped within the address and data tenures of a read operation. Thisfeature prevents deadlocks in system organizations that support multiple memory-mappedbuses. More specifically, the 601 internally detects the scenario where a load request isoutstanding and the processor has pipelined a write operation on top of the load. Normally,when the data bus is granted to the 601, the resulting data bus tenure is used for the loadoperation. The enveloped high-priority cache sector push feature defines a bus signal, thedata bus write only qualifier (DBWO), which, when asserted with a qualified data-busgrant, indicates that the resulting data tenure should be used for the store operation instead.This signal is described in Section 9.10, “Using DBWO (Data Bus Write Only).” Note thatthe enveloped copy-back operation is an internally pipelined bus operation.

4.8 Cache Control InstructionsSoftware must use the appropriate cache management instructions to ensure that caches arekept consistent when data is modified by the processor or by input data transfer. When aprocessor alters a memory location that may be contained in an instruction cache, softwaremust ensure that updates to memory are visible to the instruction fetching mechanism.

sync The sync instruction causes an address-only bus transaction. The 601 asserts the ARTRY snoop status if there are any TLB-related snoop operations pending in the chip.This transaction is also generated by the eieio instruction on the 601.

TLB invalidate A TLB invalidation operation is caused by executing a tlbie instruction. This instruction transmits the 601’s TLB index (bits 12–19 of the EA) onto the system bus. Other processors on the bus invalidate TLB entries associated with EAs that match those bits.

I/O reply The I/O reply operation is part of the I/O controller interface operation. It serves as the final bus operation in the series of bus operations that service an I/O controller interface operation.

Table 4-5. Response to Bus Transactions (Continued)



Although the instructions to enforce coherency vary among implementations and hencemany operating systems will provide a system service for this function, the followingsequence is typical:

1. dcbst (update memory)2. sync (wait for update)3. icbi (invalidate copy in cache)4. isync (invalidate copy in own instruction buffer)

These operations are necessary because the processor is not required to maintain instructionmemory consistent with data memory. Software is responsible for enforcing consistency ofinstruction and data memory. Since instruction fetching may bypass the data cache,changes made to items in the data cache may not be reflected in memory until after theinstruction fetch completes.

The PowerPC architecture defines instructions for controlling both the instruction and datacaches. Instruction cache control instructions are valid instructions on the 601, but mayfunction differently than they do when used on PowerPC processors that have separateinstruction and data caches.

Note that in the PowerPC architecture, the term cache block, or simply block when used inthe context of cache implementations, refers to the unit of memory at which coherency ismaintained. For the 601 this is the eight-word sector. This value may be different for otherPowerPC implementations. In-depth descriptions of coding these instructions is providedin Chapter 3, “Addressing Modes and Instruction Set Summary,” and Chapter 10,“Instruction Set.”

4.8.1 Cache Line Compute Size Instruction (clcs)The clcs instruction places the cache information specified in the instruction into a targetregister. This instruction is used by the POWER architecture to determine the maximumand minimum line sizes for cache implementations. For a complete description of thisinstruction, refer to Chapter 10, “Instruction Set.”

4.8.2 Data Cache Block Touch Instruction (dcbt)This instruction provides a method for improving performance through the use of software-initiated fetch hints. The 601 performs the fetch for the cases when the address hits in theUTLB or the BTLB, and when it is permitted load access from the addressed page. Theoperation is treated similarly to a byte load operation with respect to memory protection.

If the address translation does not hit in the UTLB or BTLB, or if it does not have loadaccess permission, the instruction is treated as a no-op.

If the access is directed to a cache-inhibited page, or to an I/O controller interface segment,then the bus operation occurs, but the cache is not updated.

This instruction never affects the reference or change bits in the hashed page table.


While the 601 maintains a cache line size of 64 bytes, the dcbt instruction may only resultin fetching a 32-byte sector (the one directly addressed by the EA). The other 32-byte sectorin the cache line may or may not be fetched, depending on activity in the dynamic memoryqueue.

A successful dcbt instruction will affect the state of the UTLB and cache LRU bits asdefined by the LRU algorithm.

4.8.3 Data Cache Block Touch for Store Instruction (dcbtst)The dcbtst instruction behaves exactly like the dcbt instruction as implemented on the 601.

4.8.4 Data Cache Block Set to Zero Instruction (dcbz) If the block (the cache sector consisting of 32 bytes) containing the byte addressed by theEA is in the data cache, all bytes are cleared to 0.

If the block containing the byte addressed by the EA is not in the data cache and thecorresponding page is caching-allowed, the block is established in the data cache withoutfetching the block from main memory, and all bytes of the block are cleared to 0.

If the page containing the byte addressed by the EA is caching-inhibited or write-through,then the system alignment exception handler is invoked.

If the block containing the byte addressed by the EA is in coherence required mode, andthe block exists in the data cache(s) of any other processor(s), it is kept coherent in thosecaches.

The dcbz instruction is treated as a store to the addressed byte with respect to addresstranslation and protection.

If the EA corresponds to an I/O controller interface segment (SR[T] = 1), the dcbzinstruction is treated as a no-op.

See Chapter 5, “Exceptions,” for more information about a possible delayed machine checkexception interrupt that can occur by use of dcbz if the operating system has set up anincorrect memory mapping.

4.8.5 Data Cache Block Store Instruction (dcbst)If the block (the cache sector consisting of 32 bytes) containing the byte addressed by theEA is in coherence required mode, and a block containing the byte addressed by the EA isin the data cache of any processor and has been modified, the writing of it to main memoryis initiated.

The function of this instruction is independent of the write-through and cache-inhibited/allowed modes of the block containing the byte addressed by the EA.


This instruction is treated as a load from the addressed byte with respect to addresstranslation and protection.

If the EA specifies a memory address for an I/O controller interface segment (segmentregister T-bit = 1), the dcbst instruction is treated as a no-op.

4.8.6 Data Cache Block Flush Instruction (dcbf)The action taken depends on the memory mode associated with the target, and on the stateof the sector. The list below describes the action taken for the various cases. The actionsdescribed must be executed regardless of whether the page containing the addressed byteis in caching-inhibited or caching-allowed mode.

• Coherence-required mode

Unmodified sector—Invalidates copies of the sector in the caches of all processors.

Modified sector—Copies the sector to memory. Invalidates copies of the sector in the caches of all processors.

Absent sector—If modified copies of the sector are in the caches of other processors, causes them to be copied to memory and invalidated. If unmodified copies are in the caches of other processors, cause those copies to be invalidated.

• Coherence-not-required mode

Unmodified sector—Invalidates the sector in the processor’s cache.

Modified sector—Copies the sector to memory. Invalidate the sector in the processor’s cache.

Absent sector—Does nothing.

The 601 treats this instruction as a load from the addressed byte with respect to addresstranslation and protection.

4.8.7 Enforce In-Order Execution of I/O Instruction (eieio)The eieio instruction provides an ordering function for the effects of load and storeinstructions executed by a given processor. Executing eieio ensures that all memoryaccesses previously initiated by the given processor are completed with respect to mainmemory before any memory accesses subsequently initiated by the processor access mainmemory.

The eieio instruction orders loads and stores to caching-inhibited memory only.

The eieio instruction is intended for use only in doing memory-mapped I/O. It can bethought of as placing a barrier into the stream of memory accesses issued by a processor,such that any given memory access appears to be on the same side of the barrier to both theprocessor and the I/O device.

The eieio instruction may complete before previously initiated memory accesses have beenperformed with respect to other processors and mechanisms.


Unlike the sync instruction, eieio need not serialize the processor. It requires only that theprocessor execute memory accesses in the order described above, and enforce that order inany queues in the memory subsystem.

4.8.8 Instruction Cache Block Invalidate Instruction (icbi)The icbi instruction is provided in the PowerPC architecture for use in processors withseparate instruction and data caches. The effective (logical) address is computed, translated,and checked for protection violations as defined in the PowerPC architecture; however, theinstruction functions as a no-op on the 601.

In the PowerPC architecture, the icbi instruction performs the following function:

• If the block (sector) containing the byte addressed by EA is in coherency-required mode and a sector containing the byte addressed by EA is in the instruction cache of any processor, the sector is made invalid in all such processors, so that subsequent references cause the sector to be refetched.

• If coherency is not required for the sector containing the byte addressed by EA and a sector containing the byte addressed by EA is in the instruction cache of this processor, the sector is made invalid in this processor so that subsequent references cause the sector to be fetched from main memory (or from a cache).

4.8.9 Instruction Synchronize Instruction (isync)The isync instruction waits for all previous instructions to complete and then discards anypreviously fetched instructions, causing subsequent instructions to be fetched (or refetched)from memory and to execute in the context established by the previous instructions. Thisinstruction has no effect on other processors or on their caches.

4.9 Bus Operations Caused by Cache Control Instructions

Table 4-6 provides an overview of the bus operations initiated by cache control instructions.Note that Table 4-6 assumes that the WIM bits are set to 001; that is, since the cache isoperating in write-back mode, caching is permitted and coherency is enforced.


Table 4-6 does not include noncacheable or write-through cases, nor does it completelydescribe the mechanisms for the operations described. For more information, seeSection 4.11, “MESI State Transactions.”

Chapter 3, “Addressing Modes and Instruction Set Summary,” and Chapter 10, “InstructionSet,” describe the cache control instructions in detail. Several of the cache controlinstructions broadcast onto the 601 interface so that all processors in a multiprocessorsystem can take appropriate actions. The 601 contains snooping logic to monitor the bus forthese commands and the control logic required to keep the cache and the memory queuescoherent. For additional details about the specific bus operations performed by the 601, seeChapter 9, “System Interface Operation.”

4.10 Memory Unit The 601’s memory unit contains read and write queues that buffer operations between theexternal interface and the cache. These operations are comprised of operations resultingfrom load and store instructions that are cache misses, read and write operations requiredto maintain cache coherency, and table search operations. As shown in Figure 4-5, the readqueue contains two elements and the write queue contains three elements. Each element ofthe write queue can contain as many as eight words (one sector) of data. One element of thewrite queue, marked snoop in Figure 4-5, is dedicated to writing cache sectors to systemmemory after a modified sector is hit by a snoop from another processor or snooping deviceon the system bus. The use of this queue guarantees that a high-priority operation receivesa deterministic response time when snooping hits a modified sector.

Table 4-6. Bus Operations Caused by Cache Control Instructions (WIM = 001)

Operation Cache State Next Cache State Bus Operations Comment

sync/eieio Don’t care No change sync First clears memory queue

dcbi Don’t care I Kill —

dcbf I, S, E I Flush —

dcbf M I Write with kill Sector is pushed

dcbst I, S, E No change Clean —

dcbst M E Write with kill Sector is pushed

dcbz I M Kill May also cast out a sector

dcbz S M Kill —

dcbz E, M M None Writes over modified data

dcbt I No change Read State change on reload may cast out sector

dcbt S, E, M No change None —


Figure 4-5. Memory Unit

The other two elements in the write queue are used for store operations and writing backmodified sectors that have been deallocated by updating the queue; that is, when a cachesector is full, the least-recently used cache sector is deallocated by first being copied intothe write queue and from there to system memory if it is modified. Note that snooping canoccur after a sector has been pushed out into the write queue and before the data has beenwritten to system memory. Therefore, to maintain a coherent memory, the write queueelements are compared to snooped addresses in the same way as the cache tags. If a snoophits a write queue element, the data is first stored in system memory before it can be loadedinto the cache of the snooping bus master. Full coherency checking between the cache andthe write queue prevents dependency conflicts.

For a detailed discussion about the retry signals and bus operations pertaining to snooping,see Chapter 9, “System Interface Operation.”

Execution of a load or store instruction is considered complete when the associated addresstranslation completes, guaranteeing that the instruction has completed to the point where itis known that it will not generate an internal exception. However, after address translationis complete, a read or write operation can generate an external exception.

Load and store instructions are always issued and translated in program order with respectto other load and store instructions. However, a load or store operation that hits in the cachecan complete ahead of those that miss in the cache; additionally, loads and stores that missthe cache can be reordered as they arbitrate for the system bus.

If a load or store misses in the cache, the operation is managed by the memory unit whichprioritizes accesses to the system bus. Read requests, such as loads, RWITMs, andinstruction fetches have priority over single-beat write operations The priorities foraccessing the system bus are listed in Section 4.10.2, “Memory Unit Queuing Priorities.”The 601 ensures memory consistency by comparing target addresses and prohibiting

READ QUEUE

WRITE QUEUE

SNOOP

ADDRESS DATA

ADDRESS(from cache)

DATA(from cache)

SYSTEM INTERFACE

(to cache)

DATA QUEUE

(four word)


instructions from completing out of order if an address matches. Load and store operationscan be forced to execute in strict program order by inserting a sync instruction between eachsequential memory access instruction.

4.10.1 Memory Unit Queuing StructureThe memory queue receives requests from the cache unit for arbitration onto the 601 businterface. These requests may either be presented immediately to the bus interface logic orthey may be queued for future arbitration onto the bus. The memory queue consists of atwo-element read queue and a three-element write queue. Each write queue element canhold a sector of data (32 bytes) associated with a single address.

Some operations presented to the memory queue cannot be queued. These operationstypically require synchronization with respect to either the execution units, the cache, or thememory queue itself. In general, when these requests are presented and not arbitrateddirectly onto the bus, they stall above the cache (but do not necessarily prevent use of thecache) and attempt to re-arbitrate on the next cycle. These operations include the following:

• Cache control instructions that are broadcast • Execution of the tlbie instruction• Execution of the sync instruction• Execution of the eieio instruction• Cache requests for exclusive ownership when the sector is resident but not exclusive

in the cache

The memory queue also allows the optional loading of the sector adjacent to the onecontaining the critical data. As the memory read queue receives and processes cache sectorreload requests, it is advantageous to fetch the other sector if it is not already in the cacheunless fetching the other sector delays access to data required for the machine to continueprocessing. The memory unit logic detects whether other operations are pending; if not, itinitiates a fetch for the other sector. Note that this function can be disabled by setting bit 26in HID0 (for instruction fetch misses) and bit 27 in HID0 (for load/store misses).

4.10.2 Memory Unit Queuing PrioritiesThis section describes the priorities for access to the system interface:

1. High-priority cache push-out operations

2. Normal snoop push-out operations

3. I/O controller interface segment accesses that incur no additional delays (that is, they have not been retried because of latency).

4. Cache instruction operations

5. Read requests, such as loads, RWITMs, and instruction fetches

6. Single-beat write operations

7. sync instructions


8. Optional cache-line fill operations

9. Cache sector cast-out operations

10. I/O controller interface segment accesses that incur additional delays (that is, they have been retried because of external device latency)

4.10.3 Bus InterfaceThe bus interface logic sequences operations onto the 601 bus according to definedprotocols. The bus interface logic is also responsible for snooping other bus traffic,presenting these operations to the rest of the device for coherency considerations andreporting the appropriate snoop status onto the bus.

For additional information about the 601 bus interface and the bus protocols, refer toChapter 9, “System Interface Operation.”

4.11 MESI State TransactionsTable 4-7 shows MESI state transitions for various operations. The bus synchronizationcolumn indicates whether exclusivity is required. Bus operations are described inTable 4-5.

Table 4-7. MESI State Transitions

OperationCache Operation

Bus sync

WIMCurrent

StateNext State

Cache ActionsBus

Operation

Load or Fetch(T = 0)

Read No x0x I Same 1 Cast out of modified sector 1 (as required)

Write with kill

2 Pass four-beat read to memory queue

Read

3 Secondary cast out of sector 2 (as required)

Write with kill

Load or Fetch(T = 0)

Read No x0x S,E,M Same Read data from cache —

Load or FetchT = 0 or Load (T = 1, BUID = x'7F')

Read No x1x I Same Pass single-beat read to memory queue

Read


Read No x1x S,E I CRTRY read —


Read No x1x M I CRTRY read (push sector to write queue)

Write with kill


Load(T = 1,BUID ≠ x'7F')

I/O controller load

— x1x — — — I/O load

lwarx Read Acts like other reads but bus operation uses special encoding

Store(T = 0)

Write No 00x I Same 1 Cast out of modified sector (if necessary)

Write with kill

2 Pass RWITM to memory queue

RWITM

3 Cast out of sector 2 (if necessary)

Write with kill

Store(T = 0)

Write Yes 00x S Same 1 CRTRY write —

2 Pass kill Kill

M 3 Write data to cache —

Store(T = 0)

Write No 00x E,M M Write data to cache —

Store ≠ stwcx.(T = 0)

Write No 10x I Same Pass single-beat write to memory queue

Write with flush


Write No 10x S,E Same 1 Write data to cache —

2 Pass single-beat write to memory queue

Write with flush


Write No 10x M E 1 CRTRY write —

2 Push sector to write queue

Write with kill

Store (T = 0)or stwcx. (WIM = 10x)or store (T = 1, BUID = x'7F'

Write No x1x I Same Pass single-beat write to memory queue

Write with flush

Store (T = 0)or stwcx. (WIM = 10x)or store (T = 1, BUID = x'7F')

Write No x1x S,E I CRTRY write —

Store (T = 0)or stwcx. (WIM = 10x)or store (T = 1, BUID = x'7F')

Write No x1x M I 1 CRTRY write —


Write with kill

Store (T = 1, BUID ≠ x'7F')

I/O controller

No — — — — I/O store request

Table 4-7. MESI State Transitions (Continued)


Bus sync

WIMCurrent

StateNext State

Cache ActionsBus

Operation


stwcx. Conditional write

If the reserved bit is set, this operation is like other writes except the bus operation uses a special encoding.

dcbf Data cache block flush

Yes xxx I,S,E Same 1 CRTRY dcbf —

2 Pass flush Flush

Same I 3 State change only —

dcbf Data cache block flush

No xxx M I Push sector to write queue

Write with kill

dcbst Data cache block store

Yes xxx I,S,E Same 1 CRTRY dcbst —

2 Pass clean Clean

Same Same 3 No action —

dcbst Data cache block store

No xxx M E Push sector to write queue

Write with kill

dcbz Data cache block set to zero

No x1x x x Alignment trap —


No 10x x x Alignment trap —


Yes 00x I Same 1 CRTRY dcbz —

2 Cast out of modified sector

Write with kill

3 Pass kill Kill

4 Secondary cast out of sector 2

Write with kill

Same M 5 Clear sector —


Yes 00x S Same 1 CRTRY dcbz —

2 Pass kill Kill

Same M 3 Clear sector —


No 00x E,M M Clear sector —

dcbt Data cache block touch

No x1x I Same Pass single-beat read to memory queue

Read


No x1x S,E I CRTRY read —


No x1x M I 1 CRTRY read —


Write with kill



Bus sync

WIMCurrent

StateNext State

Cache ActionsBus

Operation



No x0x I Same 1 Cast out of modified sector (as required)

Write with kill

2 Pass four-beat read to memory queue

Read

3 Secondary cast out of sector (as required)

Write with kill


No x0x S,E,M Same No action —

Secondary cast out

Secondary cast out

No xxx I Same Cast out Write with kill

Single-beat read

Reload dump 1

No xxx I Same Forward data_in —

Four-beat read (quad-word 1)

Reload dump 1

No xxx I Same 1 Forward data_in —

2 Write data_in to cache

—

Four-beat read (quad-word 2)—S

Reload dump 2

No xxx I S Write data_in to cache —

Four-beat read (quad-word 2)—E

Reload dump 2

No xxx I E Write data_in to cache —

Four-beat write (quad-word 1)

Reload dump1

No xxx I Same 1 Splice and forward data_in

—

2 Write data_in to cache

—

Four-beat write (quad-word 2)

Reload dump 2

No xxx I M Write data_in to cache —

Optional reload of adjacent sector (quad-word 1)

Reload dump 1

No xxx I Same Write data_in to cache —

Optional reload of adjacent sector (quad-word 2)—S

Reload dump 2

No xxx I S Write data_in to cache —

Optional reload of adjacent sector (quad-word 2)—E

Reload dump 2

No xxx I E Write data_in to cache —



Bus sync

WIMCurrent

StateNext State

Cache ActionsBus

Operation


Note that dcbt is presented to the cache as a load operation. The instructions tlbie and sync/eieio cause no state transitions and are not cache operations but are included in the table to show how they are performed by the memory unit queueing mechanism.

Note also that single-beat writes are not snooped in the write queue.

E→S Snoopread

No xxx E S State change only (committed)

—

S→I Snoopwrite or kill

No xxx S I State change only (committed)

—

E→I Snoopwrite or kill

No xxx E I State change only (committed)

—

M→I Snoopkill

No xxx M I State change only (committed)

—

PushM→S

Snoopread

No xxx M S Conditionally push Write with kill

PushM→I

Snoopflush

No xxx M I Conditionally push Write with kill

PushM→E

Snoopclean

No xxx M E Conditionally push Write with kill

tlbie TLB invalidate

Yes xxx x x 1 CRTRY TLB invalidate

—

2 Pass TLB invalidate TLB invalidate

3 No action —

sync/eieio Synchronization

Yes xxx x x 1 CRTRY sync —

2 Pass sync dsync

3 No action —



Bus sync

WIMCurrent

StateNext State

Cache ActionsBus

Operation

Chapter 5. Exceptions 5-1

Chapter 5 Exceptions5050

The PowerPC exception mechanism allows the processor to change to supervisor state as aresult of external signals, errors, or unusual conditions arising in the execution ofinstructions. When exceptions occur, information about the state of the processor is savedto certain registers and the processor begins execution at an address (exception vector)predetermined for each exception. The exception handler at the specified vector is thenprocessed with the processor in supervisor mode.

Although multiple exception conditions can map to a single exception vector, a morespecific condition may be determined by examining a register associated with theexception—for example, the DAE/source instruction service register (DSISR) and thefloating-point status and control register (FPSCR). Additionally, some exception conditionscan be explicitly enabled or disabled by software.

Except for the catastrophic asynchronous exceptions (machine check and system reset) thePowerPC 601 microprocessor exception model is precise, defined as follows:

• The exception handler is given the address of the excepting instruction (or the next instruction to execute in the case of asynchronous, precise exceptions).

• All instructions prior to the excepting instruction in the instruction stream have completed execution and have written back their results.

• No instructions subsequent to the excepting instruction in the instruction stream have been issued.

A detailed description of how the instruction flow is handled in a precise fashion is providedin 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model.”

The PowerPC architecture requires that exceptions be handled in program order; therefore,although a particular implementation may recognize exception conditions out of order, theyare presented strictly in order. When an instruction-caused exception is recognized, anyunexecuted instructions that appear earlier in the instruction stream, including any that havenot yet entered the execute state, are required to complete before the exception is taken. Anyexceptions caused by those instructions are handled first. Likewise, exceptions that areasynchronous and precise are recognized when they occur, but are not handled until allinstructions currently in the execute stage successfully complete execution and report theirresults.


Unless a catastrophic condition causes a system reset or machine check exception, only oneexception is handled at a time. If, for example, a single instruction encounters multipleexception conditions, those conditions are encountered sequentially. After the exceptionhandler handles an exception, the instruction execution continues until the next exceptioncondition is encountered. However, in many cases there is no attempt to reexecute theinstruction. This method of recognizing and handling exception conditions sequentiallyguarantees that exceptions are recoverable.

Exception handlers should save the information stored in SRR0 and SRR1 early to preventthe program state from being lost due to a system reset and machine check exception or toan instruction-caused exception in the exception handler, and before enabling externalinterrupts.

This chapter describes the 601 exception model, it explains each class of instruction, and itdescribes how the program state is saved for individual exceptions.

5.1 Exception ClassesAs specified by the PowerPC architecture, all 601 exceptions can be described as eitherprecise or imprecise and either synchronous or asynchronous. Asynchronous exceptions arecaused by events external to the processor’s execution; synchronous exceptions, which areall handled precisely by the 601, are caused by instructions.

The 601 exceptions are shown in Table 5-1.

Although exceptions have other characteristics as well, such as whether they are maskableor nonmaskable, the distinctions shown in Table 5-1 define categories of exceptions that the601 recognizes. Note that Table 5-1 includes no synchronous imprecise instructions. Whilethe PowerPC architecture supports imprecise floating-point exceptions, they do not occurin the 601.

Exceptions, and conditions that cause them, are listed in Table 5-2.

Table 5-1. PowerPC 601 Microprocessor Exception Classifications

Synchronous/Asynchronous Precise/Imprecise Exception Type

Asynchronous Imprecise Machine checkSystem reset

Asynchronous Precise External interruptDecrementer

Synchronous Precise Instruction-caused exceptions


Table 5-2. Exceptions, Vector Offsets, and Conditions

Exception Type

Vector Offset(hex)

Causing Conditions

Reserved 00000 —

System reset 00100 A system reset is caused by the assertion of either SRESET or HRESET.

Machine check 00200 A machine check is caused by the assertion of the TEA signal during a data bus transaction.

Data access 00300 The cause of a data access exception can be determined by the bit settings in the DSISR, listed as follows:1 Set if the translation of an attempted access is not found in the primary

hash table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a BAT register; otherwise cleared.

4 Set if a memory access is not permitted by the page or BAT protection mechanism described in Chapter 6, “Memory Management Unit”; otherwise cleared.

5 Set if the access was to an I/O segment (SR[T] =1) by an eciwx, ecowx, lwarx, stwcx., or lscbx instruction; otherwise cleared. Set by an eciwx or ecowx instruction if the access is to an address that is marked as write-through.

6 Set for a store operation and cleared for a load operation. 9 Set if an EA matches the address in the DABR while in one of the three

compare modes. 11 Set if eciwx or ecowx is used and EAR[E] is cleared.

Instruction access

00400 An instruction access exception is caused when an instruction fetch cannot be performed for any of the following reasons:• The effective (logical) address cannot be translated. That is, there is a page

fault for this portion of the translation, so an instruction access exception must be taken to retrieve the translation from a storage device such as a hard disk drive.

• The fetch access is to an I/O segment.• The fetch access violates memory protection. If the key bits (Ks and Ku) in

the segment register and the PP bits in the PTE or BAT are set to prohibit read access, instructions cannot be fetched from this location.

External interrupt

00500 An external interrupt occurs when the INT signal is asserted.

Alignment 00600 An alignment exception is caused when the 601 cannot perform a memory access for any of several reasons, such as when the operand of a floating-point load or store operation is in an I/O segment (SR[T] = 1) or a scalar load/store operand crosses a page boundary. Specific exception sources are described in Section 5.4.6, “Alignment Exception (x'00600').”


Program 00700 A program exception is caused by one of the following exception conditions, which correspond to bit settings in SRR1 and arise during execution of an instruction:• Floating-point enabled exception—A floating-point enabled exception

condition is generated when the following condition is met: (MSR[FE0] | MSR[FE1]) & FPSCR[FEX] is 1.

FPSCR[FEX] is set by the execution of a floating-point instruction that causes an enabled exception or by the execution of a “move to FPSCR” instruction that results in both an exception condition bit and its corresponding enable bit being set in the FPSCR.

• Illegal instruction—An illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode fields (including PowerPC instructions not implemented in the 601), or when execution of an optional instruction not provided in the 601 is attempted (these do not include those optional instructions that are treated as no-ops). The PowerPC instruction set is described in Chapter 3, “Addressing Modes and Instruction Set Summary.”

• Privileged instruction—A privileged instruction type program exception is generated when the execution of a supervisor instruction is attempted and the MSR register user privilege bit, MSR[PR], is set. In the 601, this exception is generated for mtspr or mfspr with an invalid SPR field if SPR[0] = 1 and MSR[PR] = 1. This may not be true for all PowerPC processors.

• Trap—A trap type program exception is generated when any of the conditions specified in a trap instruction is met.


00800 A floating-point unavailable exception is caused by an attempt to execute a floating-point instruction (including floating-point load, store, and move instructions) when the floating-point available bit is disabled (MSR[FP] = 0).

Decrementer 00900 The decrementer exception occurs when the most significant bit of the decrementer (DEC) register transitions from 0 to 1. Must also be enabled with the MSR[EE] bit.

I/O controller interface error

00A00 An I/O controller interface error exception is taken only when an operation to an I/O controller interface segment fails (such a failure is indicated to the 601 by a particular bus reply packet). If an I/O controller interface exception is taken on a memory access directed to an I/O segment, the SRR0 contains the address of the instruction following the offending instruction. Note that this exception is not implemented in other PowerPC processors.

Reserved 00B00 —

System call 00C00 A system call exception occurs when a System Call (sc) instruction is executed.

Reserved 00D00 Other PowerPC processors may use this vector for trace exceptions.

Reserved 00E00 The 601 does not generate an interrupt to this vector. Other PowerPC processors may use this vector for floating-point assist exceptions.

Reserved 00E10–00FFF —

Reserved 01000–01FFF Reserved, implementation-specific

Table 5-2. Exceptions, Vector Offsets, and Conditions (Continued)

Exception Type

Vector Offset(hex)

Causing Conditions


5.1.1 Precise ExceptionsIn the 601, all synchronous exceptions and the asynchronous external interrupt anddecrementer exceptions are handled precisely; that is, all instructions that occur in theinstruction stream before the excepting event appear to complete and subsequentinstructions execute after the exception has been handled. When one of the 601’s preciseexceptions occurs, SRR0 is set to point to an instruction such that all prior instructions inthe instruction stream have completed execution and no subsequent instruction has begunexecution. However, depending on the exception type, the instruction addressed by SRR0may not have completed execution.

When an exception occurs, instruction dispatch (the issuance of instructions by theinstruction fetch unit to any instruction execution unit) is halted and the followingsynchronization is performed:

1. The exception mechanism waits for all previous instructions in the instruction stream to complete to a point where they report all exceptions they will cause.

2. The processor ensures that all previous instructions in the instruction stream complete in the context in which they began execution.

3. The exception mechanism is responsible for saving and restoring the processor state. After control passes back to the program that encountered the exception, there are no instructions in execute stage, and the user program instructions are dispatched and executed in this new context.


02000 The run mode exception is taken depending on the settings of the HID1 register and the MSR[SE] bit. The following modes correspond with bit settings in the HID1 register:• Normal run mode—No address breakpoints are specified, and the 601

executes from zero to three instructions per cycle• Single instruction step mode—One instruction is processed at a time. The

appropriate break action is taken after an instruction is executed and the processor quiesces.

• Limited instruction address compare—The 601 runs at full speed (in parallel) until the EA of the instruction being decoded matches the EA contained in HID2. Addresses for branch instructions and floating-point instructions may never be detected.

• Full instruction address compare mode—Processing proceeds out of IQ0. When the EA in HID2 matches the EA of the instruction in IQ0, the appropriate break action is performed. Unlike the limited instruction address compare mode, all instructions pass through the IQ0 in this mode. That is, instructions cannot be folded out of the instruction stream.

The following mode is taken when the MSR[SE] bit is set.• MSR[SE] trace mode—Note that in other PowerPC implementations, the

trace exception is a separate exception with its own vector x'00D00'.

Reserved 02001–03FFF —

Table 5-2. Exceptions, Vector Offsets, and Conditions (Continued)

Exception Type

Vector Offset(hex)

Causing Conditions


5.1.1.1 Synchronous/Precise ExceptionsIn the 601, all exceptions caused by instructions are precise. When instruction executioncauses a precise exception, the following conditions exist at the exception point:

• Depending on the type of exception, SRR0 addresses either the instruction causing the exception or the immediately following instruction. The instruction addressed can be determined from the exception type and status bits, which are described with the description of each exception.

• All instructions that precede the excepting instruction are allowed to complete before the exception is processed. However, some memory accesses generated by these preceding instructions may not have been performed with respect to all other processors or system devices.

• The instruction causing the exception may not have begun execution, may have partially completed, or may have completed, depending on the exception type.

• No subsequent instructions in the instruction stream complete execution.

Note that other PowerPC microprocessors may support optional imprecise floating-pointexception modes. While parallel processing allows the possibility of two instructionsreporting exceptions during the same cycle, they are handled in program order. If a singleinstruction generates multiple exception conditions, those exceptions are handledsequentially, as described in Section 5.1.3, “Recognition of Exceptions.” Exceptionpriorities are described in Section 5.1.2, “Exception Priorities.”

5.1.1.2 Asynchronous/Precise ExceptionsThe 601 supports two asynchronous, precise exceptions—external interrupt anddecrementer exceptions. For asynchronous exceptions, the following conditions exist at theexception point:

• All instructions issued before the event that caused the exception, and any undispatched instructions that precede those instructions in the instruction stream, appear to have completed before the exception is processed. However, some memory accesses generated by these preceding instructions may not have been performed with respect to all other processors or system devices.

• SRR0 addresses the instruction that would have been executed next had the exception not occurred.

• Architecturally, no subsequent instructions in the instruction stream complete execution.

These two exceptions are maskable. When the machine state register external interruptenable bits are cleared (MSR[EE] = 0), these exception conditions are latched and are notrecognized until the EE bit is set. MSR[EE] is cleared automatically when an exception istaken to delay recognition of conditions causing asynchronous, precise exceptions. No twoprecise exceptions can be recognized simultaneously. Handling of an asynchronous,precise exception does not begin until all currently executing instructions complete and anysynchronous, precise exceptions caused by those instructions have been handled, as


described in Section 5.1.3, “Recognition of Exceptions.” Exception priorities are describedin Section 5.1.2, “Exception Priorities.”

5.1.1.3 Asynchronous, Imprecise ExceptionsThere are two asynchronous, imprecise exceptions—system reset and machine check.These two exceptions have the highest priority and can occur while other exceptions arebeing processed. Note that asynchronous, imprecise exceptions cannot be masked;therefore, if two of these exceptions occur in immediate succession, the state informationsaved by the first exception may be overwritten when the subsequent exception occurs.

These exceptions cannot be masked by using the MSR[EE] bit. A machine check exceptionoccurs if the machine check enable bit, MSR[ME], is set. If MSR[ME] is cleared, theprocessor goes directly into checkstop state (if the machine-check checkstop condition isproperly enabled). When an imprecise exception occurs, the following conditions exist atthe exception point:

• The integer instruction pipeline acts as the synchronizing mechanism for the three pipelines (branch, floating-point, and integer). When an asynchronous interrupt occurs, integer instructions that have entered or passed through the integer execute stage and the instructions tagged to them are allowed to complete, no other instructions are allowed to start execution and synchronization. The value in SRR0 is the address of the instruction, which would execute after the last instruction that was actually completed.

For more information on the tagging mechanism, refer to Section 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model.”

• SRR0 addresses either the instruction that would have completed or some instruction following it that would have completed if the exception had not occurred.

• An exception is generated such that all instructions preceding the instruction addressed by SRR0 appear to have completed with respect to the executing processor.

5.1.2 Exception PrioritiesThis section describes how exceptions are prioritized. Exceptions are roughly prioritized byexception class, as follows:

1. Asynchronous, imprecise exceptions have priority over all other exceptions. These exceptions do not wait for the completion of any precise exception handling.

2. Synchronous, precise exceptions are caused by instructions and are presented in strict program order.

3. Asynchronous, precise exceptions (external interrupt and decrementer exceptions) are delayed until higher priority exceptions are presented.


The exceptions are listed in Table 5-3 in order of highest to lowest priority.

5.1.3 Recognition of Exceptions The order in which exceptions are recognized is determined by program order and whetherthe exception is synchronous or asynchronous, precise or imprecise, and masked ornonmasked.

Synchronous, precise exceptions (that is, exceptions that are caused by instructions) arehandled in strict program order, even though instructions can execute and exceptions canbe detected out of order. Therefore, before the 601 processes an instruction-causedexception, it executes all instructions, and handles any resulting exceptions, that appearearlier in the instruction stream.

A single instruction may generate multiple exception conditions. Of these exceptions, the601 handles the exception it encounters first; software determines if execution ofinstructions is resumed after an exception.

If the exception is asynchronous and precise (namely an external interrupt or decrementerexception), the 601 synchronizes the pipeline by completing the execution of any

Table 5-3. Exception Priorities

Exception Class

Priority Exception

Asynchronous,imprecise

1 System reset—The system reset exception has the highest priority of all exceptions. If this exception exists, the exception mechanism ignores all other exceptions and generates a system reset exception. Instructions issued before the generation of a system reset exception cannot generate a nonmaskable exception.

2 Machine check—The machine check exception is the second-highest priority exception. If this exception occurs, the exception mechanism ignores all other exceptions (except reset) and generates a machine check exception. Instructions issued before the generation of a machine check exception cannot generate a nonmaskable exception.

Synchronous, precise

3 Instruction dependent— When an instruction causes an exception, the exception mechanism waits for any instructions prior to the excepting instruction in the instruction stream to execute. Any exceptions caused by these instructions are handled first. It then generates the appropriate exception if no higher priority exception exists.

Note that a single instruction can cause multiple exceptions. The ordering of such exceptions is described in 5.1.3, “Recognition of Exceptions.”

Asynchronous,precise

4 External interrupt—The external interrupt mechanism waits for instructions currently dispatched to complete execution. After all dispatched instructions are executed, and any exceptions caused by those instructions are handled, the exception mechanism generates this exception if no higher priority exception exists. This exception is delayed while MSR[EE] is cleared.

5 Decrementer—This exception is the lowest priority exception. When this exception is created, the exception mechanism waits for all other possible exceptions to be reported. It then generates this exception if no higher priority exception exists. This exception is delayed while MSR[EE] is cleared.


instruction in the execute stage and any undispatched instructions that appear earlier in theinstruction stream (including any exceptions they generate) before handling the externalinterrupt or decrementer exceptions.

5.1.3.1 Recognition of Asynchronous, Imprecise ExceptionsExceptions that are nonmasked, imprecise, and asynchronous (namely system reset ormachine check exceptions) may occur at any time. That is, these exceptions are not delayedif another exception is being handled. As a result, state information for the interruptedexception may be lost; therefore, these exceptions are typically nonrecoverable.

All other exceptions have lower priority than system reset and machine check exceptions.

5.1.3.2 Recognition of Precise ExceptionsOnly one precise exception can be reported at a time. (Note that PowerPC implementationsthat support imprecise-mode floating-point enabled exceptions allow those to be handled inthe same manner as described in this section.)

Figure 5-1 illustrates the ordering of precise exceptions. Note that this ordering is on a per-instruction basis. If a precise, asynchronous exception condition occurs while instruction-caused exceptions are being processed, its handling is delayed until all instruction-causedexceptions are handled.


Figure 5-1. Recognition of Precise Exception Conditions

5.2 Exception ProcessingWhen an exception is taken, the 601 uses the save/restore registers, SRR0 and SRR1, tosave the contents of the machine state register and to identify where instruction executionshould resume after the exception is handled. The machine status save/restore register 0(SRR0) is a 32-bit register that the 601 uses to save either the address of the instruction thatcauses the exception, the one that follows, or the next instruction that would have executedin the case of an asynchronous, imprecise exception. This address is used typically when aReturn from Interrupt (rfi) instruction is executed. The SRR0 is shown in Figure 5-2.

Instruction Access

Trap System Call

Data Access

FP Unavailable

Alignment

External Interrupt

Decrementer

Floating Point1

1Not all floating-point instructions can cause enabled exceptions.2If the MSR bits FE0 and FE1 are set such that precise mode floating-point enabled exceptions are enabled and the FPSCR[FEX] bit is set, a program exception results.

3Floating-point precise exceptions are taken only when either MSR[FE0] or MSR[FE1] are set.

Program Exception(Illegal/Privileged Instruction)

Integer

Precise Mode FP Enabled2

rfi or mtmsr

I/O Cont I/FError

FP Alignment

Data Access

Privileged Instruction

Run Mode and Trace3

Floating Point1



When an exception occurs, SRR0 is set to point to an instruction such that all priorinstructions have completed execution and no subsequent instruction has begun execution.The instruction addressed by SRR0 may not have completed execution, depending on theexception type. SRR0 addresses either the instruction causing the exception or theimmediately following instruction. The instruction addressed can be determined from theexception type bits.

The SRR1 is a 32-bit register used to save machine status on exceptions and to restoremachine status when rfi is executed. The SRR1 is shown in Figure 5-3.


In general, when an exception occurs, bits 0–15 of SRR1 are loaded with exception-specificinformation and bits 16–31 of the machine state register (MSR) are placed into bits 16–31of SRR1. The machine state register is shown in Figure 5-4.

.

Figure 5-4. Machine State Register (MSR)

Table 5-4 shows the bit definitions for the MSR.

Table 5-4. Machine State Register Bit Settings


0–15 — Reserved

16 EE External interrupt enable 0 The processor delays recognition of external interrupts and decrementer exception

conditions. 1 The processor is enabled to take an external interrupt or the decrementer exception.

17 PR Privilege level 0 The processor can execute both user and supervisor instructions.1 The processor can only execute user-level instructions.

SRR0 (Holds EA for resuming program execution)

0 31

0 15 16 31

Exception-Specific Data MSR[16–31]

0 15 16 17 18 19 20 2122 23 24 25 26 27 28 29 30 31

Reserved

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EE PR FP ME FE0 SE 0 FE1 0 EP IT DT 0 0 0 0


18 FP Floating-point available 0 The processor prevents dispatch of floating-point instructions, including floating-point

loads, stores, and moves. Floating-point enabled program exceptions can still occur and the FPRs can still be accessed.

1 The processor can execute floating-point instructions, and can take floating-point enabled exception type program exceptions.

19 ME Machine check enable 0 Machine check exceptions are disabled. If this bit is cleared, the 601 attempts to go into

checkstop mode. Note however, if is not enabled in HID0, the machine check exception is taken.

1 Machine check exceptions are enabled.

20 FE0 Floating-point exception mode 0 (See Table 5-5.)

21 SE Single-step trace enable0 The processor executes instructions normally. 1 The processor generates a trace exception upon the successful execution of the next

instruction. When this bit is set, the processor dispatches instructions in strict program order. Successful execution means the instruction caused no other exception. Single-step tracing may not be present on all implementations. If the function is not implemented, MSR[SE] should be treated as a reserved MSR bit: mfmsr may return the last value written to the bit, or may return 0 always.


23 FE1 Floating-point exception mode 1 (See Table 5-5.)

24 — Reserved. This bit corresponds to the AL bit of the POWER Architecture.

25 EP Exception prefix. The setting of this bit specifies whether an exception vector offset is prepended with Fs or 0s. In the following description, nnnnn is the offset of the exception. See Table 5-2.0 Exceptions are vectored to the physical address x'000n_nnnn'. 1 Exceptions are vectored to the physical address x'FFFn_nnnn'.

26 IT Instruction address translation 0 Instruction address translation is disabled. When instruction address translation is

disabled, EA is interpreted as described in Chapter 6, “Memory Management Unit.”1 Instruction address translation is enabled.

27 DT Data address translation 0 Data address translation is disabled. When data relocation is off, EA is interpreted as

described in Chapter 6, “Memory Management Unit.”1 Data address translation is enabled.




*These reserved bits may be used by other PowerPC processors. Attempting to change these bits does not affect the operation of the processor. These bit positions always return a zero value when read.

Table 5-4. Machine State Register Bit Settings (Continued)



The floating-point exception mode bits are interpreted as shown in Table 5-5. For furtherdetails see Section 5.4.7.1, “Floating-Point Enabled Program Exceptions.”

MSR bits 16–31 are guaranteed to be written to SRR1 when the first instruction of theexception handler is encountered.

The data address register (DAR) is a 32-bit register used by several exceptions (data access,I/O controller interface error, and alignment) to identify the address of a memory element.

5.2.1 Enabling and Disabling ExceptionsWhen a condition exists that causes an exception to be generated, it must be determinedwhether the exception is enabled for that condition.

• Floating-point enabled exceptions (a type of program exception) can be disabled by clearing both MSR[FE0] and MSR[FE1]. If either or both of these bits are set, all floating-point exceptions are taken and are precise and cause a program exception. Other PowerPC processors may support imprecise floating-point exceptions. Individual conditions that can generate floating-point exceptions can be enabled and disabled with bits in the FPSCR register.

• Asynchronous, precise exceptions are enabled by setting the MSR[EE] bit. When MSR[EE] = 0, recognition of these exception conditions is delayed. MSR[EE] is cleared automatically when an exception is taken to delay recognition of conditions causing those exceptions.

• A machine check exception can only occur if the machine check enable bit, MSR[ME], is set. If MSR[ME] is cleared, the processor goes directly into checkstop state when a machine-check exception condition occurs.

• The run mode exception, which is used to set an instruction breakpoint, can be enabled and disabled using bits 8 and 9 of HID1 (HID1[RM]). Note that this stops the processor and is used for debug purposes only.

• The data address breakpoint can be enabled and disabled using bits 30 and 31 of the DABR (HID5[SA]).

• System reset exceptions cannot be masked.

Table 5-5. Floating-Point Exception Mode Bits

FE0 FE1 Mode

0 0 Floating-point exceptions disabled

0 1 Floating-point imprecise nonrecoverable *

1 0 Floating-point imprecise recoverable*

1 1 Floating-point precise mode

* Not implemented in the 601


5.2.2 Steps for Exception Processing After it is determined that the exception can be taken (by confirming that any instruction-caused exceptions occurring earlier in the instruction stream have been handled, and byconfirming that the exception is enabled for the exception condition), the 601 does thefollowing:

1. The SRR0 is loaded with an instruction address that depends on the type of exception. See the individual exception description for details about how this register is used for specific exceptions.

2. Bits 0–15 of SRR1 are loaded with 16 bits of information specific to the exception type or all zeros.

3. Bits 16–31 of SRR1 are loaded with a copy of bits 16–31 of the MSR.

4. The MSR is set as described in Table 5-4. The new values take effect beginning with the fetching of the first instruction of the exception-handler routine located at the exception vector address.

Note that MSR[IT] and MSR[DT] are cleared for all exception types; therefore, address translation is disabled for both instruction fetches and data accesses beginning with the first instruction of the exception-handler routine.

5. Instruction fetch and execution resumes, using the new MSR value, at a location specific to the exception type. The location is determined by adding the exception's vector (see Table 5-2) to the base address determined by MSR[EP]. If EP is cleared, exceptions are vectored to the physical address x'000n_nnnn'. If EP is set, exceptions are vectored to the physical address x'FFFn_nnnn'. For a machine check exception that occurs when MSR[ME] = 0 (machine check exceptions are disabled), the checkstop state is entered (the machine stops executing instructions). See Section 5.4.2, “Machine Check Exception (x'00200').”

6. The lwarx and stwcx. instructions require special handling if a reservation is still set when an exception occurs. Exceptions clear reservations set with lwarx (or ldarx).

5.2.3 Returning from Supervisor ModeThe Return from Interrupt (rfi) instruction performs context synchronization by allowingpreviously issued instructions to complete before returning to the interrupted process.Execution of the rfi instruction ensures the following:

• All previous instructions have completed to a point where they can no longer cause an exception. If a prior instruction causes an I/O controller interface error exception, the results must be determined before this instruction is executed.

• Previous instructions complete execution in the context (privilege, protection, and address translation) under which they were issued.

• The instructions following this instruction execute in the context established by this instruction.


5.3 Process SwitchingThe operating system should execute the following when processes are switched:

• The sync instruction, to resolve any data dependencies between the processes and to synchronize the use of segment registers and SPRs. For an example showing use of the sync instruction, see Section 2.3.3.1, “Synchronization for Supervisor-Level SPRs and Segment Registers.”

• The isync instruction, to ensure that undispatched instructions not in the new process are not used by the new process

• The stwcx. instruction, to clear any outstanding reservations, which ensures that an lwarx instruction in the old process is not paired with an stwcx. in the new process.

Note that if an exception handler is used to emulate an instruction that is not implementedin the 601, the exception handler must report in SRR0 (and in the data address register[DAR] if applicable) the EA computed by the instruction being emulated and not one usedto emulate the instruction being emulated. For example, the Move to Time Base instruction(mttb) was created for compatibility between the 601 and other PowerPC processors. The601 implements a real-time clock (RTC) rather than the time base (TB), and when thisinstruction is encountered by the 601 an illegal instruction program exception is taken andthe operation is performed by the 601’s mtspr instruction that accesses the appropriateRTC register. When the exception caused by the mttb instruction is taken, the EA reportedshould be that computed by the mttb instruction and not the mtspr instruction used in theexception handler.

5.4 Exception DefinitionsTable 5-6 shows all the types of exceptions that can occur with the 601 and the MSR bitsettings when the processor transitions to supervisor mode. The state of these bits prior tothe exception is typically stored in SRR1.

Table 5-6. MSR Setting Due to Exception

Exception Type

MSR Bit

EE16

PR17

FP18

ME19

FE020

SE21

FE123

EP25

IT26

DT27

Soft reset 0 0 0 — 0 0 0 — 0 0

Machine check 0 0 0 0 0 0 0 — 0 0

Data access 0 0 0 — 0 0 0 — 0 0

Instruction access 0 0 0 — 0 0 0 — 0 0

External 0 0 0 — 0 0 0 — 0 0

Alignment 0 0 0 — 0 0 0 — 0 0

Program 0 0 0 — 0 0 0 — 0 0


The setting of the exception prefix (EP) bit in the MSR determines how exceptions arevectored. If the bit is cleared, exceptions are vectored to the physical address x'000n_nnnn'(where nnnnn is the vector offset); if EP is set, exceptions are vectored to the physicaladdress x'FFFn_nnnn'. Table 5-2 shows the exception vector offset of the first instructionof the exception handler routine for each exception type.

5.4.1 Reset Exceptions (x'00100')The system reset exception is a nonmaskable, asynchronous exception signaled to the 601either through the assertion of either of the reset signals (SRESET or HRESET). Theassertion of the soft reset signal, SRESET, as described in Section 8.2.9.4.2, “Soft Reset(SRESET)—Input” causes the soft reset exception to be taken and the physical baseaddress of the handler is determined by the MSR[EP] bit. The assertion of the hard resetsignal, HRESET, as described in Section 8.2.9.4.1, “Hard Reset (HRESET)—Input” causesthe hard reset exception to be taken and the physical address of the handler is always x'FFF00100'.

5.4.1.1 Soft ResetWhen a soft reset exception occurs, registers are set as shown in Table 5-7.


0 0 0 — 0 0 0 — 0 0

Decrementer 0 0 0 — 0 0 0 — 0 0

System call 0 0 0 — 0 0 0 — 0 0


0 0 0 — 0 0 0 — 0 0

I/O controller interface error exception

0 0 0 — 0 0 0 — 0 0

0 Bit is cleared1 Bit is set— Bit is not alteredReserved bits are read as if written as 0.

Table 5-6. MSR Setting Due to Exception (Continued)

Exception Type

MSR Bit

EE16

PR17

FP18

ME19

FE020

SE21

FE123

EP25

IT26

DT27


When a soft reset exception is taken, instruction execution resumes at offset x'00100' fromthe physical base address indicated by MSR[EP].

Before returning to the main program, the exception handler should do the following:

1. SRR0 and SRR1 should be given the values used by the rfi instruction. 2. Execute rfi.

It is not guaranteed that execution is recoverable. Other registers and the MSR are not resetby hardware.

5.4.1.2 Hard ResetThis section describes the 601’s reset state after performing a hard reset operation (assertingHRESET as described in Section 8.2.9.4.1, “Hard Reset (HRESET)—Input”). Note that ahard reset operation should be performed on power-on to appropriately reset the processor.Table 5-8 shows the state of the machine just before it fetches the first instruction after ahard reset. Because of the setting of the MSR[EP] bit caused by a hard reset, the firstinstruction is fetched from address x'FFF0 0100'.

Table 5-7. Soft Reset Exception—Register Settings

Register Setting Description

SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present.

SRR1 0–15 Cleared16–31 Loaded from bits 16–31 of the MSR. Note that if the processor state is corrupted to the extent

that execution cannot be reliably restarted, SRR1[30] is cleared.

MSR EE 0 SE 0PR 0 FE1 0FP 0 EP —ME — IT 0FE0 0 DT 0

Table 5-8. Settings Caused by Hard Reset

Register Setting

GPRs All 0s

FPRs All 0s

FPSCR 00000000

CR All 0s

SRs All 0s

MSR 00001040

MQ 00000000

XER 00000000


1 Early releases (DD1) of the 601 hardware set this to x'00010000'. Other versions of silicon may be different (see Section 2.3.3.11, “Processor Version Register (PVR)” for setting information).

2 Master checkstop enabled; internal power-on reset checkstops enabled.

3 Note that if external clock is connected to RTC for the 601, then the RTCL, RTCU, and DEC registers can change from their initial value of 0s without receiving instructions to load those registers.

RTCU 00000000

RTCL 000000003

LR 00000000

CTR 00000000

DSISR 00000000

DAR 00000000

DEC 00000000

SDR1 00000000

SRR0 00000000

SRR1 00000000

SPRGs 00000000

EAR 00000000

PVR 000100011

BATs All 0s

HID0 800100802

HID1 00000000

HID2 00000000

HID5 00000000

HID15 00000000

TLBs All 0s

Cache All 0s

Tag directory All 0s. (However, LRU bits are initialized so each side of the cache has a unique LRU value.)

Table 5-8. Settings Caused by Hard Reset (Continued)

Register Setting


The following is also true after a hard reset operation:

• External checkstops are enabled.

• The on-chip test interface has given control of the I/Os to the rest of the chip for functional use.

• Since the reset exception has data and instruction translation disabled (MSR[DT] and MSR[IT] both cleared), the chip operates in direct address translation mode. This implies that instruction fetches as well as loads and stores are cacheable. (Operations that correspond to direct address translations are implicitly cacheable, not write-through mode, and require coherency checking on the bus.)

• All internal arrays and registers are cleared during the hard reset process.

5.4.2 Machine Check Exception (x'00200')The 601 conditionally initiates a machine-check exception after detecting the assertion ofthe TEA signal on the 601 interface. The assertion of the TEA signal indicates that a buserror occurred and the system terminates the current transaction. One clock cycle after TEAis asserted, the data bus signals go to the high-impedance state; however, data entering theGPR or the cache is not invalidated.

If the MSR[ME] bit is set, the exception is recognized and handled; otherwise, the 601attempts to enter an internal checkstop condition. This may not lead to a checkstopdepending upon the state of the various checkstop enable control bits in the HID0 register.These are shown in Table 5-9. If the ME bit is cleared, and the HID0[CE] and HID0[EM]bits are cleared (that is, when both the master checkstop and the machine check checkstopsare disabled), the machine check exception is taken.

The ability to disable the machine-check checkstop is useful for debugging. The HID0register is described in Section 2.3.3.13.1, “Checkstop Sources and Enables Register—HID0.”

In general, it is expected that the TEA signal would be used by a memory controller toindicate a memory parity error or an uncorrectable memory ECC error. Note that theresulting machine check exception is imprecise and has priority over any exceptions causedby the instruction that generated the bus operation.

Machine check exceptions are enabled when MSR[ME] = 1; this is described inSection 5.4.2.1, “Machine Check Exception Enabled (MSR[ME] = 1).” If MSR[ME] = 0and a machine check occurs, the processor enters the checkstop state. Checkstop state isdescribed in 5.4.2.2, “Checkstop State (MSR[ME] = 0).”


5.4.2.1 Machine Check Exception Enabled (MSR[ME] = 1)When a machine check exception is taken, registers are updated as shown in Table 5-9.

When a machine check exception is taken, instruction execution resumes at offset x'00200'from the physical base address indicated by MSR[EP].

Before returning to the main program, the exception handler should do the following:

1. SRR0 and SRR1 should be given the values to be used by the rfi instruction. 2. Execute rfi.

5.4.2.2 Checkstop State (MSR[ME] = 0)When a processor is in the checkstop state, instruction processing is suspended andgenerally cannot be restarted without resetting the processor. The contents of all latches arefrozen within two cycles upon entering checkstop state so that the state of the processor canbe analyzed as an aid in problem determination.

A machine check exception may result from referring to a nonexistent physical address. Insome implementations, for example, execution of a Data Cache Block Set to Zero (dcbz)instruction that introduces a block into the cache associated with a nonexistent physicaladdress may delay the machine check exception until an attempt is made to store that blockto main memory.

Note that not all PowerPC processors provide the same level of error checking. The reasonsa processor can enter checkstop state is implementation-dependent.

Table 5-9. Machine Check Exception—Register Settings


SRR0 Set to the address of the next instruction that would have been executed in the interrupted instruction stream. Neither this instruction nor any others beyond it will have been executed. All preceding instructions will have been completed.

SRR1 0–15 Cleared16–31 Loaded from MSR[16–31]. Note that if the processor state is corrupted to the extent that

execution cannot be reliably restarted, SRR1[30] is cleared.

MSR EE 0PR 0FP 0ME 0Note that when a machine check exception is taken, the exception handler should set MSR[ME] as soon as it is practical to handle another TEA assertion. Otherwise, subsequent TEA assertions cause the processor to automatically enter the checkstop state. FE0 0SE 0FE1 0EP Value is not alteredIT 0DT 0


5.4.3 Data Access Exception (x'00300')A data access exception occurs when no higher priority exception exists and a data memoryaccess cannot be performed. The condition that caused the data access exception can bedetermined by reading the DAE/source instruction service register (DSISR), a supervisor-level SPR (SPR18) that can be read by using the mfspr instruction. Bit settings areprovided in Table 5-10. Table 5-10 also indicates which memory element is saved to theDAR. Data access exceptions can occur for any of the following reasons:

• The effective address cannot be translated. That is, there is a page fault for this portion of the translation, so a data access exception must be taken to retrieve the translation from a storage device such as a hard disk drive.

• The instruction is not supported for the type of memory addressed. Invalid instructions are described in Section D.1.1.1, “Invalid Instruction Forms.” I/O controller interface segments are described in Section 9.6, “Memory- vs. I/O-Mapped I/O Operations.”

• The access violates memory protection. Access is not permitted by the key (Ks and Ku) and PP bits, which are set in the segment register and PTE for page protection and in the BATs for block protection.

• The execution of an eciwx or ecowx instruction is disallowed because the external access register enable bit (EAR[E]) is cleared.

These scenarios are common among all PowerPC processors. The following additionalscenarios can cause a data access exception in the 601:

• An lwarx, stwcx., or lscbx instruction refers to a non–memory-forced I/O controller interface segment (that is, when SR[T] = 1 and BUID ≠ x'07F').

• An effective address matches the address in the data-address breakpoint register (DABR) while in one of the appropriate compare modes. For additional information on the DABR and the compare modes, refer to Section 2.3.3.13.4, “Data Address Breakpoint Register (DABR)—HID5.”

Data access exceptions can be generated by load/store instructions, and the cache controlinstructions (dcbi, dcbz, dcbst, and dcbf).

Although the 601 does not generally support memory accesses that cross a page boundary,load or store multiple as well as load or store string instructions that are word-aligned andcross a page boundary are handled. In these cases, if the second page has a translation erroror protection violation associated with it, the 601 takes the data access exception in themiddle of the instruction. In this case, the data address register (DAR) always points to thefirst byte address of the offending page.


If an stwcx. instruction has an effective address for which a normal store operation wouldcause a data access exception but the processor does not have the reservation from lwarx,the 601 determines whether a data access exception occurs as follows:

• If the reservation bit is cleared before the stwcx. instruction executes, that instruction cannot generate an exception, regardless of whether the address translation would have failed, page protection would have been violated, or the address matches one in the DABR.

• A data access exception is taken if there is an address translation or page protection error, or if the address hits in the DABR as long as the reservation bit is set when the stwcx. instruction begins execution. In particular, the exception is taken even if the reservation bit is cleared after execution begins.

If the XER indicates that the byte count for an lswi, stswi or lscbx instruction is zero, a dataaccess exception does not occur, regardless of the effective address.

The condition that caused the exception is defined in the DSISR. These conditions also usethe data address register (DAR) as shown in Table 5-10.

Table 5-10. Data Access Exception—Register Settings


SRR0 Set to the effective address of the instruction that caused the exception.

SRR1 0–15 Cleared16–31 Loaded from bits 16–31 of the MSR

MSR EE 0 PR 0FP 0 ME Value is not alteredFE0 0 SE 0FE1 0 EP Value is not alteredIT 0 DT 0

DSISR 0 Reserved on the 601. The PowerPC architecture uses this bit for I/O controller interface error exceptions, which are vectored to x'00A00' on the 601.

1 Set if the translation of an attempted access is not found in the primary hash table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a BAT register; otherwise cleared.

2–3 Cleared4 Set if a memory access is not permitted by the page or BAT protection mechanism; otherwise

cleared. 5 Set if the eciwx, ecowx, lwarx, stwcx., or lscbx instruction is attempted to I/O controller

interface space, or if the lwarx or stwcx. instruction is used with addresses that are marked as write-through.

6 Set for a store operation and cleared for a load operation. 7–8 Cleared9 Set if an EA matches the address in the DABR while in one of the three compare modes. 10 Cleared.11 Set if the instruction was an eciwx or ecowx and EAR[E] = 0. 12–31 Cleared


When a data access exception is taken, instruction execution resumes at offset x'00300'from the physical base address indicated by MSR[EP].

The architecture permits certain instructions to be partially executed when they cause a dataaccess exception. These are as follows:

• Load multiple or load string instructions—Some registers in the range of registers to be loaded may have been loaded. On the 601, all of the first page is accessed and none of the second page is accessed.

• Store multiple or store string instructions—Some bytes of memory in the range addressed may have been updated. On the 601, all of the first page is accessed and none of the second page is accessed.

In the cases above, the questions of how many registers and how much memory is alteredare instruction- and boundary-dependent. However, memory protection is not violated.Furthermore, if some of the data accessed is in memory-forced I/O controller interfacespace (SR[T] = 1) and BUID = x'7F', and the instruction is not supported for I/O controllerinterface accesses, the locations in I/O controller interface space are not accessed.

For update forms, the update register (rA) is not altered.

5.4.4 Instruction Access Exception (x'00400')An instruction access exception occurs when no higher priority exception exists and anattempt to fetch the next instruction to be executed cannot be performed for any of thefollowing reasons:

• The effective address cannot be translated. That is, there is a page fault for this portion of the translation, so an instruction access exception must be taken to retrieve the translation from a storage device such as a hard disk drive.

• The fetch access is to an I/O controller interface segment that is not memory-forced.

• The fetch access violates memory protection. Access is not permitted by the key bits (Ks and Ku) and PP bits, which are set in the segment register and PTE for page protection and in the BATs for block protection.

An instruction fetch to an I/O controller interface segment while MSR[IT] is set causes aninstruction access exception on the 601. Register settings for instruction access exceptionsare shown in Table 5-11.

DAR Set to the effective address of a memory element as described in the following list:• A byte in the first word accessed in the page that caused the data access exception, for a byte, half

word, or word memory access.• A byte in the first double word accessed in the page that caused the data access exception, for a

double-word memory access.

Table 5-10. Data Access Exception—Register Settings (Continued)



When an instruction access exception is taken, instruction execution resumes at offsetx'00400' from the physical base address indicated by MSR[EP].

5.4.5 External Interrupt (x'00500')An external interrupt is signaled to the 601 by the assertion of the INT signal as describedin Section 8.2.9.1, “Interrupt (INT)—Input.” The interrupt may be delayed by other higherpriority exceptions or if the MSR[EE] bit is cleared when the exception occurs.

After the INT is detected, the 601 stops dispatching instructions and waits for executinginstructions to complete. Therefore, exceptions caused by instructions in progress are takenbefore the external interrupt exception is taken. After all instructions complete, the 601takes the external interrupt exception.

The register settings for the external interrupt exception are shown in Table 5-12.

Table 5-11. Instruction Access Exception—Register Settings

Register Setting

SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present (if the exception occurs on attempting to fetch a branch target, SRR0 is set to the branch target address).

SRR1 0 Cleared1 Set if the translation of an attempted access is not found in the primary hash table entry group

(HTEG), or in the rehashed secondary HTEG, or in the range of a BAT register; otherwise cleared.

2 Cleared3 Cleared. Note that the PowerPC architecture defines this as set if the fetch access was to an I/O

controller interface segment (SR[T]=1). Note that this condition causes SRR1[0–15] to be cleared in the 601.

4 Set if a memory access is not permitted by the page or BAT protection mechanism, described in Chapter 6, “Memory Management Unit”; otherwise cleared.

5–9 Cleared 10 Set if the page table search fails to find a translation for the effective address; otherwise cleared. 11–15 Cleared 16–31 Loaded from bits 16–31 of the MSR

MSR EE 0 SE 0PR 0 FE1 0FP 0 EP Value is not alteredME Value is not altered IT 0FE0 0 DT 0


When an external interrupt exception is taken, instruction execution resumes at offsetx'00500' from the physical base address indicated by MSR[EP].

In early versions of the 601 (processor revision level x'0000'), the external interrupt is alevel-sensitive signal and should be held active until reset by the interrupt service routine.Phantom interrupts due to phenomena such as crosstalk and bus noise should be avoided.

The INT signal is used to signal an external interrupt to the 601. The 601 latches theinterrupt condition if the MSR[EE] bit is set; and it ignores the interrupt condition if theMSR[EE] bit is cleared. To guarantee that the external interrupt is taken, the INT signalmust be held active until the 601 takes the interrupt. If the INT signal is negated before theinterrupt is taken, the 601 is not guaranteed to take an external interrupt, depending onwhether the MSR[EE] bit was set while INT signal was held active. To clear the interrupt,the interrupt handler must send a command to the device that signaled the interrupt.

5.4.6 Alignment Exception (x'00600')This section describes conditions that can cause alignment exceptions in the 601. Similarto data access exceptions, alignment exceptions use the SRR0 and SRR1 to save themachine state and the DSISR to determine the source of the exception.

The register settings for alignment exceptions are shown in Table 5-13.

Table 5-12. External Interrupt—Register Settings


SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no interrupt conditions were present.



Table 5-13. Alignment Exception—Register Settings



SRR1 0–15 Cleared 16–31 Loaded from bits 16–31 of the MSR



5.4.6.1 Integer Alignment Exceptions The 601 is optimized for load and store operations that are aligned on natural boundaries.Operations that are not naturally aligned may suffer performance degradation, dependingon the type of operation, the boundaries crossed, and the mode that the processor is induring execution. More specifically, these operations may either cause an alignmentexception or they may cause the processor to break the memory access into multiple,smaller accesses with respect to the cache and the memory subsystem.

The 601 can initiate alignment exception for the accesses as shown in Table 5-14. In all ofthese cases, the appropriate range check is performed before the instruction begins

DSISR 0–11 Cleared12–13 Cleared. (Note that these bits can be set by several 64-bit PowerPC

instructions that are not supported in the 601.) 14 Cleared15–16 For instructions that use register indirect with index addressing—set

to bits 29–30 of the instruction.For instructions that use register indirect with immediate index addressing—cleared.

17 For instructions that use register indirect with index addressing—set to bit 25 of the instruction.For instructions that use register indirect with immediate index addressing— Set to bit 5 of the instruction

18–21 For instructions that use register indirect with index addressing—set to bits 21–24 of the instruction.For instructions that use register indirect with immediate index addressing—set to bits 1–4 of the instruction.

22–26 Set to bits 6–10 (source or destination) of the instruction. Undefined for dcbz.

27–31 Set to bits 11–15 of the instruction (rA) Set to either bits 11–15 of the instruction or to any register number not in the range of registers loaded by a valid form instruction, for lmw, lswi, and lswx instructions. Otherwise undefined.

Note that for load or store instructions that use register indirect with index addressing, the DSISR can be set to the same value that would have resulted if the corresponding instruction uses register indirect with immediate index addressing had caused the exception. Similarly, for load or store instructions that use register indirect with immediate index addressing, DSISR can hold a value that would have resulted from an instruction that uses register indirect with index addressing. For example, an unaligned lwax instruction that crosses a protection boundary would normally cause the DSISR to be set to the following binary value:

000000000000 00 0 01 0 0101 ttttt ?????The value ttttt refers to the destination and ????? indicates undefined bits. However, this register may be set as if the instruction were lwa, as follows:

000000000000 10 0 00 0 1101 ttttt ?????If there is no corresponding instruction, no alternative value can be specified.

DAR Set to the EA of the data access as computed by the instruction causing the alignment exception.

Table 5-13. Alignment Exception—Register Settings (Continued)



execution. As a result, if an alignment exception is taken, it is guaranteed that no portion ofthe instruction has been executed.

5.4.6.1.1 Direct-Translation Access A direct-translation access occurs when both MSR[DT] and SR[T] are cleared. If a256-Mbyte boundary is crossed by any portion of the memory being accessed by aninstruction (including string/multiples), an alignment exception is taken.

5.4.6.1.2 I/O Controller Interface Access The 601 supports memory-mapped I/O with the high-performance memory bus protocol.Additionally, the 601 has an I/O controller interface for compatibility with certain externaldevices that implement this protocol. An I/O controller interface access occurs when a dataaccess is initiated, SR[T] is set, and SR[BUID] is not equal to x'07F'. In the 601 (but not forthe general PowerPC processor case), MSR[DT] is a don't care for this case. The followingapply for I/O controller interface accesses:

• If a 256-Mbyte boundary will be crossed by any portion of the I/O controller interface space accessed by an instruction (the entire string for strings/multiples), an alignment exception is taken.

• Floating-point loads and stores to I/O controller interface segments always cause an alignment exception, regardless of operand alignment.

Note that other I/O controller interface errors may generate an I/O controller interface errorexception, as described in Section 5.4.10, “I/O Controller Interface Error Exception(x'00A00').”

5.4.6.1.3 Memory-Forced I/O Controller Interface Access A memory-forced I/O controller interface access occurs when SR[T] is set, and SR[BUID]is x'07F' in the 601 (not defined as part of the PowerPC architecture). MSR[DT] is a don'tcare for this case.

If a 256-Mbyte boundary is crossed by any portion of the memory being accessed by aninstruction (including string/multiples), an alignment exception is taken.

Note that floating-point instructions and lwarx, stwcx., and lscbx instructions are handledas the page- and block-address translation cases for the memory-forced I/O controllerinterface segments.

Table 5-14. Access Types

MSR[DT] SR[T] SR[BUID] Access Type

0 0 x Direct translation access

x 1 Not x'07F' I/O controller interface access

x 1 x'07F' Memory-forced I/O controller interface access

1 0 x Page-address translation access


5.4.6.1.4 Page Address Translation Access A page-address translation access occurs when MSR[DT] is set, SR[T] is cleared, and thereis not a BAT array match. Note the following points:

• The following is true for all loads and stores except strings/multiples:

— An alignment exception is taken if the operand spans a 4-Kbyte boundary.

— Byte operands never cause an alignment exception.

— Half-word operands cause an alignment exception if the EA ends in x'FFF'.

— Word operands cause an alignment exception if the EA ends in x'FFD–FFF'.

— Double-word operands cause an alignment exception if the EA ends in x'FF9–FFF'.

• The lscbx instruction causes an alignment exception if any portion of the entire string crosses into the next 4-Kbyte page of memory. This is taken regardless of the starting address, even if the lscbx operand starts on a word boundary.

• All other string/multiple instructions (except lscbx) take alignment exceptions as follows:

— If the string/multiple starts on a word boundary and a 256-Mbyte boundary is crossed by any portion of the entire string/multiple, an alignment exception is taken. Note that it must be a 256-Mbyte crossing—a simple 4-Kbyte crossing does not cause an exception for a word-aligned string/multiple operation.

— If any portion of the string/multiple will cross into the next 4-Kbyte page of memory, an alignment exception is taken.

Note that on other PowerPC implementations, load and store multiple instructions that are not on a word boundary either take an alignment exception or generate results that are boundedly undefined.

• The dcbz instruction causes an alignment exception if the access is to a page or block with the W (write-through) or I (cache-inhibit) bit set in the UTLB or BAT array entry, respectively.

Note that the above summary indicates that a 256-Mbyte crossing always causes analignment exception. This includes accesses of all four types regardless of alignment. Ofcourse, non-string/multiple load and store operations can only cross this boundary if theyare not aligned.

Misaligned memory accesses that do not cause an alignment exception may not perform aswell as an aligned access of the same type. In general, the IU is designed to efficientlyhandle memory access quantities of eight bytes or fewer that lie within a double-wordboundary. Internally, all integer memory access instructions that involve more than fourbytes of data are broken into multiple access of four bytes or fewer. Floating-point memoryaccess instructions always involve either four or eight bytes of data. Any memory accessthat crosses a double-word boundary is further broken into two smaller accesses that do not


cross the double-word boundary. For multiple-word and string operations, the 601 does notforce alignment to reduce the number of accesses.

The resulting performance degradation due to misaligned accesses depends on how welleach individual access behaves with respect to the memory hierarchy. At a minimum,additional cache access cycles are required that can delay other processor resources fromusing the cache. More dramatically, for an access to a noncacheable page, each discreteaccess involves individual 601 bus operations that reduce the effective bandwidth of thatbus.

Finally, note that when the 601 is in page address translation mode, there is no specialhandling for accesses that fall into BAT regions. If one of the 4-Kbyte crossing conditionsindicated above happens to be completely contained within a BAT register, the 601 stilltakes the alignment exception.

5.4.6.2 Floating-Point Alignment Exceptions An alignment exception occurs when no higher priority exception exists and the 601 cannotperform a memory access for one of the following reasons:

• The operand of a floating-point load or store operation is in a non–memory-forced I/O controller interface segment (SR[T] = 1).

• The operand of a load or store crosses a 4-Kbyte boundary if MSR[DT] is set or if the operand crosses a 256-Mbyte boundary if MSR[DT] is cleared or set.

5.4.6.3 Little-Endian Mode Alignment ExceptionsIn little-endian mode, any operand that is not properly aligned (as described inSection 2.4.3, “Byte and Bit Ordering”), causes an alignment exception. Additionally, anyattempted execution of the string/multiple instructions causes an alignment exception.

5.4.6.4 Interpretation of the DSISR as Set by an Alignment ExceptionFor most alignment exceptions, an exception handler may be designed to emulate theinstruction that causes the exception.

In order for emulation to occur, it needs the following characteristics of the instruction:

• Load or store• Length (half word, word, or double word)• String, multiple, or normal load/store• Integer or floating-point• Whether the instruction performs update • Whether the instruction performs byte reversal• Whether it is a dcbz instruction

The PowerPC architecture provides this information implicitly, by setting opcode bits in theDSISR that identify the excepting instruction type. The exception handler does not need to


load the excepting instruction from memory. The mapping for all exception possibilities isunique except for the few exceptions discussed below.

Table 5-15 shows the inverse mapping—how the DSISR bits identify the instruction thatcaused the exception.

The alignment exception handler cannot distinguish a floating-point load or store thatcauses an exception because it is misaligned, or because it addresses the I/O controllerinterface space. However, this does not matter; in either case it is emulated with integerinstructions.

Table 5-15. DSISR(15–21) Settings to Determine Misaligned Instruction

DSISR[15–21] Instruction

00 0 0000 lwarx, lwz, reserved 1

00 0 0010 stw

00 0 0100 lhz

00 0 0101 lha

00 0 0110 sth

00 0 0111 lmw

00 0 1000 lfs

00 0 1001 lfd

00 0 1010 stfs

00 0 1011 stfd

00 1 0000 lwzu

00 1 0010 stwu

00 1 0100 lhzu

00 1 0101 lhau

00 1 0110 sthu

00 1 0111 stmw

00 1 1000 lfsu

00 1 1001 lfdu

00 1 1010 stfsu

00 1 1011 stfdu

01 0 0101 lwax

01 0 1000 lswx

01 0 1001 lswi

01 0 1010 stswx

01 0 1011 stswi


1 The instructions lwz and lwarx give the same DSISR bits (all zero). But if lwarx causes an alignment exception, it is an invalid form, so it need not be emulated in any precise way. It is adequate for the alignment exception handler to simply emulate the instruction as if it were an lwz. It is important that the emulator use the address in the DAR, rather than computing it from rA/rB/D, because lwz and lwarx use different addressing modes.

01 1 0101 lwaux

10 0 0010 stwcx.

10 0 1000 lwbrx

10 0 1010 stwbrx

10 0 1100 lhbrx

10 0 1110 sthbrx

10 1 1111 dcbz

11 0 0000 lwzx

11 0 0010 stwx

11 0 0100 lhzx

11 0 0101 lhax

11 0 0110 sthx

11 0 1000 lfsx

11 0 1001 lfdx

11 0 1010 stfsx

11 0 1011 stfdx

11 1 0000 lwzux

11 1 0010 stwux

11 1 0100 lhzux

11 1 0101 lhaux

11 1 0110 sthux

11 1 1000 lfsux

11 1 1001 lfdux

11 1 1010 stfsux

11 1 1011 stfdux

Table 5-15. DSISR(15–21) Settings to Determine Misaligned Instruction (Continued)


5.4.7 Program Exception (x'00700')A program exception occurs when no higher priority exception exists and one or more ofthe following exception conditions, which correspond to bit settings in SRR1, occur duringexecution of an instruction:

• System floating-point enabled exception—A system floating-point enabled exception is generated when the following condition is met:

(MSR[FE0] | MSR[FE1]) & FPSCR[FEX] is 1.

FPSCR[FEX] is set by the execution of a floating-point instruction that causes an enabled exception or by the execution of a “move to FPSCR” type instruction that sets an exception bit when its corresponding enable bit is set. In the 601, all floating-point enabled exceptions are handled in a precise manner. As a result, all program exceptions taken on behalf of a floating-point enabled exception clear SRR1[15] to indicate that the address in SRR0 points to the instruction that caused the exception. For more information, refer to Section 5.4.7.1, “Floating-Point Enabled Program Exceptions.”

• Illegal instruction—An illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode fields (these include PowerPC instructions not implemented in the 601), or when execution of an optional instruction not provided in the 601 is attempted.

• Privileged instruction—A privileged instruction type program exception is generated when the execution of a supervisor instruction is attempted and the MSR register user privileged bit, MSR[PR], is set. Some implementations may generate this exception for mtspr or mfspr with an invalid SPR field if SPR[0] = 1 and MSR[PR] = 1.

• Trap—A trap type program exception is generated when any of the conditions specified in a trap instruction is met. Trap instructions are described in Chapter 3, “Addressing Modes and Instruction Set Summary.”

Note that instructions using an invalid instruction form do not take a program exception,but instead cause results that are boundedly undefined.

The register settings are shown in Table 5-16.


When a program exception is taken, instruction execution resumes at offset x'00700' fromthe physical base address indicated by MSR[EP].

5.4.7.1 Floating-Point Enabled Program ExceptionsIn the 601, floating-point exceptions are signaled by condition bits set in the floating-pointstatus and control register (FPSCR). They can cause the system floating-point enabledexception error handler to be invoked. All floating-point exceptions are handled precisely.The FPSCR is shown in Figure 5-5.

Figure 5-5. Floating-Point Status and Control Register (FPSCR)

Table 5-16. Program Exception—Register Settings


SRR0 Contains the effective address of the excepting instruction

SRR1 0–10 Cleared11 Set for a floating-point enabled program exception; otherwise cleared.12 Set for an illegal instruction program exception; otherwise cleared. 13 Set for a privileged instruction program exception; otherwise cleared.14 Set for a trap program exception; otherwise cleared.15 Cleared if SRR0 contains the address of the instruction causing the exception, and set if SRR0

contains the address of a subsequent instruction. 16–31 Loaded from bits 16–31 of the MSR. Note that only one of bits 11–14 can be set.

MSR EE 0 PR 0FP 0 ME Value is not alteredFE0 0 SE 0FE1 0 EP Value is not alteredIT 0 DT 0

FPSCR

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 19 20 21 22 23 24 25 26 27 28 29 30 31

FX FEXVX OX UX ZX XX FR FI FPRF VE OE UE ZE XE RN

VXIDI

VXISI

VXSNAN

VXZDZ

VXIMZ

VXVC

Reserved

VXSOFT

VXSQRT

VXCVI

A listing of FPSCR bit settings is shown in Table 5-17.

Table 5-17. FPSCR Bit Settings

Bit(s) Description

0 Floating-point exception summary (FX). Every floating-point instruction implicitly sets FPSCR[FX] if that instruction causes any of the floating-point exception bits in the FPSCR to transition from 0 to 1. The mcrfs instruction implicitly clears FPSCR[FX] if the FPSCR field containing FPSCR[FX] is copied. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions can set or clear FPSCR[FX] explicitly. This is a sticky bit.

1 Floating-point enabled exception summary (FEX). This bit signals the occurrence of any of the enabled exception conditions. It is the logical OR of all the floating-point exception bits masked with their respective enables. The mcrfs instruction implicitly clears FPSCR[FEX] if the result of the logical OR described above becomes zero. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot set or clear FPSCR[FEX] explicitly. This is not a sticky bit.

2 Floating-point invalid operation exception summary (VX). This bit signals the occurrence of any invalid operation exception. It is the logical OR of all of the invalid operation exceptions. The mcrfs implicitly clears FPSCR[VX] if the result of the logical OR described above becomes zero. The mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot set or clear FPSCR[VX] explicitly. This is not a sticky bit.

3 Floating-point overflow exception (OX). This is a sticky bit.

4 Floating-point underflow exception (UX). This is a sticky bit.

5 Floating-point zero divide exception (ZX). This is a sticky bit.

6 Floating-point inexact exception (XX). This is a sticky bit.

7 Floating-point invalid operation exception for SNaN (VXSNAN). This is a sticky bit.

8 Floating-point invalid operation exception for ∞-∞ (VXISI). This is a sticky bit.

9 Floating-point invalid operation exception for ∞/∞ (VXIDI). This is a sticky bit.

10 Floating-point invalid operation exception for 0/0 (VXZDZ). This is a sticky bit.

11 Floating-point invalid operation exception for ∞*0 (VXIMZ). This is a sticky bit.

12 Floating-point invalid operation exception for invalid compare (VXVC). This is a sticky bit.

13 Floating-point fraction rounded (FR). The last floating-point instruction that potentially rounded the intermediate result incremented the fraction.

14 Floating-point fraction inexact (FI). The last floating-point instruction that potentially rounded the intermediate result produced an inexact fraction or a disabled exponent overflow.

15–19 Floating-point result flags (FPRF). This field is based on the value placed into the target register even if that value is undefined. Refer to Table 2-2 for specific bit settings.15 Floating-point result class descriptor (C). Floating-point instructions other than the compare

instructions may set this bit with the FPCC bits, to indicate the class of the result.16–19 Floating-point condition code (FPCC). Floating-point compare instructions always set one of

the FPCC bits to one and the other three FPCC bits to zero. Other floating-point instructions may set the FPCC bits with the C bit, to indicate the class of the result. Note that in this case the high-order three bits of the FPCC retain their relational significance indicating that the value is less than, greater than, or equal to zero.16 Floating-point less than or negative (FL or <)17 Floating-point greater than or positive (FG or >)18 Floating-point equal or zero (FE or =)19 Floating-point unordered or NaN (FU or ?)

20 Reserved


The following conditions that can cause program exceptions are detected by the processor.These conditions may occur during execution of floating-point arithmetic instructions. Thecorresponding bits set in the FPSCR are indicated in parentheses.

• Invalid floating-point operation exception condition (VX)

— SNaN condition (VXSNAN)— Infinity–infinity condition (VXISI)— Infinity/infinity condition (VXIDI)— Zero/zero condition (VXZDZ)— Infinity*zero condition (VXIMZ)— Illegal compare condition (VXVC)

These exception conditions are described in Section 5.4.7.2, “Invalid Operation Exception Conditions.”

• Software request condition (VXSOFT). These exception conditions are described in Section 5.4.7.2, “Invalid Operation Exception Conditions.”

• Illegal integer convert condition (VXCVI). These exception conditions are described in Section 5.4.7.2, “Invalid Operation Exception Conditions.”

21 Floating-point invalid operation exception for software request (VXSOFT). This bit can be altered only by the mcrfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1 instructions. The purpose of VXSOFT is to allow software to cause an invalid operation condition for a condition that is not necessarily associated with the execution of a floating-point instruction. For example, it might be set by a program that computes a square root if the source operand is negative. This is a sticky bit.

22 Floating-point invalid operation exception for invalid square root (VXSQRT). This is a sticky bit. This guarantees that software can simulate fsqrt and frsqrte, and to provide a consistent interface to handle exceptions caused by square-root operations.

23 Floating-point invalid operation exception for invalid integer convert (VXCVI). This is a sticky bit. See Section 5.4.7.2, “Invalid Operation Exception Conditions."

24 Floating-point invalid operation exception enable (VE)

25 Floating-point overflow exception enable (OE)

26 Floating-point underflow exception enable (UE). This bit should not be used to determine whether denormalization should be performed on floating-point stores

27 Floating-point zero divide exception enable (ZE)

28 Floating-point inexact exception enable (XE)

29 Reserved. This bit may be implemented as the non-IEEE mode bit (NI) in other PowerPC implementations.

30–31 Floating-point rounding control (RN). 00 Round to nearest 01 Round toward zero 10 Round toward +infinity11 Round toward –infinity

Table 5-17. FPSCR Bit Settings (Continued)

Bit(s) Description


• Zero divide exception condition (ZX). These exception conditions are described in Section 5.4.7.3, “Zero Divide Exception Condition.”

• Overflow Exception Condition (OX). These exception conditions are described in Section 5.4.7.4, “Overflow Exception Condition.”

• Underflow Exception Condition (UX). These exception conditions are described in Section 5.4.7.5, “Underflow Exception Condition.”

• Inexact Exception Condition (XX). These exception conditions are described in Section 5.4.7.6, “Inexact Exception Condition.”

Each floating-point exception condition and each category of illegal floating-pointoperation exception condition, has a corresponding exception bit in the FPSCR. In addition,each floating-point exception has a corresponding enable bit in the FPSCR. The exceptionbit indicates the occurrence of the corresponding condition. If a floating-point exceptionoccurs, the corresponding enable bit governs the result produced by the instruction and, inconjunction with bits FE0 and FE1, whether and how the system floating-point enabledexception error handler is invoked. (The “enabling” specified by the enable bit is ofinvoking the system error handler, not of permitting the exception condition to occur. Theoccurrence of an exception condition depends only on the instruction and its inputs, not onthe setting of any control bits.)

The floating-point exception summary bit (FX) in the FPSCR is set when any of theexception condition bits transitions from a zero to a one or when explicitly set by software.The floating-point enabled exception summary bit (FEX) in the FPSCR is set when any ofthe exception condition bits is set and the exception is enabled (enable bit is one).

A single instruction may set more than one exception condition bit in the following cases:

• The inexact exception condition bit may be set with overflow exception condition.

• The inexact exception condition bit may be set with underflow exception condition.

• The illegal floating-point operation exception condition bit (SNaN) may be set with illegal floating-point operation exception condition (∞*0) for multiply-add instructions.

• The illegal operation exception condition bit (SNaN) may be set with illegal floating-point operation exception condition (illegal compare) for compare ordered instructions.

• The illegal floating-point operation exception condition bit (SNaN) may be set with illegal floating-point operation exception condition (illegal integer convert) for convert to integer instructions.

When an exception occurs, the instruction execution may be suppressed or a result may bedelivered, depending on the exception condition.


Instruction execution is suppressed for the following kinds of exception conditions, so thatthere is no possibility that one of the operands is lost:

• Enabled illegal floating-point operation• Enabled zero divide

For the remaining kinds of exception conditions, a result is generated and written to thedestination specified by the instruction causing the exception. The result may be a differentvalue for the enabled and disabled conditions for some of these exception conditions. Thekinds of exception conditions that deliver a result are the following:

• Disabled illegal floating-point operation• Disabled zero divide• Disabled overflow• Disabled underflow• Disabled inexact• Enabled overflow• Enabled underflow• Enabled inexact

Subsequent sections define each of the floating-point exception conditions and specify theaction taken when they are detected.

The IEEE standard specifies the handling of exception conditions in terms of traps and traphandlers. In the PowerPC architecture, setting an FPSCR exception enable bit causesgeneration of the result value specified in the IEEE standard for the trap enabled case—theexpectation is that the exception is detected by software, which will revise the result.Clearing an FPSCR exception enable bit causes generation of the default result valuespecified for the trap disabled case (or no trap occurs or trap is not implemented)—theexpectation is that the exception will not be detected by software, and the default result isused. The result to be delivered in each case for each exception is described in the followingparagraphs.

The IEEE default behavior when an exception occurs, which is to generate a default valueand not to notify software, is obtained by clearing all FPSCR exception enable bits andusing ignore exceptions mode (see Table 5-18). In this case the system floating-pointenabled exception error handler is not invoked, even if floating-point exceptions occur. Ifnecessary, software can inspect the FPSCR exception bits to determine whether exceptionshave occurred.

If the program exception handler notifies software that a given exception condition hasoccurred, the corresponding FPSCR exception enable bit must be set and a mode other thanignore exceptions mode must be used. In this case the system floating-point enabledexception error handler is invoked if an enabled floating-point exception condition occurs.


Whether and how the system floating-point enabled exception error handler is invoked if anenabled floating-point exception occurs is controlled by MSR bits FE0 and FE1 as shownin Table 5-18. (The system floating-point enabled exception error handler is never invokedif the appropriate floating-point exception is disabled.)

Note that in the 601, FE0 and FE1 are ORed; therefore, unless both FE0 and FE1 arecleared, the 601 operates in precise mode. Whether a floating-point result is stored and whatvalue is stored is determined by the FPSCR exception enable bits, as described insubsequent sections, and are not affected by any MSR bit settings.

Whenever the system floating-point enabled exception error handler is invoked, themicroprocessor ensures that all instructions logically residing before the exceptinginstruction have completed, and no instruction after that instruction has been executed.

If exceptions are ignored, an FPSCR instruction can be used to force any exceptions, dueto instructions initiated before the FPSCR instruction, to be recorded in the FPSCR. A syncinstruction can also be used to force exceptions, but is likely to degrade performance morethan an FPSCR instruction.

Table 5-18. MSR[FE0] and MSR[FE1] Bit Settings

FE0 FE1 Description

0 0 Ignore exceptions mode—Floating-point exceptions do not cause the program exception error handler to be invoked.

0 1 Imprecise nonrecoverable mode—This mode is not applicable to the 601. FE0 and FE1 are ORed, so setting either bit results in running the processor in precise mode. Note that in PowerPC processors that support this mode, the system floating-point enabled exception error handler is invoked at some point at or beyond the instruction that caused the enabled exception. The state of the processor may include conditions and data affected by the exception (that is, hazards are not avoided). It may not be possible to identify the excepting instruction or the data that caused the exception (that is, the data is not recoverable).

1 0 Imprecise recoverable mode—This mode is not applicable to the 601. FE0 and FE1 are ORed, so setting either bit results in running the processor in precise mode. Note that in PowerPC processors that support this mode, the system floating-point enabled exception error handler is invoked at some point at or beyond the instruction that caused the enabled exception. Sufficient information is provided to the system floating-point enabled exception error handler that it can identify the excepting instruction and the operands, and correct the result. All hazards caused by the exception are avoided (for example, use of the data that would have been produced by the excepting instruction).

1 1 Precise mode—The system floating-point enabled exception error handler is invoked precisely at the instruction that caused the enabled exception.


For the best performance across the widest range of implementations, the followingguidelines should be considered:

• If the IEEE default results are acceptable to the application, FE0 and FE1 should be cleared (ignore exceptions mode). All FPSCR exception enable bits should be cleared.

• Ignore exceptions mode should not, in general, be used when any FPSCR exception enable bits are set.

• Precise mode may degrade performance in some implementations, perhaps substantially, and therefore should be used only for debugging and other specialized applications.

5.4.7.2 Invalid Operation Exception ConditionsAn invalid operation exception occurs when an operand is invalid for the specifiedoperation. The invalid operations are as follows:

• Any operation except load, store, move, select, or mtfsf on a signaling NaN (SNaN)

• For add or subtract operations, magnitude subtraction of infinities (∞-∞)

• Division of infinity by infinity (∞/∞)

• Division of zero by zero (0/0)

• Multiplication of infinity by zero (∞*0)

• Ordered comparison involving a NaN (invalid compare)

• Square root or reciprocal square root of a negative, nonzero number (invalid square root)

• Integer convert involving a number that is too large to be represented in the format, an infinity, or a NaN (invalid integer convert)

FPSCR[VXSOFT] allows software to cause an invalid operation exception for a conditionthat is not necessarily associated with the execution of a floating-point instruction. Forexample, it might be set by a program that computes a square root if the source operand isnegative. This facilitates the emulation of PowerPC instructions not implemented in the601.


5.4.7.2.1 Action for Invalid Operation Exception ConditionsThe action to be taken depends on the setting of the invalid operation exception enable bitof the FPSCR. When invalid operation exception is enabled (FPSCR[VE] = 1) and invalidoperation occurs or software explicitly requests the exception, the following actions aretaken:

• One or two invalid operation exceptions is setFPSCR[VXSNAN] (if SNaN)FPSCR[VXISI] (if ∞-∞)FPSCR[VXIDI] (if ∞/∞)FPSCR[VXZDZ](if 0/0)FPSCR[VXIMZ] (if ∞*0)FPSCR[VXVC] (if invalid comparison)FPSCR[VXSOFT] (if software request)FPSCR[VXCVI] (if invalid integer convert)

• If the operation is an arithmetic or convert-to-integer operation,the target FPR is unchangedFPSCR[FR FI] are clearedFPSCR[FPRF] is unchanged

• If the operation is a compare,FPSCR[FR FI C] are unchangedFPSCR[FPCC] is set to reflect unordered

• If software explicitly requests the exception,FPSCR[FR FI FPRF] are as set by the mtfsfi, mtfsf, or mtfsb1 instruction

When invalid operation exception condition is disabled (FPSCRVE = 0) and invalidoperation occurs or software explicitly requests the exception, the following actions aretaken:

• One or two invalid operation exception condition bits is setFPSCR[VXSNAN] (if SNaN)FPSCR[VXISI] (if ∞-∞)FPSCR[VXIDI] (if ∞/∞)FPSCR[VXZDZ] (if 0/0)FPSCR[VXIMZ] (if ∞*0)FPSCR[VXVC] (if invalid comparison)FPSCR[VXSOFT] (if software request)FPSCR[VXCVI] (if invalid integer convert)

• If the operation is an arithmetic operation, the target FPR is set to a quiet NaNFPSCR[FR FI] are clearedFPSCR[FPRF] is set to indicate the class of the result (quiet NaN)


• If the operation is a convert to 32-bit integer operation, the target FPR is set as follows: FRT[0–31] = undefinedFRT[32–63] = most negative 32-bit integer FPSCR[FR FI] are clearedFPSCR[FPRF] is undefined

• If the operation is a convert to 64-bit integer operation, the target FPR is set as follows:FRT[0–63] = most negative 64-bit integerFPSCR[FR FI] are clearedFPSCR[FPRF] is undefined

• If the operation is a compare,FPSCR[FR FI C] are unchangedFPSCR[FPCC] is set to reflect unordered

• If software explicitly requests the exception,FPSCR[FR FI FPRF] are as set by the mtfsfi, mtfsf, or mtfsb1 instruction

5.4.7.3 Zero Divide Exception ConditionA zero divide exception condition occurs when a divide instruction is executed with a zerodivisor value and a finite, nonzero dividend value.

The name is a misnomer used for historical reasons. The proper name for this exceptioncondition should be exact infinite result from finite operands exception conditioncorresponding to a mathematical pole.

5.4.7.3.1 Action for Zero Divide Exception ConditionThe action to be taken depends on the setting of the zero divide exception condition enablebit of the FPSCR. When the zero divide exception condition is enabled (FPSCR[ZE] = 1)and a zero divide condition occurs, the following actions are taken:

• Zero divide exception condition bit is setFPSCR[ZX] = 1

• The target FPR is unchanged

• FPSCR[FR FI] are cleared

• FPSCR[FPRF] is unchanged

When zero divide exception condition is disabled (FPSCR[ZE] = 0) and zero divide occurs,the following actions are taken:

• Zero divide exception condition bit is setFPSCR[ZX] = 1

• The target FPR is set to a ±infinity, where the sign is determined by theXOR of the signs of the operands


• FPSCR[FPRF] is set to indicate the class and sign of the result (±infinity)


5.4.7.4 Overflow Exception ConditionOverflow occurs when the magnitude of what would have been the rounded result, if theexponent range were unbounded, exceeds that of the largest finite number of the specifiedresult precision.

5.4.7.4.1 Action for Overflow Exception ConditionThe action to be taken depends on the setting of the overflow exception condition enablebit of the FPSCR. When the overflow exception condition is enabled (FPSCR[OE] = 1) andan exponent overflow condition occurs, the following actions are taken:

• Overflow exception condition bit is setFPSCR[OX] = 1

• For double-precision arithmetic instructions, the exponent of the normalizedintermediate result is adjusted by subtracting 1536

• For single-precision arithmetic instructions and the floating round to single-precision instruction, the exponent of the normalized intermediate result is adjusted by subtracting 192

• The adjusted rounded result is placed into the target FPR

• FPSCR[FPRF] is set to indicate the class and sign of the result (±normal number)

When the overflow exception condition is disabled (FPSCR[OE] = 0) and an overflowcondition occurs, the following actions are taken:

• Overflow exception condition bit is setFPSCR[OX] = 1

• Inexact exception condition bit is setFPSCR[XX] = 1

• The result is determined by the rounding mode (FPSCR[RN]) and the sign of the intermediate result as follows:

— Round to nearestStore ±infinity, where the sign is the sign of the intermediate result

— Round toward zeroStore the format’s largest finite number with the sign of the intermediate result

— Round toward +infinityFor negative overflows, store the format’s most negative finite number; for positive overflows, store +infinity

— Round toward –infinityFor negative overflows, store –infinity; for positive overflows, store the format's largest finite number


• The result is placed into the target FPR


• FPSCR[FPRF] is set to indicate the class and sign of the result (±infinityor ±normal number)

5.4.7.5 Underflow Exception ConditionThe underflow exception condition is defined separately for the enabled and disabled states:

• Enabled—Underflow occurs when the intermediate result is “Tiny.”

• Disabled—Underflow occurs when the intermediate result is “Tiny” and there is “Loss of Accuracy.”

A “Tiny” result is detected before rounding, when a nonzero result value computed asthough the exponent range were unbounded would be less in magnitude than the smallestnormalized number.

If the intermediate result is “tiny” and the underflow exception condition enable bit iscleared (FPSCR[UE]=0), the intermediate result is denormalized (see Section 2.5.4,“Normalization and Denormalization”) and rounded (see Section 2.5.6, “Rounding”).

“Loss of Accuracy” is detected when the delivered result value differs from what wouldhave been computed were both the exponent range and precision unbounded.

5.4.7.5.1 Action for Underflow Exception ConditionThe action to be taken depends on the setting of the underflow exception condition enablebit of the FPSCR.

When the underflow exception condition is enabled (FPSCR[UE]=1) and an exponentunderflow condition occurs, the following actions are taken:

• Underflow exception condition bit is setFPSCR[UX] = 1

• For double-precision arithmetic and conversion instructions, the exponent of the normalized intermediate result is adjusted by adding 1536.

• For single-precision arithmetic instructions and the Floating-Point Round to Single-Precision (frsp) instruction, the exponent of the normalized intermediate result is adjusted by adding 192.

• The adjusted rounded result is placed into the target FPR.

• FPSCR[FPRF] is set to indicate the class and sign of the result (±normalized number).

The FR and FI bits in the FPSCR allow the system floating-point enabled exception errorhandler, when invoked because of an underflow exception condition, to simulate a trapdisabled environment. That is, the FR and FI bits allow the system floating-point enabledexception error handler to unround the result, thus allowing the result to be denormalized.


When the underflow exception condition is disabled (FPSCR[UE]=0) and an underflowcondition occurs, the following actions are taken:

• Underflow exception condition enable bit is setFPSCR[UX] = 1

• The rounded result is placed into the target FPR

• FPSCR[FPRF] is set to indicate the class and sign of the result(±denormalized number or ±zero)

5.4.7.6 Inexact Exception ConditionThe inexact exception condition occurs when one of two conditions occur during rounding:

• The rounded result differs from the intermediate result assuming the intermediate result exponent range and precision to be unbounded.

• The rounded result overflows and overflow exception condition is disabled.

5.4.7.6.1 Action for Inexact Exception ConditionThe action to be taken does not depend on the setting of the inexact exception conditionenable bit of the FPSCR.

When the inexact exception condition occurs, the following actions are taken:

• Inexact exception condition enable bit in the FPSCR is setFPSCR[XX] = 1

• The rounded or overflowed result is placed into the target FPR

• FPSCR[FPRF] is set to indicate the class and sign of the result

In other PowerPC implementations, enabling inexact exception conditions may havegreater latency than enabling other types of floating-point exception condition.

5.4.8 Floating-Point Unavailable Exception (x'00800')A floating-point unavailable exception occurs when no higher priority exception exists, anattempt is made to execute a floating-point instruction (including floating-point load, store,and move instructions), and the floating-point available bit in the MSR is disabled,(MSR[FP]=0).

The register settings for floating-point unavailable exceptions are shown in Table 5-19.


When a floating-point unavailable exception is taken, instruction execution resumes atoffset x'00800' from the physical base address indicated by MSR[EP].

5.4.9 Decrementer Exception (x'00900')A decrementer exception occurs when no higher priority exception exists, the decrementerregister has completed decrementing, and MSR[EE] = 1. The decrementer exceptionrequest is canceled when the exception is handled. The decrementer register counts down,causing an exception (unless masked) when passing through zero. The decrementerimplementation meets the following requirements:

• The operation of the RTC and the decrementer are coherent; that is, the counters are driven by the same fundamental time base (7.8125 MHz).

• Loading a GPR from the decrementer does not affect the decrementer.

• Storing a GPR value to the decrementer replaces the value in the decrementer with the value in the GPR.

• Whenever bit 0 of the decrementer changes from 0 to 1, an exception request is signaled. If multiple decrementer exception requests are received before the first can be reported, only one exception is reported. The occurrence of a decrementer exception cancels the request.

• If the decrementer is altered by software and if bit 0 is changed from 0 to 1, an interrupt request is signaled.

The register settings for the decrementer exception are shown in Table 5-20.

Table 5-19. Floating-Point Unavailable Exception—Register Settings



SRR1 0–15 Cleared 16–31 Loaded from bits 16–31 of the MSR



When a decrementer exception is taken, instruction execution resumes at offset x'00900'from the physical base address indicated by MSR[EP].

5.4.10 I/O Controller Interface Error Exception (x'00A00')An I/O controller interface error exception occurs when no higher-order priority exists anda load or store corresponding to an I/O controller interface segment generates an error. I/Ocontroller interface operations are described in Section 9.6, “Memory- vs. I/O-Mapped I/OOperations.”

This exception is taken only when an operation to an I/O controller interface segment fails(such a failure is indicated to the 601 by a particular bus reply packet). If an I/O controllerinterface error exception occurs, the SRR0 contains the address of the instruction followingthe excepting instruction.

Note the following:

• Illegal accesses to I/O controller interface space cause alignment or data access exceptions. For information refer to Section 5.4.6.1.2, “I/O Controller Interface Access.” Note that this exception is specific to the 601. The PowerPC architecture treats I/O controller interface exceptions as data access exceptions (x'00300').

• Unlike exceptions that occur with memory accesses, loads and both loads and stores with update cause the target register to be updated, even though an exception is taken.

• The lwarx, stwcx. and lscbx instructions never cause an I/O controller interface error exception; instead they take a data access exception as defined in the PowerPC architecture.

• Floating point loads and stores to I/O controller interface segments are not supported on the 601 even though they are allowed in the PowerPC architecture. For those instructions, the 601 takes an alignment interrupt.

Table 5-20. Decrementer Exception—Register Settings


SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present.




The register settings for I/O controller interface error exceptions are shown in Table 5-21.

When an I/O controller interface error exception is taken, instruction execution resumes atoffset x'00A00' from the physical base address indicated by MSR[EP].

5.4.11 System Call Exception (x'00C00')A system call exception occurs when a System Call (sc) instruction is executed. Theeffective address of the instruction following the sc instruction is placed into SRR0. Bits16–31 of the MSR are placed into bits 16–31 of SRR1, and bits 0–15 of SRR1 are set toundefined values. Then a system call exception is generated.

The system call exception causes the next instruction to be fetched from offset x'00C00'from the physical base address indicated by the setting of MSR[EP]. This instruction iscontext synchronizing. That is, when a system call exception occurs, instruction dispatch ishalted.

Table 5-21. I/O Controller Interface Error Exception—Register Settings


SRR0 Set to the effective address of the instruction following the instruction that caused the instruction. This and subsequent instructions have not been executed. SRR0 contains the EA of the instruction following the load or store that caused the exception.



DSISR Unchanged.

DAR For scalar (nonmultiple or string) loads and stores, the DAR points to the first byte of the operand, regardless of the alignment. For multiple and string loads and stores, the DAR points to the first byte in the last word.

GPRs • On update form loads/stores, rA contains the updated EA. If rA=0, then R0 is not updated. If rA = rD, the register gets the target data instead of the updated EA.

• On simple loads, the target register rD will have been updated with the word (or bytes) received from the I/O controller interface operation.

• On simple stores the source register rS will have been sent to the I/O controller as store data. Whether the store actually occurred at the I/O device depends on the controller implementation and perhaps the specific type of error detected.

• On load multiples and load strings, all of the target registers will have been updated with data received from the I/O controller. However, the addressing registers are not updated if they are in the range of target registers specified by the instruction.

• On store multiples and store strings, all source registers will have been presented to the I/O controller via normal extended transfer bus protocols.


The following synchronization is performed:

1. The exception mechanism waits for all instructions in execution to complete to a point where they report all exceptions they will cause.

2. The processor ensures that all instructions in execution complete in the context in which they began execution.

3. Instructions dispatched after the exception is processed are fetched and executed in the context established by the exception mechanism.

Register settings are shown in Table 5-22.

When a system call exception is taken, instruction execution resumes at offset x'00C00'from the physical base address indicated by MSR[EP].

5.4.12 Run Mode/Trace Exception (x'02000')Vector x'02000' is used for the implementation-specific run mode exception, and the traceexception, which is defined in the PowerPC architecture, but which uses a different vector.

5.4.12.1 Run Mode ExceptionThe 601 defines an implementation-specific exception called the run mode exception. Thisexception is taken by the 601 under the following circumstances:

• Instruction address compare

• Branch target address compare

• Trace mode (MSR[SE] is set)—When an instruction clears MSR[SE], trace mode ends immediately. Note that other PowerPC processors implement a separate trace exception at vector x'00D00'.

Note that this exception may not be implemented by other PowerPC processors, and thatthis exception can be enabled and disabled using bits 8 and 9 in HID1; the exception isenabled when HID1[8,9] = b'10'. When this exception occurs, the registers are set asindicated in Table 5-23.

Table 5-22. System Call Exception—Register Settings


SRR0 Set to the effective address of the instruction following the sc instruction

SRR1 0–15 Loaded from bits 16–31 of the instruction16–31 Loaded from bits 16–31 of the MSR



The run mode is determined by the settings of HID1[1–3]. These settings are defined inTable 5-24.

Table 5-23. Run Mode Exception—Register Settings

Register Setting

SRR0 Set to the address of the instruction that causes the run mode exception

SRR1 Loaded from bits 0–31 of the MSR.


Table 5-24. Run Mode Exception Actions

HID1(1–3) Setting

Mode Description

000 Normal Run Mode No address break points are specified and the 601 processes zero to three instructions per cycle.

001 — Undefined. Do not use.

010 Limited Instruction Address Compare Mode

The 601 runs at full speed until the EA of the instruction in the lowest position in the instruction queue (IQ0) matches the one specified in HID2. At this point the appropriate break action is performed. This is a limited compare in that branches and floating-point operations and the addresses associated with them may never be detected.


100 Single Instruction Step Mode

If you clear HID1[1:3] and set HID1[8:9] to 10, the processor branches to offset x'02000' and enters an infinite loop, executing the instruction at x'02000'. Unless the user needs this mode specifically, the trace exception should be used.


110 Full Instruction Address Compare Mode

In full instruction address compare mode, processing proceeds out of IQ0. When the EA in HID2 matches the EA of the instruction in IQ0, the appropriate break action is performed. Unlike the limited instruction address compare mode, all instructions pass through the IQ0 in this mode. That is, instructions cannot be folded out of the instruction stream.

111 Full Branch Target Address Compare Mode

This mode is similar to full instruction address compare mode except that the branch target is compared against HID2. When addresses match, the appropriate break action is taken. This allows the programmer to see how a program got to an address. This mode can be used with b, bc, bcr, and bcc instructions.


5.4.12.2 Trace ExceptionWhen the trace exception is enabled, (MSR[SE] is set), a trace interrupt is taken after eachinstruction that completes without causing a exception or context change (such as a sc, rfi,or a load instruction that causes an exception). MSR[SE] is cleared when the traceexception is taken. In the normal use of this function, MSR[SE] is restored when theexception handler returns to the interrupted program using an rfi instruction.

Register settings for the trace mode are described in Table 5-25.

When a run mode or trace exception is taken, instruction execution resumes as offsetx'02000' from the base address indicated by MSR[EP].

Table 5-25. Trace Exception—Register Settings

Register Setting

SRR0 Set to the address of the next instruction to be executed in the program for which the trace exception was generated.



Chapter 6. Memory Management Unit 6-1

Chapter 6 Memory Management Unit6060

This chapter describes the PowerPC 601 microprocessor’s memory management unit(MMU). The primary functions of the MMU are to translate logical (effective) addresses tophysical addresses for memory accesses, I/O accesses (most I/O accesses are assumed tobe memory-mapped), and I/O controller interface accesses, and to provide accessprotection on a block or page basis.

There are three types of accesses generated by the 601 that require address translation:instruction accesses, data accesses to memory generated by load and store instructions, andI/O controller interface accesses generated by load and store instructions.

The 601 MMU provides 4 Gbytes of logical address space accessible to supervisor and userprograms with a 4-Kbyte page size and 256-Mbyte segment size. Block sizes range from128 Kbyte to 8 Mbyte and are software selectable. In addition, the 601 uses an interim 52-bit virtual address and hashed page tables in the generation of 32-bit physical addresses.

The MMU contains three translation lookaside buffers (TLBs). There is a 256-entry, two-way set-associative unified (instruction and data address) TLB (UTLB) for storing recently-used address translations, and a four-entry fully-associative first-level instruction TLB(ITLB) that is used only by instruction accesses for storing recently used instructionaddress translations. Additionally, there is a four-entry block address translation array (BATarray) that stores the available block address translations (for instruction or data addresses).BAT array entries are implemented as the block address translation (BAT) registers that areaccessible as supervisor special-purpose registers (SPRs). UTLB entries are generatedautomatically by the 601 hardware via a search of the page tables in memory. The 601maintains all the segment information on-chip in 16 supervisor-level segment registers.

This chapter describes the MMU address translation mechanisms, the MMU conditionsthat cause 601 exceptions, the instructions used to program the MMU, and thecorresponding registers.

The 601 MMU relies on the exception processing mechanism for the implementation of thepaged virtual memory environment and for enforcing protection of designated memoryareas. Exception processing is described in Chapter 5, “Exceptions.” Section 2.3.1,“Machine State Register (MSR),” describes the MSR of the 601, which controls some ofthe critical functionality of the MMU.


The operation of the 601 MMU conforms to the operating environment defined by thePowerPC architecture for 32-bit implementations in most respects. However, the numberand format of the BAT registers is different, as is the available range of block sizes. Inaddition, the protection mechanism provided by the BAT registers is different from thatdefined for the PowerPC architecture, as is the bit settings in the SRR1 register for one caseof the instruction access exception. Also, the PowerPC architecture defines the concept ofguarded memory that is not implemented in the 601. Finally, some MMU instructions ofthe PowerPC architecture (including tlbsync) are not implemented in the 601.

Note that the memory-forced I/O controller interface functionality described for the 601 isnot defined as part of the PowerPC architecture, and will not be present in other PowerPCprocessors. Also note that the hardware implementation details of the 601 MMU are notcontained in the architectural definition of PowerPC processors and are invisible to theprogramming model.

6.1 MMU OverviewThe 601 MMU and exception model support demand paged virtual memory. Virtualmemory management permits execution of programs larger than the size of physicalmemory; demand paged implies that individual pages are loaded into physical memoryfrom backing storage only as they are accessed by an executing program.

The memory management model of the 601 includes the concept of a virtual address thatis not only larger than that of the maximum physical memory allowed but a virtual addressspace that is also larger than the logical address space. Each logical address generated bythe 601 is 32 bits wide. In the address translation process, a logical address is converted toa 52-bit virtual address (as governed by the operating system) and then translated back to a32-bit physical address.

The operating system is responsible for managing the system’s physical memory resources.Consequently, the operating system programs the MMU registers (segment registers, BATregisters, and table search description register 1 (SDR1)) and sets up the page tables inmemory appropriately. The MMU then assists the operating system by managing pagestatus and maintaining the recently-used address translations on-chip for quick access.

The 601 logical address spaces are divided into 256-Mbyte regions called segments or otherlarge regions called blocks (128 Kbyte–8 Mbyte). Segments that correspond to memory ormemory-mapped devices can be further subdivided into smaller regions called pages (4Kbyte). For each block or page, the operating system creates an address descriptor (pagetable entry (PTE) or BAT array entry) that the MMU uses to generate the physical addressand the protection and other access control information when an address within the blockor page is accessed. Address descriptors for pages reside in tables (as PTEs) in the physicalmemory; for faster accesses, the MMU maintains on-chip copies of recently used PTEs inthe ITLB and UTLB, and keeps the block information on-chip in the BAT array (comprisedof the BAT registers).


This section provides an overview of the high-level organization and operational conceptsof the 601 MMU, and a summary of all MMU control registers. Section 2.3.3.6, “TableSearch Description Register 1 (SDR1),” describes the SDR1 register, Section 2.3.2,“Segment Registers,” describes the segment registers, Section 2.3.1, “Machine StateRegister (MSR),” describes the MSR, and Section 2.3.3.12, “BAT Registers,” describes theBAT registers.

6.1.1 Memory AddressingA program references memory using the effective (logical) address computed by theprocessor when it executes a load, store, branch, or cache instruction, and when it fetchesthe next instruction. The effective (logical) address is translated to a physical addressaccording to the procedures described throughout this chapter. The memory subsystem usesthe physical address for the access.

For a complete discussion of effective address calculation, see Section 3.1.1, “EffectiveAddress Calculation.”

6.1.2 MMU OrganizationFigure 6-1 shows the conceptual organization of the MMU and its relationship to some ofthe other functional units in the 601. The instruction unit generates all instruction addresses;these addresses are both for sequential instruction fetches and addresses that correspond toa change of program flow. The integer unit generates addresses for data accesses (both formemory and the I/O controller interface).

After an address is generated, the upper order bits of the logical address, LA0–LA19 (or asmaller set of address bits, LA0–LAn, in the cases of blocks), are translated by the MMUinto physical address bits PA0–PA19. Simultaneously, the lower order address bits,A20–A31 (that are untranslated and therefore considered both logical and physical), aredirected to the on-chip cache where they form the index into the eight-way set-associativetag array. After translating the address, the MMU passes the higher-order bits of thephysical address to the cache, and the cache lookup completes. For cache-inhibitedaccesses or accesses that miss in the cache, the untranslated lower order address bits areconcatenated with the translated higher-order address bits; the resulting 32-bit physicaladdress is then used by the memory unit and the system interface, which accesses externalmemory.

In addition to the upper-order address bits, the MMU automatically keeps an internalindicator of whether each access was generated as an instruction or data access and asupervisor/user indicator that reflects the state of the PR bit of the MSR when the logicaladdress was generated. In addition, for data accesses, there is an indicator of whether theaccess is for a load or a store operation. This information is then used by the MMU toappropriately direct the address translation and to enforce the protection hierarchyprogrammed by the operating system. See Section 2.3.1, “Machine State Register (MSR),”for more information about the MSR.


Figure 6-1. MMU Block Diagram

Instruction Unit

IntegerUnit

BPU FPU

MMU

A20–A31

X ITLBMiss

LA0–

LA19

LA0–

LA19

ITLBLA

0–LA

19

LA0–LA3

Sel

ect

0

15

Segment Registers...

LA4–LA19

UTLB

0

127

BAT0UBAT0L

•

•

BAT3UBAT3L

0

3

BAT Array

PA0–PA19

Table Search Logic

SDR1 SPR25

PA0–PA31

Cache8

TAGS

•

•

PA0–PA19

8

Compare

CacheHit/Miss

Select

0

63

+

X

ITLB hit accessesarbitrate for cacheindependently

ITLB HitA20–A31ITLB Hit

PA0–PA19

LA0-LA14


For instruction accesses, the MMU first performs a lookup in the four entries of the ITLBfor the physical address translation. Instruction accesses that miss in the ITLB and dataaccesses cause a lookup in the UTLB and BAT array for the physical address translation.In most cases, the physical address translation resides in one of the TLBs and the physicaladdress bits are readily available to the on-chip cache. In the case where the physicaladdress translation misses in the TLBs, the 601 automatically performs a search of thetranslation tables in memory using the information in the SDR1 and the correspondingsegment register.

6.1.3 Address Translation MechanismsThe 601 supports the following four main types of address translation:

• Page address translation—translates the page frame address for a 4-Kbyte page size

• Block address translation—translates the block number for blocks that range in size from 128 Kbyte to 8 Mbyte

• I/O controller interface address translation—used to generate I/O controller interface accesses on the external bus

• Direct address translation—when address translation is disabled, the physical address is identical to the logical address

Figure 6-2 shows the four main address translation mechanisms provided by the 601. Thesegment registers shown in the figure control both the page and I/O controller interfaceaddress translation mechanisms. When an access uses the page or I/O controller interfaceaddress translation, one of the 16 on-chip segment registers is selected by the highest-orderlogical address bits. A control bit in the corresponding segment register then determines ifthe access is to memory (memory-mapped) or to the I/O controller interface space.

For memory accesses selected by the segment register, the segment register information isused to generate the interim 52-bit virtual address. Page address translation corresponds tothe conversion of this virtual address into the 32-bit physical address used by the cache orby external memory. In most cases, the physical address for the page resides in the UTLBand is available for quick access. However, if the page address translation misses in theUTLB, the 601 automatically searches the page tables in memory (using the virtual addressinformation and a hashing function) to locate the required physical address.

Block address translation occurs in parallel with page and I/O controller interface addresstranslation and is similar to page address translation, except that there are fewer upper-orderlogical address bits to be translated into physical address bits (more lower-order addressbits (at least 17) are untranslated to form the offset into a block). Also, instead of segmentregisters and a UTLB, block address translations use the on-chip BAT registers as a fully-associative array. If the logical address of an access matches the corresponding field of aBAT register, the information in the BAT register is used to generate the physical address;in this case, the results of the page translation (occurring in parallel) are ignored.


Figure 6-2. Address Translation Types

I/O controller interface address translation is enabled when the I/O controller interfacetranslation control bit (T-bit) in the selected segment register (segment register selected bythe highest-order address bits) is set. In this case, the remaining information in the segmentregister is interpreted as identifier information that is used with the remaining logicaladdress bits to generate the packets used in an I/O controller interface access on the externalinterface; additionally, no UTLB lookup or page table search is performed and the BATarray lookup results are ignored. For more information about the I/O controller interfaceoperations, see Section 9.6, “Memory- vs. I/O-Mapped I/O Operations.”

A special case of I/O controller interface address translation (not shown in Figure 6-2) issupported that forces an I/O controller interface address translation to be interpreted as amemory access (that is, it uses the usual memory access protocol rather than the I/O

0 31Logical Address

0 51Virtual Address

Segment Register Select

Match with BAT Reg-isters

0 31Physical Address

0 31I/O Cont. I/F Address



Look up in Page Table

Direct Address Translation Logical Address = Physical Address

Address Translation Disabled*

* In the 601, I/O controller interface translation may occur when data address translation is disabled.

Page Address

I/O Controller Interface Trans-

lation

Block Address Translation


controller interface access protocol on the external interface). This occurs when a field inthe selected segment register (with the T-bit set) is encoded as memory-forced I/Ocontroller space. This feature effectively allows the specification of a 256-Mbyte “block”of memory (with a common physical block number) with the use of only one segmentregister, bypassing the page and block address translation and protection mechanismsdescribed above.

Direct address translation occurs when address translation is disabled; in this case thephysical address generated is identical to the logical address. The translation of addressesfor instruction and data accesses is enabled (and disabled) independently with the MSR[IT]and MSR[DT] bits, respectively. Thus when the instruction unit generates an instructionaccess, and instruction address translation is disabled (MSR[IT] = 0), the resulting physicaladdress is identical to the logical address and all other translation mechanisms are ignored.

When a data access occurs and MSR[DT] = 0, the resulting physical address is identical tothe logical address with one exception—I/O controller interface address translation for dataaccesses is allowed, even when MSR[DT] = 0. In this case, the segment registers are usedin the same way as if translation were enabled. Note that this case of data accesses to theI/O controller interface while MSR[DT] = 0 will not be supported in other PowerPCprocessors.

6.1.4 Memory Protection FacilitiesIn addition to the translation of logical addresses to physical addresses, the MMU providesaccess protection of supervisor areas from user access and can designate areas of memoryas read-only. Table 6-1 shows the four protection options supported by the 601.

Each of these options is enforced at the block or page level. Thus, the supervisor-onlyoption allows only read and write operations generated while the 601 is operating insupervisor mode (corresponding to MSR[PR] = 0) to use the selected address translation(block or page). User accesses that map into these blocks or pages cause an exception to betaken.

As shown in the table, the supervisor-write-only option allows both user and supervisoraccesses to read from the selected area of memory but only supervisor programs can update(write to) that area. There is also an option that allows both supervisor and user programsread and write access (both user/supervisor option), and finally, there is an option to

Table 6-1. Access Protection Options

Option User Read User WriteSupervisor

ReadSupervisor

Write

Supervisor-only Not allowed Not allowed √ √

Supervisor-write-only √ Not allowed √ √

Both user/supervisor √ √ √ √

Both read-only √ Not allowed √ Not allowed


designate an area of memory as read-only, both for user and supervisor programs (bothread-only option).

For I/O controller interface segments, the MMU calculates a “key” bit based on theprotection values programmed in the segment register, and the specific user/supervisor andread/write information for the particular access. However, this bit is merely passed on to thesystem interface to be transmitted in the context of the I/O controller interface protocol asdescribed in Section 9.6, “Memory- vs. I/O-Mapped I/O Operations.” The MMU does notitself enforce any protection or cause any exception based on the state of the key bit forthese accesses. The I/O controller device or other external hardware can optionally use thisbit to enforce any protection required.

6.1.5 Page History InformationThe 601 MMU also maintains reference (R) and change (C) bits in the page addresstranslation mechanism that can be used as history information relevant to the page. Thisinformation can then be used by the operating system to determine which areas of memoryto write back to disk when new pages must be allocated in main memory. While these bitsare initially programmed by the operating system into the page table, the 601 automaticallyupdates these bits when required. Note that the updates to these bits in the page tables areperformed with standard read and write transactions on the bus (not locked read-modify-write operations). However, when multiple 601 devices have shared access to the pagetables, the bit settings are guaranteed to be updated correctly.

6.1.6 General Flow of MMU Address TranslationWhen an instruction or data access is generated and the corresponding instruction or datatranslation is disabled (MSR[IT] = 0 or MSR[DT] = 0), direct translation is used and theaccess continues to the cache. When the selected segment register indicates that the accessis an I/O controller interface access, I/O controller interface address translation occurs. SeeSection 6.5, “Selection of Address Translation Type” for more information regarding theselection of address translation mode used for all cases.

For instruction accesses, if translation is enabled (MSR[IT] = 1), the ITLB is first checkedfor a matching page or block address translation. If there is a miss, then the MMU uses theblock and page address translation mechanisms to find the address translation. Figure 6-3shows the flow used to search for the block or page address translation.

Although the 601 performs the block and page TLB lookups simultaneously, the flowdiagram shows that if a BAT array hit occurs, that particular translation is performedregardless of the results of the UTLB lookup. If the BAT array misses, the results of theUTLB search are considered. If the UTLB hits, the page translation occurs and the physicaladdress bits are forwarded to the cache (if the access is cacheable).


Figure 6-3. MMU Block and Page Address Translation Flow

If the UTLB misses, the 601 automatically searches the page tables in memory. If the pagetable entry (PTE) is successfully read, a new UTLB entry (and an ITLB entry for theinstruction access case) is created and the page translation is once again attempted. Thistime, the UTLB (and ITLB for instruction access case) is guaranteed to hit. If the PTE isnot found by the table search operation, an instruction access or data access exception isgenerated.

BAT arrayMiss

BAT arrayHit

Access Faulted

Generate 52-bitVirtual Address

from Segment Reg-

Compare Virtual Address with UTLB

Entries

Access Protected

Translate Address

Access Permitted

UTLBMiss

UTLBHit

Continue Accessto Cache

Access Faulted

Access Faulted

Logical AddressGenerated

Select Segment Register

Compare Address with BAT array

(BAT Registers)

Translate Address

Continue Accessto Cache

Access Protected

Access Permitted

PTE Found PTE Not Found

Block/Page AddressTranslation

Load UTLB Entry

See Figure 6-4

Perform Table Search Operation


Note that if either the BAT array or UTLB results in a hit, the access is qualified with theappropriate protection bits. If the access is determined to be protected (not allowed), anexception (instruction access or data access) is generated.

6.1.7 Memory/MMU Coherency ModelThe memory model of the 601 provides the following features:

• Performance benefits of weak ordering of memory accesses

• Memory coherency among processors and between a processor and I/O devices controlled at the block and page level

• Instructions that ensure a coherent and ordered memory state.

• Processor address order guaranteed

The memory implementations in 601 systems can take advantage of the performancebenefits of weak ordering of memory accesses between processors or between processorsand other external devices without any additional complications. The MMU assumes thatall accesses are ordered. Thus, the priority of accesses is determined at the externalinterface in a way that provides maximum throughput for most cases.

In addition, at the system level, the memory coherency among processors and between aprocessor and I/O devices is programmed through the following three mode control bits inthe MMU:

• Write-through (W bit)• Caching-inhibited (I bit)• Memory coherency (M bit)

Both the block and page address translation mechanisms contain the WIM bits for eachTLB entry; these bits are used to control all accesses that correspond to the particular blockor page. The four possible combinations of the W and M bits yield modes that are supportedfor I = 0 (caching-allowed) as shown in Table 6-2. For the caching-inhibited (I = 1) case,there are only two modes defined, corresponding to W=0/M=0, and W=0/M=1.

Table 6-2. Defined WIM Combinations

W I M

0 0 0

0 0 1

1 0 0

1 0 1

0 1 0

0 1 1


The 601 also provides instructions (the cache instructions, isync, sync, eieio, lwarx, andstwcx.) to ensure a coherent and ordered memory state. These instructions are described inChapter 3, “Addressing Modes and Instruction Set Summary,” and in Chapter 10,“Instruction Set.” Memory accesses performed by a single processor appear to completesequentially from the view of the programming model but may complete out of order withrespect to the ultimate destination in the memory hierarchy. Order is guaranteed at eachlevel of the memory hierarchy for accesses to the same address from the same processor.

Memory coherency can be enforced externally by a snooping bus design, a centralizedcache directory design, or other design that can take advantage of these coherency features.

6.1.8 Effects of Instruction Fetch on MMUSpeculative instruction execution occurs when the 601 executes instructions in advance incase the result is needed. If subsequent events indicate that the speculative instructionshould not have been executed, the processor abandons the results produced by thatinstruction. Typically, the 601 executes instructions speculatively when it has resources thatwould otherwise be idle, so the operation is done at little or no cost.

The 601 executes computational instructions speculatively (beyond a branch instruction)and performs instruction fetches speculatively (it fetches instructions ahead in theinstruction stream). However, the 601 does not execute any load or store instructionsspeculatively. Speculative execution of computational instructions does not involve theMMU.

To avoid instruction fetch delay, the processor typically fetches instructions in advance ofwhen they are needed. Such instruction fetching is speculative in that fetched instructionsmay not be executed due to intervening branches or exceptions.

Instruction fetching from I/O controller interface segments (T = 1) only occurs from thosesegments designated as memory-forced I/O controller interface segment. See Section 6.10,“I/O Controller Interface Address Translation” for more information about I/O controllerinterface segments.

Machine check exceptions that result from instruction fetching may be generated, even ifthe instruction fetched would not have been executed because of a previous branch orchange in program flow. See Section 5.4.2, “Machine Check Exception (x'00200'),” formore information on the machine check exception.

Memory in 601 systems is considered not “guarded” in the sense that fetching may occurto any area of memory. For example, if a data area is adjacent to an instruction area ofmemory, the 601 could fetch from that data area. Furthermore, if a word in that data areacontains the encoding for an unconditional branch instruction, the processor could evencontinue to fetch from the address it interprets as the target of the branch. Care may berequired to prevent these situations, particularly if peripheral devices that cannot recoverfrom extraneous accesses reside in these areas. Areas of memory in other PowerPC


processors may be designated as guarded within the MMU in that speculative operationsdo not occur.

6.1.9 Breakpoint FacilityThrough the use of the HIDx registers (HID1, HID2, and HID5), the 601 has the ability toperform a breakpoint operation for both instruction and data accesses independently. Forinstruction accesses, the logical addresses of instructions in decode are compared with theaddress specified in the instruction address breakpoint register (IABR). If there is a match,then the processor takes a run mode exception. Similarly, data breakpoints occur when thelogical address of a data access matches the address specified in the data address breakpointregister (DABR) register. However, when a data address matches, the 601 takes a dataaccess exception.

The instruction and data breakpoint functionality is controlled by bit settings in the601debug modes register (HID1). Various combinations and levels of breakpoints can beenabled. Section 2.3.3.13, “601 Implementation-Specific HID Registers” describes thebreakpoint functionality provided in the 601. Note that these breakpoints occur completelyindependently of the MSR[DT] and MSR[IT] bit settings.

6.1.10 MMU Exceptions SummaryIn order to complete any memory access, the logical address must be translated to aphysical address. An MMU exception condition occurs if this translation fails for one of thefollowing reasons:

• There is no valid entry in the page table for the page specified by the logical address (and segment register).

• An address translation is found but the access is not allowed by the memory protection mechanism.

Most MMU exception conditions cause either the instruction access exception or the dataaccess exception to be taken. The state saved by the 601 for each of these exceptionscontains information that identifies the address of the failing instruction. Refer toChapter 5, “Exceptions,” for a more detailed description of exception processing.

There are 11 types of conditions that can cause an MMU exception to occur. The exceptionconditions map to the 601 exception as shown in Table 6-3. The only MMU exceptioncondition recognized when MSR[IT] = 0 is the instruction breakpoint match condition. Theonly exception conditions that occur when MSR[DT] = 0 are the data breakpoint matchcondition and the conditions that cause the alignment exception for data accesses. For moredetailed information about the conditions that cause the alignment exception (in particularfor string/multiple instructions) see 5.4.6, “Alignment Exception (x'00600').”


Table 6-3. MMU Exception Conditions/Exception Mapping

Condition Description Exception

Page fault No matching PTE found in page tables

I access: instruction access exceptionSRR1[1] = 1

D access: data access exceptionDSISR[1] = 1

Block protection violation Conditions described in Table 6-7 for block


D access: data access exceptionDSISR[4] =1

Page protection violation Conditions described in Table 6-7 for page


D access: data access exceptionDSISR[4] = 1

dcbz with W or I = 1 dcbz instruction to write-through or cache-inhibited segment or block

Alignment exception

Instruction access to I/O controller interface space

Attempt to fetch instruction when SR[T] = 1, SR[BUID] ≠ '07F'

Instruction access exceptionCauses no SRR1 bits to be set*

lwarx, stwcx., lscbx instruction to I/O controller interface space

Reservation instruction or load string and compare byte instruction when SR[T] = 1, SR[BUID] ≠ '07F'

Data access exceptionDSISR[5] = 1

Floating-point load or store to I/O controller interface space

FP memory access when SR[T] = 1, SR[BUID] ≠ '07F'

Alignment exception

Instruction breakpoint match Instruction address matches the address in HID2

Run mode exception

Data breakpoint match Data address matches the address in HID5

Data access exceptionDSISR[9] = 1

Operand misalignment: 256-Mbyte boundary

Operand crosses a 256-Mbyte boundary (regardless of MSR[DT] and MSR[IT] setting)

Alignment exception

Operand misalignment: 4-Kbyte boundary

Translation enabled and operand crosses a 4 Kbyte boundary (page or block)

Alignment exception

* This is only true for the 601; other PowerPC processors will set SRR1[3] for this case.


6.1.11 MMU Instructions and Register SummaryTable 6-4 summarizes the instructions of the 601 that specifically control the MMU. Formore detailed information about the instructions, refer to Chapter 10, “Instruction Set.”

Table 6-5 summarizes the registers that the operating system uses to program the MMU.These registers are accessible to supervisor-level software only. These registers aredescribed in detail in Chapter 2, “Registers and Data Types.”

6.1.12 TLB Entry InvalidationThe UTLB (and ITLB) maintains on-chip copies of the PTEs that are resident in physicalmemory. The 601 has the ability to invalidate resident UTLB entries through the use of thetlbie instruction. Additionally, the tlbie instruction optionally (as programmed in the HID1register—see Section 2.3.3.13.2, “601 Debug Modes Register—HID1”) causes a TLBinvalidate broadcast (an address-only operation) to occur on the system bus so that otherprocessors also invalidate their resident copies of the matching PTE. See Chapter 10,

Table 6-4. Instruction Summary—Control MMU

Instruction Description

mtsr SR,rS Move to Segment RegisterSR[SR#]← rS

mtsrin rS,rB Move to Segment Register IndirectSR[rB[0–3]]←rS

mfsr rD,SR Move from Segment RegisterrD←SR[SR#]

mfsrin rD,rB Move from Segment Register IndirectrD←SR[rB[0–3]]

tlbie rB Translation Lookaside Buffer Invalidate EntryIf TLB hit (for logical address specified as rB), TLB[V]←0 Causes TLBI operation on the system bus.

Table 6-5. MMU Registers

Register Description

Segment registers(SR0–SR15)

The sixteen 32-bit segment registers are present only in 32-bit implementations of the PowerPC architecture. Figure 6-13 shows the format of a segment register. The fields in the segment register are interpreted differently depending on the value of bit 0. The segment registers are accessed by the mtsr, mtsrin, mfsr, and mfsrin instructions

BAT registers (BAT0U–BAT3U and BAT0L–BAT3L)

The 601 includes eight block-address translation registers (BATs), organized as four pairs (BAT0U–BAT3U and BAT0L–BAT3L). Figure 6-6 and Figure 6-7 show the format of the upper and lower BAT registers. These are special-purpose registers that are accessed by the mtspr and mfspr instructions.

Table search description register 1(SDR1)

The 32-bit table search description register 1 (SDR1) specifies the variables used in accessing the page tables in memory. This is a special-purpose register that is accessed by the mtspr and mfspr instructions.


“Instruction Set” for detailed information about the tlbie instruction and Section 9.3.2.2.1,“Transfer Type (TT0–TT4) Signals” for more information on address-only bustransactions.

The snooping hardware of the 601 detects when other processors perform a TLB invalidatebroadcast on the bus. In the case of a hit with an on-chip UTLB entry, the 601 performs thefollowing:




4. Invalidates all entries in the ITLB

5. Resumes normal execution

6.2 ITLB DescriptionThe 601 implements a four-entry, fully-associative TLB for storing the most recently usedinstruction address translations. The 601 automatically generates an entry in the ITLBwhenever the page or block address translation mechanism generates a new logical-to-physical mapping for a page or block used for instruction fetch. Each ITLB entry cancontain the translation information for either an entire block or a page. The 601 uses theITLB for address translation of instruction accesses when MSR[IT] = 1.

The instruction unit accesses the ITLB independently of the rest of the MMU. Therefore,when instruction accesses hit in the ITLB, the page and block translation mechanisms areavailable for use by data accesses simultaneously.

The 601 also automatically maintains the integrity of the entries in the ITLB by purging thecontents when any of the following conditions occur:

• An mtsr or mtsrin instruction is executed• An mtspr instruction that modifies any of the BAT registers is executed• A tlbie instruction is executed• A TLB invalidate operation is detected on the system interface (via snooping)

Since these conditions potentially cause the MMU context to be changed, the ITLB entriesmay no longer be valid. Therefore, the MMU automatically detects these conditions andclears all the valid bits in the ITLB array.

Finally, the 601 replaces ITLB entries on a least-recently-used (LRU) basis. Throughout theremainder of this chapter, the page and block translations that are resident in the ITLB aredescribed within the context of page address translation and block address translation, as


the contents of the ITLB are always a subset of translations that were generated for theUTLB and/or the BAT array.

Accesses to the ITLB are transparent to the executing program, except that hits in the ITLBcontribute to a higher overall instruction throughput by allowing data translations to occurin parallel.

6.3 Memory/Cache Access ModesAll instruction and data accesses are performed under the control of the three mode controlbits that are defined by the MMU for each access. The three mode control bits, W, I, and M,have the following effects. The W and I bits control how the processor performing theaccess uses its own cache. The M bit specifies whether the processor performing the accessmust use the memory coherency protocol to ensure that all copies of the addressed memorylocation are consistent.

When an access requires coherency, the processor performing the access must inform thecoherency mechanisms throughout the system that the access requires memory coherency.The M bit determines the kind of access performed on the bus (global or local). Note thatthese mode-control bits are relevant only when an address is translated and are not savedalong with data in the on-chip cache (for cacheable accesses). Once an access has beentranslated, the MESI bits in the cache then control the coherency to that cache locationmade by subsequent accesses from other processors. See Chapter 4, “Cache and MemoryUnit Operation,” for more information about cache accesses.

The operating system programs the WIM bits for each page or block as required. The WIMbits reside in the BAT registers for block address translation and in the PTEs for pageaddress translation. Thus these bits are programmed as follows:

• The operating system uses the mtspr instruction to program the WIM bits in the BAT registers for block address translation.

• The operating system programs the WIM bits for each page into the PTEs in system memory as it sets up the page tables.

Note that for accesses performed with direct address translation (MSR[IT] = 0 orMSR[DT] = 0 for instruction or data access, respectively), the WIM bits are automaticallygenerated as b'001' (the data is write-back, caching is enabled, and memory coherency isenforced).

6.3.1 Write-Through Bit (W)When an access is designated as write-through (W = 1), if the data is in the cache, a storeoperation updates the cached copy of the data. In addition, the update is written to theexternal memory location (as described below). Store-combining compiler optimizationsare allowed for write-through accesses except when the store instructions are separated bya sync instruction. Note that a store operation that uses the write-through mode may causeany part of valid data in the cache to be written back to main memory.


The definition of the external memory location to be written to in addition to the on-chipcache depends on the implementation of the memory system but can be illustrated by thefollowing examples:

• RAM—The store must be sent to the RAM controller to be written into the target RAM.

• I/O device—The store must be sent to the I/O control hardware to be written to the target register or memory location.

In systems with multilevel caching, the store must be written to at least a depth in thememory hierarchy that is seen by all processors and devices.

Accesses that correspond to W = 0 are considered write-back. For this case, although thestore operation is performed to the cache, it is only made to external memory when a copy-back operation is required. Use of the write-back mode (W = 0) can improve overallperformance for areas of the memory space that are seldom referenced by other masters inthe system. See Chapter 4, “Cache and Memory Unit Operation,” for more informationabout cache accesses.

6.3.2 Caching Inhibited Bit (I)If I = 1, the memory access is completed by referencing the location in main memory,completely bypassing the on-chip cache of the 601. During the access, the accessedlocation is not loaded into the cache nor is the location allocated in the cache. If a copy ofthe accessed data is in the cache, that copy is not updated, flushed, or invalidated. Dataaccesses from more than one instruction may not be combined (as a compiler optimization)for cache-inhibited operations.

6.3.3 Memory Coherence Bit (M)This mode control bit is provided to allow improved performance in systems wherehardware-enforced coherency is relatively slow, and software is able to enforce the requiredcoherency. When M = 0, the 601 does not enforce data coherency. When M = 1, theprocessor enforces data coherency and the corresponding access is considered to be aglobal access. When the M bit is set, and the access is performed to external memory, theGBL signal is asserted and the access is designated as global. Other processors affected bythe access must then respond to this global access and signal whether it is shared. If the datain another processor is modified, then address retry is signaled.


6.3.4 W, I, and M Bit CombinationsTable 6-6 summarizes the six combinations of the WIM bits defined for the 601..

If the system software maps the same physical page with multiple page table entries thathave different W, I, or M values, the results of the translation are undefined.

Table 6-6. Combinations of W, I, and M Bits

WIM Setting Meaning

000 Data may be cached. Loads or stores whose target hits in the cache use that entry in the cache.

Exclusive ownership of the block containing the target location is not required for store accesses and coherency operations for the block do not occur when fetching the block, storing it back, or changing its state from shared to exclusive.

001 Data may be cached. Loads or stores whose target hits in the cache use that entry in the cache.

Memory coherency is enforced by hardware as follows: exclusive ownership of the block containing the target location is required before store accesses are allowed. When fetching the block, the processor indicates on the bus transaction that coherency is to be enforced. If the state of the block is shared-unmodified, the processor must gain exclusive use of the block before storing into it.This encoding is used for addresses translated via direct address translation (MSR[IT] = 0 or MSR[DT] = 0).

010 Caching is inhibited. The access is performed to external memory, completely bypassing the cache.

Hardware enforced memory coherency is not required.

011 Caching is inhibited. The access is performed to external memory, completely bypassing the cache.

Memory coherency must be enforced by external hardware (601 asserts GBL).

100 Data may be cached. Loads whose target hits in the cache use that entry in the cache.

Stores are written to external memory. The target location of the store may be cached and is updated on a hit.

Exclusive ownership of the block containing the target location is not required for store accesses and coherency operations for the block do not occur when fetching the block, storing it back, or changing its state from shared to exclusive.

101 Data may be cached. Loads whose target hits in the cache use that entry in the cache.

Stores are written to external memory. The target location of the store may be cached and is updated on a hit.

Memory coherency is enforced by hardware as follows: exclusive ownership of the block containing the target location is required before store accesses are allowed. When fetching the block, the processor indicates on the bus transaction that coherency is to be enforced. If the state of the block is shared, the processor must gain exclusive use of the block before storing into it.


6.4 General Memory Protection MechanismAnother aspect of the MMU that is programmed at the block and page level is the memoryprotection option. The memory protection mechanism allows selectively granting readaccess, granting read/write access, and prohibiting access to areas of memory based on anumber of control criteria.

The memory protection mechanism is used by both the block and page address translationmechanisms in a similar way, as described here. However, the protection mechanism in the601 differs from that defined in the PowerPC architecture in that the PowerPC operatingenvironment architecture defines valid bits rather than key bits for the block addresstranslation mechanism. For specific information unique to block address translation, referto Section 6.7.4, “Block Memory Protection.” For specific information unique to pageaddress translation, refer to Section 6.8.5, “Page Memory Protection.”

For both block and page address translation in the 601, the memory protection mechanismis controlled by the following:

• MSR[PR], which defines the mode of the access as follows:

— MSR[PR] = 0 corresponds to supervisor mode — MSR[PR] = 1 corresponds to user mode

• Ks and Ku, the supervisor and user key bits, which define the key for the block or page

• The PP bits, which define the access options for the block or page

The key bits (Ks and Ku) and the PP bits are located as follows for block and page addresstranslation:

• Ks and Ku are located in the upper BAT register for block address translation and in the selected segment register for a page address translation.

• The PP bits are located in the upper BAT register for block address translation and in the PTE for page address translation.

The key bits, the PP bits, and the MSR[PR] bit are used as follows:

• When an access is generated, one of the key bits (Ks or Ku) is selected to be the key as follows:

— For supervisor accesses (MSR[PR] = 0), the Ks bit is used and Ku is ignored— For user accesses (MSR[PR] = 1), the Ku bit is used and Ks is ignored

• The selected key is used with the PP bits to determine if the load or store access is allowed.


Table 6-7 shows the types of accesses that are allowed for the general case (all possible Ks,Ku, and PP bit combinations).

Thus, the conditions that cause a protection violation are depicted in Table 6-8. Any accessattempted (read or write) when the key = 1 and PP = 00, results in a protection violationexception condition. When key = 1 and PP = 01, an attempt to perform a write access causesa protection violation exception condition. When PP = 10, all accesses are allowed, andwhen PP = 11, write accesses always cause an exception. The 601 takes either theinstruction access exception or the data access exception (for an instruction or data access,respectively) when there is an attempt to violate the memory protection.

Although any combination of the Ks, Ku and PP bits is allowed, the Ks and Ku bits can beprogrammed so that the value of the key bit for Table 6-7 directly matches the MSR[PR]bit for the access. In this case, the encoding of Ks = 0 and Ku = 1 is used for the BAT arrayentry or the PTE, and the PP bits then enforce the protection options shown in Table 6-9.

Table 6-7. Access Protection Control with Key

Key1 PP2 Block or Page Type

0 00 Read/write

0 01 Read/write

0 10 Read/write

0 11 Read only

1 00 No access

1 01 Read only

1 10 Read/write

1 11 Read only

1 Ks or Ku selected by state of MSR[PR] 2 PP protection option bits in BAT array entry or

PTE

Table 6-8 . Exception Conditions for Key and PP Combinations

Key PPProhibited Accesses

1 00 Read/write

1 01 Write

x 10 None

x 11 Write


However, if the setting Ks = 1 is used, supervisor accesses are treated as user reads andwrites with respect to Table 6-9. Likewise, if the setting Ku = 0 is used, user accesses to theblock or page are treated as supervisor accesses in relation to Table 6-9. Therefore, bymodifying one of the key bits (in either the BAT register or the segment register), the waythe 601 interprets accesses (supervisor or user) in a particular block or segment can easilybe changed. Note, however, that only supervisor programs can modify the key bits for theblock or the segment as access to the BAT registers and the segment registers is privileged.

When the memory protection mechanism prohibits a reference, one of the following occurs,depending on the type of access that was attempted:

• For data accesses, a data access exception is generated and bit 4 of DSISR is set. If the access is a store, bit 6 of DSISR is also set.

• For instruction accesses, an instruction access exception is generated and bit 4 of SRR1 is set.

See Chapter 5, “Exceptions,” for more information about these exceptions.

6.5 Selection of Address Translation TypeA description of the selection flow for determining the type of address translation to beperformed is provided in Figure 6-4. The selection of address translation type differs forinstruction and data accesses in that I/O controller interface accesses are not allowed forinstruction accesses when instruction address translation is disabled, and I/O controllerinterface accesses for data occur without regard for the enabling of data address translation.

6.5.1 Address Translation Selection for Instruction AccessesAddresses for instruction accesses are translated under control of the IT bit of MSR. Whenany context-synchronizing event occurs within the 601, any fetched instructions arediscarded and refetched using the updated state of MSR[IT].

Table 6-9. Access Protection Encoding of PP Bits

PP Field

OptionUser Read(Key = 1)

User Write(Key = 1)

Supervisor Read

(Key = 0)

Supervisor Write

(Key = 0)

00 Supervisor-only Not allowed Not allowed √ √

01 Supervisor-write-only √ Not allowed √ √

10 Both user/supervisor √ √ √ √

11 Both read-only √ Not allowed √ Not allowed


Figure 6-4. Address Translation Type Selection

6.5.1.1 Instruction Address Translation Disabled: MSR[IT] = 0When instruction address translation is disabled, designated by MSR[IT] = 0, the logicaladdress is interpreted as described in Section 6.6, “Direct Address Translation.”

6.5.1.2 Instruction Address Translation Enabled: MSR[IT] = 1When instruction address translation is enabled (MSR[IT] = 1), instruction fetching occursunder control of one of the following address translation mechanisms:

• Page address translation • Block address translation

Note that for either of these translation mechanisms, the ITLB is first checked for theaddress translation. If the ITLB misses, then the corresponding segment register is accessedto see if the access is to the I/O controller interface space. If the access is not to the I/Ocontroller interface space, the page and block address translation mechanisms are used asshown in Figure 6-3.

Instruction Access Data Access



Translation Enabled (MSR[DT]=1)

Translation Disabled (MSR[DT]=0)

Translation Enabled (MSR[IT]=1)

Translation Disabled (MSR[IT]=0)

Perform Direct Translation

Perform Lookupin ITLB

ITLBMiss

ITLBHit

Extract Physical Address from ITLB


Continue Accessto Cache Perform Direct

Translation

Perform I/O Cont. I/F Translation

Perform Block or Page Address Translation

Perform I/O Cont. I/F Translation

otherwise (T=0)

Not I/O Cont. I/F Address (T=0)

I/O Cont. I/F Address (T=1)

I/O Cont. I/F Address (T=1)


In most cases, instructions cannot be fetched from the I/O controller interface segments andattempting to fetch an instruction from an I/O controller interface segment causes aninstruction access exception. However, instruction fetches are allowed when the addresstranslation maps to segments with the T bit set (I/O controller interface segment) and withthe memory-forced I/O controller interface encoding. This case is described in more detailin Section 6.10.4, “Memory-Forced I/O Controller Interface Accesses.”

6.5.2 Address Translation Selection for Data AccessesAs shown in Figure 6-4, for data accesses, the corresponding segment register is selectedindependent of the DT bit of MSR. Addresses for data accesses are translated first undercontrol of the T bit of the selected segment register. If T = 1, the translation is to an I/Ocontroller interface segment. Otherwise, the translation is governed by the state of the DTbit of MSR. When the state of MSR[DT] changes, subsequent accesses are made using thenew state of MSR[DT].

6.5.2.1 I/O Controller Interface Address Translation: T = 1 in Segment Register

I/O controller interface segments are used independently of MSR[DT]. When the segmentregister indexed by the upper-order logical address bits has the T bit set, the access isconsidered an I/O controller interface access and the I/O controller interface protocol of theexternal interface is used to perform the access to I/O controller space.

Note, however, that an x'07F' encoding in the BUID field of the segment register defines anaccess as a memory-forced I/O controller interface access. In this case, the memoryprotocol is used on the external interface. See Section 6.10, “I/O Controller InterfaceAddress Translation” for more information on address translation for I/O controllerinterface accesses.

6.5.2.2 Data Translation Disabled: MSR[DT] = 0When MSR[DT] = 0, the logical address is interpreted as described in Section 6.6, “DirectAddress Translation.” Note that as shown in Figure 6-4, the determination of whether theaddress maps to an I/O controller interface segment occurs prior to the checking ofMSR[DT]. Therefore, I/O controller interface address translation occurs independently ofMSR[DT] for data accesses. The attempted execution of the eciwx or ecowx instructionswhile MSR[DT] = 0 causes boundedly undefined results.

6.5.2.3 Data Translation Enabled: MSR[DT] = 1When data address translation is enabled (MSR[DT] = 1), data accesses employ one of thefollowing translation mechanisms:

• Page address translation • Block address translation

The block and page address translation mechanisms locate the physical address for theaccess as described in Figure 6-3.


6.6 Direct Address TranslationIf address translation is disabled (MSR[IT] = 0 or MSR[DT] = 0) for a particular access(fetch, load, or store), the logical address is treated as the physical address and is passeddirectly to the memory subsystem as a direct address translation.

The addresses for accesses that occur in direct translation mode bypass all memoryprotection checks as described in Section 6.4, “General Memory Protection Mechanism,”and do not cause the recording of reference and change information (described inSection 6.8.4, “Page History Recording”). Such accesses are performed as though thememory access mode bits (“WIM”) were 001. That is, the cache is write-back and systemmemory does not need to be updated (W = 0), caching is enabled (I = 0), and data coherencyis enforced with memory, I/O, and other processors (caches) (M = 1 so data is global).

Whenever an exception occurs, the 601 clears both the MSR[IT] and MSR[DT] bits.Therefore, at least at the beginning of all exception handlers (including reset), the 601operates in direct address translation mode for instruction accesses (and data accesses thatdo not map to I/O controller interface space). If address translation is required for theexception handler code, the software must explicitly enable address translation byaccessing the MSR as described in Chapter 2, “Registers and Data Types.”

Note that when translation is disabled, I/O controller interface segments can still be usedfor data accesses as the T bit of the segment registers is checked and segment registers withT = 1 are used independently of MSR[DT].

Note also that an attempt to fetch from, load from, or store to a physical address that is notphysically present in the system may cause a machine check exception (or even a checkstopcondition), depending on the response by the system for this case. See Section 5.4.2,“Machine Check Exception (x'00200'),” for more information on machine checkexceptions.

6.7 Block Address TranslationThe block address translation (BAT) mechanism in the 601provides a way to map ranges oflogical addresses larger than a single page into contiguous areas of physical memory. Suchareas can be used for data that is not subject to normal virtual memory handling (paging),such as a memory-mapped display buffer or an extremely large array of numerical data.

The implementation of block address translation in the 601 including the block protectionmechanism is described followed by a block translation summary with a detailed flowdiagram.


6.7.1 BAT Array OrganizationThe block address translation mechanism in the 601 is implemented as a software-controlled array (BAT array). The BAT array maintains the address translation informationfor four blocks of memory. The BAT array in the 601 is maintained by the system softwareand is implemented as a set of eight special-purpose registers (SPRs). Each block is definedby a pair of SPRs called upper and lower BAT registers.

The BAT registers can be read from or written to by the mfspr and mtspr instructions;access to the BAT registers is privileged. Section 6.7.3, “BAT Register Implementation ofBAT Array,” gives more information about the BAT registers. Note that the BAT arrayentries are completely ignored for TLB invalidate operations detected on the system busand in the execution of the tlbie instruction.

Figure 6-5 shows the organization of the BAT array. Four pairs of BAT registers areprovided for translating instruction and data addresses. These four pairs of BAT registerscomprise the four-entry fully-associative BAT array (each BAT array entry corresponds toa pair of BAT registers). The BAT array is fully-associative in that all four entries arecompared with the logical address of the access to check for a match simultaneously.

Figure 6-5. BAT Array Organization

Each pair of BAT registers defines the starting address of a block in the logical addressspace, the size of the block, and the start of the corresponding block in physical addressspace. If a logical address is within the range defined by a pair of BAT registers, its physicaladdress is defined as the starting physical address of the block plus the lower order logicaladdress bits.

Blocks are restricted to a finite set of sizes, from 128 Kbytes (217 bytes) to 8 Mbytes (223

bytes). The starting address of a block in both logical address space and physical addressspace is defined as a multiple of the block size.

Unmasked bits of LA0–LA14(Instruction or Data Access)

BAT array

CompareBLPI

Compare

Compare

Compare

BAT0UBAT0L

BAT3UBAT3L

SPR528

SPR535


Because the BAT array entries are used for both instruction and data access, if the samememory address is to be mapped for both instruction fetching and data load and storeoperations, the address mapping must only be loaded into one register pair.

It is an error for system software to program the BAT registers such that a logical addressis translated by more than one BAT pair. If this occurs, the results are undefined and mayinclude a spurious violation of the memory protection mechanism, a machine checkexception, or a check stop condition.

6.7.2 Recognition of Addresses in BAT ArrayThe BAT array (BAT registers) is accessed in parallel with segmented address translationto determine whether a particular logical address corresponds to a block defined by the BATarray. If a logical address is within a valid BAT area, the physical address for the memoryaccess is determined, as described in Section 6.7.5, “Block Physical Address Generation.”

Block address translation is enabled only when address translation is enabled(MSR[IT] = 1 and/or MSR[DT] = 1) and only when the indexed segment register specifiesT = 0. That is, the BAT mechanism in the 601 does not apply to I/O controller interfacesegments (T = 1). When the segment register has T = 1, the segment register translation isused. (This is true for both I/O controller interface segments and memory-forced I/Ocontroller interface segments.) Note, however, that this differs from other PowerPCprocessors, as the PowerPC operating environment architecture defines that a matchingBAT array entry always takes precedence over any segment register translation,independent of the setting of the SR[T] bit.

The BAT registers and the segmented address translation mechanism can be programmedsuch that a particular logical address is within a BAT area and that logical address also hasa segment register translation that corresponds to page address translation (T = 0 in thesegment register). When this occurs, the block address translation is used as shown inTable 6-10 and the segment address translation is ignored.

Additionally, a block can be defined to overlay part of a segment such that the block portionis non-paged although the rest of the segment is pageable. This allows non-paged areas tobe specified within a segment, and PTEs for the part of the segment overlaid by the blockare not required.

Table 6-10. Address Translation Precedence for Blocks and Segments

Segment Register T bit

Address Translation

0 Matching BAT array entry prevails

1 Segment register prevails


6.7.3 BAT Register Implementation of BAT ArrayRecall that the BAT array is comprised of four entries used for instruction accesses and dataaccesses. Each BAT array entry consists of a pair of BAT registers—an upper and a lowerBAT register for each entry. The BAT registers are accessed with the mtspr and mfsprinstructions and are only accessible to supervisor-level programs. See Section 3.7,“Processor Control Instructions,” for a list of simplified mnemonics for use with the BATregisters.

Figure 6-6 shows the format of the upper BAT registers and Figure 6-7 shows the format ofthe lower BAT registers. The format and bit definitions of the upper and lower BAT registersin the 601 differs from that of the BAT registers in other PowerPC processors.

Figure 6-6. Format of Upper BAT Registers

.

Figure 6-7. Format of Lower BAT Registers

The BAT registers contain the logical to physical address mappings for blocks of memory.This mapping information includes the logical address bits that are compared with thelogical address of the access, the memory/cache access mode bits (WIM) and the protectionbits for the block. In addition, the size of the block and the starting address of the block aredefined by the block page number and block size mask fields.

Table 6-11 describes the bits in the upper and lower BAT registers.

BLPI 0 0 0 0 0 0 0 0 0 0 WIM KS KU PP

0 14 15 24 25 27 28 29 30 31

Reserved

PBN 0 0 0 0 0 0 0 0 0 0 V BSM

Reserved

0 14 15 24 25 26 31


The BSM field in the lower BAT register is a mask that encodes the size of the block.Table 6-12 defines the bit encodings for the BSM field of the lower BAT register. Note thatthe range of block sizes is a subset of that defined by the PowerPC architecture.

Table 6-11. BAT Registers—Field and Bit Descriptions

Register Bits Name Description

Upper BAT Registers

0–14 BLPI Block logical page index. This field is compared with bits 0–14 of the logical address to determine if there is a hit in that BAT array entry.


25–27 WIM Memory/cache access mode bitsW Write-throughI Caching-inhibitedM Memory coherenceFor detailed information about the WIM bits, see Section 6.3, “Memory/Cache Access Modes."

28 Ks Supervisor mode key. This bit interacts with MSR[PR] and the PP field to determine the protection for the block. For more information, see Section 6.4, “General Memory Protection Mechanism."

29 Ku User mode key. This bit also interacts with MSR[PR] and the PP field to determine the protection for the block. For more information, see Section 6.4, “General Memory Protection Mechanism."

30–31 PP Protection bits for block. This field interacts with MSR[PR] and the Ks or Ku to determine the protection for the block as described in Section 6.4, “General Memory Protection Mechanism."

Lower BAT Registers

0–14 PBN Physical block number. This field is used in conjunction with the BSM field to generate bits 0-14 of the physical address of the block.


25 V BAT register pair (BAT array entry) is valid if V = 1

26–31 BSM Block size mask (0...5). BSM is a mask that encodes the size of the block. Values for this field are listed in Table 6-12.

Table 6-12. Lower BAT Register Block Size Mask Encodings

Block Size BSM Encoding

128 Kbytes 00 0000

256 Kbytes 00 0001

512 Kbytes 00 0011

1 Mbyte 00 0111

2 Mbytes 00 1111

4 Mbytes 01 1111

8 Mbytes 11 1111


Only the values shown in Table 6-12 are valid for BSM. A logical address is determined tobe within a BAT area if the appropriate bits (determined by the BSM field) of the logicaladdress matches the value in the BLPI field of the upper BAT register, and if the valid bit(V) of the corresponding lower BAT register is set.

The boundary between the strings of zeros and ones in the BSM field determines the bitsof the logical address that are used in the comparison with the BLPI field to determine ifthere is a hit in that BAT array entry. The rightmost bit of the BSM field is aligned with bit14 of the logical address; bits of the logical address corresponding to ones in the BSM fieldare then forced to zero for the comparison.

The value loaded into the BSM field determines both the length of the block and thealignment of the block in both logical address space and physical address space. The valuesloaded into the BLPI and PBN fields must have at least as many low-order zeros as thereare ones in BSM.

6.7.4 Block Memory ProtectionIf a logical address is determined to be within a block defined by the BAT array, the accessis next validated by the memory protection mechanism. If this protection mechanismprohibits the access, a block protection violation exception condition (data access exceptionor instruction access exception) is generated.

The block protection mechanism provides protection at the granularity defined by the blocksize (128 Kbyte to 8 Mbyte) and is described in Section 6.4, “General Memory ProtectionMechanism.”

The Ks, Ku, and PP bits are located in the upper BAT register for block address translation.Note, however, that the block protection defined by the PowerPC architecture’s operatingenvironment implements valid bits rather than key bits in defining user and supervisorblocks.

When the block protection mechanism prohibits a reference, one of the following occurs,depending on the type of access that was attempted:

• For data accesses, a data access exception is generated and bit 4 of DSISR is set. If the access was a store, bit 6 of DSISR is additionally set.


6.7.5 Block Physical Address GenerationIf the block protection mechanism validates the access, a physical address is formed asshown in Figure 6-8. Bits in the logical address corresponding to ones in the BSM field,concatenated with the 17 lower-order bits of the logical address form the offset within theblock of memory in the case of a hit. Bits in the logical address corresponding to zeros inthe BSM field are then logically ORed with the corresponding bits in the PBN field to form


the next higher-order bits of the physical address. Finally, the highest-order nine bits of thePBN field form bits 0–8 of the physical address (PA0–PA8).

Access to the physical memory within the block is made according to the memory/cacheaccess mode defined by the WIM bits in the upper BAT register. These bits apply to theentire block rather than to an individual page and are described in Section 6.3,“Memory/Cache Access Modes.”

Figure 6-8. Block Physical Address Generation

6.7.6 Block Address Translation SummaryFigure 6-9 provides the detailed flow for the block address translation mechanism.Figure 6-9 is an expansion of the “BAT Array Hit” branch of Figure 6-3. Note that if thedcbz instruction is attempted to be executed with either W = 1 or I = 1, the alignmentexception is generated.

Physical Block Number

Logical Address

Physical Address

9-bit 6-bit 17-bit

Block Size

9-bit 6-bit

9-bit 6-bit 17-bit

MASK

OR

6-bit 17-bit

0 8 9 14 15 31

0 8 9 14 15 31


Figure 6-9. Block Address Translation Flow

Figure 6-10 further expands on the determination of a memory protection violation and thesubsequent actions taken by the processor in this case. Note that in the case of a memoryprotection violation for the attempted execution of a dcbt of dcbtst instruction, thetranslation is aborted and the instruction executes as a no-op (no violation is reported).

BAT Array Hit

BLPI (0–8) = LA0–LA8, andBLPI (9–14) = (LA9–LA14) & (BSM), and

V = 1

Alignment Exception

otherwisedcbz Instructionwith W or I = 1

Select Key:If MSR[PR] = 0, Key = KsIf MSR[PR] = 1, Key = Ku

PA0–PA31 = PBN (0–8) ||PBN (9–14) OR ((LA9–LA14) & (BSM)) || LA15–LA31

Continue Access to Cache with WIM in Upper BAT Register

Memory Protection Violation Flow

otherwise Memory Protection Violation

(See Figure 6-10)


Figure 6-10. Memory Protection Violation Flow

Store Operation

Load Operation

Data Access Exception

Set DSISR[6] = 1

Memory Protection Violation

Set DSISR[4] = 1

Write Access with Key || PP = any of:

011100101111

Read Accesswith Key || PP = 100

Abort Access

dcbt/dcbtst Instruction

otherwise

InstructionAccess

DataAccess

Set SRR1[4] = 1

Instruction Access Exception


6.8 Memory Segment ModelMemory in the 601 is divided into sixteen 256-Mbyte segments. The segmented memorymodel of the 601 provides a way to map 4-Kbyte pages of logical addresses to 4-Kbytepages in physical memory (page address translation), while providing the programmingflexibility afforded by a large virtual address (52 bits).

The page address translation mechanism may be superseded by the block addresstranslation (BAT) mechanism described in Section 6.7, “Block Address Translation.” If not,the translation proceeds in two steps: from logical address to the 52-bit virtual address(which never exists as a specific entity but can be considered to be the concatenation of thevirtual page number and the byte offset within a page), and from virtual address to physicaladdress.

The implementation of the page address translation mechanism in the 601 is describedfollowed by a summary of page address translation with a detailed flow diagram.

6.8.1 Page Address Translation ResourcesThe page address translation performed by the 601 is facilitated by the 16 segment registers,which provide virtual address and protection information, and by the UTLB, whichmaintains 256 recently-used page table entries (PTEs). The segment registers areprogrammed by the operating system to provide the virtual ID for a segment. In addition,the operating system also creates the page tables in memory that provide the logical tophysical address mappings (in the form of PTEs) for the pages in memory.

As shown in Figure 6-11, when an access occurs, one of the 16 segment registers is selectedwith LA0–LA3. For page address translation, the virtual ID field in the segment register isthen compared with the corresponding field of the two entries in the UTLB selected byLA13–LA19 (one entry corresponding to set 0 and the other to set 1). In the case of a hit,the result of this comparison is then used to select which physical page number (PPN) (fromset 0 or 1) to use for the access.


Figure 6-11. Segment Register and UTLB Organization

In the case of a UTLB miss, the table search hardware in the MMU automatically searchesfor the required PTE in the page tables in memory. The MMU then automatically loads theUTLB with the PTE and the address translation is performed. Note that for an instructionaccess, the required PTE is also loaded into the ITLB for future use.

If the table search operations fail to locate the required PTE, then the appropriate exception(instruction access exception or data access exception) is taken. See Section 6.9.2, “PageTable Search Operation” for more information on the context for these exceptionconditions.

6.8.2 Recognition of Addresses in SegmentsAs described in Section 6.7.2, “Recognition of Addresses in BAT Array,” the block andpage translation mechanisms operate in parallel such that if the logical address of an accesshits in the BAT array (the address can be translated as a block address), the selected segmentregister is ignored, unless T = 1 in the segment register.

Segment Registers

0 7 8 31

T

T VSID

0

15

UTLB

0

127

V

V

VSID

LA0–LA31

LA0–LA3

LA13–LA19

Compare

Compare

Set 1

Set 0

MUX

Set 1/Set 0 Hit

PA0–PA19

PPN

Select

Select

LA4–LA12


Segments in the 601 are defined as one of the following two types:

• Memory segment—A logical address in these segments represents a virtual address that is used to define the physical address of the page. Note that for best performance, I/O devices can be memory-mapped (SR[T] = 0).

• I/O controller interface segment—References made to I/O controller interface segments use the I/O controller interface bus protocol described in Section 9.6, “Memory- vs. I/O-Mapped I/O Operations,” and do not use the virtual paging mechanism of the 601. See Section 6.10, “I/O Controller Interface Address Translation,” for a complete description of the mapping of I/O controller interface segments.

The T bit in a segment register selects between memory segments and I/O controllerinterface segments, as shown in Table 6-13.

The types of address translation used by the 601 MMU are shown in the flow diagram ofFigure 6-4.

6.8.2.1 Selection of Memory SegmentsAll accesses generated by the 601 index into the array of segment registers and select oneof the 16 with LA0–LA3. If MSR[IT] = 0 or MSR[DT] = 0 for an instruction or data access,respectively, then direct address translation is performed as described in Section 6.6,“Direct Address Translation.” Otherwise, if T = 0 for the selected segment register, theaccess maps to memory space and page address translation is performed.

After a memory segment is selected, the 601 creates the 52-bit virtual address for thesegment and searches for the PTE (first in the UTLB, then in the page tables in memory)that dictates the physical page number to be used for the access. Note that I/O devices caneasily be mapped into memory space and used as memory-mapped I/O.

6.8.2.2 Selection of I/O Controller Interface SegmentsAll data accesses generated by the 601 index into the array of segment registers and selectone of the 16 with LA0–LA3. If T = 1 for the selected segment register, the access maps tothe I/O controller interface and the access proceeds as described in Section 6.10, “I/OController Interface Address Translation.” This is true, even if data address translation isdisabled (MSR[DT] = 0).

For the case of instruction accesses, however, the 601 checks the state of the MSR[IT] bitbefore checking the T bit in the segment register. If MSR[IT] = 0, direct address translation

Table 6-13. Segment Register Types

Segment Register T Bit

Segment Type

0 Memory segment

1 I/O controller interface segment


is performed as described in Section 6.6, “Direct Address Translation.” If MSR[IT] = 1 andthe T bit of the selected segment register is set, then the MMU further checks the state ofthe BUID field of the segment register. If BUID has the encoding x'07F', the segment isdesignated as a memory-forced I/O controller interface segment and the instruction fetchoccurs as described in Section 6.10.4, “Memory-Forced I/O Controller Interface Accesses.”Otherwise, an instruction access exception occurs.

6.8.3 Page Address TranslationThe first step in page address translation is the conversion of the 32-bit logical address ofan access into the 52-bit virtual address. The virtual address is then used to locate the PTEeither in the UTLB or in the page tables in memory. The physical page number is thenextracted from the PTE and used in the formation of the physical address of the access.

Figure 6-12 shows the translation of a logical address to a physical address as follows:

• Bits 0–3 of the logical address comprise the segment register number used to select a segment register, from which the virtual segment ID (VSID) is extracted.

• Bits 4–19 of the logical address correspond to the page number within the segment; these are concatenated with the VSID from the segment register to form the virtual page number (VPN). The VPN is used to search for the PTE in either the UTLB or the page table. The PTE then provides the physical page number (PPN).

• Bits 20–31 of the logical address are the byte offset within the page; these are concatenated with the PPN field of a PTE to form the physical address used to access memory.


Figure 6-12. Page Address Translation Overview

6.8.3.1 Segment Register DefinitionThe fields in the 16 segment registers are interpreted differently depending on the value ofbit 0 (the T bit). When T = 1, the segment register defines an I/O controller interfacesegment, and the format is described in Section 6.10.1, “Segment Register Format for I/OController Interface.” Figure 6-13 shows the format of a segment register used in pageaddress translation (T = 0).

Figure 6-13. Segment Register Format for Page Address Translation

Table 6-14 provides the definitions of the segment register bits for page address translation.

52-Bit Virtual Address

32-Bit Logical Address

32-Bit Physical Address

SR# API Byte Offset(4-bit) (6-bit) (12-bit)

Virtual Segment ID (VSID) Page Index Byte Offset(24-bit) (16-bit) (12-bit)

Physical Page Number (PPN) Byte Offset(20-bit) (12-bit)

UTLB/Page Table

Page Index (16-bit)

PTE

0 3 4 19 20 31

Segment Registers

0 23 24 39 40 51

Virtual Page Number (VPN)

T Ks Ku 0 0 0 0 0 VSID

Reserved

0 1 2 3 7 8 31


The Ks and Ku bits partially define the access protection for the pages within the segment.The page protection provided in the 601 is described in Section 6.8.5, “Page MemoryProtection.” The virtual segment ID field is used as the high-order bits of the virtual pagenumber (VPN) as shown in Figure 6-12.

The segment registers are programmed with 601-specific instructions that implicitlyreference the segment registers. The 601 segment register instructions are summarized inTable 6-15. These instructions are privileged in that they are executable only whileoperating in supervisor mode. See Section 2.3.3.1, “Synchronization for Supervisor-LevelSPRs and Segment Registers” for information about the synchronization requirementswhen modifying the segment registers. See Chapter 10, “Instruction Set,” for more detailon the encodings of these instructions.

6.8.3.2 Page Table Entry (PTE) FormatPage table entries (PTEs) are generated and placed in page tables in memory by theoperating system using the hashing algorithm described in Section 6.9.1.3, “HashingFunctions.” Each PTE is a 64-bit entity (two words) that maps one virtual page number(VPN) to one physical page number (PPN). Information in the PTE controls the tablesearch process and provides input to the memory protection mechanism. Figure 6-14 showsthe format of both words that comprise a PTE.

Table 6-14. Segment Register Bit Definition for Page Address Translation





3–7 — Reserved


Table 6-15. Segment Register Instructions

Instruction Description

mtsr SR#,rS Move to Segment RegisterSR[SR#]← rS

mtsrin rS,rB Move to Segment Register IndirectSR[rB[0–3]]←rS

mfsr rD,SR# Move from Segment RegisterrD←SR[SR#]

mfsrin rD,rB Move from Segment Register IndirectrD←SR[rB[0–3]]

These instructions are specific to the 601 and not guaranteed on other PowerPC processors.


Figure 6-14. Page Table Entry Format

Table 6-16 lists the bit definitions for each word in a PTE.

All other fields are reserved.

The PTE contains an abbreviated page index rather than the complete page index fieldbecause at least ten of the low-order bits of the page index are used in the hash function toselect a PTE group (PTEG) address (PTEG addresses define the location of a PTE). Thesebits are not repeated in the PTEs of that PTEG. However, when a PTE is loaded into theUTLB, the entire page index (PI) field must be loaded into the UTLB entry. The PI field isthen compared with incoming logical address bits LA4–LA12 (LA13–LA16 select theUTLB entries to be compared) to determine if there is a hit.

The virtual segment ID field corresponds to the high-order bits of the virtual page number(VPN), and, along with the H bit, it is used to locate the PTE. The R and C bits maintainhistory information for the page as described in Section 6.8.4, “Page History Recording.”The WIM bits define the memory/cache control mode for accesses to the page. Finally, thePP bits define the remaining access protection constraints for the page. The page protectionprovided in the 601 is described in Section 6.8.5, “Page Memory Protection.”

Table 6-16. PTE Bit Definitions

Word Bit Name Description

0 0 V Entry valid (V = 1) or invalid (V = 0)


25 H Hash function identifier

26–31 API Abbreviated page index

1 0–19 PPN Physical page number


23 R Reference bit

24 C Change bit

25–27 WIM Memory/cache control bits


30–31 PP Page protection bits

Reserved

0 19 20 22 23 24 25 27 28 29 30 31

V VSID H API

0 1 24 25 26 31

PPN 0 0 0 R C WIM 0 0 PP


Conceptually, the page table is searched by the address translation hardware to translate theaddress of every reference. For performance reasons, the UTLB maintains recently-usedPTEs so that the table search time is eliminated for most accesses. The UTLB is searchedfor the address translation first. If the PTE is found, then no page table search is performed.As a result, software that changes the page tables in any way must perform the appropriateTLB invalidate operations to keep the UTLB (and ITLB) coherent with respect to the pagetables.

6.8.4 Page History RecordingReference (R) and change (C) bits are automatically maintained by the 601 in the PTE foreach physical page (for accesses made with page table address translation) to keep historyinformation about the page. This information can then be used by the operating system todetermine which areas of memory to write back to disk when new pages must be allocatedin main memory. Reference and change recording is not performed for translations madewith the BAT or for accesses that correspond to I/O controller interface (T = 1) segments.Furthermore, R and C bits are maintained only for accesses made while address translationis enabled (MSR[IT] = 1 or MSR[DT] = 1).

The reference and change bits are automatically updated by the 601 under the followingcircumstances:

• For UTLB hits, if the C bit requires updating (as shown in Table 6-16).

• For UTLB misses, when a table search is in progress to locate a PTE. The R and C bits are updated (set, if required) to reflect the status of the page based on this access.

Note that the processor updates the C bit based only on the status of the C bit in the UTLBentry in the case of a UTLB hit (the R bit is assumed to be set in the page tables if there isa UTLB hit). Therefore, when software clears the R and C bits in the page tables in memory,it must invalidate the UTLB entries associated with the pages whose reference and changebits were cleared. See Section 6.9.3, “Page Table Updates,” for all of the constraintsimposed on the software when updating the reference and change bits in the page tables.

The R bit or the C bit for a page is not set by the execution of the Data Cache Block Touchinstructions (dcbt, or dcbtst).

Table 6-17. Table Search Operations to Update History Bits—UTLB Hit Case

R and C bits in UTLB entry

PowerPC 601 Microprocessor Action

00 Combination doesn’t occur

01 Combination doesn’t occur

10 Read: No special actionWrite: Table search operation to update C

11 No special action for read or write


6.8.4.1 Reference BitThe reference bit of a page is located both in the PTE in the page table and in the copy ofthe PTE loaded into the UTLB. Every time a page is referenced (with a read or write access)the reference bit is set in the page table by the 601. Because the reference to a page is whatcauses a PTE to be loaded into the UTLB, the reference bit in all UTLB entries is alwaysset. The processor never automatically clears the reference bit.

The reference bit is only a hint to the operating system about the activity of a page. At times,the reference bit may be set although the access was not logically required by the programor even if the access was prevented by memory protection. Examples of this include thefollowing:

• Fetching of instructions not subsequently executed• Accesses that cause exceptions and are not completed

6.8.4.2 Change BitThe change bit of a page is also located both in the PTE in the page table and in the copyof the PTE loaded into the UTLB. Whenever a data store instruction is executedsuccessfully, if the UTLB search (for page address translation) results in a hit, the changebit in the matching UTLB entry is checked. If it is already set, the processor does not changethe C bit. If the UTLB change bit is 0, it is set and a table search operation is performed toset the C bit also in the corresponding PTE in the page table.

The change bit (in both the UTLB and the PTE in the page tables) is set only when a storeoperation is allowed by the page memory protection mechanism.

The automatic update of the reference and change bits in the 601 is performed with single-beat read and write transactions on the bus (not with atomic read/modify/write operations).

During a table search operation, PTEs are fetched as global, nonexclusive read transactions(not as read-with-intent-to-modify transactions). This reduces cache thrashing in otherprocessors (in a multiprocessor system) caused by UTLB load operations because otherprocessors do not have to invalidate their resident copies of the PTEs. The response on thebus to a PTE load transaction should then be exclusive (SHD signal not asserted) if no otherprocessor has a copy. Because PTEs are considered as cacheable, the MESI protocol of thecache then ensures that coherency is maintained among multiple processors for C bitupdates to the page tables.

6.8.5 Page Memory ProtectionSimilar to the block memory protection mechanism, the page memory protection of the 601provides selective access to each page in memory. If a logical address is determined to bewithin a page defined by the segment registers and an entry in the UTLB, the access is nextvalidated by the page protection mechanism. If this protection mechanism prohibits theaccess, a page protection violation (data access exception or instruction access exception)is generated.


When the page protection mechanism prohibits a reference, one of the following occurs,depending on the type of access that was attempted.

• For data accesses, a data access exception is generated and bit 4 of DSISR is set. If the access was a store, bit 6 of DSISR is additionally set.


See Chapter 5, “Exceptions,” for more information on these types of exceptions

A store operation that is not permitted because of the page protection mechanism does notcause the change (C) bit to be set in the PTE (in either the UTLB or in the page tables inmemory); however, a prohibited store access may cause a PTE to be loaded into the UTLBand consequently cause the reference bit to be set in a PTE (both in the UTLB and in thepage table in memory).

6.8.6 Page Address Translation SummaryFigure 6-15 provides the detailed flow for the page address translation mechanism. Thefigure is an expansion of the “UTLB Hit” branch of Figure 6-3. The detailed flow for the“UTLB Miss” branch of Figure 6-3 is described in Section 6.9.2, “Page Table SearchOperation." Note that as in the case of block address translation, if the dcbz instruction isattempted to be executed with either W = 1 or I = 1, the alignment exception is generated.Also note that the memory protection violation flow for page address translation is identicalto that of the block memory protection violation and is provided in Figure 6-10.

6.9 Hashed Page TablesWhen an access that is to be translated by the page address translation mechanism resultsin a miss in the UTLB (a PTE corresponding to the VSID of the segment register is notresident in either of the UTLB entries indexed by LA13–LA19), the 601 automaticallysearches the page tables set up by the operating system in main memory.

The algorithm used by the processor in searching the page tables includes a hashingfunction on some of the virtual address bits. Thus, the addresses for PTEs are allocatedmore evenly within the page tables and the hit rate of the page tables is maximized. Thisalgorithm must be synthesized by the operating system for it to correctly place the pagetable entries in main memory.

This section describes the format of the page tables and the algorithm used to access them.In addition, the constraints imposed on the software in updating the page tables (and otherMMU resources) are described.


Figure 6-15. Page Address Translation Flow—UTLB Hit

UTLB Hit Case

Alignment Exception


Generate 52-bit Virtual Address

from Segment Regis-ter

Compare Virtual Address with UTLB

Entries

LA13–LA19 select a UTLB entry for each set (Set 0 and Set 1)Segment Register [VSID] = VSID in UTLB entryLA4–LA12 = PI in UTLB entry UTLB entry [V] = 1

otherwisedcbz Instruction with W or I = 1

Memory Protection Violation

See Figure 6-10

Continue Access to Cache with WIM from UTLB entry

otherwise

Store Access with UTLB entry [C] = 0

Table SearchOperation

otherwise

PA0–PA31←PPN||A20–A31

Select Key:If MSR[PR] = 0, key = KsIf MSR[PR] = 1, key = Ku

Invalidate UTLB entry (and force to LRU)


6.9.1 Page Table DefinitionThe hashed page table is a variable-sized data structure that defines the mapping betweenvirtual page numbers and physical page numbers. The page table size is a power of 2, andits starting address is a multiple of its size.

The page table contains a number of page table entry groups (PTEGs). A PTEG containseight page table entries (PTEs) of eight bytes each; therefore, each PTEG is 64 bytes long.PTEG addresses are entry points for table search operations. Figure 6-16 shows two PTEGaddresses (PTEGaddr1 and PTEGaddr2) where a given PTE may reside.

Figure 6-16. Page Table Definitions

A given PTE can reside in one of two possible PTEGS. For each PTEG address, there is acomplementary PTEG address—one is the primary PTEG and the other is the secondaryPTEG. Additionally, a given PTE can reside in any of the PTE locations within anaddressed PTEG. Thus, a given PTE may reside in one of 16 possible locations within thepage table. If a given PTE is not resident within either the primary or secondary PTEG, apage table miss occurs, corresponding to a page fault condition.

A table search operation is defined as the search of a PTE within a primary and secondaryPTEG. When a table search operation commences, a primary hashing function is performedon the virtual address. The output of the hashing function is then concatenated with bits(some of them masked) programmed into the SDR1 register by the operating system tocreate the physical address of the primary PTEG. The PTEs in the PTEG are then checked,one by one, to see if there is a hit within the PTEG. In case the PTE is not located duringthis PTEG, a secondary hashing function is performed, a new physical address is generated

PTE0 PTE1 PTE7 PTEG0

PTE0 PTE1 PTE7

PTE0 PTE1 PTE7

PTEGn

PTEGaddr1

PTEGaddr2

Page Table


for the PTEG, and the PTE is searched for again, this time using the secondary PTEGaddress.

Note, however, that although a given PTE may reside in one of 16 possible locations, anaddress that is a primary PTEG address for some accesses also functions as a secondaryPTEG address for a second set of accesses (as defined by the secondary hashing function).Therefore, these 16 possible locations are really shared by two different sets of logicaladdresses. Section 6.9.1.5.1, “Page Table Structure Example” illustrates how PTEs mapinto the 16 possible locations as primary and secondary PTEs.

6.9.1.1 Table Search Description Register (SDR1)The SDR1 register contains the control information for the table structure in that it definesthe highest order bits for the physical base address of the page table and it defines the sizeof the table. The format of the SDR1 register is shown in Figure 6-17 and the bit settingsare described in Table 6-18.

Figure 6-17. SDR1 Register Format

The HTABORG field in SDR1 contains the high-order 7–16 bits of the 32-bit physicaladdress of the page table. Therefore, the beginning of the page table lies on a 216 byte (64Kbyte) boundary at a minimum.

A page table can be any size 2n where 16 ≤ n ≤ 25. The HTABMASK field in SDR1contains a mask value that determines how many bits from the output of the hashingfunction are used as the page table index. This mask must be of the form b'00...011...1' (astring of 0 bits followed by a string of 1 bits). As the table size increases, more bits areused from the output of the hashing function to index into the table. The 1 bits inHTABMASK determine how many additional bits (beyond the minimum of 10) from thehash are used as the index; the HTABORG field must have the same number of lower-orderbits equal to 0 as the HTABMASK field has lower-order bits equal to 1.

Table 6-18. SDR1 Register Bit Settings


0–15 HTABORG Physical base address of page table (Base Address bits plus Maskable bits)


23–31 HTABMASK Mask for page table address

Base Address Maskable Bits 0 0 0 0 0 0 0 HTABMASK

Reserved

0 6 7 15 16 22 23 31

HTABORG


6.9.1.2 Page Table Size The number of entries in the page table directly affects performance because it influencesthe hit ratio in the page table and thus the rate of page fault exception conditions. If the tableis too small, not all virtual pages that have physical page frames assigned may be mappedvia the page table. This can happen if there are more than 16 entries that map to the sameprimary/secondary pair of PTEGs; in this case, many hash collisions may occur.

The minimum allowable size for a page table is 64 Kbytes (210 PTEGs of 64 bytes each).However, it is recommended that the total number of PTEGs (primary plus secondary) inthe page table be greater than half the number of physical page frames to be mapped. Whileavoidance of hash collisions cannot be guaranteed for any size page table, making the pagetable larger than the recommended minimum size reduces the frequency of such collisions,by making the primary PTEGs more sparsely populated, and further reducing the need touse the secondary PTEGs.

Table 6-18 shows some example sizes for total main memory. The recommended minimumpage table size for these example memory sizes are then outlined, along with theircorresponding HTABORG and HTABMASK settings. Note that systems with less thaneight Mbytes of main memory may be designed with the 601, but the minimum amount ofmemory that can be used for the page tables is 64 Kbytes.

As an example, if the physical memory size is 229 bytes (512 Mbyte), then there are 229–

212 (4 Kbyte page size) = 217 (128 Kbyte) total page frames. If this number of page frames

Table 6-19. Recommended Page Table Sizes (Minimum)

Total Main Memory

Recommended MinimumSettings for

Recommended Minimum

Memory for Page Tables

Number of Mapped Pages (PTEs)

Number of PTEGs

HTABORG (Maskable bits 7-15)

HTABMASK

8 Mbytes (223) 64 Kbytes (216) 213 210 x xxxx xxxx 0 0000 0000

16 Mbytes (224) 128 Kbytes (217) 214 211 x xxxx xxx0 0 0000 0001

32 Mbytes (225) 256 Kbytes (218) 215 212 x xxxx xx00 0 0000 0011

64 Mbytes (226) 512 Kbytes (219) 216 213 x xxxx x000 0 0000 0111

128 Mbytes (227) 1 Mbytes (220) 217 214 x xxxx 0000 0 0000 1111

256 Mbytes (228) 2 Mbytes (221) 218 215 x xxx0 0000 0 0001 1111

512 Mbytes (229) 4 Mbytes (222) 219 216 x xx00 0000 0 0011 1111

1 Gbytes (230) 8 Mbytes (223) 220 217 x x000 0000 0 0111 1111

2 Gbytes (231) 16 Mbytes (224) 221 218 x 0000 0000 0 1111 1111

4 Gbytes (232) 32 Mbytes (225) 222 219 0 0000 0000 1 1111 1111


is divided by 2, the resultant minimum recommended page table size is 216 PTEGs, or 222

bytes (4 Mbytes) of memory for the page tables.

6.9.1.3 Hashing FunctionsThe processor uses two different hashing functions, a primary and a secondary, in thecreation of the physical addresses used in a page table search operation. These hashingfunctions efficiently distribute the PTEs within the page table, in that there are two possiblePTEGs where a given PTE can reside. Additionally, there are eight possible PTE locationswithin a PTEG where a given PTE can reside. If a PTE is not found using the primaryhashing function, the secondary hashing function is performed, and the secondary PTEG issearched. Note that these two functions must also be used by the operating system toappropriately set up the page tables in memory.

The use of the two hashing functions provides a high probability that a required PTE isresident in the page tables, without requiring the definition of all possible PTEs in mainmemory. However, if a PTE is not found in the secondary PTEG, then a page fault occursand an exception is taken. Thus, the required PTE can then be placed into either the primaryor secondary PTEG by the system software, and on the next UTLB miss to this page, thePTE will be found.

The address of a page table is derived from the HTABORG field of the SDR1 register, andthe output of the corresponding hashing function (primary hashing function for primaryPTEG and secondary hashing function for a secondary PTEG). The value in HTABMASKdetermines how many of the higher-order hash value bits are masked and how many areused in the generation of the physical address of the page table.

Figure 6-18 depicts the hashing functions used by the 601. The inputs to the primaryhashing function are the lower-order 19 bits of the VSID field of the selected segmentregister (bits 5–23 of the 52-bit virtual address), and the page index field of the logicaladdress (bits 24–39 of the virtual address) concatenated with three zero higher-order bits.The XOR of these two values generates the output of the primary hashing function (hashvalue 1).


Figure 6-18. Hashing Functions

When the secondary hashing function is required, the output of the primary hashingfunction is complemented with one’s complement arithmetic, to provide hash value 2.

6.9.1.4 Page Table AddressesFigure 6-19 illustrates the generation of the addresses used for accessing the hashed pagetables defined for the 601. As stated earlier, the operating system must synthesize the tablesearch algorithm for setting up the tables.

As shown in Figure 6-19, two of the elements that define the 52-bit virtual address (thesegment register VSID field and the page index field of the logical address) are used asinputs into a hashing function. Depending on whether the primary or secondary PTEG is tobe accessed, the processor uses either the primary or secondary hashing function.

Low-Order 19 Bits of VSID (From Segment Register)

5 23

24 39

Primary Hash:

XOR

Output of Hashing Function 1

0 8 9 18

=

Secondary Hash:

Hash Value 1

0 18

Output of Hashing Function 2

0 8 9 18

Hash Value 1

Hash Value 2

One’s Complement Function

0 0 0 Page Index (From Logical Address)


Figure 6-19. Generation of Addresses for Page Tables

Virtual Segment ID API Byte Offset(24-bit) (6-bit) (12-bit)

Virtual Page Number (VPN)

PAGE TABLE

(3-bit)

0 4 5 23 24 29 30 394051

SDR1

xxxx xx . . . . . . 00 00 . . . . 011 . . .1

0 6 7 15 16 22 23 31 0 18

32-bit Physical Address of Page Table Entry

PTE0

64 Bytes

52-Bit Virtual Address

PTE78 bytes

32-bit Physical Address

VSID API(24-bit) (6-bit)

V H

Physical Page Number (PPN)(20-bit)

0 19 25 27 31

RC

WIM

AND

OR

(7-bit) (9-bit) (10-bit)

PPN Byte Offset(20-bit) (12-bit)

(16-bit)

Hash Function

Hash Value(19-bit)

PTEG0

PTEGn

Page Index (16-bit)

Mask

0 6 7 15 16 25 26 31

PTEG Select

PP

PTE01 24 25 26 31

9 bits 10 bits

0 0 0 0 0 0 0

00 0

0 0 0

0 0 0 0 0 0(6-bit)

BaseAddress Maskable

bits


The base address of the page table is defined by the higher order bits of SDR1[HTABORG].Bits 7–15 of the PTEG address are derived from the masking of the higher-order bits of thehash value (as defined by SDR1[HTABMASK]) concatenated with (implemented as an ORfunction) the remaining bits of SDR1[HTABORG]. Bits 16-25 of the PTEG address are the10 lower order bits of the hash value, and bits 26-31 of the PTEG address are zero. In theprocess of searching for a PTE, the processor first checks PTE0 (at the PTEG base address).

6.9.1.5 Page Table StructureIn the process of searching for a PTE, the processor interprets the values read from memoryas described in Section 6.8.3.2, “Page Table Entry (PTE) Format.” The VSID and theabbreviated page index (API) fields of the 52-bit virtual address of the access are comparedto those same fields of the PTEs in memory. In addition, the valid (V) bit and the hashingfunction (H) bit are also checked. For a hit to occur, the V bit of the PTE in memory mustbe set. If the fields match and the entry is valid, the PTE is considered a hit if the H bit isset as follows:

• If this is the primary PTEG, H = 0• If this is the secondary PTEG, H = 1

The physical address of the PTEs to be checked is derived as shown in Figure 6-19, and isthe address of a group of eight PTEs (a PTEG). During a table search operation, theprocessor first compares the PTE0 location defined by the primary hashing function. If theVSID and API fields do not match (or if V or H are not set appropriately), the processorincrements the lower order address bits by eight bytes and checks the PTE1 location and soon, until all eight PTEs in the PTEG have been checked.

If no match is found, the secondary hashing function is performed, and the secondaryPTEG address is derived. The eight PTEs within the secondary PTEG are then similarlychecked. If the required PTE is not found in any of the 16 possible locations (the eight PTEswithin the primary PTEG and the eight PTEs within the secondary PTEG), then a page faultoccurs and an exception is taken. Thus, if a valid PTE is located in the page tables, the pageis considered resident; if no matching (and valid) PTE is found for an access, the page isinterpreted as non-resident (page fault) and the operating system must load the PTE (andpossibly the page) into main memory.

Note that for performance reasons, PTEs should be allocated by the operating system firstbeginning with the PTE0 locations within the primary PTEG, then PTE1, and so on. If morethan eight PTEs are required within the address space that defines a PTEG address, thesecondary PTEG can be used. Nonetheless, it may be desirable to place the PTEs that willrequire most frequent access at the beginning of a PTEG and reserve the PTEs in thesecondary PTEG for the least frequently accessed PTEs.


6.9.1.5.1 Page Table Structure ExampleFigure 6-20 shows the structure of an example page table. The base address of this pagetable is defined by bits 0–13 in SDR1[HTABORG]; note that bits 14 and 15 of HTABORGmust be zero because the lower order two bits of HTABMASK are ones. The addresses forindividual PTEGs within this page table are then defined by bits 14–25 as an offset frombits 0–13 of this base address. Thus the size of the page table is defined as 4096 PTEGs.

Figure 6-20. Example Page Table Structure

Two example PTEG addresses are shown in the figure as PTEGaddr1 and PTEGaddr2. Bits14–25 of each PTEG address in this example page table are derived from the output of thehashing function (bits 26–31 are zero to start with PTE0 of the PTEG). In this example, the'b' bits in PTEGaddr2 are the one’s complement of the 'a' bits in PTEGaddr1. The 'm' bits

PTE0 PTE1 PTE7 PTEG0

PTE0 PTE1 PTE7

PTE0 PTE1 PTE7

PTEG4095

PTEGaddr1

PTEGaddr2

Page Table

Example:

Given: SDR1 1010 0110 0000 0000 0000 0000 0000 0011

0 15 23 31

Base Address

$A600 0000

PTEGaddr1 = 1010 0110 0000 00mm aaaa aaaa aa00 0000

0 14 25 31

PTEGaddr2 = 1010 0110 0000 00nn bbbb bbbb bb00 0000

0 14 25 31

HTABORG HTABMASK


are also the one’s complement of the 'n' bits, but these two bits are generated from bits 7–8of the output of the hashing function, logically ORed with bits 14–15 of the HTABORGfield (which are zero in this example). If bits 14–25 of PTEGaddr1 were derived by usingthe primary hashing function, then PTEGaddr2 corresponds to the secondary PTEG.

Note, however, that bits 14–25 in PTEGaddr2 can also be derived from a combination oflogical address bits, segment register bits, and the primary hashing function. In this case,then PTEGaddr1 corresponds to the secondary PTEG. Thus, while a PTEG may beconsidered a primary PTEG for some logical addresses (and segment register bits), it mayalso correspond to the secondary PTEG for a different logical address (and segment registervalue).

It is the value of the H bit in each of the individual PTEs that identifies a particular PTE aseither primary or secondary (there may be PTEs that correspond to a primary PTEG andPTEs that correspond to a secondary PTEG, all within the same physical PTEG addressspace). Thus, only the PTEs that have H = 0 are checked for a hit during a primary PTEGsearch. Likewise, only PTEs with H = 1 are checked in the case of a secondary PTEGsearch.

6.9.1.5.2 PTEG Address Mapping ExampleFigure 6-21 shows an example of a logical address and how its address translation (thePTE) maps into the primary PTEG in physical memory. The example illustrates how theprocessor generates PTEG addresses for a table search operation; this is also the algorithmthat must be used by the operating system in creating the page tables.

In the example, the value in SDR1 defines a page table at address x'0F98 0000' that contains8192 PTEGs. The example logical address selects segment register 0 (SR0) with the highestorder four bits. The contents of SR0 are then used along with bits 4–19 of the logicaladdress to create the 52-bit virtual address.

To generate the address of the primary PTEG, bits 5–23, and bits 24–39 of the virtualaddress are then used as inputs into the primary hashing function (XOR) to generate hashvalue 1. The lower order 13 bits of hash value 1 are then concatenated with the higher order13 bits of HTABORG, defining the address of the primary PTEG (x'0F9F F980').


Figure 6-21. Example Primary PTEG Address Generation

Figure 6-22 shows the generation of the secondary PTEG address for this example. If thesecondary PTEG is required, the secondary hash function is performed and the lower order13 bits of hash value 2 are then concatenated with the higher order 13 bits of HTABORG,defining the address of the secondary PTEG (x'0F98 0640').

As described in Figure 6-19, the 10 lower-order bits of the page index field are always usedin the generation of a PTEG address (through the hashing function). This is why only theabbreviated page index (API) is defined for a PTE (the entire page index field does not needto be checked). For a given logical address, the lower order 10 bits of the page index (atleast) contribute to the PTEG address (both primary and secondary) where thecorresponding PTE may reside in memory. Therefore, if the higher order 6 bits (the APIfield) of the page index match with the API field of a PTE within the specified PTEG, thePTE mapping is guaranteed to be the unique PTE required.

Example:

Given: SDR1 0000 1111 1001 1000 0000 0000 0000 0111

0 15 23 31HTABORG HTABMASK

0000 0000 1111 1111 1010 0000 0001 1011

0 4 19 20 25 31

LA = x'00FF A01B':

SR0

Segment Register Select

0010 0000 1100 1010 0111 0000 0001 1100

x'C A 7 0 1 C’

8 31

1100 1010 0111 0000 0001 1100 0000 1111 1111 1010 0000 0001 1011

5 24 25 39

Virtual Address:

Byte Offset

Page IndexVSID

Primary Hash: 010 0111 0000 0001 1100

XOR

000 0000 1111 1111 1010Hash Value 1 010 0111 1111 1110 0110

9-bits 10-bits

0000 1111 1001 1111 1111 1001 1000 0000

x’ 0 F 9 F F 9 8 0’

Primary PTEG Address:

Start at PTE0HTABORG 13 16 25


Figure 6-22. Example Secondary PTEG Address Generation

Note that a given PTEG address does not map back to a unique logical address. Not onlycan a given PTEG be considered both a primary and a secondary PTEG (as described inSection 6.9.1.5.1, “Page Table Structure Example”), but in this example, bits 24–26 of thepage index field of the virtual address are not used to generate the PTEG address. Therefore,any of the eight combinations of these bits will map to the same primary PTEG address.(However, these bits are part of the API and are therefore compared for each PTE withinthe PTEG to determine if there is a hit.) Furthermore, a logical address can select a differentsegment register with a different value such that the output of the primary (or secondary)hashing function happens to equal the hash values shown in the example. Thus these logicaladdresses would also map to the same PTEG addresses shown.

Hash Value 2: 101 1000 0000 0001 1001

Secondary PTEG Address:

0000 1111 1001 1000 0000 0110 0100 0000

x’ 0 F 9 8 0 6 4 0’

9 bits 10 bits

1) First compare 8 PTEs at x'0F9F F980'

2) Then, compare 8 PTEsat x'0F98 0640',

if necessary

HTABORG

x'0F98 0000'

x'0F98 0640'

x'0F9F F980'

PTEG0

PTEG25

PTEG8166

PTEG8191

PTE0 PTE7

PTE0 PTE7

010 0111 1111 1110 0110

One’s Complement

Secondary Hash:

010 0111 1111 1110 0110Hash Value 1:

Start at PTE0


6.9.2 Page Table Search OperationAn outline of the table search process performed by the 601 in the search of a PTE is asfollows:

1. The 32-bit physical address of the primary PTEG is generated as described in Section 6.9.1.4, “Page Table Addresses”.

2. The first PTE (PTE0) in the primary PTEG is read from memory. PTE reads occur with an implied WIM memory/cache mode control bit setting of b'001'. Therefore, they are considered cacheable and burst in from memory and placed in the cache.

3. The PTE in the selected PTEG is tested for a match with the virtual page number (VPN) of the access. The VPN is the VSID concatenated with the page index fields of the virtual address. For a match to occur, the following must be true:— PTE[H] = 0— PTE[V] = 1— PTE[VSID] = VA[0–23]— PTE[API] = VA[24–29]

4. If a match is not found, step 3 is repeated for each of the other seven PTEs in the primary PTEG. If a match is found, the table search process continues as described in step 8. If a match is not found within the 8 PTEs of the primary PTEG, the address of the secondary PTEG is generated.

5. The first PTE (PTE0) in the secondary PTEG is read from memory. Again, because PTE reads have an implied WIM bit combination of b'001', an entire cache line is burst into the on-chip cache.

6. The PTE in the selected secondary PTEG is tested for a match with the virtual page number (VPN) of the access. For a match to occur, the following must be true:— PTE[H] = 1— PTE[V] = 1— PTE[VSID] = VA[0–23]— PTE[API] = VA[24–29]

7. If a match is not found, step 6 is repeated for each of the other seven PTEs in the secondary PTEG.

8. If a match is found, the PTE is written into the UTLB and the R bit is updated in the PTE in memory (if necessary). If there is no memory protection violation, the C bit is also updated in memory and the table search is complete.

9. If a match is not found within the 8 PTEs of the secondary PTEG, the search fails, and a page fault exception condition occurs (either an instruction access exception or a data access exception).

Reads from memory for table search operations are performed as global (but not exclusive),cacheable operations, and are loaded into the on-chip cache of the 601.


Figure 6-23 and Figure 6-24 provide detailed flow diagrams of the table search operationsperformed by the 601. Figure 6-23 shows the case of a dcbz instruction executed withW = 1 or I = 1, and that the R bit is updated in memory (if required) before the alignmentexception occurs. The R bit is also updated (if required) in the case of a memory protectionviolation except for the case of a dcbt or a dcbtst instruction. If either of these instructionsis executed and a protection violation occurs, the translation is simply aborted, the R bit isnot set in memory and the instruction execution becomes a no-op (not shown in the figure).


Figure 6-23. Primary Table Search Flow

Generate PA using Primary Hash FunctionPA ← Base PA of PTEG

(PA26-PA31=0)

Fetch PTE from PTEG

Select Key:If MSR[PR] = 0, key = KSIf MSR[PR] = 1, key = KU

Burst read from PA (fill cache line)Fetch PTE (64 bits)

otherwise

otherwise

PA ← PA+8(Fetch next PTE in PTEG)

Perform Secondary Table Search

PA26-PA31=56(Last PTE in PTEG)

PTE [VSID, API, H, V]= Segment Register [VSID], LA[API], 0, 1

Secondary Table Search Hit

PTE[R]=0PTE[R]=1

PTE[R] ← 1R_Flag ← 1

Write PTE into UTLB

Primary Table Search

otherwise dcbz Instruction with W or I = 1

otherwise

Alignment Exception

otherwise Memory Protection Violation

otherwise

See Figure 6-10

Byte write to update PTE[R] in

memory

otherwise

Table SearchComplete

UTLB[PTE[C]] ← 1PTE[C] ← 1

Half word write to update PTE[C]

in memory

Store operation with PTE[C]=0

otherwise

Table SearchComplete

R_Flag=1 R_Flag=1

R_Flag=1


memory


memory


Figure 6-24. Secondary Table Search Flow

6.9.3 Page Table UpdatesThis section describes the requirements on the software when updating page tables inmemory via some pseudo-code examples. In a multiprocessor system the rules described inthis section must be followed so that all processors operate with a consistent set of pagetables. Even in a single processor system, certain rules must be followed, regardingreference and change bit updates, because software changes must be synchronized withautomatic updates made by the hardware. Updates to the tables include the followingoperations:

• Adding a PTE• Modifying a PTE, including modifying the R and C bits of a PTE• Deleting a PTE

PTEs must be ‘locked’ on multiprocessor systems. Access to PTEs must be appropriatelysynchronized by software locking of (i.e., guaranteeing exclusive access to) PTEs orPTEGs if more than one processor can modify the table at that time. In the examples below,

Secondary Table Search

Generate PA using Secondary Hash FunctionPA ← Base PA of PTEG

(PA26–PA31=0)

Fetch PTE from PTEG

Single-beat read from PA Fetch PTE (64 bits)

otherwise

otherwise

PA ← PA+8(Fetch next PTE in PTEG)

PTE [VSID, API, H, V]= Segment Register [VSID], LA[API], 1, 1

Secondary Table Search Hit

PA26–PA31=56(Last PTE in PTEG)

Page Fault


Instruction Access Exception

Set SRR1[1]=1 Set SDISR[1]=1

Data AccessInstruction Access

(See Figure 6-23)


“lock()” and “unlock()” refer to software locks that must be performed to provide exclusivestatus for the PTE being updated. See Appendix G, “Synchronization ProgrammingExamples,” for more information about the use of the lwarx and stwcx. instructions toperform software interlocks.

On single processor systems, PTEs need not be locked. To adapt the examples given belowfor the single processor case, simply delete the “lock()” and “unlock()” lines from theexamples. The sync instructions shown are required even for single processor systems.

The UTLB (and ITLB) are non-coherent caches of the page tables. UTLB entries must beflushed explicitly with the TLB invalidate entry instruction (tlbie) whenever thecorresponding PTE is modified. In a multiprocessor system, the tlbie instruction must becontrolled by software locking, so that the tlbie is issued on only one processor at a time.The sync instruction causes the processor to wait until the TLB invalidate operation inprogress by this processor is complete.

The PowerPC architecture defines the tlbsync instruction (an illegal instruction in the 601)that ensures that TLB invalidate operations executed by this processor have caused allappropriate actions in other processors on the system bus. In a system that contains both601 processors and other PowerPC processors, the tlbsync functionality must be emulatedfor the 601 in order to ensure proper synchronization with the other PowerPC processors.

Any processor, including the processor modifying the page table, may access the page tableat any time in an attempt to reload a UTLB entry. An inconsistent page table entry mustnever accidentally become visible; thus there must be synchronization betweenmodifications to the valid bit and any other modifications. This requires as many as twosync operations for each PTE update.

The 601 writes reference and change bits with unsynchronized, atomic byte storeoperations. Note that the V, R, and C bits each resides in a distinct byte of a PTE. Therefore,extreme care must be taken to ensure that no store operation inadvertently overwrites oneof these bytes.

6.9.3.1 Adding a Page Table EntryAdding a page table entry requires only a lock on the PTE in a multiprocessor system. Thebytes in the PTE are then written, except for the valid bit. A sync instruction then ensuresthat the updates have been made to memory, and lastly, the valid bit is set.

lock(PTE)PTE[VSID,H,API] ← new valuesPTE[PPN,R,C,WIM,PP] ← new valuessyncPTE[V] ← 1unlock(PTE)


6.9.3.2 Modifying a Page Table EntryThis section describes several scenarios for modifying a PTE.

6.9.3.2.1 General CaseIn the general case, a currently-valid PTE must be changed. To do this, the PTE must belocked, marked invalid, flushed from the TLB, updated, marked valid again, and unlocked.The sync instruction must be used at appropriate times to wait for modifications tocomplete.

Note that the tlbsync and the sync instruction that follow are only required if compatibilityis must be maintained with other PowerPC processors that implement the tlbsyncinstruction. The tlbsync instruction is not implemented in the 601 but can be emulated inthe illegal instruction exception handler.

lock(PTE)PTE[V] ← 0synctlbie(PTE)synctlbsyncsyncPTE[VSID,H,API] ← new values PTE[PPN,R,C,WIM,PP] ← new valuessyncPTE[V] ← 1unlock(PTE)

6.9.3.2.2 Clearing the Reference (R) BitWhen the PTE is modified only to clear the R bit to 0, a much simpler algorithm sufficesbecause the R bit need not be maintained exactly.

lock(PTE)oldR ← PTE[R]PTE[R] ← 0if oldR = 1, then tlbie(PTE)unlock(PTE)

Since only the R and C bits are modified by the processor, and since they reside in differentbytes, the R bit can be cleared by reading the current contents of the byte in the PTEcontaining R (bits 16–23 of the second word), ANDing the value with x'FE', and storing thebyte back into the PTE.


6.9.3.2.3 Modifying the Virtual AddressIf the virtual address is being changed to a different address within the same hash class(primary or secondary), the following flow suffices:

lock(PTE)val ← PTE[VSID,API,H,V]val ← new VSIDPTE[VSID,API,H,V] ← valsynctlbie(PTE)synctlbsyncsyncunlock(PTE)

In this pseudo-code flow, note that the store into the first word of the PTE is performedatomically. Also, the tlbsync and the sync instruction that follow are only required ifcompatibility is must be maintained with other PowerPC processors that implement thetlbsync instruction. The tlbsync instruction is not implemented in the 601 but can beemulated in the illegal instruction exception handler.

6.9.3.3 Deleting a Page Table EntryIn this example, the entry is locked, marked invalid, invalidated in the TLBs, and unlocked.

Again, note that the tlbsync and the sync instruction that follow are only required ifcompatibility is must be maintained with other PowerPC processors that implement thetlbsync instruction. The tlbsync instruction is not implemented in the 601 but can beemulated in the illegal instruction exception handler.

lock(PTE)PTE[V] ← 0synctlbie(PTE)synctlbsyncsyncunlock(PTE)

6.9.4 Segment Register UpdatesThere are certain synchronization requirements for using the move to segment registerinstructions. These are described in Section 2.3.3.1, “Synchronization for Supervisor-LevelSPRs and Segment Registers.”


6.10 I/O Controller Interface Address TranslationAn I/O controller interface segment is a mapping of logical addresses to the I/O controllerinterface bus protocol. I/O controller interface segments are provided for POWERcompatibility. Applications that require low-latency load/store access to external addressspace should use memory-mapped I/O, rather than the I/O controller interface.

A logical address within the I/O controller interface space corresponds to a segment registerwhich has T = 1. For more details about memory references to I/O controller interfacesegments, refer to Chapter 9, “System Interface Operation.”

As a subset of I/O controller interface address translation, the 601 also provides a way toforce I/O controller interface accesses to be made to memory. This memory-forced I/Ocontroller interface capability allows a 256-Mbyte segment of memory to be mapped withonly one segment register and no page translation overhead. Note that this functionalitymay not be provided in other PowerPC processors.

6.10.1 Segment Register Format for I/O Controller InterfaceFigure 6-25 shows the register format for the segment registers when the T bit is set.

Figure 6-25. Segment Register Format for I/O Controller Interface

Table 6-20 shows the bit definitions for the segment registers when the T bit is set.

Table 6-20. Segment Register Bit Definitions for I/O Controller Interface



1 Ks Supervisor mode memory key

2 Ku User mode memory key

3–11 BUID Bus unit ID

12–27 — Device specific data for I/O controller

28–31 Packet 1(0–3) This field contains address bits 0–3 of the packet 1 cycle (address-only).

T Ks Ku BUID Controller Specific Information Packet 1(0–3)

0 1 2 3 11 12 27 28 31


6.10.2 I/O Controller Interface AccessesWhen the address translation process determines that the segment has T = 1, I/O controllerinterface address translation is selected and any match due to block address translation (seeSection 6.7, “Block Address Translation”) is ignored. Additionally, no reference is madeto the page tables. The following data is sent to the memory controller in the protocol (twopackets consisting of address-only cycles) described in Section 9.6, “Memory- vs. I/O-Mapped I/O Operations”:

• Packet 0

— One of the Kx bits (Ks or Ku) is selected to be the key as follows:

– For supervisor accesses (MSR[PR] = 0), the Ks bit is used and Ku is ignored

– For user accesses (MSR[PR] = 1), the Ku bit is used and Ks is ignored

— The contents of bits 3–31 of the segment register, which is the BUID field concatenated with the “controller-specific” field.

• Packet 1—SR[28–31] concatenated with the 28 lower-order bits of the logical address, LA4–LA31.

The WIM bits for I/O controller interface accesses are forced to b'010'. Some instructionscause multiple address/data transactions to occur on the bus. The address for eachtransaction is handled individually with respect to the MMU.

6.10.3 I/O Controller Interface Segment ProtectionPage-level protection as described in Section 6.8.5, “Page Memory Protection,” is notprovided by the 601 for I/O controller interface segments. The appropriate key bit (Ks orKu) from the segment register is sent to the memory controller, and the memory controllerimplements any protection required. Frequently, no such mechanism is provided; the factthat a I/O controller interface segment is mapped into the address space of a process maybe regarded as sufficient authority to access the segment.

6.10.4 Memory-Forced I/O Controller Interface AccessesThe 601 performs memory-forced I/O controller interface accesses when the T bit in theselected segment register is set and the BUID field in the segment register is x'07F'. In thiscase, the processor bypasses all protection mechanisms and generates a memory accesswith the physical address specified by the lowest-order four bits in the segment register(SR[28–31]) concatenated with LA4–LA31. In this case, the processor assumes the WIMbits to be '011', denoting the access as cache-inhibited and global. An example of addressgeneration for a memory-forced I/O controller interface access is shown in Figure 6-26.


Figure 6-26. Memory-Forced I/O Controller Interface Access Example

6.10.5 Instructions Not Supported in I/O Controller Interface Segments

The following instructions are not supported when issued with a logical address that selectsa segment register that has T = 1:

• lwarx• stwcx.• lscbx

If one of the above instructions is executed with a logical address corresponding to asegment with T = 1, a data access exception occurs and DSISR[5] is set.

The following instructions are not supported at all and cause boundedly undefined resultswhen issued with a logical address that selects a segment register that has T = 1(or whenMSR[DT] = 0):

• eciwx• ecowx

6.10.6 Instructions with No Effect in I/O Controller Interface Segments

The following instructions are executed as no-ops when issued with a logical address thatselects a segment where T = 1:

• dcbt• dcbtst• dcbf• dcbi• dcbst• dcbz

1 Ks Ku 0 0111 1111 x.........x PPPP

0 1 2 3 1112 27 28 31

SR6

0110

0 3 4 31

LA

PA PPPP

0 3 4 31


6.10.7 I/O Controller Interface Summary FlowFigure 6-27 shows the flow used by the MMU when I/O controller interface addresstranslation is selected. This figure expands the I/O Controller Interface Translation stubfound in Figure 6-4 for both instruction and data accesses.

Figure 6-27. I/O Controller Interface Translation Flow

Memory AccessPerformed

Perform I/O Controller Interface Access

Instruction Access Exception*

Data AccessInstruction Access

I/O ControllerInterface Translation

T=1

SR[BUID] = X’07F’ (Memory-Forced)otherwise

otherwise

Floating-PointLoad or Store

Alignment Exception

lwarx, stwcx., or lscbx Instruction otherwise

Cache Instruction (dcbt, dcbtst, dcbf,

dcbi, dcbst, or dcbz)

No-Op

Set DSISR[5]=1

PA0–PA31 ← SR[28–31] || LA4–LA31

otherwise


PA0–PA31 ← SR[28–31] || LA4–LA31

* No SRR1 bits are set for this case; this differs from the PowerPC architecture, which specifies that SRR1[3]is set for this condition.

Chapter 7. Instruction Timing 7-1

Chapter 7 Instruction Timing7070

This chapter describes how instructions flow through the PowerPC 601 microprocessor. Alogical model of the 601 pipeline is presented as a framework for understanding thefunctionality and performance of the hardware. While this pipeline model is an abstractionof the hardware implementation, it can yield accurate instruction timing information.

7.1 Terminology and ConventionsThis section describes terminology and conventions used in this chapter and in Appendix I,“Instruction Timings.”

7.1.1 Definition of TermsThis section defines terms used in this chapter.

• Stage—An element in the pipeline at which certain actions are performed, such as decoding the instruction, performing an arithmetic operation, and writing back the results. A stage typically takes a cycle to perform its operation, however, some stages are repeated (a double-precision floating-point multiply, for example). When this occurs, an instruction immediately following it in the pipeline is forced to stall in its cycle.

An instruction may also occupy more than one stage simultaneously; for example, the IQ0 position in the dispatch stage (DS) is usually identical to the integer decode (ID) stage, or an instruction in the integer store buffer stage (ISB) remains in that stage until it completes the cache access stage (CACC).

After an instruction is fetched, it can always be defined as being in a stage.

• Boundary—A boundary is the infinitely small period of time between stages.

• Pipeline—In the context of instruction timing, the term pipeline refers to the interconnection of the stages. The events necessary to process an instruction are broken into several cycle-length tasks to allow work to be performed on several instructions simultaneously—analogous to an assembly line. As an instruction is processed, it passes from one stage to the next. When it does, the completed stage is now available for the next instruction.

Although it may take many cycles to complete the processing of an individual instruction (the number of cycles is called the instruction latency), pipelining makes


it possible to overlap the processing so that the throughput (number of instructions completed per cycle) is greater than if pipelining were not implemented.

Note that with respect to the 601, each of the three executions units (floating-point unit, or FPU; branch processing unit, or BPU; and the integer unit, or IU) has its own pipeline. This implementation of parallel pipelines is called superscalar architecture.

• Branch folding—The elimination of a branch instruction from the pipeline after the direction has been determined. The next sequential instruction is allowed to take its place in the pipeline.

• Branch prediction—The process of guessing whether a branch will be taken. Such predictions can be correct or incorrect; the term predicted as it is used here does not imply that the prediction is correct (successful).

• Branch resolution—The determination of whether a branch is taken or not taken. A branch is said to be resolved when it can exactly be determined which path it will take. If branch is resolved as predicted, speculatively executed instructions can complete execution. If the branch is not resolved as predicted, instructions are purged from the instruction pipeline and are replaced with the instructions from the nonpredicted path.

• Instruction stream collapsing—The elimination of empty queue positions between instructions in the instruction queue (functionally part of the IU pipeline) when branch and floating-point instructions are dispatched out of order. The remaining integer instructions comprise the reference pipeline and are assigned tags that permit all instructions to complete writeback in program order. This is shown in Section 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model.”

• Program order—The original order in which program instructions are provided to the instruction queue from the cache.

• Bubble—A bubble is caused by a lost opportunity to execute an instruction resulting from conditions such as pipeline stalls and by dynamics of the IQ and dispatch mechanism. A bubble reduces the throughput (number of instructions completed/number of cycles required) by increasing the size of the denominator.

Note that unless the bubble is a tagged bubble, it can be eliminated if the preceding instruction causes a stall.

• Tagged bubble—A bubble in the integer pipeline that exists solely to allow branch and floating-point instructions dispatched out of order to be reordered correctly at the writeback stage. Sometimes it is necessary to create a tagged bubble in order to maintain the precise exception model.

• Stall—An occurrence when an instruction cannot proceed to the next stage because that stage is occupied by the previous instruction.

• Latency—The number of cycles required to complete the processing of a particular instruction.


• Throughput—A measure of the number of instructions that are processed per cycle. For example, a series of integer add instructions can execute at a throughput of one instruction per clock cycle.

• Commitment (of an instruction)—Execution has completed, data dependencies have been resolved, and the process of recording the results has begun and must complete. Once an instruction is committed, an exception cannot prevent the instruction from writing back its results.

• Feed-forwarding—The action of passing the results of one instruction directly to a subsequent, data dependent instruction. For example, there is a feed-forwarding mechanism in the 601 IU pipeline that allows the results of the execute (IE) stage to be available to the subsequent instruction that is in the integer decode (ID) stage.

Note that most of the discussions in this chapter and in the examples in Appendix I,“Instruction Timing Examples,” assume that the floating-point precise mode is disabled(that is MSR[FE0] and MSR[FE1] are not both set).

7.1.2 Timing TablesThis section describes how to use the tables in this chapter that illustrate the timings ofinstructions through their respective pipeline. An example showing the flow of rotate, mask,and shift instructions through the IU pipeline is given in Table 7-1.

The first row in the timing tables indicates the number of cycles an instruction spends ineach pipeline stage. The second row shows the pipeline stages. For integer instructions, thestages are—integer decode (ID), integer execute (IE), integer completion (IC), integerwriteback for ALU operations (IWA), integer writeback for load operations (IWL), andcache access (CACC). Sub rows are included with a different pipeline stage in each sub-row. Some instructions simultaneously occupy multiple stages, while some instructionsspend several cycles in the same stage. The classic RISC instruction flow is shown inTable 7-1—the instruction moves from ID to IE to IWA spending one cycle in each stage.The third row in the tables shows which resources are required nonexclusively. Typicallythese resources are registers that are read by the instruction. The fourth row shows

a. Table 7-23 summarizes the resources required by individual instructions.

Table 7-1. Rotate, Mask, and Shift Instruction Timing

Number of Cycles 1 1 1

Pipeline stages ID

IE

IC

IWA

Resources required nonexclusivelya rA, rB, rS, CA, MQ

Resources required exclusivelya. rA, MQ, CR0


resources that are required exclusively; such resources are typically registers that are beingwritten by the instruction.

The columns indicate a cycle progression. The leftmost column being the first cycle.

7.2 Pipeline DescriptionThe 601 is a pipelined superscalar processor. A pipelined processor is one in which theprocessing of an instruction is broken down into discrete stages, such as decode, execute,and writeback. Because the tasks required to process an instruction are broken into a seriesof tasks, an instruction does not require the entire resources of an execution unit. Forexample, after an instruction completes the decode stage, it can pass on to the next stage,while the subsequent instruction can advance into the decode stage. This improves thethroughput of the instruction flow. For example, it may take three cycles for an integerinstruction to complete, but if there are no stalls in the integer pipeline, a series of integerinstructions can have a throughput of one instruction per cycle.

A superscalar processor is one in which multiple pipelines are provided to allowinstructions to execute in parallel. The 601 has three execution units, one each for integerinstructions, floating-point instructions, and branch instructions. The IU and the FPU eachhave dedicated register files for maintaining operands (GPRs and FPRs, respectively),allowing integer calculations and floating-point calculations to occur simultaneouslywithout interference.

Note that in the 601, instructions are typically dispatched out of order. Branch and floating-point instructions are tagged to instructions in the integer pipeline, and any order-relateddependencies are tracked and handled appropriately by completion logic in the IU (the ICstage). Note that it is not always necessary for instructions to complete in order, and the 601allows instructions to complete in a manner that ensures the integrity of data and registers.

The 601 pipeline description can be broken into two parts, the processor core, whereinstruction execution takes place, and the memory subsystem, the interface between theprocessor core and system memory. The system memory includes a unified 32-Kbyte cacheand the bus interface unit.

7.2.1 Processor CoreThe 601 processor core, shown in Figure 7-1, contains 20 distinct pipeline stages.


Figure 7-1. Instruction Flow Diagram Showing the Processor Core

CARB

IQ7

IQ6

IQ1

IQ5

IQ4

IQ3

IQ2

IC

FD

FPA

FWA

FPM

F1BE

MR

Dispatch Unit

Floating-Point

Integer Unit (IU)Branch Processing

BW

CACC

Cache (memory subsystem)

IWA IWL

FWL

Data Access Queueing Unit

Fetch Arbitration

Unit (BPU) = Unit Boundary

= Cycle Boundary

Unit (FPU)

FA

ISB

FPSB

ID

IE

IQ0

1 An integer instruction can be passed to the ID stage in the same cycle in which it enters IQ0.

1

(Instructions in the IQare said to be in the dispatch stage (DS))


Note that some stages can contain multiple instructions during a given cycle, and someinstructions can reside in multiple stages during a given cycle. This section gives a briefdescription of each stage. Later sections go into more detail about each stage and howindividual instructions flow through the pipeline.

The processor core is shown in Figure 7-1.

Table 7-2 describes the common stages in the 601.

Table 7-3 describes the stages in the integer pipeline. Not all integer instructions passthrough every IU stage.

Table 7-2. PowerPC 601 Microprocessor Pipeline Stages—Common Stages

Stage Description

FA Fetch Arbitration—During fetch arbitration, the address of the next instructions (fetch group) to be fetched is generated and sent to the memory subsystem (the cache arbitration stage). Any instructions that are going to arrive at the dispatch stage as a result of the cache access associated with the address generated during the FA stage is considered to be in the FA stage. All instructions must pass through the FA stage.

CARB Cache Arbitration—For most operations, the CARB stage of the cache is overlapped with one or more other stages. The CARB and CACC stages may be used by the memory subsystem for cache reload operations and for some snoop operations. These relationships are shown in the multiple instruction timing diagrams in Appendix I.

CACC Cache Access—The cache is the only interface point between the memory subsystem and the processor core; if the data being accessed by the instruction in the CACC stage is in the cache, it is passed to the processor core during that cycle (that is, a single-cycle cache access). If the data is not in the cache, no data is passed to the processor core. The 601 cache is nonblocking, that is, once an instruction misses in the cache, the CACC stage is free to service another instruction. For more information, see Section 7.2.2, “Memory Subsystem.” The CARB and CACC stages may be used by the memory subsystem for cache reload operations, and some snoop operations. During a cache reload, the data being reloaded is brought in from the memory system and is available for use in the processor core on the next cycle.

DS Dispatch—The dispatch stage is associated with the eight-entry instruction queue (IQ0–IQ7). The 601 can dispatch as many as three instructions on every clock cycle, one to each of the processing units (the IU, the BPU, and the FPU) from the same dispatch stage. As many as eight instructions can be in the dispatch stage (instruction queue), but instructions can be dispatched only from IQ0–IQ3. Note that only three of the first four (in program order) can be dispatched on a given cycle. Note that the bottom element of the instruction queue (IQ0) can be viewed as part of the integer decode (ID) stage.


The data access queueing unit provides a buffer between the IU and the memory subsystem.It contains two stages—the floating-point store buffer stage (FPSB), and the integer storebuffer stage (ISB). The stages in this unit are described in Table 7-4. The data accessqueueing unit is described with the IU in Section 7.3.3, “Integer Pipeline Stages.”

The floating-point pipeline contains six stages below the DS stage. These stages aredescribed in Table 7-5. Note that all floating-point arithmetic instructions must passthrough each of these stages (with the exception of F1); details of how each type ofinstruction makes use of each floating-point stage is provided in Section 7.3.4, “Floating-Point Pipeline Stages.”

Table 7-3. PowerPC 601 Microprocessor Pipeline Stages—Integer Pipeline

Stage Description

ID Integer Decode—In the ID stage, integer instructions are decoded and the operands are fetched from the GPRs. Note that an integer instruction typically enters the decode stage when it enters IQ0; when and instruction stalls in the ID stage, a new instruction may move into IQ0; however, IQ0 and ID will be the same after the instruction is no longer stalled in the ID stage.

IE Integer Execute—In the IE stage, integer ALU operations are executed, the EA for memory access instructions is calculated and translated, and the request is made to the memory subsystem (that is, the instruction simultaneously occupies the CARB and CACC stages). There is feed-forwarding for ALU operations in the IE stage. This means that the results calculated in the IE stage are available as sources to the instruction that enters IE stage in the next cycle. This eliminates stalls due to data dependencies between consecutive integer ALU instructions. There is also feed-forwarding from the CACC stage to the IE stage resulting in only a one-cycle stall for a dependent operation directly following a load instruction. For more information, see Section 7.3.3, “Integer Pipeline Stages.”

IC Integer Completion—In the IC stage, results of instructions are made available for use unless synchronous exceptions are detected. Tags for branch and floating-point instructions must pass through this stage.

IWA Integer Arithmetic Writeback—In the IWA stage, the general purpose registers (GPRs) are updated with the results from integer arithmetic operations.

IWL Integer Load Writeback—In the IWL stage, integer load operations a write operation in the GPRs with from the cache or from memory.

Table 7-4. PowerPC 601 Microprocessor Pipeline Stages—Data Access Queueing Unit

Stage Description

FPSB Floating-Point Store Buffer— The FPSB stage is used for floating-point store instructions that have been committed (or are being committed) but for which the floating-point data is not yet available. All floating-point store instructions must pass through this stage. This stage allows the instruction to free up the IE stage. An instruction remains in this stage until it completes the CACC stage. The data in this stage is kept memory-coherent by the processor.

ISB Integer Store Buffer—The ISB stage is used to buffer data accesses that were not arbitrated into the cache due to a higher priority access (such as a cache reload). This stage allows the instruction to free up the IE stage. An instruction remains in this stage until it completes the CACC stage.The buffer used in this stage is kept memory-coherent by the processor.


The BPU pipeline contains three stages below the DS stage—the branch execute stage(BE), the mispredict recovery stage (MR), and the branch writeback stage (BW). These aredescribed in Table 7-6.

Some integer and floating-point instructions must repeat stages in their respective pipelines,which causes subsequent instructions in the pipeline to stall. In addition to the multicycleoperations, data dependencies can cause stalls as described in Section 7.3.3, “IntegerPipeline Stages,” and Section 7.3.4, “Floating-Point Pipeline Stages.”

Synchronization is handled relative to the integer pipeline. To ensure that LR and CR areupdated in order, neither the BPU nor the FPU can perform their write-back operations untiltheir tags complete the IC stage in the integer pipeline. Instructions in the IU cannot getahead of instructions in the BPU, but may get ahead of instructions in the FPU by as many

Table 7-5. PowerPC 601 Microprocessor Pipeline Stages—Floating-Point Pipeline

Stage Description

F1 Floating-Point Instruction Queue—The F1 stage is a one-entry queue that buffers floating-point instructions that have been dispatched but cannot be decoded because an instruction is stalled in the FD stage.

FD Floating-Point Decode—In the FD stage, instructions are decoded and operands are fetched from the FPRs.

FPM Floating-Point Multiply—In the FPM stage, operands are fed through a multiplier that performs the first part of a multiply operation. The multiplier performs a single-precision multiply with a throughput of one per cycle or a double-precision multiply with a throughput of one per two cycles.

FPA Floating-Point Add—In the FPA stage, an addition is performed for add instructions or to complete multiply or accumulate instructions.

FWA Floating-Point Arithmetic Writeback—In the FWA stage, normalization and rounding occur, FPRs are updated, store data is sent to the memory subsystem, and bits are set in the FPSCR. Data written to the FPRs during the FWA stage is available to the FD stage in the following cycle.

FWL Float Load Writeback—During the FWL stage, load data is written into the FPRs. Load data that is being written to the FPRs is also available to the FD stage in the same cycle.

Table 7-6. PowerPC 601 Microprocessor Pipeline Stages—Branch Pipeline

Stage Description

BE Branch Execute—In the BE stage, the target address of a branch is calculated and the branch direction is either determined or predicted depending on the state of the condition register (CR) and the type of branch instruction. Note that the BE stage is parallel with the FA stage of the target instructions of a taken branch.

MR Mispredict Recovery—Conditional branches also go into the MR stage (in parallel with entering the BE stage) and stay there until the branch is resolved (the CR is coherent). If a branch was predicted incorrectly, the MR stage logic allows the fetcher to recover and start executing the correct path.

BW Branch Writeback—In the BW stage, branch instructions that update the link register (LR) or count register (CTR) do so. Note that many branches can be in the BW stage at any given cycle, but that no more than two can write back in one cycle. An infinite stream of taken branches that hit in the cache has a throughput of one branch every two cycles.


as three instructions if floating-point exceptions are disabled (that is, the processor logs theexception but does not trap to an exception vector). However, when floating-pointexceptions are enabled, (controlled by MSR[FE0] and MSR [FE1]) the IU cannot get aheadof the FPU. This is described in Section 7.3.1.4.4, “Synchronization Tags for the PreciseException Model.”

Performance is affected whenever the IU is required to synchronize with the FPU, asdescribed in Section 7.3.4, “Floating-Point Pipeline Stages.”

When a floating-point or branch instruction is dispatched, it is tagged to the previous integerinstruction (in program order). This tag is used to ensure that execution appears to be inprogram order. There are certain restrictions, such as the fact that only one floating-pointinstruction can be tagged to an integer instruction, that may cause an instruction to betagged to a bubble (nonexecuting placeholder) in the IU pipeline, thus reducing throughput.These restrictions are described in Section 7.3.1.4, “Common Stages—Dispatch (DS)Stage,” and Section 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model.”

7.2.1.1 Dispatch Stage LogicThe DS stage, which contains instruction queue (IQ), resides between the memorysubsystem and the execution units and accounts for the time required to fetch the instructionand to dispatch it to one of the execution units. The instruction queue can hold as many aseight instructions, so an entire cache sector can be fetched into the DS stage in one cycle.

The following characteristics and restrictions regarding instruction dispatch should benoted:

• Floating-point, branch, and integer instructions can be dispatched out-of-order.

• Floating-point and branch instructions are tagged to a previous integer instruction in program order or to a bubble in the IU pipeline. Typically, a floating-point or branch instruction is dispatched before the integer instruction or bubble to which it is tagged. Tagging is described in Section 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model. “

• Floating-point and integer instructions flow through their respective pipelines in program order. Under certain conditions, branch instructions can be dispatched out of order, as described in Section 7.2.1.4, “Branch Processing Unit (BPU).”

• Integer and load/store instructions are dispatched in order relative to all instructions (that is, an integer instruction cannot be dispatched until all instructions before it in program order have been dispatched).

• Branch and floating-point instructions can be dispatched in program order from IQ0–IQ3. However, an integer instruction is dispatched only when it is the first instruction in the DS stage (typically IQ0) in program order. Branch and integer instructions are dispatched in zero cycles, while floating-point instructions take one cycle to dispatch.


Several situations can cause dispatch stalls—instruction synchronization requirements ofthe precise exception model, resource dependencies and data dependencies. These stalls aredescribed in Section 7.3.1.4, “Common Stages—Dispatch (DS) Stage.”

7.2.1.2 Integer Unit (IU)The integer unit (IU) executes the following types of instructions:

• Integer arithmetic instructions • Integer logical instructions• All load operations and store operations (both integer and floating-point)• CR instructions• Memory management instructions• Miscellaneous special purpose register instructions.

Different types of instructions use different portions of the IU pipeline.

All instructions that execute in the IU use the ID stage in the same way—the instruction isdecoded, immediate constants are pulled from the instruction, and the appropriate operandsare fetched from the GPRs.

All instructions that execute in the IU use the IE stage and writeback stages (IWA andIWL), but different types of instructions use these stages in different ways. Theseinstructions can grouped by function. Within a group, the functions performed in thesestages are very similar. However, the functions performed in these stages differ greatly fromone group to another. Instructions that execute in the IU can be divided into the followinggroups:

• The single-cycle integer operations—These operations take one cycle in the IE stage, one cycle in the IWA stage, and an overlapping cycle in the IC stage (typically these instructions are in the IC stage at the same time as the IWA stage). This class of operations includes integer addition operations, rotate/shift operations, integer logical operations, CR manipulations, and some register move operations.

• The multicycle integer operations use the IE stage, the IWA stage, and the IC stage, but take multiple cycles in at least one of these stages. Included in this class are integer multiply instructions, integer divide instructions, and some register move operations.

• All single-cycle load/store instructions spend one cycle (simultaneously) in the IE and CARB stages, followed by at least one cycle in the CACC and in the IC stages. Note that additional cycles may be required in either the FPSB or the ISB stage if there is a stall, and note also that when the instruction is in the CACC stage it must also occupy either the FPSB or the ISB stage in case a cache retry condition forces the instruction to back out of the CACC stage. Finally, they may spend at least one cycle in either the IWL or the FWL stage. Single-cycle load/store operations include all integer and floating-point load/store operations except for the load/store


string/multiple operations. Logical addresses are translated to physical addresses during the IE stage. The update of rA for update-form load and store operations uses the IWA stage in parallel with the IC stage.

• The load/store multiple/string instructions use the same stages as the single-cycle load/store instructions but they spend multiple cycles in each stage. Note there are no floating-point load/store multiple/string instructions.

The execution of some instructions is shared between the IU and other units on the 601. Forexample, floating-point store operations are executed in both the IU and FPU. The IUgenerates the effective store address while the FPU provides the store data. Cacheinstructions (such as dcbz and dcbi) are decoded in the IU, and as the instruction is passedinto the IE stage it is simultaneously passed to the CARB stage, and subsequently to theCACC (and ISB) stage where the appropriate actions are performed.

The flow of instructions through the integer pipeline, and dependencies and considerationsassociated with the IU pipeline, are described in Section 7.3.3, “Integer Pipeline Stages.”

7.2.1.3 Floating-Point Unit (FPU)The FPU executes all floating-point arithmetic instructions and floating-point storeoperations, as described in Section 7.2.1.2, “Integer Unit (IU).” The FPU conforms toIEEE/ANSI standards for both single- and double-precision arithmetic. Double-precisionfloating-point values are stored in the thirty-two 64-bit floating-point registers (FPRs)contained within the FPU. Single-precision floating-point values are also contained in theFPRs (although only 32 single-precision values may be stored in the FPRs).

With the exception of the F1 stage, which acts as a buffer between the DS and FD stageswhen the FD stage stalls or is busy, all floating-point instructions must complete each stagein the floating-point pipeline. Special case numbers, such as denormalized numbers, NaNs,and infinity, are fully handled in hardware and may be required to repeat the path throughthe floating-point pipeline.

In the FD stage, instructions are decoded and operands are fetched from the FPRs. Thenumber of cycles required to decode a floating-point instruction depends on the type ofinstruction:

• Single-precision multiply and add instructions, double-precision add instructions, register move instructions, conversion instructions, and store instructions each spend one cycle in the FD stage.

• Double-precision multiply instructions spend two cycles in the FD stage. The 601 employs a single-precision multiply-add array spread across the FPM and FPA stages. Double-precision multiply operations are split into two operations, each of which uses the FD, FPM, and FPA stages twice. The results are added in the second cycle in the FPA stage. Note that this operation is fully pipelined—the second cycle in FD overlaps the first cycle of FPM; and the second cycle of FPM overlaps the first


cycle of FPA. This results in a throughput of one double-precision multiply per two cycles. This process is described in Section 7.3.4.5.2, “Double-Precision Instruction Timing.”

• Divide instructions continue to occupy the FD stage until it enters its final FWL cycle—either about 18 (single-precision) or about 30 (double-precision).

• Some special case numbers cause stalls in the FD stage.

Floating-point arithmetic instructions write back in the FWA stage and do not use the FWLstage, while floating-point load instructions write back in the FWL stage, and do not usethe FWA stage.

7.2.1.4 Branch Processing Unit (BPU)The BPU handles branch prediction and resolution in addition to branch target calculation.The BPU generates two addresses that feed the FA stage—the branch target address (fortaken branches), and the mispredict recovery address (the recovery address formispredicted branches).

Branch instructions can be divided into four groups, depending upon whether they requirethe MR and BW stages. These are shown in Table 7-7. Note that branch instructions that donot require MR or BW can always be folded, and they never require tags in the IU pipeline.

* Conditions that can prevent branch folding are described in Section 7.3.1.4.5, “Dispatch Considerations Related to IU/FPU Synchronization.”

All branch instructions use the BE stage, during which, the target address for the branch iscalculated and if the branch is either resolved to be or predicted to be taken, the address forthe branch is passed to the FA stage (in zero cycles).

Branches that are conditional on the CR are predicted in the BE stage (unless they canalready be resolved). The address of the nonpredicted path is stored in the MR stage (seebelow) in case the prediction is incorrect. The 601 uses a static prediction scheme.

Branches that are conditional on the CR also enter the MR stage in parallel with BE stage.The branch stays in the MR stage until it is resolved. If the prediction is incorrect, themispredict address (stored in the MR stage) is sent to the FA stage. The MR stage can holdonly one mispredicted branch address, so the 601 can only handle one unresolved

Table 7-7. Branch Instruction Folding

Branch Instruction Type MR Stage BW Stage Folded

Nonconditional/nonupdating No No Always*

Nonconditional/updating No Yes (synchronous with IU WB) Usually*

Conditional/nonupdating Yes No Sometimes*

Conditional/updating Yes Yes (synchronous with IU WB) Usually*


conditional branch at a time—any other conditional branch stalls at DS, although branchesthat do not depend on the CR can still be dispatched and executed.

The BPU contains two link shadow registers to store link addresses when branch-and-linkinstructions are executed out of order. When a branch-and-link instruction is dispatched, atag is generated in the integer pipeline. This tag is used to synchronize completion ofinstructions. Branch-and-link instruction can execute and clear the BE stage, but the LR isupdated only when the tag completes the IC stage. Similarly, branch-and-decrementinstructions do not immediately update the CTR when they execute; instead, they generatea tag used to decrement the CTR when it clears the IC stage. This is described inSection 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model.”

Branch instructions that update LR or CTR must go through the BW stage. The branchinstruction enters the BW stage on the cycle immediately following the BE stage and stallsthere waiting for its tag to complete (tags complete by passing through the IC stage). Asmany as nine branches can reside in the BW stage simultaneously, which eliminates stallsdue to the BW stage filling up; however, only two branches can write back on a given cycle(one to the LR and one to the CTR).

7.2.1.5 Memory Subsystem Pipeline StagesThe CARB and CACC stages are part of the memory subsystem. These stages represent theonly path between the processor core and main system memory. Memory accesses mustpass through both the CARB and the CACC stages. The CARB stage is responsible forarbitration between different memory access requests generated by the processor core andthe memory subsystem. The different types of requests are prioritized as follows:

1. Cache maintenance accesses (generated by the memory subsystem). These include the following in order of priority:

a) The reload access for reloading a cache line on a cache miss

b) The cast-out access for casting-out a modified adjacent sector

c) The snoop push required by a snoop operation

2. Data load requests (generated by the processor core). Note that loads precede stores with respect to the memory queues; however cache accesses occur in program order (assuming cache hits). The memory queues are described in Section 7.2.2.2, “Bus Interface Unit.”

3. Data store requests (generated by the processor core). Note that loads precede stores with respect to the memory queues; however cache accesses occur in program order (assuming cache hits).

4. Instruction fetch requests (generated by the processor core)

Only one request is granted in a given cycle, and that request is forwarded to the CACCstage in the next cycle. If a cache miss occurs, no data is returned from the CACC stage.There are conditions where a stall may occur in the CARB or CACC stages associated withcache retry, as described in Chapter 4, “Cache and Memory Unit Operation.” The data is


available during the CACC stage of the cache reload, as described in Section 7.2.2.3.3,“Cache Miss Timing”; thus in this case, a request can be satisfied even if it is not arbitratedinto the cache (the CARB stage).

Throughout much of this document, cache hits are assumed on all requests generated by theprocessor core; requests generated by the memory subsystem are ignored. Timinginformation including memory subsystem activity and cache misses can be determined byapplying the timing of the memory substage to the CARB and CACC stage timing, asdescribed in the following section, Section 7.2.2, “Memory Subsystem.”

7.2.2 Memory SubsystemThe memory subsystem of the 601 consists of a unified cache and a bus interface thatemploys extensive queueing to increase performance. The cache supports the four-stateMESI protocol and the bus interface implements snooping protocol to maintain cachecoherency for symmetric multiprocessor implementations.

This section describes the components of the memory subsystem—the cache unit,described in Section 7.2.2.3, “Cache Unit,” and the system interface, described inSection 7.2.2.2, “Bus Interface Unit.” This section also describes how memory accessesaffect instruction timing.

7.2.2.1 Memory Management Unit (MMU)The 601 MMU supports both the standard page table translation mechanism and the blockaddress translation (BAT) mechanism. There is a unified 256-entry, two-way set associativetranslation lookaside buffer (TLB) in the MMU. The TLB is maintained in hardware,including a hardware reload on a TLB miss. The BAT registers are maintained by softwareas specified by the PowerPC architecture. One MMU is used for both instruction and datatranslation with data accesses given priority over instruction accesses. Translation occursduring execute stage for load operations and store operations. For more detailedinformation about translation mechanism, see Chapter 6, “Memory Management Unit.”

7.2.2.2 Bus Interface UnitThe bus interface unit of the 601, shown in Figure 7-2, consists of a memory queue and buscontrol logic. The memory queue contains two read queues and three write queues (one ofwhich is reserved for snoop pushes).


Figure 7-2. Bus Interface Unit

The read queues may hold any combination of the following three cache misses—one fetch,one load, and one cacheable write-back store operation.

For better overall processor performance, the queueing allows high priority operations(such as read misses) to bypass low priority operations (such as cache write-backoperations). This prioritizing is discussed in Section 7.2.2.2.3, “Bus Interface Arbitration.”The bus interface features include pipelining of up to two operations, data forwarding (aftertwo data beats have been received), bus parking, and optional loading of the adjacent sectorin a cache line to decrease the latency of memory accesses. This is described in detail inChapter 9, “System Interface Operation.”

The bus interface allows zero wait state data to stream into or out of the chip, providing amaximum data bus bandwidth of 320 Mbytes/second (at 50 MHz). This high bandwidth isachieved by using the 601’s zero wait state capability and the ability to burst data, topipeline two addresses onto the bus and overlap slave access time for the second tenure withthat of the first.

7.2.2.2.1 Write QueueFor each position in the write queue, there is space for the physical address and for theassociated data (each capable of holding a sector). One position, marked snoop inFigure 7-2, is provided for high-priority bus operations when a snoop hits a modified sector.This is described in Section 9.10, “Using DBWO (Data Bus Write Only).”

Burst writes are always sector aligned.

7.2.2.2.2 Read QueueFor burst read operations, the address is modified to be the quadword address of therequested data. By requesting the quadword, the hardware can forward data to the internaltarget after only two beats of the data have been received. On the other hand, the memorysystem need only be able to provide two orderings of data coming back (first quadword ofa sector then second or second quadword then first).

READ QUEUE

WRITE QUEUE

SNOOP

ADDRESS DATA

ADDRESS(from cache)

DATA(from cache)

SYSTEM INTERFACE

(to cache)

DATA QUEUE

(four word)


Data that is read from memory is sent to a buffer that accepts a quadword of data beforeaccessing the cache to write the data (and forward it to the processor if required).

The read queues also allow an optional reload of the adjacent sector of a cache line if it isnot already valid in the cache, a read request is queued for that sector if no other read missesare present. How these operations are prioritized is shown in Section 7.2.2.2.3, “BusInterface Arbitration.”

For code and data with good locality of reference, this can lead to significant performanceincreases by reducing the number of cache misses. This optional reload can be disabled forinstruction fetches by setting HID0[26] and for load/store misses by setting HID0[27]. Formore information about the HID0 register, see Section 2.3.3.13.1, “Checkstop Sources andEnables Register—HID0.”

7.2.2.2.3 Bus Interface ArbitrationThe 601 arbitrates all queue positions for bus access using the following priority scheme(which is optimized for minimal processor stall):

1. High-priority cache push-out operations (using the DBWO signal)

2. Normal snoop push-out operations

3. I/O controller interface segment accesses that incur no additional delays (that is, they have not been retried because of I/O latency). When these accesses have been retried, these operations are given lowest priority.

4. Cache instruction operations

5. Read requests, such as load operations, RWITMs, and instruction fetches

6. Single-beat write operations

7. sync instructions

8. Optional cache-line fill operations

9. Cache sector cast-out operations

10. I/O controller interface segment accesses that incur additional delays (that is, they have been retried because of I/O latency). This is described in Chapter 4, “Cache and Memory Unit Operation.”

Because write operations do not stall the processor, they are performed after readoperations. If, however, a read misses and a write to the same block is in the queue, thatwrite is prioritized ahead of any other read operations. This allows the read miss to bequeued so the cache can be accessed for some other request.

I/O controller interface operations are typically ARTRYed multiple times. To allow otheroperations to finish in the interim, these I/O controller interface operations are given lowestpriority (until the 601 begins another bus operation).


A read miss can be arbitrated directly from the cache access point to the bus if no otherreads are queued. This eliminates having to queue the operation before arbitrating for thebus.

7.2.2.2.4 Bus ParkingBus parking is associated with the address portion of the bus interface unit. It is a part ofthe bus protocol that eliminates as many as two cycles of bus arbitration overhead by havingthe bus pregranted to a particular master. If the master is given a bus grant (BG) on the cyclethat it needs to request the bus, it begins a tenure by asserting the TS signal. Bus parkingsaves the cycle where bus request would have been asserted and the cycle of the grant, asshown in Figure 7-3. In some systems, these two cycles may be the same cycle. The transferstart occurs up to two cycles earlier when the bus is parked compared to when the bus is notparked.

Figure 7-3. Bus Timing for Parked and Nonparked Bus Masters

For more information about bus parking refer to Section 9.3.1, “Address Bus Arbitration.”

7.2.2.3 Cache UnitThe 601 has a unified 32-Kbyte, eight-way set associative cache. The cache access time isone processor cycle and it is nonblocking on misses. The cache tag directory is dual portedwith one port dedicated for bus snooping. The cache array is not accessed by snoops unlessa snoop push is required (that is, a snoop results in a data access action). The 601 uses asingle cache for both data and instruction accesses. The cache uses a least-recently-used(LRU) algorithm for replacing cache lines.

7.2.2.3.1 Cache ArbiterBecause the cache is single-ported and unified, all cache accesses must be prioritized.Cache arbitration is prioritized as shown in 7.2.1.5, “Memory Subsystem Pipeline Stages.”

Note that instruction fetch accesses have the lowest priority access to the cache.

7.2.2.3.2 Cache Hit TimingTo determine if the appropriate data resides in the cache, the cache data array and the cachetag directory are accessed in parallel. As shown in Figure 7-4, for a cache hit, data is

BR

BG

TS

BR

BG

TS

Cache miss detected

(Inactive)

(Active)

Processor Not Parked

Processor Parked


available at the output of the cache one cycle after the read address is arbitrated. Ifnecessary, the tag directory is updated in the same cycle that the cache data is available(cycle 1). The processor can forward the data to the IE stage in cycle 1. This feed-forwarding mechanism is described further in Section 7.3.3, “Integer Pipeline Stages.”

Figure 7-4. Cache Hit Timing for a Load Access and a Fetch Access

If a cache hit is detected when a write access is arbitrated into the cache, the cache isaccessed and written in the same cycle. If necessary, the tag directory is also updated in thesame cycle (cycle 1) in Figure 7-4. Store operations have the same arbitration priority asload operations. As long as they hit in the cache, a read access can immediately follow awrite access and a write access can immediately follow a read access unless there are datadependencies. Data dependencies are shown in the examples in Appendix I, “InstructionTiming Examples.”

7.2.2.3.3 Cache Miss TimingThe cache miss timing is a function of several variables, including bus speed, memoryaccess time and interleaving structure, and bus arbitration schemes. This section considersthe fastest possible system, then provides formulas for calculating numbers for any realsystem. For the rest of this section, the terms processor cycle and bus cycle are used todifferentiate the actual amount of time elapsing. Bus cycles are integer multiples ofprocessor cycles, and the multiple is system-dependent.

7.2.2.3.4 Timings When the Processor Clock Frequency Equals the Bus Clock Frequency

The following discussion assumes a 1:1 bus clock to processor clock frequency ratio. Forinformation about 601 clock operation, refer to Section 8.2.11, “Clock Signals.” Figure 7-5shows the best-case timing for a cache miss with the processor and bus operating at thesame frequency. This example assumes a one bus cycle access time to main memory, busrunning at full speed, no delay between the data beats (that is the transfer acknowledge, TA,signal is asserted without delay for each beat), no address retry (ARTRY) on the addresstenure, and the processor is parked on the bus.

Arbitrate load into

the cache.

Access cache, determine that there is a cache hit. Data available to the IE

stage.

Target GPR

written, dependent operation executes.

Arbitrate fetch into the cache.(FA stage)

Access cache, determine that

there is a cache hit. Instructions are placed in the IQ.

(DS stage)

0 1Cycle:0 1 2Cycle:

Instruction available

for dispatch

2

Load Access: Fetch Access:


Figure 7-5. Cache Miss Timing When the Processor Clock Equals the Bus Clock (Best-Case Timing)

When a cache miss is detected it is immediately arbitrated onto the bus if nothing of higherpriority is queued (Clock 0). The bus grant signal (BG, is asserted in the next clock cycle(Clock 1) if there is no higher priority transaction on the bus. However, if the processor isparked on the bus (BG is already asserted), a transfer begins on the next bus cycle. In thisexample, the transfer begins in Clock 2. The operation is also placed in the read queue(shown in Figure 7-2), so the data can be pipelined back to the processor and the cache.Sometime after the transfer has begun (perhaps as soon as the next bus cycle as shown herein Clock 3), data begins to come into the processor. Data arrives in four double-word beats(for a cache sector of data). After the second beat of data arrives (Clock 4), the bus interfaceunit makes a request (Clock 5) to write the four words into the cache (a “reload dump”).The first four words contain the critical word.

During the next processor cycle (Clock 6), data is written to the cache and forwarded to theprocessor core if required (either to the register or to the IQ, depending on whether theaction was a load or a code fetch). After the last two beats of data arrive, a second requestto write to the cache is made (Clock 7). During Clock 8, the cache receives the remainingfour words of data (bursts three and four) and the cache tag is validated. In Clock 9, if anaccess to this sector is arbitrated into the cache, the cache tags signal a cache hit, as shownin Figure 7-5.

Figure 7-6 shows a formula for calculating the number of processor cycles from cache miss(that is, cycle 1 in Figure 7-5) to dependent operation execution (cycle 7 in Figure 7-5)assuming a 1:1 processor-to-bus-clock ratio.

Arbitrate load

into the cache

(CARB).

Cache miss.

Receive bus grant (BG).

Load in memory queue,

assert TS, receive

data bus grant (DBG)

1st data beat

returned

2nd data beat

returned

3rd data beat

returned; arbitrate

reload dump into

cache.

4th data beat, first two beats of data written to cache and forwarded to IE stage, and

IQ.

dependent operation executes, arbitrate reload dump 2

into cache.

Reload dump 2, cache tags

validated for

sector.

New sector usable

in cache.

0 1 2 3 4 5 6 7 8 9


Figure 7-6. Formula for Calculating Total Time from Cache Miss to Execution of Dependent Operation

Bus parking is described in Section 7.2.2.2.4, “Bus Parking.”

Figure 7-7 provides a formula for calculating the total number of processor cycles fromcache miss to data available for next cache hit assuming the processor clock frequencyequals bus clock frequency.

Figure 7-7. Formula for Calculating Total Time from Cache Miss to Data Available for the Next Cache Hit

7.2.2.3.5 Timings When the Processor Clock Frequency Does Not Equal the Bus Clock Frequency

This section describes timings when the bus clock frequency does not equal the processorclock. Information regarding configuring the clock signals is given in Section 8.2.11,“Clock Signals.” Figure 7-8 shows the timing for a cache miss when the processor clock istwice as fast as the bus clock.

Total = 3 + 3 + AT + TD + SDD1 + BNP

cycles from the assertion of TS to the assertion of the first TA

delay between the first and second beats

cycles while DRTRY is asserted after second TA

processor cycles spent in the 601

minimum time spent on the bus

penalty if the bus is not parked

Total = 3 + 5 + AT + STD + SDD2 + BNP

cycles from the assertion of TS to the assertion of the first TA

sum of delays between TAs

cycles while DRTRY is asserted after all TAs





Figure 7-8. Cache Miss Timing when the Processor Clock Frequency is Twice the Bus Clock Frequency (Best-Case)

Processor cycles are indicated by both dashed and solid lines, while bus cycles are indicatedby the solid lines only.

Note that typically there is a one-half processor cycle penalty for synchronizing to the busclock when the processor clock frequency is twice the bus clock frequency (half the timethe cache miss occurs one processor clock before a bus transition processor clock, half thetime it occurs in a bus transition processor clock). This penalty is larger for larger processorclock to bus clock ratios, as the average number of processor cycles of delay increases.Also, all bus related parameters now have to be multiplied by the processor clock to busclock ratio.

Figure 7-9 shows these factors in calculating total processor cycles for cache-miss-to-dependent-operation execution when the bus clock frequency does not equal the processorclock frequency.

Load in cache, cache miss.

Load in memory queue,

arbitrate to the

bus, BG asserted

Assert TS

Receive DBG.

First data beat

returned

Second data beat

returned.

Third data beat

returned and

arbitrate reload dump into

Data written to

cache and forwarded to IE stage

or IQ

Fourth data beat

returned, dependent operation executes

0 1 2 3 4 5 6 7 8 9 10 11

Arbitrate reload dump2 into the cache

Reload dump2, cache tags

validated for

sector.

New sector usable.

12 13 14 15

NOTE: Processor cycles are indicated by both solid and dashed lines. Bus cycles are indicated by solid lines.

50% of the time, this clock is not wasted because the miss occurs on the bus transition cycle instead of the nontransition cycle as shown here.


Figure 7-9. Formula for Calculating Total Time from Cache Miss to Dependent Operation Execution (Processor Clock/Bus Clock ≠ 1:1)

Figure 7-10 shows a formula for calculating the total number of processor cycles fromcache miss to data available for next cache hit on a bus that is not running at the samefrequency as the processor.

Figure 7-10. Formula for Calculating Total Time from Cache Miss to Data Available for Next Cache Hit (Processor Clock ≠ Bus Clock)

7.3 Pipeline TimingThis section, describes the exact timing of the pipeline stages, including specificinformation for every instruction and descriptions of data dependencies that may affectthroughput. It is organized into three main parts. The first part discusses timing for stagescommon to all instructions and the closely related branch pipeline. Subsequent sectionsdiscuss timing for stages specific to the IU and FPU pipelines.

Total = 3 + ASP + (3 + AT + TD + SDD1 + BNP) * PBR

memory access time

delays between first and second TAs


processor clock/bus clock ratio



average number of processor cycles lost for synchronizing to the bus

cycles while DRTRY is asserted after second TA

Total = 3 + ASP + (5 + AT + TD + SDD2 + BNP) * PBR

memory access time

delays between first and second TAs


processor clock/bus clock ratio



average number of processor cycles lost for synchronizing to the bus

cycles while DRTRY is asserted after all TAs


7.3.1 Common Stages/BPU Pipeline StagesThis section describes timing of the pipeline stages that are common to every instructionand the closely related branch pipeline.

The common stages are as follows:

• Fetch arbitration stage (FA)—In this stage, the fetch address is sent to the memory subsystem. The FA stage is described in Section 7.3.1.1, “Common Stages—Fetch Arbitration (FA) Stage.”

• Cache arbitration (CARB) and cache access (CACC) stages—In these stages instructions are read from the memory subsystem (from the address sent to the memory subsystem in the FA stage) and loaded into the instruction queue (DS stage). The CARB stage is described in Section 7.3.1.1, “Common Stages—Fetch Arbitration (FA) Stage and Section 7.3.1.3, “Common Stages—Cache Access (CACC) Stage.”

• Dispatch stage (DS)—The common stages end when instructions are dispatched to the appropriate execution unit(s) in the dispatch stage (DS). The dispatch stage has different timing for different instructions.

The BPU pipeline is included in this discussion because it is tightly coupled to the commonstages in that the results of a branch instruction determine which instructions enter the FAstage. The BPU generates the branch target address and the mispredict address that can goto the memory subsystem in the FA stage.

7.3.1.1 Common Stages—Fetch Arbitration (FA) StageDuring the FA stage, the address for the next instruction needed by the processor core issent to the memory subsystem. The possible sources of stalls can be grouped into the twofollowing categories:

• External events—Anything that occurs outside the BPU that stops the fetch address from accessing the memory subsystem. For more information see Section 7.2.2.3.1, “Cache Arbiter.”

• Inability to generate a correct address—These conditions are primarily related to an inability to translate the fetch address. For more information see Section 7.2.2.1, “Memory Management Unit (MMU).”

Note that because the fetch bandwidth (up to eight instructions per cycle) is greater than thedispatch bandwidth (up to three instructions per cycle), stalls in the FA stage do notnecessarily cause bubbles in the dispatch stage or in execution unit pipeline stages.

The address that the fetcher sends to the memory subsystem comes from one of threesources (in order of decreasing priority): the mispredict recovery address, the branch targetaddress, or, the next sequential address.

The sequential fetcher is used to generate the next sequential address. This address is usedwhen there is not a taken branch or a mispredicted branch recovery, and is calculated by


taking the current fetch address and adding to it the number of instructions loaded into theDS stage this cycle (times four—because each instruction is four bytes long).

When a branch is dispatched to the BPU, its target EA is calculated and then sent to theMMU for translation. For conditional branches that depend on the CR, while the target EAis being calculated, the CR is also being checked for coherency (on a four-bit fieldgranularity).

If the CR is coherent (that is, no instructions remaining in the pipeline below the branch aregoing to update the CR field to which the branch refers), the branch direction is resolved.If the CR is not coherent, the branch is predicted. The recovery address (the nonpredictedaddress) is saved in the MR stage. Based on the direction of the branch prediction, thetranslated target address (from the MMU) or the translated next-sequential address (fromthe MMU) is sent to the memory subsystem. If a previous branch prediction is determinedto have failed, the translated address is not sent to the memory system, and the currentbranch instruction is removed as part of the mispredict recovery.

After a branch is predicted, the CR is checked for coherency in each subsequent cycle.When the CR becomes coherent, the branch is resolved. If the branch is predicted correctly,the MR stage is cleared and fetching continues from the current fetch address (the currentaddress could be either a sequential fetch address or a target address from a subsequentbranch that does not depend on the CR). If the branch was predicted incorrectly, anyinstructions fetched from the predicted path are purged and the translated mispredictrecovery address is sent to the memory subsystem as discussed in Section 7.3.2.1,“Speculative Execution and Mispredict Recovery Mechanism.”

7.3.1.2 Common Stages—Cache Arbitration (CARB) StageThe commons stages include cache arbitration and cache access stages. During the CARBstage, the instruction fetching mechanism arbitrates for access to the cache. For mostoperations, the CARB stage of the cache is overlapped with one or more other stages (suchas fetch arbitration and cache arbitration occur in one cycle). Note that the CARB andCACC stages may be used by the memory subsystem for cache reload operations and forsome snoop operations. For more details about the CARB stage, including cache accesspriorities, see Section 7.2.1.5, “Memory Subsystem Pipeline Stages.”

7.3.1.3 Common Stages—Cache Access (CACC) Stage During the CACC stage, instructions are read from the cache and loaded into the instructionqueue (DS stage). At most, eight instructions are loaded into the IQ during the CACC stage;the number of instructions fetched depends on the following:

• The type of access—A cache hit (eight instructions maximum), a cache reload (four instructions maximum if the fetch occurs between the second and the fourth data beat, otherwise eight instructions) or a noncacheable fetch (two instructions maximum)


• The alignment of the fetch address

• How full the IQ is—Any instructions that do not fit into the IQ are refetched later. Any instructions being dispatched from the DS stage during this cycle may be replaced with incoming instructions.

7.3.1.4 Common Stages—Dispatch (DS) StageDuring the DS stage, instructions are dispatched to the appropriate execution units.Different instructions take different numbers of cycles for the dispatch stage—this is thefirst stage where different types of instructions are distinguished. In general, branch andinteger instructions are dispatched in zero cycles, while floating-point instructions aredispatched in one cycle.

Instructions in the IQ are considered to be in the dispatch stage, and in some contexts, suchas instruction timing diagrams, it is useful to refer to the different elements of theinstruction queue (IQ0–IQ7) as stages.

7.3.1.4.1 Branch DispatchBranches take zero cycles to dispatch—the branch being dispatched is in the BE stage inthe same cycle in which it is in the DS stage. The following is a list of reasons for a branchto be stalled at the DS stage:

1. There is a dependency on the LR caused by an mtlr instruction preceding the branch instruction in program order. For more information, see Table 7-9.

2. There is a dependency on the CTR caused by an mtctr instruction preceding the branch instruction in program order. For more information see Table 7-10.

3. The link shadow registers are both full and the branch instruction that needs to be dispatched next has the link bit set. For more information, see Section 7.2.1.4, “Branch Processing Unit (BPU).”

4. The branch that is to be dispatched next depends on if the CR and the MR stage is occupied. For more information, see Section 7.3.2.1, “Speculative Execution and Mispredict Recovery Mechanism.

5. The branch is on a nonpurged path. For more information, see Section 7.3.2.1, “Speculative Execution and Mispredict Recovery Mechanism.”

Notice that the BPU is the only unit for which dependency checking is done in the dispatchstage.

7.3.1.4.2 Integer DispatchAs noted in Section 7.2.1.1, “Dispatch Stage Logic,” the integer pipeline is used to ensurethat completion of instruction appears to be in program order. An integer instruction isdispatched only after all instructions in front of it have been dispatched. To enforce thisrequirement, an integer instruction is only dispatched when it is the first instruction in theIQ stage (IQ0). Unless there are stalls in the IU pipeline, dispatch takes zero cycles.Although instructions can stall in the dispatch stage, there are no conditions inherent to thedispatch stage that can cause an integer instruction to stall.


7.3.1.4.3 Floating-Point DispatchFloating-point instructions take a full cycle to dispatch. Floating-point instructions aredispatched from the DS stage into either the FD stage or the F1 stage, depending on whetherFD is stalled on the dispatch cycle. If the F1 stage is full, floating-point instructions cannotbe dispatched (even if the F1 stage is being emptied during this cycle). This is the onlydispatch stall for floating-point instructions.

7.3.1.4.4 Synchronization Tags for the Precise Exception ModelThe 601 supports a precise exceptions model for integer and branch instructions, whileprecise floating-point exceptions mode can be enabled or disabled through MSR[FE0] andMSR[FE1]. This section describes timing considerations when floating-point exceptionsare disabled (MSR[FE0] = MSR[FE1] = 0). Floating-point exceptions enabled mode isdiscussed in Section 7.3.1.4.5, “Dispatch Considerations Related to IU/FPUSynchronization.”

Regardless of whether floating-point precise exceptions are enabled, branch instructionsthat update the CTR or LR and all floating-point instructions generate tags when they aredispatched. These tags follow integer instructions (or bubbles in the IU pipeline when nointeger instruction is available) through the IU pipeline to keep the program orderinformation that is used to synchronize the three pipelines. When either a floating-point ora branch instruction is dispatched, the remaining instructions are collapsed into the spaceleft by the dispatched instructions, as shown in Figure 7-11.

Figure 7-11. Collapsing of Instruction Stream after Out-of-Order Dispatch

All floating-point and branch instructions except unconditional branch instructions that donot update the LR generate a tag that ensures that instructions will write back in an orderlyfashion. The three types of synchronization tags are as follows:

Integer 2

Branch 1 (B1)

Float 1 (F1)

Integer 1

—

—

—

Integer 2

—

—

—

—

IQ3:

IQ2:

IQ1:

IQ0:

State 1 State 2 State 3

Integer 1

—

—

Integer 2

Integer 1;

—

—

Integer 2

ID:

IE:

IC:

TAGF1; TAGB1

Integer 1;TAGF1; TAGB1


• Floating-point tags

• LR tags

• CTR tags

• There is a fourth tag that is used for mispredicted branch recovery called a predicted branch tag; this tag is discussed in Section 7.3.2.1, “Speculative Execution and Mispredict Recovery Mechanism.”

Generally, the synchronization tag is placed with the closest integer instruction precedingit in program order. However, in some situations an instruction is tagged to a bubble in theIU pipeline. For example, this occurs when a tag is already taken—when there are twofloating-point instructions with no intervening integer instruction, the first floating-pointinstruction can use the floating-point tag of the previous integer instruction, but since thattag has been used, the second floating-point instruction must tag to a bubble in the integerpipeline. A tagged bubble must also be created when a floating-point instruction updatesthe CR as described in Section 7.3.1.4.5, “Dispatch Considerations Related to IU/FPUSynchronization.” This also occurs when floating-point precise mode is enabled.

Multiple instructions can be tagged to a bubble in the same way that multiple instructionscan be tagged to an integer instruction. This is shown in Figure 7-12.

Figure 7-12. Instruction Tagging in the IU Pipeline

The actual synchronization between units is performed relative to the integer completionstage (IC). As an integer instruction completes the IC stage, the completion logic checks forany tags.When tags are detected, any dependencies are checked, and writeback is withhelduntil any dependencies related to the tags are resolved. Note that instructions are not forcedto complete in strict program order; the IC stage logic ensures that dependencies (such asthe CR) implied by program order are handled properly.

Float 2 (F2)

Float 1 (F1)

Integer 2

Integer 1

—

Branch 1 (Link) (L1)

Float 2

Integer 2; TagF1

—

—

—

TagF2; TagL1

IQ3:

IQ2:

IQ1:

IQ0:

State 1 State 2 State 3

Integer 1

—

—

Integer 2; TagF1

Integer 1

—

TagF2; TagL1

Integer 2; TagF1

Integer 1

ID:

IE:

IC:


Figure 7-12 illustrates the necessity of creating a bubble in the IU pipeline to provide a tagwhen there are two consecutive floating-point instructions. Note that Float 2 cannot befolded because Integer 2 already has a floating-point tag on it. As a result, the tag for Float2 occupies integer decode (ID) in state 3.

If a floating-point instruction is in the floating-point writeback stage (FW), it is allowed tocomplete; otherwise, the completion logic remembers that a floating-point instruction cancomplete when it arrives in the FW stage. The completion logic can keep track of as manyas three floating-point tags for which no floating-point instruction has written back. In otherwords, the IU can complete instructions past three incomplete floating-point instructionswithout stalling. It stalls when the fourth floating-point tag is in the IC stage. Again, thisassumes the processor is in floating-point instruction exceptions disabled mode, asdescribed in Section 7.3.1.4.5, “Dispatch Considerations Related to IU/FPUSynchronization.” Link and count tags cause the LR and CTR to be updated in the BPU asthe integer instruction clears the IC stage. The BPU is always ready to write backinstructions by the time their tags reach the IC stage, so it can always update these registerssynchronously with the IC stage.

Branch and floating-point instructions are considered to complete when their tags leave theIC stage (floating-point instructions may write back later, and branch instructions may getresolved later). Thus the 601 supports in-order completion, but permits out-of-orderwriteback.

7.3.1.4.5 Dispatch Considerations Related to IU/FPU Synchronization There are some additional dispatch considerations regarding synchronization between theFPU and the IU:

• Floating-point store operations are dispatched to both the FPU (for data fetch) and the IU (for EA calculation). Floating-point store operations may be dispatched to the FPU before they are dispatched to the IU; they can be dispatched to the FPU as soon as the standard floating-point dispatch criteria are met. For more information, see Section 7.3.1.4.3, “Floating-Point Dispatch.”

When a floating-point store is dispatched to the FPU, a float tag is generated and placed on the floating-point store that needs to be dispatched to the IU. From here on, the floating-point store behaves like a standard integer instruction tagged with a floating-point tag. When a floating-point store clears the IC stage, if the FPU is behind, (that is, the store has not arrived at the FW stage), then the store address (physical address—rA) is put into a floating-point store buffer to wait for the floating-point data—the FPSB stage. Only one instruction can reside in the FPSB stage at a time, so if another floating-point store arrives in the IC stage before the first store clears the FW stage, the second one remains in the IC stage until the FPSB stage is available. This is shown in Appendix I, “Instruction Timing Examples.”

• Floating-point instructions that update the CR, complete synchronously with the IC stage. Additionally, they are never dispatched out of order and are always tagged to a bubble in the integer pipeline rather than to an integer instruction. Thus a floating-


point instruction that updates the CR never shares the position in the IU pipeline with an integer instruction. The bubble in the IU pipeline stalls at the IC stage until the instruction reaches the FW stage.

• When precise floating-point exceptions mode is enabled, a floating-point instruction may cause an exception at the FW stage, preventing its completion. Therefore, its tag cannot leave the IC stage until the instruction is in the FW stage. In this mode, floating-point instructions are forced to be dispatched in order and must be tagged to bubbles in the IU pipeline to facilitate this synchronization requirement. Thus in floating-point exceptions enabled mode, floating-point and integer instructions are dispatched strictly in order (although branch instructions can be tagged to the integer instruction or to the bubble in the IU pipeline required by the floating-point instruction).

7.3.2 BPU Pipeline StagesThe BPU has three unique pipeline stages:

• Branch execute (BE)—During the BE stage, the branch target address is calculated and the branch direction is either determined (for nonconditional branches or conditional branches for which the CR is coherent), or predicted (for conditional branches for which the CR is not coherent). The BE stage is one cycle for all branches.

• Mispredict recovery (MR)—During the MR stage, the conditional branches stay until they are resolved. They enter the MR stage and the BE stage at the same time. A resolved conditional branch can leave the MR stage in one cycle, but an unresolved conditional branch stays in the MR stage until it is resolved. A (conditional) branch in the MR stage does not prevent another (unconditional) branch from entering the BE stage.

• Branch writeback (BW)—The BW stage can contain as many as nine branch instructions waiting to write back. They can leave after they complete (their tags leave the IC stage). Only branches that update the LR or the CTR must pass through the BW stage.

7.3.2.1 Speculative Execution and Mispredict Recovery MechanismThe 601 uses a static branch prediction algorithm for predicting unresolved conditionalbranches. The prediction algorithm is shown in Table 7-8. Notice that if the information isavailable, the branch instruction should be coded such that the prediction is correct most ofthe time.


When a branch is predicted, (recall that the CR is not coherent on the cycle in which thebranch is in BE), the address of the nonpredicted path is held in the MR stage along withthe information required to determine whether the prediction is correct. If the branch isresolved as correctly predicted, the MR stage is cleared and no recovery is performed.

If the branch is not resolved as predicted, the mispredict recovery address at MR is used asthe fetch address and any instructions from the nonpredicted path are purged from the IQ.Note that the MR and BW stages are different and that a branch does not necessarily leavethe MR stage before it leaves the BW stage. Also note that only branches that areconditional on the CR need to pass through the MR stage; unconditional branches andbranch-and-decrement branches that do not need the CR need not pass through the MRstage and can execute even when the MR stage is busy.

When a branch is resolved as predicted, there are generally instructions behind it inprogram order already in the DS stage. Because the processor could not determine whetherthose instructions are needed before the branch is resolved, it waits until the targetinstructions are loaded into the IQ before writing the instructions from the predicted pathover those from the nonpredicted path. This is referred to as a delayed purge. If the branchis not resolved as predicted and resolution occurs before the target instructions are writteninto the IQ, the sequential path is kept and the target instructions are not loaded into the IQ.While the nonpredicted path is in the IQ, it is referred to as the nonpurged path. Examplesshowing how these instructions are handled are provided in Appendix I, “InstructionTiming Examples.”

The speculative instructions in the pipeline are marked with the predicted branch tag, whichmarks the position of the predicted branch in program order in the integer pipeline. Noinstructions can be dispatched onto a predicted branch tag (although a predicted branch tagcan be placed on top of other tags). Also, instructions cannot be dispatched onto conditionalbranch instructions that are going to generate a predicted branch tag (that is, any branchesthat are conditional on the CR). Speculative execution is supported for the FPU and theBPU in the 601. The predicted branch tag is different from other tags in that it is not allowedto progress beyond the IE stage; thus speculative integer instructions are not allowed pastthe ID stage (they cannot go past the predicted branch tag in the IE stage).

Table 7-8. Branch Prediction

InstructionBO[y]

0 1

bc (forward) Not-taken Taken

bc (backward) Taken Not-taken

bclr Not-taken Taken

bcctr Not-taken Taken


The predicted branch tag is used for mispredict recovery to determine which instructions topurge. Any instructions behind the predicted branch tag in program order are speculativeand therefore purged. This tag is also used to mark the division between nonspeculativeinstructions and the nonpurged path when a delayed purge occurs. When a branch isresolved as predicted, the predicted branch tag is deleted. Note that there can only be oneoutstanding predicted branch and therefore only one predicted branch tag at a time. This isshown in Appendix I, Section I.1.1.6, “Conditional Branch Timing.”

7.3.2.2 Branch Pipeline TimingTable 7-9 through Table 7-12 show the timings of the four different types of branchinstructions executed by the BPU. These tables describe the ideal pipeline timing and theprocessor resources used. Resources are used either exclusively, (that is, no otherinstruction can access the resource at the same time—a write lock), or nonexclusively, (thatis, other instructions can access the resource at the same time—a read lock).

Note that for these tables, all memory accesses are assumed to hit in the cache. The numberof cycles for a memory access is larger for a cache miss. For more information, seeSection 7.2.2.3.3, “Cache Miss Timing.” Timing examples for multiple instructionsequences involving branches are given in Appendix I, Section I.1, “Branch InstructionTiming Examples.”

Table 7-9 shows the timing for the b, ba, bl, and bla instructions. Note that branchwriteback is delayed until the branch’s tag completes the IU pipeline. For branches with thelink bit set (LK = 1), the new link value is stored in a link shadow register and is written tothe architected LR in the BW stage. The branch-and-link instruction use one of the two linkshadow registers from the cycle it is dispatched until the cycle after the LR is updated(BW).


Table 7-10 shows the instruction timing for the bc, bcl, bca, and bcla instructions. Forbranches that are resolved k cycles after they are executed, k may be 0, in which case theCR is accessed in the DS/BE/MR cycle—this corresponds to the case when the conditionupon which the branch depends is already resolved.

Branch writeback is delayed until the branch tag completes (n cycles). Note that onlybranches that update either the CTR or LR need to pass through the BW stage. If n is lessthan k, the branch remains in the MR stage but leaves the BW stage.

a. Branch writeback is delayed n cycles until the branch’s tag completes. Note that only branches with LK=1 need to pass through the BW stage.

b. This is the FA stage for the instructions on the taken path for taken branches.

c. For branches with the link bit set (LK = 1), the new link value is stored in a link shadow register and is written to the architected LR in the BW stage. The branch-and-link instruction use one of the two link shadow registers from the cycle it is dispatched until the cycle after the LR is updated (BW).

Table 7-9. Branch Instruction Timing—b, ba, bl, bla

Number of Cycles 1 1 1 na

Pipeline stages FA FAb

CARB CARB

CACC

DS

BE

BW

Resources required nonexclusively

Resources required exclusively Link shadow registerc

Link shadow registerc.; LRc.

a. For branches that are resolved k cycles after they are executed. k may be 0 in which case the architected CR is accessed in the DS/BE/MR cycle—this corresponds to the case when the condition upon which the branch depends is already resolved.

b. Branch writeback is delayed until the branch tag completes (n cycles). Note that only branches that update either the CTR or the LR need to pass through the BW stage. If n < k, the branch remains in the MR stage but leaves the BW stage.

c. This is the FA stage for the target instructions of the branch (only for taken branches).

d. For branches with BO field specifying ‘decrement count and branch.’

e. Conditional branches access the architected CR on the last cycle of the MR stage; this access is what ends the MR stage.

f. For branches with the link bit set (LK=1), the new link value is stored in a link shadow register and is written to the architected LR in the BW stage. The branch-and-link instruction will use one of the two link shadow registers from the cycle it is dispatched until the cycle after the LR is updated (BW).

Table 7-10. Branch Instruction Timing—bc, bcl, bca, bcla

Number of Cycles 1 1 1 ka n-kb

Pipeline stages FA FAc

CARB CARB

CACC

DS

BE

MR MR

BW BW

Resources required nonexclusively CTRd CRe

Resources required exclusively Link shadow registerf

Link shadow registerf.

LRf., CTRd.

Table 7-11 shows the instruction timing for the bclr and bclrl instructions.

a. For branches that are resolved k cycles after they are executed. k may be 0 in which case the CR is accessed in the DS/BE/MR cycle—this corresponds to the case when the condition upon which the branch depends is already resolved.

b. Branch writeback is delayed n cycles—until the branch tag completes. Note that only branches that update either the CTR or the LR need to pass through the BW stage. If n < k, then the branch remains in the MR stage but leaves the BW stage.


d. For branches with BO field specifying ‘decrement count and branch.’

e. The bclr instruction needs the most recent LR (look-ahead state) and can access the link shadow registers.

f. Conditional branches access the architected CR on the last cycle of the MR stage; this access is what ends the MR stage.

g. For branches with the link bit set (LK = 1), the new link value is stored in a link shadow register and is written to the architected LR in the BW stage. The branch-and-link instruction will use one of the two link shadow registers from the cycle it is dispatched until the cycle after the LR is updated (BW).

Table 7-12 shows the instruction timing for the bcctr and bcctrl instructions.

Table 7-11. Branch Instruction Timing—bclr, bclrl



CARB CARB

CACC

D

BE

MR MR

BW BW

Resources required nonexclusively CTRd, LRe CRf

Resources required exclusively Link shadow registerg

Link shadow registerg.

LRg., CTRd.

7.3.3 Integer Pipeline StagesThis section discusses the IU pipeline. The objective is to explain how each instructionflows through the IU pipeline. A block diagram of the IU pipeline is shown in Figure 7-13.

a. For branches that are resolved k cycles after they are executed. k may be 0 in which case the architected CR is accessed in the DS/BE/MR cycle—this corresponds to the case when the condition upon which the branch depends is already resolved.

b. Branch writeback is delayed n cycles—until the branch tag completes. Note that only branches that update either the CTR or the LR need to pass through the BW stage. If n < k, then the branch remains in the MR stage but leaves the BW stage.


d. The bcctr instruction needs the most recent CTR (look-ahead state) and can access the results of previous ‘decrement count branches’ before they writeback (leave BW).

e. For branches with the link bit set (LK = 1), the new link value is stored in a link shadow register and is written to the architected LR in the BW stage. The branch-and-link instruction will use one of the two link shadow registers from the cycle it is dispatched until the cycle after the LR is updated (BW).

Table 7-12. Branch Instruction Timing—bcctr, bcctrl



CARB CARB

CACC

D

BE

MR MR

BW BW

Resources required nonexclusively CTRd CR

Resources required exclusively Link shadow registere

Link shadow registere.

LRe.


Figure 7-13. IU Pipeline Showing Data, Instruction, and Tag Flow

Each block in the diagram represents a different stage (except for the box labeled “GPRs”,which represents the IU general purpose registers). There are three different flows shownin the diagram—the instruction flow; the tag flow, and the data flow.

In most cases, it takes one cycle to go from one stage to another, as shown by the markedcycle boundaries in the diagram. However, an instruction that wishes to access the cache(load or store) resides in the CARB stage (cache arbitration) on the same cycle it is in IE(integer execute). Also, instructions that are in writeback can provide data to the ID/IEboundary through the GPRs in the same cycle. Each instruction uses a subset of the stages

GPRs

ID

IE

ISB FPSB

CARB

IWLIWAIC

From Floating-Point Unit

Feed-Forwarding

= Instruction Flow

= Data Flow

= Tag Flow

CACC

CycleBoundary


identified, although the subset varies from instruction to instruction. Likewise, theoperations performed in each stage may vary from one instruction to another.

All instructions executed by the IU are dispatched to integer decode (ID), which is identicalto IQ0 unless there is a stall in the IU pipeline, in which case the next integer instructionmay fall into IQ0 without entering ID. ID is the first IU stage for all IU instructions.

From ID all instructions go to integer execute (IE) where, for most instructions, most of theinstruction’s function is carried out (e.g. add instructions add two numbers together, etc...).All IU instructions spend one or more cycles in this stage. Instructions that use the cacheor bus interface unit (BIU) arbitrate for cache access while the instruction is in IE. Theblock marked CARB is the cache arbitration block.

Where an instruction goes from IE depends on the instruction type. Arithmetic and logicalinstructions go to integer writeback for ALU operations (IWA). Integer instructions thataccess the cache simultaneously enter the CARB and IE stages and then go to the integerstore buffer (ISB) and cache access (CACC) stages. Note that while the instruction is in theCACC stage, it must also remain in the ISB stage simultaneously in case the instruction inCACC is unable to successfully access the cache because of a cache retry. If that is the case,ISB holds the original request so that it can participate in the arbitration process onsubsequent cycles, and eventually the instruction accesses the cache successfully.

Floating-point store operations are unique in that the store address is provided by the IUand the store data is provided by the FPU. The FPSB holds the store request and addresswhile the data is being generated in the FPU. When the data is available, the instruction inthe FPSB participates in the arbitration. Link integer instructions, floating-point storeinstructions in the CACC stage also reside in the FPSB stage in case of a cache retry.

The IC stage is entered the first cycle after an instruction leaves IE in parallel with theinstruction moving into some other stage (CACC for load/stores; IWA for arithmeticoperations). When an instruction moves into IC, it indicates that the instruction iscommitted, even though that instruction’s results may not have been written back to theappropriate register (or cache) yet. Tags flowing through IC represent instructioncompletion for instructions that are executed in the BPU or the FPU.

Load operations have one more stage to enter beyond CACC. Integer load operations moveto integer writeback for load operations (IWL) the cycle after a successful cache access, andfloating-point load operations move to floating-point writeback for load operations (FWL)on the cycle after a successful cache access. From these stages the registers are written withdata read from the cache/memory subsystem.

7.3.3.1 Integer Pipeline—Integer Decode (ID) StageThe Integer Unit (IU) pipeline receives instructions from the fetcher/dispatch unit into theID stage. The instructions are decoded and data is read from the register file. All IUinstructions take one cycle in the IU decode stage. Instructions may be held in the decodestage by the 601 control logic for two reasons.


• There is an instruction in the IU execute stage (IE) and it is being held there for some reason, such as when additional cycles are required to perform a calculation or gain access to a resource.

• The instruction in decode is trying to read the CR or XER that is being written by the instruction in execute.

Note that instructions are not held in decode for most register hazards. There is a forwardpath, shown in Figure 7-13, that brings ALU results to decode (bypassing the GPR filewhen the instruction in decode wants to use the results of the instruction in execute).

7.3.3.2 Integer Pipeline—Integer Execute (IE) StageThe IU execute stage receives the instruction and its associated register data from decode.The following is a list of the major functions performed in the IU execute (IE) stage.

• The IE stage is where data manipulation occurs for all of the ALU instructions—arithmetic, logical, CR logical, shifts, and rotates.

• Many of the SPRs are written and read during the IE stage.

• Instructions that require translation are translated by the MMU when they are in the IE stage.

• Instructions that need to access the cache send a request from IE to the cache arbiter. That is, such an instruction enters the CARB stage simultaneously with the last cycle of the IE stage.

• There are several reasons an instruction may be held (stall) in IE. These reasons include:

— The instruction needs to access the cache and the cache is busy.

— The instruction in IWA is being held (stalled)—perhaps for pipeline synchronization—so the instruction in IE cannot proceed.

— The IE instruction may have a GPR dependency with an outstanding load.

— Floating-point load operations may have an outstanding floating-point ALU instruction.

When an instruction is stalled in IE, instructions in ID also stall.

7.3.3.2.1 IE Stage—ALU Instruction OperationMost ALU instructions spend one cycle in IE. Data is taken from latches that separate IDfrom IE and is manipulated as required by the instruction. As shown in Figure 7-13, theresults are made available both to the IWA stage and to the latch between the ID and IEstages for use by the next instruction for use in the next cycle.

Multiply and divide instructions take more than one cycle in IE, which reduces thethroughput accordingly. For example, the mulli instruction always takes five cycles in IE,as shown in Table 7-16, and each of the divide instructions always take 36 cycles in IE. Thenumber of cycles in IE taken by mul, mullw, mulhw, and mulhwu depends on the dataspecified in rB, as shown in Table 7-15. If bits 0–16 of the rB operand are sign bits

(positive: rB < (215 – 1 or negative: rB ≤ –215), the instruction spends five cycles in IE.Otherwise, the instructions spends nine cycles in IE. For the mulhwu instruction only, ifthe most significant bit of rB is a 1, the instruction takes 10 cycles in IE.

Many IU instructions have certain side effects defined as part of their operation. These sideeffects include the update of the CR field 0, XER[CA], XER[SO], and XER[OV], whichoccurs on the last cycle that the instruction is in IE.

7.3.3.2.2 IE Stage—Basic Load and Store Instruction OperationLoad operations and store operations perform different logic functions in IE than ALUoperations. The EA of the load or store is generated in IE and EA is passed to the MMU fortranslation. The translated address and a set of control bits are passed to the cache arbiterto request cache access. The control bits include information that identifies the operation asa load, store, or cache operation; it indicates the size of the operand; and identifies the targetregister for load operations.

Accesses to the memory subsystem are described in Section 7.3.3.3, “Integer Operationsthat Access the Memory Subsystem.”

Load operations and store operations that are aligned on an addressing boundary at least aslarge as the size of the operand can do all of this work in one IE cycle. For example, an lwinstruction with an EA of x'00004C34' is properly aligned and completes in one IE cycle.On the 601, improperly aligned load operations and store operations can be handled in onecycle only if the operand does not cross a double-word addressing boundary; otherwise anadditional cycle is required for the additional access.

The IU performs the EA generation, translates the address, and makes cache requests onbehalf of the FPU for floating-point load operations and store operations. Floating-pointload operations are only dispatched to the IU. Floating-point store operations aresimultaneously dispatched to both the FPU and the IU. The FPU provides the store data;however, it may not be available until after the cache request is made by the IU. Therefore,there is a floating-point store buffer (FPSB) that holds the address and request informationuntil the floating-point store data is available. This buffer allows the IU pipeline to continueexecuting instructions after a floating-point store.

7.3.3.3 Integer Operations that Access the Memory SubsystemSome operations in the integer pipeline require access to the memory subsystem. As shownin Figure 7-13, such operations enter the CARB stage during the same cycle that they arein the IE stage. The instruction continues to occupy the IE stage until it can either enter theCACC stage or one of the store buffer stages (ISB or FPSB).

7.3.3.3.1 Integer Pipeline—CARB Stage If the instruction in IE wins the arbitration, the cache access occurs on the next cycle;otherwise, the request is saved in a set of retry latches inside the data access queueing unit,shown as the integer store buffer (ISB) and floating-point store buffer (FPSB) inFigure 7-13. Examples of when the store buffers are required are shown in the multiple


instruction timing examples in Appendix I, “Instruction Timing Examples.” A request inthe store buffers is arbitrated as soon as it is the highest priority request. The requestinginstruction is allowed to move out of IE because the request is either granted access orqueued by the arbiter. However, each store buffer is only one element deep. Therefore, ifthere is an IU request in the ISB stage, the instruction in IE cannot arbitrate for cacheaccess. In other words, the instruction is held (stalled) in execute until the ISB stage isempty.

7.3.3.3.2 Integer and Floating-Point Store Buffer Stages (ISB and FPSB)When an integer instruction enters the CACC stage after the IE stage is completed, it alsooccupies the appropriate store buffer. This provides a place for the instruction if a cacheretry is necessary. The instruction remains in the buffer until the CACC stage is complete.

7.3.3.3.3 Address TranslationA TLB miss occurs if the translation for an instruction is not in the BAT registers or thetranslation lookaside buffer (TLB). A TLB miss causes the cache request to be aborted andstarts a state machine that performs a table search operation. A page fault occurs if the pagetable in memory does not contain the page requested. If the TLB reload is successful, therequest goes through arbitration for the cache (the CARB stage). It then proceeds like anormal load/store operation to the CACC and IWL stages.

7.3.3.3.4 Unaligned Load/Store OperationsUnaligned load operations and store operations and load/store string/multiple instructionsmay spend more than one cycle in IE. Load operations and store operations with unalignedoperands may access two different aligned double words of data in cache or memory.However, the 601 can only handle data contained within one double word when servicinga load or store. Therefore, the IU splits load operations and store operations that cross adouble word addressing boundary into two pieces. The first piece (called a splice 1 request)accesses data in the first double word. The second piece (called a splice 2 request) accessesdata within the second double word. Logic in the IE stage detects these double-wordboundary crossings and splits the request into two pieces: the splice 1 and the splice 2. Todo this the instruction is held in IE one extra cycle—for a total of two cycles in IE. For loadoperations, the data from the two requests are spliced together before the data is written intothe register file. For store operations the two requests are serviced by the memorysubsystem completely independently of each other. For both load operations and storeoperations, unaligned support requires two cache access (CACC) cycles (one each for thesplice 1 and the splice 2 requests). If the load operations/store operations are to a cacheinhibited or write-through page, the splice 1 and splice 2 each require a separate busarbitration and data transfer.

As a result, load/store accesses that cross a double-word addressing boundary provide halfthe throughput of aligned access. Load/store accesses that cross a 4-Kbyte page boundaryor 256-Mbyte segment boundary may take an alignment exception. For more details on thealignment exceptions see Chapter 5, “Exceptions.”


7.3.3.3.5 Load/Store String/Multiple Operations Load/store string/multiple instructions typically transfer several words of data. On the 601,these instructions split the transfer up into word-length pieces of data. Each word of datacorresponds to a register to be read from (store operations) or written to (load operations).These instructions make one cache request for every cycle that they are in IE until enoughrequests are made to transfer all of the data specified by the instruction. For example, lmw28,0(r1) transfers four words from memory addressed by r1 to register r28, r29, r30, andr31. A request is made for a word of data for r28 during the first cycle in IE; for r29 duringthe second cycle in IE; for r30 during the third cycle in IE; and for r31 during the fourthcycle in IE. The cache accesses occur in consecutive cycles (one cycle delayed from therequests made from IE) as long as the load hits in the cache and there are no higher priorityrequests made to the cache arbiter.

Note that in future implementations these instructions are likely to have greater latency andtake longer to execute, perhaps much longer, than a sequence of individual load or storeinstructions.

Since string operations specify the transfer size in bytes, the last request generated by astring is for one, two, three, or four bytes of data.

Load/store string/multiple operations are subject to the same behavior for unaligned data asnormal load operations and store operations. As previously discussed, cache requests aremade for word size pieces of data (one for each register source/target register specified bythe instruction). If the word-length pieces of data cross a double-word addressing boundary,the access for that word is broken into two pieces (splice 1 and splice 2). In the limit, aload/store string/multiple operation that is not aligned on a word boundary takes 50%longer than the same instruction aligned on a word boundary (ignoring alignmentexceptions at 4-Kbyte page crossings).

7.3.3.3.6 Integer Pipeline—Cache Access (CACC) StageFor load, store, and cache operations, there is a cache access stage immediately followingthe IE stage. That is, store data is written into the cache and load data is read from the cacheon this cycle. If it is a cache miss, the request is sent to the bus interface unit. On a cachehit, load data from the cache is forwarded to the latches above the execute (IE) stage. All ofthe load data modification necessary before writing the GPRs is done on this cycle beforethe data is latched up. (Load data modifications include: sign extension, zero extension, andbyte reversal.) On the next cycle, the data is available for the instruction in the IE stage tobegin execution, and the data is written into the GPRs in the IWL stage.

A sequence of load operations and store operations may be processed by the cache, one oneach consecutive cycle if they all hit in the cache. There are other system activities that mayrequire the cache on a given cycle; in which case, the load or store from IU would stall ifthe ISB stage is already full. These activities would include cast out operations resultingfrom a bus snoop hit in the cache or reload activity due to a previous cache miss.


7.3.3.4 Integer Pipeline—Integer Writeback Stages (IWA and IWL)The IU writeback stage is different for ALU operations than for load operations. Storeoperations (except for store with update operations) do not have a writeback stage. ALUoperations have a writeback stage (IWA) on the cycle immediately following IE stage. TheALU writeback stage has a dedicated write port into the GPR register file. Load operationshave a writeback stage (IWL) on the cycle that follows the cache access. The IWL also hasa dedicated write port into the register file. These two writeback stages are independent ofeach other and may have different instructions on the same cycle. Both write ports can beused simultaneously—with the exception that they cannot both write the same register onthe same cycle.

Update-form load operations and store operations write the EA into the addressing register(rA) during the ALU writeback stage (IWA) in parallel with the cache access.

Some instructions are executed with assistance from the sequencer. These instructionstypically spend two cycles in execute and N cycles in writeback. (N varies from 1 to 20cycles depending on the instruction.)

All integer instructions pass through the IWA stage, even though some of these instructions(such as cmp and store instructions) write to the GPRs.

7.3.3.5 Integer Pipeline Instruction TimingsThis section describes the flow of an integer instruction through the integer pipelineignoring any interaction with other instructions. In the real world there are datadependencies and pipeline synchronization issues that must be considered. Such situationsare shown in Appendix I, “Instruction Timing Examples.”

7.3.3.5.1 Arithmetic InstructionsThe timing of the add and subtract instructions is described in Table 7-13. This timing isappropriate for the following instructions: addi, addis, add, addic, addic., subfic, subf,addc, subfc, adde, subfe, addme, subfme, addze, subfze, neg, abs, nabs, doz, and dozi.

The instruction spends one cycle in IE. The status bits (XER[CA], XER[OV SO], and CR0)are all written from IE. Resource usage of the various types of instructions is summarizedin Table 7-14.

Table 7-13. Integer Addition Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA, rB, CA

Resources required exclusively CA, SO, OV, CR0, rD


The multiply instructions are shown in Table 7-15 (mul, mullw, mulhw, mulhwu) andTable 7-16 (mulli). These instructions take 5, 9, or 10 cycles in IE as described inSection 7.3.3.2.1, “IE Stage—ALU Instruction Operation.” All of the multiply instructionsin the 601 use the MQ register, which is undefined after an mulhw or mulhwu instruction.

Table 7-15 shows the timing for the mul, mullw, mulhw, and mulhwu instructions.

Table 7-14. Resource Usage for Integer Addition Instructions

Instruction

Nonexclusive Resource Usage Exclusive Resource Usage

Read CA Read rA Read rB Update CA Update CR0Update SO,

OV

add No If !(rA = 0) Yes No If (RC = 1) If (OE = 1)

addc No Yes Yes Yes If (RC = 1) If (OE = 1)

adde Yes Yes Yes Yes If (RC = 1) If (OE = 1)

addi No If !(rA = 0) No No No No

addic No Yes No Yes No No

addic. No Yes No Yes Yes No

addis No If !(rA = 0) No No No No

addme Yes Yes No Yes If (RC = 1) If (OE = 1)

addze No Yes Yes Yes If (RC = 1) If (OE = 1)

neg No Yes No No If (RC = 1) If (OE = 1)

subf No Yes Yes No If (RC = 1) If (OE = 1)

subfc No Yes Yes Yes If (RC = 1) If (OE = 1)

subfe Yes Yes Yes Yes If (RC = 1) If (OE = 1)

subfic No Yes No Yes No No

subfme Yes Yes No Yes If (RC = 1) If (OE = 1)

subfze No Yes Yes Yes If (RC = 1) If (OE = 1)


As shown in Table 7-16, the mulli instruction takes only five cycles in IE.

Table 7-17 shows the timing for divide instructions (div, divs, divw, and divwu). Eachdivide instructions takes 36 cycles in the IE stage. They all use the MQ register. PowerPCdivide instructions do not return the remainder and the contents of the MQ after theinstruction completes is undefined. The 601 handles the MQ for POWER divideinstructions as defined in the POWER architecture. These instructions are described inChapter 10, “Instruction Set.”

a. Number of cycles is data dependent.

b. Update CR0 if (RC=1), updates SO and OV if (OE=1).

Table 7-15. Multiply Instruction Timing (mul, mullw, mulhw, mulhwu)

Number of Cycles 1 5/9/10a 1

Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA, rB

Resources required exclusively rD, MQ, CR0, SO,OVb

Table 7-16. Multiply Instruction Timing (mulli)


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA

Resources required exclusively rD, MQ


Compare instructions spend one cycle in ID and one cycle in IE, but essentially have noIWA stage. The compare results are written to the CR and forwarded to the BPU (forconditional branch evaluation) in the middle of the IE cycle. The timing for the compareinstructions is shown in Table 7-18.

The flow of the trap instructions (tw and twi) depends on whether they cause a programexception. If they do not cause a program exception, one cycle is required in IWA. If theycause a program exception, 22 cycles are required in the IWA stage. It is important to notethat this performance penalty occurs only when the trap condition is met. Typically, this isa rare condition and performance is not critical.

a. rB is used only by the cmp and cmpl instructions.

b. Compare results are forwarded to the BPU for branch resolution.

Table 7-17. Divide Instruction Timings (div, divs, divw, divwu)


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA, rB, MQ,

Resources required exclusively rD, MQ, CR0, SO OV

Table 7-18. Integer Compare Instruction Timing (cmp, cmpi, cmpl, cmpli)


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA, rBa

Resources required exclusively CR[BF]b


7.3.3.5.2 Boolean Logic Instruction TimingsBoolean logic instructions include the standard Boolean logic functions, sign extension ofa byte within a register, sign extension of a halfword within the register, and a count leadingzeros in a register instruction. These instructions include the following: andi., andis., ori,oris, xori, xoris, and, andc, eqv, nand, nor, or, orc, xor, extsb, extsh, and cntlzw. Theseinstructions flow through the integer pipeline in the same manner, spending a single cyclein IE and completing in IWA. Timing for Boolean instructions is shown in Table 7-20.

Table 7-21 lists the resources required by the Boolean logic instructions.

a. One cycle is required if the trap is not taken; 22 cycles are required if it is taken.

b. Only the tw instruction uses rB.

a. Table 7-21 summarizes resources used by individual instructions.

Table 7-19. Trap Instruction Timing (tw, twi)

Number of Cycles 1 1 1 or 22a

Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA, rBb

Resources required exclusively

Table 7-20. Boolean Logic Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rS, rB

Resources required exclusively rA, CR0a


7.3.3.5.3 Rotate, Shift, and Mask Instruction TimingsThe 601 supports rotate, shift and mask instructions of both the POWER and PowerPCarchitectures. These instructions each spend one cycle in each of the ID, IE, and IWAstages. However, there is a wide variation in the resources that these instructions read to andwrite from.

Table 7-22 shows the timing for rotate, mask, and shift instructions, which include thefollowing: rlimi, rlmi, rlinm, rlnm, rrib, sl, sr, slq, srq, sliq, sriq, slliq, srliq, sllq, srlq,sle, sre, sleq, sreq, srai, sra, sraiq, sraq, srea, maskg, and maskir.

a. Table 7-23 summarizes the resources required by individual instructions.

Table 7-21. Resource Usage for Boolean Logic Instructions

InstructionNonexclusive Resource Usage Exclusive Resource Usage

Read rA Read rB Update CR0

andi., andis. Yes No Yes

ori, oris, xori, xoris Yes No No

and, andc, eqv, nand, nor, or, orc, xor

Yes Yes If (RC=1)

extsb, extsh, cntlzw Yes No If (RC=1)

Table 7-22. Rotate, Mask, and Shift Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusivelya rA, rB, rS, CA, MQ

Resources required exclusivelya. rA, MQ, CR0


Table 7-23 summarizes resources required by each instruction and indicates which are usedexclusively (writes) and which are used nonexclusively (reads).

Table 7-23. Resource Usage for Rotate, Shift, and Mask Instructions

InstructionNonexclusive Resource Usage Exclusive Resource Usage

Read rB Read MQ Update CA Update MQ Update CR0

maskg Yes No No No If (RC=1)

maskir Yes No No No If (RC=1)

rlimi No No No No If (RC=1)

rlmi Yes No No No If (RC=1)

rlinm No No No No If (RC=1)

rlnm Yes No No No If (RC=1)

rrib Yes No No No If (RC=1)

sl Yes No No No If (RC=1)

sr Yes No No No If (RC=1)

slq Yes No No Yes If (RC=1)

srq Yes No No Yes If (RC=1)

sliq No No No Yes If (RC=1)

sriq No No No Yes If (RC=1)

slliq No Yes No Yes If (RC=1)

srliq No Yes No Yes If (RC=1)

sllq Yes Yes No No If (RC=1)

srlq Yes Yes No No If (RC=1)

sle Yes No No Yes If (RC=1)

sre Yes No No Yes If (RC=1)

sleq Yes Yes No Yes If (RC=1)

sreq Yes Yes No Yes If (RC=1)

srai No No Yes No If (RC=1)

sra Yes No Yes No If (RC=1)

sraiq No No Yes Yes If (RC=1)

sraq Yes No Yes Yes If (RC=1)

srea Yes No Yes Yes If (RC=1)


7.3.3.5.4 Condition Register (CR) Instruction TimingsAlthough the CR can be modified in many different ways, it is most often altered when theRC bit on an instruction encoding is set. Setting RC does not increase the instructionexecution time for integer instructions; however, for floating-point instructions, setting theRC bit requires that the IU and FPU pipelines be synchronized. This section shows timingdiagrams for IU instructions that read or write the CR.

Integer ALU instructions that have the RC bit set write the CR from IE and there is noadditional latency. See Table 7-13.

As shown in Table 7-24, each of the eight CR logical instructions (crand, cror, crxor,crnand, crnor, crandc, creqv, and crorc) flow through the IU pipeline in the same way.

The mtcrf instruction, shown in Table 7-25, copies the contents of a GPR into the CR on afield-by-field basis. Although the PowerPC architecture allows performance to degrade ifmore than one field is being written, the 601 has the same optimal performance regardlessof the field mask specified in the instruction. The CR is written during the IE stage.

a. Any combination of fields can be written as specified by the mask field in the instruction.

Table 7-24. CR Logical Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively CR[BA], CR[BB]

Resources required exclusively CR[BT]

Table 7-25. mtcrf Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rA

Resources required exclusively CR0–CR7a


The mfcr instruction, shown in Table 7-26, reads the CR from IE and copies that value intothe specified GPR during IWA.

The mcrf instruction, shown in Table 7-27, copies the contents of one specified field of theCR into another. This instruction is unusual in that it reads the CR from the decode stage.The timing for this instruction is the same regardless of which fields are specified.

The mcrxr instruction, shown in Table 7-28, copies bits 0–3 from the XER to the specifiedfield in the CR. This instruction reads the XER in the ID stage and writes to the CR in theIE stage. The performance is the same regardless of which bit field is specified.

Table 7-26. mfcr Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively CR

Resources required exclusively rD

Table 7-27. mcrf Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively CR[BFA]

Resources required exclusively CR[BF]

Table 7-28. mcrxr Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively XER[0–3]

Resources required exclusively CR[BF]


7.3.3.5.5 Move to SPR (mtspr) Instruction TimingsThe mtspr and mfspr instructions flow differently through the IU pipeline depending onwhich SPR is accessed. Timing of the mtspr instruction through the IU pipeline dependson the SPR being accessed.

• SPR accesses that require one cycle per stage—MQ, XER, PIR, DABR, EAR, RTCL, DEC, LR, and CTR (shown in Table 7-29) are both read from and written in the IE cycle.

The SDR1 register requires one cycle per stage, but is written from IE and can be read from the IWA stage (see Table 7-29).

• SPR accesses that require two cycles in the IE stage and one cycle in each of the other stages—This group includes accesses to SPRGn, DSISR, DAR, RTCU, SRR0, SRR1. These accesses require two cycles in IE and one cycle in IWA. Note that if the next instruction is another of these instructions, it requires two additional cycles in IE.

Table 7-29. mtspr Instruction Timing (One Cycle per Stage)


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively rS

Resources required exclusively SPR

Table 7-30. mtspr Instruction Timing (SDR1)


Pipeline stages ID

IE

IC

IWA


Resources required exclusively SPR


Timing for this access is shown in Table 7-31.

• Three of the 601’s hardware implementation-dependent registers (HID0, HID1, and HID2)—These SPRs take 11, 12, and 17 cycles in writeback respectively. Timing for these instruction is shown in Table 7-32. Writes to these SPRs are context-synchronizing, which causes the instructions in the IQ to be purged and restarts fetching under the new context. This causes a minimum three-cycle penalty before the next instruction can be executed (in IE).

7.3.3.5.6 Move to MSR (mtmsr) InstructionThe mtmsr instruction flows through the IU pipeline similarly to the mtspr for HID0,HID1, and HID2 (shown in Table 7-31), except that it is context-synchronizing. It spendstwo cycles in execute and either 17 or 20 cycles in IWA. The instruction may cause animmediate exception if an exception condition is present and the enable is turned from offto on. If it does not cause an exception, it takes 17 cycles in IWA if the FPSCR[FEX] bit iscleared; otherwise it takes 20 cycles in IWA.

Because mtmsr instruction is context-synchronizing, it is held in ID until IE and IWA areempty, and instructions in the IQ are purged. After the mtmsr instruction completes,subsequent instructions are refetched, which causes at least a three-cycle stall before the

Table 7-31. mtspr Instruction Timing (Two Cycles in the IE Stage and One Cycle in the Writeback Stage)


Pipeline stages ID

IE

IC

IWA


Resources required exclusively SPR SPR

Table 7-32. mtspr Instruction Timing (Two Cycles in the IE Stage and Multiple Cycles in the Writeback Stage)—(HID0, HID1, and HID2)

Number of Cycles 1 2 n

Pipeline stages ID

IE

IC

IWA


Resources required exclusively SPR SPR


subsequent instructions can enter the ID stage. Therefore, it is the only instruction in themachine at the time. Timing for this instruction is shown in Table 7-33.

In Table 7-33, n is 17 cycles if the FPSCR[FEX] bit is clear, and is 20 if the FPSCR[FEX]bit is set.

7.3.3.5.7 Move to Segment Register InstructionsThe mtsr and mtsrin instructions spend one cycle in the IE stage and write the registerfrom the IWA stage. An mtsr instruction in the IWA stage writes the register during the firsthalf of the cycle. Timing for these instruction are shown in Table 7-34.

a. The mtmsr instruction remains in the decode stage (ID) until all previous instructions in program order have completed processing. This latency is variable for this reason.

b. Because the mtspr instruction is context-synchronizing, there are no other instructions in the pipeline, so all resources are not considered to be required exclusively or nonexclusively in the conventional sense.

Table 7-33. mtmsr Instruction Timing

Number of Cycles na 2 n

Pipeline stages ID

IE

IC

IWA

Resources requiredb rS, SPR SPR

Table 7-34. mtsr and mtsrin Instruction Timing


Pipeline stages ID

IE

IC

IWA


Resources required exclusively SR


7.3.3.5.8 Move from Special Purpose Register (mfspr)The timing of the mfspr instruction depends on the SPR being accessed, which are asfollows:

• Some SPRs take one cycle in IE and one cycle in IWA. These include the following registers—MQ, XER, PIR, DABR, EAR, RTCL, DEC, LR, AND CTR. The SPR is read from IE and the GPR is written during IWA. Timing for these accesses is shown in Table 7-35.

• Some SPR accesses spend two cycles in IE and multiple cycles in IWA. Accesses to the SPRGn, DSISR, DAR, HID0, HID1, HID2, RTCU, SRR0, and SRR1 registers require four cycles in IWA; accesses to the PVR register take seven cycles. These accesses take longer than mtspr accesses to the same registers. Timing for these accesses is shown in Table 7-36.

• As with the mtspr instruction, accesses to the SDR1 register with the mfspr instruction have a unique timing. These accesses take one cycle in IE where the SDR1 is read, and it spends one cycle in IWA where the GPR is written. Data cannot be forwarded for this register; therefore, a dependent operation that follows immediately after mfspr stalls for one cycle in IE. Timing for this instruction is shown in Table 7-37.

a. PVR is the only SPR that takes seven cycles in the IWA stage. The rest of the SPR’s listed in this table take four cycles in IWA.

Table 7-35. mfspr Instructions—One Cycle in IE and One Cycle in IWA


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively SPR

Resources required exclusively rD

Table 7-36. mfspr Instructions—Two Cycles in IE and Multiple Cycles in IWA

Number of Cycles 1 2 4 or 7 a

Pipeline stages ID

IE

IC

IWA


Resources required exclusively rD rD


7.3.3.5.9 Move from Machine State Register (mfmsr) Instruction TimingThe timing of the mfmsr instruction is similar to the mfspr instruction shown inTable 7-36—two cycles are spent in IE and two cycles are spent in the IWA stage. Theregister can be read during the IWA stage. Timing for this instruction is shown inTable 7-38.

7.3.3.5.10 Move from Segment Register Instruction TimingThe mfsr and mfsrin instructions have the same timing. The segment registers are readduring the IE stage and written to the GPR during the IWA stage. The timings for theseinstructions are shown in Table 7-39. Because data written into the GPR cannot not beforwarded to the instruction that immediately follows, a one-cycle stall occurs if the nextinstruction depends on data in this GPR.

Table 7-37. mfspr Instruction Timing (SDR1)


Pipeline stages ID

IE

IC

IWA



Table 7-38. mfmsr Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively MSR



.

7.3.3.5.11 System Call (sc) and Return from Interrupt (rfi) Instruction Timings

Both the sc and rfi instructions are context synchronizing; they establish a new context andthen branch to that target. All prefetched instructions are discarded and fetching beginsagain at the new address under the new machine state. This causes at least a three-cycle stallbefore the next instruction can enter the IE stage. These instructions are held in the ID stageuntil all preceding instructions have completed. The sc instruction takes 16 cycles in IWA,and the rfi instruction takes 13 cycles in IWA. Timing for these instructions is shown inTable 7-40.

7.3.3.5.12 Cache Instruction TimingsTable 7-41 and Table 7-42 provide best-case examples for cache instructions. Timings forcache instructions depend upon the MESI state of the cache block. Timing for cacheinstructions that require bus activity, shown in Table 7-41, also depend upon bus speed,timing of the AACK signal, whether the processor is parked on the bus, and bussynchronization. The timing for the CACC stage may be longer for a given systemimplementation.

a. The sc instruction takes 16 cycles in IWA; the rfi instruction takes 13 cycles in IWA. Both instructions are context-synchronizing, which causes at least a three-cycle stall before the next instruction can enter the IE stage.

Table 7-39. mfsr and mfsrin Instruction Timing


Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively SR


Table 7-40. rfi and sc Instruction Timings

Number of Cycles 1 2 13 or 16a

Pipeline stages ID

IE

IC

IWA

Resources required nonexclusively MSR



Timings here assume that the WIM bits are set to b'001'—that is, write-back, cachingenabled, and memory coherency enforced.

There are three cycles of processor activity and at least three cycles of bus activity for a totalof at least six cycles in the CACC (cache access) stage. The following formula can be usedto calculate the delay for a specific system implementation:

CACC latency = 3 + ASP + (2 + AD + BNP) * BPR

In this formula, ASP is a synchronization parameter. AD is the number of bus cycles delayfor the AACK signal, BNP is 0 if the bus is parked and 1 otherwise. BPR is the bus-to-processor clock frequency ratio (1 for a full-speed bus, 2 for a half-speed bus, 3 for a third-speed bus, etc...).

Table 7-41 shows the timing for the tlbi, sync, dcbf(E,S,I), dcbst(E,S,I), dcbi, dcbz(S,I),and store(S) instructions.

Table 7-42 shows best-case timings for the dcbf(M), dcbst(M), and dcbz(M,E)instructions. Note that the CACC stage requires only one cycle because these instructionsdo not require access to the system interface.

a. Note that this is the best-case timing. The time in this stage may be longer for a given system implementation. Actual timing is dependent on bus speed, timing of the AACK signal, whether the processor is parked on the bus, and bus synchronization.

Table 7-41. Cache Instruction Timings—tlbi, sync, dcbf(E,S,I), dcbst(E,S,I), dcbi, dcbz(S,I), store(S)

Number of Cycles 1 1 6a

Pipeline stages ID

IE

CARB

CACC

IC




7.3.3.5.13 Load Instruction Timing All load operations (except string and multiple load operations) spend one cycle in IE if theoperand does not cross a double-word boundary. Table 7-43 shows the timing for theseinstructions, which include the lbz, lbzx, lhz, lhzx, lha, lhax, lwz, lwzx, lwarx, lfs, lfsx,lfd, and lfdx instructions.For more information, see Section 7.3.3.2.2, “IE Stage—BasicLoad and Store Instruction Operation.” Note that for floating-point loads, the target registeris an FPR.

As shown in Table 7-44, the same load instructions (lbz, lbzx, lhz, lhzx, lha, lhax, lwz,lwzx, lwarx, lfs, lfsx, lfd, lfdx) take two cycles in IE and CACC stages if the operandcrosses a double-word addressing boundary; this is because the data from doublewordsrequires separate cache accesses (splice 1 and 2). Note that in the third cycle the first accessin the cache (splice 1) overlaps with the second arbitration for the cache (splice 2). It thentakes one extra cycle for the register data to become available. For more information, seeSection 7.3.3.3.4, “Unaligned Load/Store Operations.”

Table 7-42. Cache Instruction Timings—dcbf(M), dcbst(M), dcbz(M,E)


Pipeline stages ID

IE

CARB

CACC

IC



Table 7-43. Single-Cycle Load Instruction Timing—Operand Does Not Cross a Double-Word Boundary

Number of Cycles 1 1 1 1

Pipeline stages ID

IE

CARB

CACC

IC

IWL or FWL




Note that for floating-point loads, the target register is an FPR.

7.3.3.5.14 Load with Update Instruction TimingLoad operations that use the update option also write the addressing register rA with theEA in addition to loading the target register. These operations are similar to conventionalload operations except that rA is required exclusively when the instruction is in IE. Timingfor these instructions (lbzu, lbzux, lhzu, lhzux, lhau, lhaux, lwzu, lwzux, lfsu, lfsux, lfduand lfdux) is shown in Table 7-45.


Like load operations that do not update, when these instructions (lbzu, lbzux, lhzu, lhzux,lhau, lhaux, lwzu, lwzux, lfsu, lfsux, lfdu, and lfdux) have operands that cross a double-

Table 7-44. Single-Cycle Load Instruction Timing—Operand Crosses Double-Word Boundary

Number of Cycles 1 1 1 1 1

Pipeline stages ID

IE IE

CARB CARB

CACC CACC

IC

IWL or FWL


Resources required exclusively rD rD rD

Table 7-45. Update Form Load Instruction Timing—Operand Does Not Cross Double-Word Boundary


Pipeline stages ID

IE

CARB

CACC

IC

IWA

IWL or FWL


Resources required exclusively rA, rD rD


word addressing boundary, they must spend an extra cycle in IE and CACC. Timing forthese operations is shown in Table 7-46.


7.3.3.5.15 Load Multiple Word (lmw) and Load String Word Immediate (lswi)

Load multiple word (lmw) and load string word immediate (lswi) instructions spend onecycle in IE for each register of data to be transferred (shown as the variable n). Timing forthese operations is shown in Table 7-47.

a. where n is the number of registers to be loaded.

b. where i ranges from 0 to n-3 (incrementing each cycle).

Table 7-46. Update Form Load Instruction Timing—Operand Crosses Double-Word Boundary


Pipeline stages ID

IE IE

CARB CARB

CACC CACC

IC

IWA

IWL or FWL


Resources required exclusively rD rD rD

Table 7-47. Load Multiple Word (lmw) and Load String Word Immediate (lwsi)—Operand is Word-Aligned

Number of Cycles 1 1 1 na-2 1 1

Pipeline stages ID

IE IE IE

CARB CARB CARB

CACC CACC CACC

IC

IWL[rD+i]b IWL[rD+n-2] IWL[rD+n-1]


rA


rD rD, rD+1 rD+i+1, rD+i+2b.

rD+n-1


The instruction flow becomes more complex when the initial address is not on a word-length addressing boundary, causing every other register to receive data that spans a double-word addressing boundary. Requests for data for those registers is split into two pieces(splice 1 and splice 2). Such load operations take 50% longer when the word is aligned. Thetiming for this operation is shown in Table 7-48.

7.3.3.5.16 Load String Word Indexed (lswx) and Load String and Compare Byte Indexed (lscbx) Instruction Timing

Timing for the lswx and lscbx instructions differs from the lswi and lmw instructions in thatboth rB and the XER are read in ID. Also the lscbx instruction reads the XER throughoutthe execution of the instruction and may write the XER if a byte-compare match is found.Since the determination of when the XER is written is data-dependent, the XER must beaccessed exclusively throughout the execution of the instruction. Timing for theseinstructions is shown in Table 7-49.


b. where i ranges from 0 to n-3 (incrementing by 2 each iteration (every three cycles).

Table 7-48. Load Multiple Word (lmw) and Load String Word Immediate (lswi) Timing —Not Word-Aligned

Number of Cycles

1 1 1 (na-2)/2 (3 cycle iteration) 1 1

Pipeline stages

ID

IE IE IE IE IE

CARB CARB CARB CARB CARB

CACC CACC CACC CACC CACC

IC

IWL[rD+i]b IWL[rD+n-2] IWL[rD+n-2] IWL[rD+n-1]

Resources required non exclusively

rA


rD rD rD+i, rD+i+1b.

rD+i+1, rD+i+2b.

rD+i+2b. rD+n-1


When the operand is not on a word boundary, these two instructions take an extra cycle forevery other register of data (that is, each time the word boundary is crossed). For moreinformation, see Section 7.3.3.5.15, “Load Multiple Word (lmw) and Load String WordImmediate (lswi).” Timing is shown in Table 7-50.


b. where i ranges from 0 to n-3 (incrementing each cycle).

c. The lscbx instruction reads the XER continuously and writes the XER when a byte compare match occurs. The lswx never writes the XER and only reads the XER from ID.


b. where i ranges from 0 to n-3 (incrementing by 2 each iteration (every three cycles).

Table 7-49. lswx and lscbx Instruction Timing—Word-Aligned

Number of Cycles 1 1 1 na-2 1 1

Pipeline stages ID

IE IE IE

CARB CARB CARB

CACC CACC CACC

IC

IWL[rD+i]b IWL[rD+n-2] IWL[rD+n-1]


XER rA, rB


rD, XERc

rD, rD+1, XER

rD+i+1, rD+i+2b., XER

rD+n-1, XER

Table 7-50. lswx and lscbx Instruction Timings—Operand Not on a Word Boundary

Number of Cycles 1 1 1 na-2/2 (3 cycle iteration) 1 1

Pipeline stages ID

IE IE IE IE IE

CARB CARB CARB CARB CARB

CACC CACC CACC CACC CACC

IC

IWL[rD+i]b IWL[rD+n-2] IWL[rD+n-2] IWL[rD+n-1]


rA


rD rD rD+i, rD+i+1b.

rD+i+1, rD+i+2b.

rD+i+2b. rD+n-1


7.3.3.5.17 Integer Store Instruction TimingsThis section describes integer store operations, except for the string/multiple instructions(stmw, stswx, and stswi) and store instructions that specify the update option. Timings forstore operations differ from load operations primarily in that they do not write back toregisters. Store operations calculate the EA in the IE stage and arbitrate (CARB) for cacheaccess in parallel with the IE stage. Store operations have the same alignmentproperties/requirements as load operations.

Timings for scalar store operations are shown in Table 7-51.

If the operand specified by the store crosses a double-word boundary, the access is split intotwo pieces (See Section 7.3.3.3.4, “Unaligned Load/Store Operations”); therefore, the storespends two cycles in IE (as do load operations). The timings for store instructions (stb,stbx, sth, sthx, stw, stwx, sthbrx, and stwbrx) whose operands cross a double-wordboundary are shown in Table 7-52.

Table 7-51. Integer Store Instruction Timings—Operand Does Not Cross a Double-Word Boundary


Pipeline stages ID

IE

CARB

CACC

IC

Resources required nonexclusively rA, rB, rS


Table 7-52. Integer Store Instruction Timings—Operand Crosses a Double-Word Boundary


Pipeline stages ID

IE IE

CARB CARB

CACC CACC

IC




7.3.3.5.18 Store with Update Instruction TimingStore operations that update register write the rA addressing register with the EA calculatedby the store, requiring the IWA stage. The timings for these store operations (stbu, stbux,sthu, sthux, stwu, stwux) that have operands that do not cross a double-word boundary areshown in Table 7-53.

The timings for misaligned store operations with update (stbu, stbux, sthu, sthux, stwu,and stwux) with operands crossing a double-word boundary are shown in Table 7-54.

7.3.3.5.19 Floating-Point Store Instruction TimingFloating-point store operations are dispatched to the IU, which generates the address andprovides access to the MMU cache interface, and to the FPU, which provides the dataproperly formatted. Note that the tag does not have to be dispatched in the IU pipeline atthe same time that the floating-point instruction is dispatched to the FPU pipeline.Therefore the timing for floating-point store operations is very different from other

Table 7-53. Update Form Store Instruction Timings—Operand Does Not Cross a Double-Word Boundary


Pipeline stages ID

IE

CARB

CACC

IC

IWA



Table 7-54. Update Form Store Instruction Timings—Operand Crosses a Double-Word Boundary


Pipeline stages ID

IE IE

CARB CARB

CACC CACC

IC

IWA




instructions. The instruction may complete in the IC stage and wait in the FPSB stage forthe data to be available before it can arbitrate for the cache.

Table 7-55 shows the timing for the stfs, stfsx, stfd, and stfdx instructions when no operandcrosses a double-word boundary.

Store accesses that cross a double-word boundary must be broken into two pieces, each ofwhich requires a cache access. However, because the FPSB stage is only one-element deep,it stays full with the first request until the data is made available from the FPU, delayingarbitration for the second cache access for three cycles. As shown in Table 7-56, this takesfive cycles in IE for stfs, stfsx, stfd, stfdx instructions instead of one for an aligned floating-point store.

Table 7-55. Floating-Point Store Instruction Timings—Operand Does Not Cross a Double-Word Boundary

Number of Cycles 1 1 1 1 1 1

Pipeline stages ID

IE

FPSB FPSB FPSB

CARB

CACC

IC

FD

FPM

FPA

FW

Resources required nonexclusively rA, rB, frS



7.3.3.5.20 Update-Form Floating-Point Store Instruction TimingsThe timings for floating-point store operations that update differ from the nonupdate formsin that they update rA from the IWA stage. Table 7-57 shows timings for the stfsu, stfsux,stfdu, and stfdux instructions when operands do not cross a double-word boundary.

Table 7-56. Floating-Point Store Instruction Timings—Operand Crosses a Double-Word Boundary

Number of Cycles 1 1 1 1 1 1 1 1

Pipeline stages ID

IE IE IE IE IE

FPSB FPSB FPSB FPSB FPSB FPSB

CARB CARB

CACC CACC

IC

FD

FPM

FPA

FW




Table 7-58 shows timings for the stfsu, stfsx, stfdu, and stfdux instructions when theoperand crosses a double-word boundary.

Table 7-57. Floating-Point Store Instruction Timings—Operand Does Not Cross a Double-Word Boundary


Pipeline stages ID

IE

FPSB FPSB FPSB

CARB

CACC

IC

IWA

FD

FPM

FPA

FW




7.3.3.5.21 Store Multiple Word (stmw) and Store String Word Immediate (stswi)

The stmw and stswi instructions spend a cycle in each of the IE, CARB, and CACC stagesfor each register of data to be transferred. The timings for these instructions are shown inTable 7-59. The number of registers to be transferred in this example is 3 (9–12 bytes ofdata for the stswi instruction, 12 bytes of data for the stmw instruction). If four registers ofdata were specified, there would be four IE, CARB, and CACC cycles—all overlapping asshown.

Table 7-58. Floating-Point Store Instruction Timings—Operand Crosses a Double-Word Boundary

Number of Cycles 1 1 1 1 1 1 1 1

Pipeline stages ID

IE IE IE IE IE

FPSB FPSB FPSB FPSB FPSB FPSB

CARB CARB

CACC CACC

IC

IWA

FD

FPM

FPA

FW


rA, rB, frS



Because the accesses are made one word at a time, if the EA specified is not on a wordboundary, every other access crosses a double-word boundary. Accesses that cross adouble-word boundary are split into two pieces just as for scalar store operations. Timingfor stmw and stswi when operands are not on a word boundary is shown in Table 7-60.

7.3.3.5.22 Store String Word Indexed (stswx) Instruction TimingsTiming for the stswx instruction differs from the timing for the stswi instruction in that itreads the XER (byte count) from ID and it reads the rB. The timing for this instruction


b. rS is only accessed on one cycle because the four memory locations associated with these four bytes of data are within the same double-word address.

c. rS + 1 is accessed twice in the IE stage because the four memory locations associated with these four bytes of data are not within the same double-word address.

d. rS + 2 is only accessed on one cycle because the four memory locations associated with these four bytes of data are within the same double-word address.

Table 7-59. stmw and stswi Instruction Timing (Word-Aligned)


Pipeline stages ID

IE IE IE

CARB CARB CARB

CACC CACC CACC

IC

Resources required nonexclusively rA, rS rS rS + 2

Resources required exclusively rS rS+1 rS +2

Table 7-60. stmw and stswi Instruction Timing (Not Word-Aligned)

Number of Cycles 1 1 (na-1)/2 (3 cycle iteration) 1

Pipeline stages ID

IE IE IE IE

CARB CARB CARB CARB

CACC CACC CACC CACC

Resource required nonexclusively rA, rSb rS + 1c rS + 1 rS + 2d



when the operand is on a word boundary is shown in Table 7-61. The number of registersto be transferred in this example is 3 (9 to 12 bytes of data). If four registers of data arespecified, then there would be four IE, CARB, and CACC cycles—all overlapping asshown.

When the operand is not on a word boundary, every other access takes two cycles in the IE,the CARB, and the CACC stages. The timing for the stswx instruction when the operandsare not word-aligned is shown in Table 7-62.

a. The variable n represents the number of registers to be loaded.

b. rS is only accessed on one cycle because the four memory locations associated with these four bytes of data are within the same double-word address.

c. rS + 1 is accessed twice in the IE stage because the four memory locations associated with these four bytes of data are not within the same double-word address.

d. rS + 2 is only accessed on one cycle because the four memory locations associated with these four bytes of data are within the same double-word address.

Table 7-61. stswx Instruction Timing (Word-Aligned)


Pipeline stages ID

IE IE IE

CARB CARB CARB

CACC CACC CACC

IC

Resources required nonexclusively rA, rS rS rS + 2

Resources required exclusively rS rS+1 rS +2

Table 7-62. stswx Instruction Timing (Not Word-Aligned)

Number of Cycles 1 1 (na-1)/2 (3 cycle iteration) 1

Pipeline stages ID

IE IE IE IE

CARB CARB CARB CARB

CACC CACC CACC CACC

Resources required nonexclusively rA, rSb rS + 1c rS + 1 rS + 2d



7.3.3.5.23 Store Conditional Word Indexed InstructionThe stwcx. instruction differs from that of other store operations in that it must set the CRafter it determines whether it can successfully complete the store operations. Thearchitecture does not support this instruction with an operand address that is not on a wordboundary (results of a misaligned stwcx. on the 601 are boundedly undefined). The timingfor the stwcx. instruction is shown in Table 7-63.

Because the reservation bit may not be set by the time the stwcx. instruction is executed,only one cycle is spent in IWA. The flow with the reservation bit cleared is shown inTable 7-64.

a. The XXX’s indicate that an instruction cannot be executing in IE while a stwcx. is in IWA. An instruction may be present in IE, but if there is one, it will be held (stall).

a. The XXX’s indicate that an instruction cannot be executing in the IE stage while a stwcx. is in the IWA stage. An instruction may be present in IE, but if there is one, it will be held (stall).

Table 7-63. stwcx. Instruction Timing—Reservation Set


Pipeline stages ID

IE XXXa XXX

CARB

CACC

IC

IWA IWA



Table 7-64. stwcx. Instruction Timing—Reservation Cleared


Pipeline stages ID

IE XXXa

CARB

CACC

IC

IWA

Resources required nonexclusively rA,rB, rS



7.3.4 Floating-Point Pipeline StagesAs shown in Figure 7-1, the FPU has a four-stage pipeline with a one-entry queue (F1)above it that holds an instruction if the previous instruction stalls in FD. All instructionsoccupy each of the remaining FPU stages for at least one cycle. The FPU pipeline stagesare shown in Figure 7-14.

Figure 7-14. Floating-Point Data and Instruction Flow

Some instructions require more than one cycle in a stage, and some of those instructionsmay occupy multiple stages simultaneously. This process is described in Section 7.3.1.4.3,“Floating-Point Dispatch.”

The FPU uses the multiply/add (fmadd) instruction (D = AC + B) as the base for otherfloating-point operations. Other instructions are implemented using this instruction andsetting one or more of the operands with a constant (in the FD stage). These are shown asfollows:

FPM(Multiply Stage)

Normalizer,

FPA(Add Stage)

FD(Decode Stage)

F1(Queue)

Feedback forDoublePrecisionMultiply

to Memory

= Instruction flow

= Data flow

FWA

FPR(Floating Point

Registers)

Load Data (From Memory Subsystem)

Store Data

Subsystem

= Cycle boundary

Rounder

(Writeback)


• Accumulate instructions (AC + B)—This is the base instruction in the FPU. There are both single- and double-precision versions of these instructions.

• Add instructions (A + B)— These are treated as accumulate instructions with C=1. This group of instructions includes add, subtract, round-to-single, and floating-point compare instructions. There are both single- and double-precision versions of these instructions.

• Multiply instructions (AC)—These are treated as accumulate instructions with B=0. There are both single- and double-precision versions of these instructions.

• Divide instructions —This instruction is handled with repeated iterations of the accumulate instructions. There are both single- and double-precision versions of these instructions.

• Move instructions —These instructions are treated as accumulate instructions with A = 0.

• Load instructions —These are handled entirely in the IU. Because the FPRs have a dedicated port for load data being written back, loads never interfere with arithmetic instructions writing back their data. There are both single- and double-precision versions of these instructions.

• Store instructions —These behave like move instructions, except that they do not write back to the FPRs. There are both single- and double-precision versions of these instructions.

• Special instructions —Depending on the instruction, these instructions are handled in control logic or similarly to the move instructions.

• Convert-to-integer instructions —These instructions work on both single- and double-precision data. There is a convert-to-integer instruction with a round-to-zero mode.

PowerPC floating-point arithmetic is IEEE-compliant, and the following sections assume afamiliarity with the PowerPC implementation of IEEE floating-point numbers andarithmetic, which is described in Section 2.5, “Floating-Point Execution Models.”

7.3.4.1 Floating-Point Decode Stage (FD)In the FD stage, floating-point instructions are decoded, their operands are fetched, and anyrequired constants in the base instruction (D = AC + B) are set. Stalls can occur in the FDstage for the following reasons:

• Data dependencies—Instructions stall in FD because of true data dependencies (read-after-write, or RAW, dependencies) between the source operands of the instruction in the FD stage and the target registers required by instructions in the FPM, FPA, or FWA stages, or between the source operands of the instructions in the FD stage and the target register of an outstanding load instruction. All dependency checking is performed at FD regardless of the stage at which the operand is first required. In other words, a floating-point move instruction may stall in the FD stage


because of a dependency on an instruction in FPA even though the register move instruction does not need its operand until the FW stage. For more information, see Appendix I, Section I.3.1, “Floating-Point Data Dependency Stalls.”

However, floating-point store instructions stall in FD stage not because of a dependency on an instruction in the FPM stage, but rather because of a dependency on an instruction in FPA or FPW.

• Multicycle floating-point operations—Two types of multicycle operations in the 601 can appear in the FPU pipeline—divide and double-precision accumulate/multiply operations. The divide instructions are implemented with a 2-bit, nonrestoring division algorithm that iterates using accumulate instructions and requires the entire floating-point pipeline for many cycles.

Double-precision multiply/accumulate instructions must repeat both the FPM and FPA stages twice (pipelined) because the 53- by 53-bit multiplication is split into two 27- by 53-bit multiplications. This is described in Section 7.2.1.3, “Floating-Point Unit (FPU).” Timing for these instructions is shown in Section 7.3.4.5.2, “Double-Precision Instruction Timing.”

• Special-case numbers (such as denormalized numbers)—Two conditions related to special-case numbers cause stalls in the FD stage. The first case is when one or more of the operands of the instruction in the FD stage require prenormalization. To perform the prenormalization function, the operands must be cycled through the entire pipeline and make use of the normalizer in the FW stage, as described in Section 7.3.4.4, “Floating-Point Write-Back Stage (FWA).” Denormalized operands for multiply and divide instructions require prenormalization in the 601.

Stalls also occur in the FD stage when the FPU predicts that the result of the operation produces a denormalized number (that is, an underflow condition), in which case, output must be cycled through the entire pipeline again in order to denormalize the answer before it is written back. This is described in Section 7.3.4.5.6, “Floating-Point Special-Case Number-Handling Stalls.”

7.3.4.2 Floating-Point Multiply Stage (FPM)The FPM stage performs 27- by 53-bit multiplication (which is the first half of a 53- by 53-bit multiplication for double-precision operations). Although instructions that must repeatthis stage may cause stalls in FD (as described in Section 7.3.4.1, “Floating-Point DecodeStage (FD)”) there are no stalls inherent to this stage; instructions can stall here onlybecause of an instruction in a subsequent stage. Operands used by this stage are set up inthe FD stage—that is, constants are put in the appropriate places and double-precisionmultiply instructions are split into two pieces.

7.3.4.3 Floating-Point Add Stage (FPA)The FPA stage is similar to the FPM stage; however, the FPA stage must shift the results ofthe two halves of the double-precision multiply/accumulate instructions appropriately tocorrectly calculate the double-precision sum. Although instructions that must repeat this


stage may cause stalls in FD (as described in Section 7.3.4.1, “Floating-Point Decode Stage(FD)”) there are no stalls inherent to this stage.

7.3.4.4 Floating-Point Write-Back Stage (FWA)Normalization, rounding, and writeback occur in the FWA stage. Stalls can occur in theFWA stage for the following reasons:

• Normalization—The normalizer in the FWA stage can shift by as many as 16 bits in a cycle. Additionally, a shift of as many as 48 bits can occur at the bottom of the FPA stage. Thus if the intermediate result of an operation contains more than 64 zeros, there is at least a one-cycle stall in the FWA stage. The intermediate result in the FPU contains 161 bits, so the worst case normalization time (corresponding to a one in the 161st position with 160 leading zeros), is seven cycles (48 bits are shifted out in the FPA stage, then 16 additional bits are shifted out in each of the next seven cycles in the FWA stage).

• Synchronization—The stalls associated with synchronization relate back to the precise exception model. A floating-point instruction cannot complete before an instruction ahead of it in program order that may still cause a synchronous exception. See Section 7.3.1.4.4, “Synchronization Tags for the Precise Exception Model.”

The FPRs contain two write ports, one dedicated to load data and one dedicated toarithmetic instruction results; thus there is never interference between these two types ofinstructions at the FWA stage and the FWL stage. One load and one arithmetic instructioncan write back in one cycle.

7.3.4.5 Floating-Point Pipeline TimingIn the following sections, we give timing examples of the floating-point instructions.

7.3.4.5.1 Single-Precision InstructionsThis section contains timing diagrams for the single-precision arithmetic instructions thatare implemented in the FPU. Table 7-65 shows the timing for a single-precision addinstruction with no special-case data.


Table 7-66 shows the timing for a single-precision multiple instruction with no special-casedata.

Table 7-67 shows the timing for a single-precision divide instruction with no special-casedata.

Table 7-65. Single-Precision Add (fadds, fsubs) Instruction Timing—No Special-Case Data


Pipeline stages

FD

FPM

FPA

FWA

Resources required nonexclusively frA, frB

Resources required exclusively frD

Cache access

Table 7-66. Single-Precision Multiply Instruction (fmuls)—No Special-Case Data


Pipeline stages

FD

FPM

FPA

FWA

Resources required nonexclusively frA, frC


Cache access


Table 7-68 shows the timing for single-precision accumulate instructions with no special-case data.

7.3.4.5.2 Double-Precision Instruction TimingThis section contains timing diagrams for the double-precision arithmetic instructionsimplemented in the FPU.

Table 7-69 shows the timing for a double-precision add instruction with no special-casedata. Note that the fcmpu and fcmpo instructions write to CR[BF] instead of frD

Table 7-67. Single-Precision Divide Instruction (fdivs)—No Special-Case Data


Pipeline stages

FD FD FD FD

FPM FPM FPM

FPA FPA

FWA FWA



Cache access

Table 7-68. Single-Precision Accumulate Instructions (fmadds, fmsubs, fnmadds, fnmsubs)—No Special-Case Data


Pipeline stages

FD

FPM

FPA

FWA

Resources required nonexclusively frA, frB, frC


Cache access


.

Table 7-70 shows the timing for a double-precision multiply instruction with no special-case data. Note that in this example, the instruction requires two cycles in the FD, FPM, andFPM stages; however, because the instruction is self-pipelining, there is only a one-cycledelay before the next instruction can enter the decode stage.

Table 7-71 shows the timing for a double-precision divide instruction with no special-casedata. Note that the instruction occupies the decode stage and subsequent stage until theinstruction enters the last cycle of the writeback stage.

a. The fcmpu and fcmpo instructions write to CR[BF] instead of frD.

Table 7-69. Double-Precision Add Instructions (fadd, fsub, frsp, fcmpu, fcmpo)—No Special-Case Data


Pipeline stages

FD

FPM

FPA

FWA


Resources required exclusively frDa

Cache access

Table 7-70. Double-Precision Multiply Instructions (fmul)—No Special-Case Data


Pipeline stages

FD FD

FPM FPM

FPA FPA

FWA

Resources required nonexclusively frA, frC frA, frC


Cache access


Table 7-72 shows the timing for a double-precision accumulate instruction with no special-case data. Note that the timing for this instruction is similar to that of the fmul instruction, in that it requires two cycles in the FD, FPM, and FPA stages. However, because the instruction is self-pipelining, its can begin the first cycle of the FPM stage while it enters the second cycle of the FD stage. This allows the next instruction to enter the decode stage on the third clock cycle.

Table 7-71. Double-Precision Divide Instructions (fdiv)—No Special-Case Data


Pipeline stages

FD FD FD FD

FPM FPM FPM

FPA FPA

FWA FWA



Cache access

Table 7-72. Double-Precision Accumulate Instructions (fmadd, fmsub, fnmadd, fnmsub)—No Special-Case Data


Pipeline stages FD FD

FPM FPM

FPA FPA

FWA

Resources required nonexclusively frA, frB, frC frA, frB, frC


Cache access


7.3.4.5.3 Floating-Point Move/Store Instruction TimingTable 7-73 illustrates the timing of floating-point move instructions.

7.3.4.5.4 Convert-to-Integer Instruction TimingThis section contains timing information for the convert to integer instructions.

Table 7-74 shows the timing for floating-point convert to integer instructions with nospecial-case data.

7.3.4.5.5 Special Instructions Implemented in the FPUThis section contains timing information for the special instructions (mtfsfi, mtfsf, mtfsb0,and mtfsb1; see Table 7-75) implemented in the FPU—these instructions fit none of theprevious categories. Note that in this example frB is used only by the mtfsf instruction.

Table 7-73. Floating Point Move/Store Instructions (fmr, fabs, fneg, fnabs, stfs, stfsu, stfsx, stfsux, stfd, stfdu, stfdx, stfdux)—No Special-Case Data


Pipeline stages FD

FPM

FPA

FWA

Resources required nonexclusively frB


Cache access

Table 7-74. Floating-Point Convert to Integer Instructions (fctiw, fctiwz)—No Special-Case Data


Pipeline stages FD

FPM

FPA

FWA

Resources required nonexclusively frB


Cache access


Table 7-76 shows the timing for the Move From FPSCR instructions, which areimplemented in the FPU. Note that in this example, frD is the target only of the mffsinstruction and CR[BF] is the target of the mcrfs instruction.

7.3.4.5.6 Floating-Point Special-Case Number-Handling StallsSpecial-case numbers can generate stalls and require the instruction to go through a portionof the pipeline twice in order to prenormalize or denormalize an operand.

Table 7-77 shows the timing for a single-precision instruction when the result causes anoverflow. Single-precision accumulate instructions are shown in this example, but thetiming can be generalized for single-precision instructions in general. Note that todenormalize the result of this instruction, it must pass through the FPU pipeline. Note that

a. Note that frB is used only by the mtfsf instruction.

a. frD is the target of the mffs instruction.

b. CR[BF] is the target of the mcrfs instruction.

Table 7-75. Move to FPSCR Instruction Timing (mtfsfi, mtfsf, mtfsb0, mtfsb1)


Pipeline stages

FD FD FD FD

FPM FPM FPM

FPA FPA

FWA

Resources required nonexclusively frBa

Resources required exclusively FPSCR

Cache access

Table 7-76. Move from FPSCR Instruction Timing (mffs, mcrfs)


Pipeline stages

FD

FPM

FPA

FWA

Resources required nonexclusively FPSCR

Resources required exclusively frDa, CR[BF]b

Cache Access


a stall occurs if an underflow condition is predicted, even though the condition may notoccur. In either case, the instruction occupies the decode stage for four cycles. Note thatwhile the latency of the instruction may differ in these two cases, the throughput is thesame.

Table 7-78 shows the timing for single-precision accumulate instructions (fmadds,fmsubs, fnmadds, and fnmsubs). In this example, one operand requires prenormalization.This operand travels through the pipeline before execution can begin.

a. The operand that needs prenormalization traverses the pipeline before the execution of the instruction starts.

b. The operands are now ready and the instruction can actually start execution.

Table 7-77. Single-Precision Accumulate Instruction Timing (fmadds, fmsubs, fnmadds, fnmsubs)—Result Underflow

Number of Cycles 1 1 1 1 1 1 1

Pipeline stages

FD FD FD FD

FPM FPM

FPA FPA

FW FW



Cache access

Table 7-78. Single-Precision Accumulate Instruction Timing

Number of Cycles 1 1 1 1 1 1 1

Pipeline stages

FDa FD FD FD FDb

FPM FPM

FPA FPA

FW FW



Cache access


Table 7-79 shows the timing when two operands of a single-precision accumulateinstruction require prenormalization.

7.3.4.5.7 Floating-Point Normalization StallsTable 7-80 shows the worst-case timing for when an operand of a double-precisionaccumulate instruction requires normalization.

a. The first operand that needs prenormalization traverses the pipeline before the execution of the instruction starts.

b. The second operand that needs prenormalization traverses the pipeline (immediately behind the first operand) before the execution of the instruction starts.

c. The operands are now ready and the instruction can actually start execution.

Table 7-79. Single-Precision Accumulate Instructions (fmadds, fmsubs, fnmadds, fnmsubs)—Two Operands Require Prenormalization

Number of Cycles 1 1 1 1 1 1 1 1 1

Pipeline stages

FDa FDb FD FD FD FDc

FPM FPM FPM

FPA FPA FPA

FW FPW FW



Table 7-80. Double-Precision Accumulate Instruction Timing (Worst-Case Normalization)


Pipeline stages

FD FD

FPM FPM

FPA FPA

FW FW


frA, frB, frC Shift by 48 bits

Shift by 16 bits per cycle

Shift by 16 bits per cycle


frD


7.4 Execute Stage Delay SummaryTable 7-81 shows potential delays that may be caused by instructions implemented in the601. The pipeline column indicates which execution unit is used to process the instruction(for example, floating-point loads and stores are processed in the integer pipeline). Thetable shows the number of cycles each instruction spends in the execute stage, assumingnormal operations. The table also shows the delay that a subsequent instruction mayencounter if there is a data dependency. Note that this total time ignores the instructiondecode and fetch times for integer instructions but not for floating-point instructions.

As previously stated, the FPU has no feed-forwarding capabilities. In other words, as afloating-point operation completes, another floating-point instruction that may be waitingfor those results must wait for the data to be written into the register file before decode canbegin. This extra time is accounted for in Table 7-81.

Note that Table 7-1 contains no 64-bit architected instructions. These instructions trap to anillegal instruction exception handler when encountered.

Table 7-81. PowerPC 601 Microprocessor Instruction Latencies

Mnemonic Instruction PipelineNumber of Cycles in

Execute Stage

Execute Stage Delay if Next Instruction is Dependent

abs Absolute IU 1 0

add[o][.] Add IU 1 0

addc[o][.] Add Carrying IU 1 0

adde[o][.] Add Extended IU 1 0

addi. Add Immediate IU 1 0

addic Add Immediate Carrying IU 1 0

addic. Add Immediate Carrying and Record IU 1 0

addis Add Immediate Shifted IU 1 0

addme[o][.] Add to Minus One Extended IU 1 0

addze[o][.] Add to Zero Extended IU 1 0

and[.] AND IU 1 0

andc[.] AND with Complement IU 1 0

andi. AND Immediate IU 1 0

andis. AND Immediate Shifted IU 1 0

b[l][a] Branch BPU 1 —

bc[l][a] Branch Conditional BPU 1 —

bcctr[l] Branch Conditional to CTR BPU 1 —

bclr[l] Branch Conditional to LR BPU 1 —


cmp Compare IU 1 0

cmpi Compare Immediate IU 1 0

cmpl Compare Logical IU 1 0

cmpli Compare Logical Immediate IU 1 0

cntlzw[.] Count Leading Zeros Word IU 1 0

crand CR AND IU 1 0

crandc CR AND with Complement IU 1 0

creqv CR Equivalent IU 1 0

crnand CR NAND IU 1 0

crnor CR NOR IU 1 0

cror CR OR IU 1 0

crorc CR OR with Complement IU 1 0

crxor CR XOR IU 1 0

dcbf Data Cache Block Flush IU 11 02

dcbi Data Cache Block Invalidate IU 11 02

dcbst Data Cache Block Store IU 11 02

dcbt Data Cache Block Touch IU 11 02

dcbtst Data Cache Block Touch for Store IU 11 02

dcbz Data Cache Block Set to Zero IU 11 02

div[o][.] Divide IU 36 0

divs[o][.] Divide Short IU 36 0

divw[o][.] Divide Word IU 36 0

divwu[o][.] Divide Word Unsigned IU 36 0

doz[o][.] Difference or Zero IU 1 0

dozi Difference or Zero Immediate IU 1 0

eciwx External Control Input Word Indexed IU 11 Bus dependent

ecowx External Control Output Word Indexed IU 11 0

eieio Enforce In-Order Execution of I/O IU 11 0 2

eqv[.] Equivalent IU 1 0

extsb[.] Extend Sign Byte IU 1 0

extsh[.] Extend Sign Half Word IU 1 0

Table 7-81. PowerPC 601 Microprocessor Instruction Latencies (Continued)


Execute Stage



fabs[.] Floating-Point Absolute Value FPU 1 3

fadd[.] Floating-Point Add FPU 1 3

fadds[.] Floating-Point Add Single-Precision FPU 1 3

fcmpo Floating-Point Compare Ordered FPU 1 1

fcmpu Floating-Point Compare Unordered FPU 1 1

fctiw[.] Floating-Point Convert to Integer Word FPU 1 3

fctiwz[.] Floating-Point Convert to Integer Word with Round toward Zero

FPU 1 3

fdiv[.] Floating-Point Divide FPU 31 0

fdivs[.] Floating-Point Divide Single-Precision FPU 17 0

fmadd[.] Floating-Point Multiply-Add FPU 2 3

fmadds[.] Floating-Point Multiply-Add Single-Precision

FPU 1 3

fmr[.] Floating-Point Move Register FPU 1 3

fmsub[.] Floating-Point Multiply-Subtract FPU 2 3

fmsubs[.] Floating-Point Multiply-Subtract Single-Precision

FPU 1 3

fmul[.] Floating-Point Multiply FPU 2 3

fmuls[.] Floating-Point Multiply Single-Precision FPU 1 3

fnabs[.] Floating-Point Negative Absolute Value FPU 1 3

fneg[.] Floating-Point Negate FPU 1 3

fnmadd[.] Floating-Point Negative Multiply-Add FPU 2 3

fnmadds[.] Floating-Point Negative Multiply-Add Single-Precision

FPU 1 3

fnmsub[.] Floating-Point Negative Multiply-Subtract FPU 2 3

fnmsubs[.] Floating-Point Negative Multiply-Subtract Single-Precision

FPU 1 3

fres[.] Floating-Point Reciprocal Estimate Single-Precision

— Not implemented (trap)

—

frsp[.] Floating-Point Round to Single-Precision FPU 1 3

frsqrte[.] Floating-Point Reciprocal Square Root Estimate


—



Execute Stage



fsel[.] Floating-Point Select — Not implemented (trap)

—

fsqrt[.] Floating-Point Square Root — Not implemented (trap)

—

fsqrts[.] Floating-Point Square Root Single-Precision


—

fsub[.] Floating-Point Subtract FPU 1 3

fsubs[.] Floating-Point Subtract Single-Precision FPU 1 3

icbi Instruction Cache Block Invalidate (no-op in the 601)

IU 1 0

isync Instruction Synchronize IU Serialize Serialize

lbz Load Byte and Zero IU 1 1

lbzu Load Byte and Zero with Update IU 1 1

lbzux Load Byte and Zero with Update Indexed IU 1 1

lbzx Load Byte and Zero Indexed IU 1 1

lfd Load Floating-Point Double-Precision IU 1 2

lfdu Load Floating-Point Double-Precision with Update

IU 1 2

lfdux Load Floating-Point Double-Precision with Update Indexed

IU 1 2

lfdx Load Floating-Point Double-Precision Indexed

IU 1 2

lfs Load Floating-Point Single-Precision IU 1 2

lfsu Load Floating-Point Single-Precision with Update

IU 1 2

lfsux Load Floating-Point Single-Precision with Update Indexed

IU 1 2

lfsx Load Floating-Point Single-Precision Indexed

IU 1 2

lha Load Half Word Algebraic IU 1 1

lhau Load Half Word Algebraic with Update IU 1 1

lhaux Load Half Word Algebraic with Update Indexed

IU 1 1

lhax Load Half Word Algebraic Indexed IU 1 1



Execute Stage



lhbrx Load Half Word Byte-Reverse Indexed IU 1 1

lhz Load Half Word and Zero IU 1 1

lhzu Load Half Word and Zero with Update IU 1 1

lhzux Load Half Word and Zero with Update Indexed

IU 1 1

lhzx Load Half Word and Zero Indexed IU 1 1

lmw Load Multiple Word IU Number of registers transferred

1

lscbx Load String and Compare Byte Indexed IU Number of registers transferred

1

lswi Load String Word Immediate IU Number of registers transferred

1

lswx Load String Word Indexed IU Number of registers transferred

1

lwarx Load Word and Reserve Indexed IU 1 1

lwbrx Load Word Byte-Reverse Indexed IU 1 1

lwz Load Word and Zero IU 1 1

lwzu Load Word and Zero with Update IU 1 1

lwzux Load Word and Zero with Update Indexed IU 1 1

lwzx Load Word and Zero Indexed IU 1 1

maskg[.] Mask Generate IU 1 0

maskir[.] Mask Insert from Register IU 1 0

mcrf Move CR Field IU 1 0

mcrfs Move to CR from FPSCR IU 1 1

mcrxr Move to CR from XER IU 1 0

mfcr Move from CR IU 1 0

mffs[.] Move from FPSCR FPU 1 3

mfmsr Move from Machine State Register IU 2 1

mfspr Move from Special Purpose Register IU Variable 1

mfsr Move from Segment Register IU 1 1



Execute Stage



mfsrin Move from Segment Register Indirect IU 1 1

mftb Move from Time Base — Not implemented (trap)

—

mtcrf Move to CR Fields IU 1 0

mtfsb0[.] Move to FPSCR Bit 0 IU 1 3

mtfsb1[.] Move to FPSCR Bit 1 IU 1 3

mtfsf[.] Move to FPSCR Fields IU 1 3

mtfsfi[.] Move to FPSCR Field Immediate IU 1 3

mtmsr Move to Machine State Register IU Serialize Serialize

mtspr Move to Special Purpose Register IU Variable 0

mtsr Move to Segment Register IU 1 0

mtsrin Move to Segment Register Indirect IU 1 0

mul[o][.] Multiply IU 5/93 0

mulhw[.] Multiply High Word IU 5/93 0

mulhwu[.] Multiply High Word Unsigned IU 5/9/104 0

mulli Multiply Low Immediate IU 5 0

mulw[o][.] Multiply Low Word IU 5/93 0

nabs Negative Absolute IU 1 0

nand[.] NAND IU 1 0

neg[o][.] Negate IU 1 0

nor[.] NOR IU 1 0

or[.] OR IU 1 0

orc[.] OR with Complement IU 1 0

ori OR Immediate IU 1 0

oris OR Immediate Shifted IU 1 0

rfi Return from Interrupt IU Serialize Serialize

rlmi[.] Rotate Left then Mask Insert IU 1 0

rlwimi[.] Rotate Left Word Immediate then Mask Insert

IU 1 0

rlwinm[.] Rotate Left Word Immediate then AND with Mask

IU 1 0



Execute Stage



rlwnm[.] Rotate Left Word then AND with Mask IU 1 0

rrib[.] Rotate Right and Insert Bit IU 1 0

sc System Call IU Serialize Serialize

sle[.] Shift Left Extended IU 1 0

sleq[.] Shift Left Extended with MQ IU 1 0

sliq[.] Shift Left Immediate with MQ IU 1 0

slliq[.] Shift Left Long Immediate with MQ IU 1 0

sllq[.] Shift Left Long with MQ IU 1 0

slq[.] Shift Left with MQ IU 1 0

slw[.] Shift Left Word IU 1 0

sraq[.] Shift Right Algebraic with MQ IU 1 0

sraiq[.] Shift Right Algebraic Immediate with MQ IU 1 0

sraw[.] Shift Right Algebraic Word IU 1 0

srawi[.] Shift Right Algebraic Word Immediate IU 1 0

sre[.] Shift Right Extended IU 1 0

srea[.] Shift Right Extended Algebraic IU 1 0

sreq[.] Shift Right Extended with MQ IU 1 0

sriq[.] Shift Right Immediate with MQ IU 1 0

srliq[.] Shift Right Long Immediate with MQ IU 1 0

srlq[.] Shift Right Long with MQ IU 1 0

srq[.] Shift Right with MQ IU 1 0

srw[.] Shift Right Word IU 1 0

stb Store Byte IU 1 0

stbu Store Byte with Update IU 1 0

stbux Store Byte with Update Indexed IU 1 0

stbx Store Byte Indexed IU 1 0

stfd Store Floating-Point Double-Precision IU 1 0

stfdu Store Floating-Point Double-Precision with Update

IU 1 0

stfdux Store Floating-Point Double-Precision with Update Indexed

IU 1 0



Execute Stage



stfdx Store Floating-Point Double-Precision Indexed

IU 1 0

stfiwx Store Floating-Point as integer Word Indexed

IU 1 0

stfs Store Floating-Point Single-Precision IU 1 0

stfsu Store Floating-Point Single-Precision with Update

IU 1 0

stfsux Store Floating-Point Single-Precision with Update Indexed

IU 1 0

stfsx Store Floating-Point Single-Precision Indexed

IU 1 0

sth Store Half Word IU 1 0

sthbrx Store Half Word Byte-Reverse Indexed IU 1 0

sthu Store Half Word with Update IU 1 0

sthux Store Half Word with Update Indexed IU 1 0

sthx Store Half Word Indexed IU 1 0

stmw Store Multiple Word IU Number of registers transferred

0

stswi Store String Word Immediate IU Number of registers transferred

0

stswx Store String Word Indexed IU Number of registers transferred

0

stw Store Word IU 1 0

stwbrx Store Word Byte-Reverse Indexed IU 1 0

stwcx. Store Word Conditional Indexed IU 2 0

stwu Store Word with Update IU 1 0

stwux Store Word with Update Indexed IU 1 0

stwx Store Word Indexed IU 1 0

subf[o][.] Subtract from IU 1 0

subfc[o][.] Subtract from Carrying IU 1 0

subfe[o][.] Subtract from Extended IU 1 0

subfic Subtract from Immediate Carrying IU 1 0



Execute Stage



1These instructions access the system bus, thus the latency may vary depending on the exact state of the machine

2 A delay may be incurred if the subsequent instruction requires access to the cache; for example a store instruction.

3The longer latency may occur if the contents of rB is larger than 16 bits (not including sign-extending bits.

4Shortest latency occurs if rB ≤ 16 bits. Longer latency occurs if rB > 16 bits, but most significant bit is still 0. Longest latency occurs if most significant bit is 1.

5These instructions serialize the processor if the trap is taken.

subfme[o][.] Subtract from Minus One Extended IU 1 0

subfze[o][.] Subtract from Zero Extended IU 1 0

sync Synchronize IU Serialize bus operations

Serialize bus operations

tlbia Translation Lookaside Buffer Invalidate All — Not implemented (trap)

—

tlbie Translation Lookaside Buffer Invalidate Entry

IU Serialize Serialize

tw Trap Word IU 15 0

twi Trap Word Immediate IU 15 0

xor[.] XOR IU 1 0

xori XOR Immediate IU 1 0

xoris XOR Immediate Shifted IU 1 0



Execute Stage


Chapter 8. Signal Descriptions 8-1

Chapter 8 Signal Descriptions8080

This chapter describes the PowerPC 601 microprocessor’s external signals. It contains aconcise description of individual signals, showing behavior when the signal is asserted andnegated and when the signal is an input and an output.

NOTEA bar over a signal name indicates that the signal is activelow—for example, ARTRY (address retry) and TS (transferstart). Active-low signals are referred to as asserted (active)when they are low and negated when they are high. Signals thatare not active-low, such as AP0–AP3 (address bus paritysignals) and TT0–TT4 (transfer type signals) are referred to asasserted when they are high and negated when they are low.

The 601 signals are grouped as follows:

• Address arbitration signals—The 601 uses these signals to arbitrate for address bus mastership.

• Address transfer start signals—These signals indicate that a bus master has begun a transaction on the address bus.

• Address transfer signals—These signals, which consist of the address bus, address parity, and address parity error signals, are used to transfer the address and to ensure the integrity of the transfer.

• Transfer attribute signals—These signals provide information about the type of transfer, such as the transfer size and whether the transaction is bursted, write-through, or cache-inhibited.

• Address transfer termination signals—These signals are used to acknowledge the end of the address phase of the transaction. They also indicate whether a condition exists that requires the address phase to be repeated.

• Data arbitration signals—The 601 uses these signals to arbitrate for data bus mastership.

• Data transfer signals—These signals, which consist of the data bus, data parity, and data parity error signals, are used to transfer the data and to ensure the integrity of the transfer.


• Data transfer termination signals—Data termination signals are required after each data beat in a data transfer. In a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat. They also indicate whether a condition exists that requires the data phase to be repeated.

• System status signals—These signals include the external interrupt signal, checkstop signals, and both soft- and hard-reset signals. These signals are used to interrupt and, under various conditions, to reset the processor.

• COP/scan interface signals—The common on-chip processor (COP) unit and scan (IEEE 1149.1) interface provides a serial interface to the system for performing monitoring and boundary tests.

• Test signals—These signals are used for internal testing.

• Clock signals—These signals determine the system clock frequency. These signals can also be used to synchronize multiprocessor systems.

8.1 Signal ConfigurationFigure 8-1 illustrates the 601 microprocessor’s pin configuration, showing how the signalsare grouped.

NOTEA pinout showing actual pin numbers is included in thePowerPC 601 RISC Microprocessor Hardware Specifications.


Figure 8-1. PowerPC 601 Microprocessor Signal Groups

8.2 Signal DescriptionsThis section describes individual 601 signals, grouped according to Figure 8-1. Note thatthe following sections are intended to provide a quick summary of signal functions.Chapter 9, “System Interface Operation,” describes many of these signals in greater detail,both with respect to how individual signals function and how groups of signals interact.

8.2.1 Address Bus Arbitration SignalsThe address arbitration signals are a collection of input and output signals the 601 uses torequest the address bus, recognize when the request is granted, and indicate to other deviceswhen mastership is granted. For a detailed description of how these signals interact, seeSection 9.3.1, “Address Bus Arbitration.”

111

11

32

41

1423111131

111

1111

111

6481

111

1111111

7

21111

601

BR

BGABB

TSXATS

A0–A31

AP0–AP3

APE

TT4TT0–TT3

TC0–TC1TSIZ0–TSIZ2

TBSTCIWT

GBLCSE0–CSE2

HP_SNP_REQ

AACKARTRY

SHD

2X_PCLKPCLK_ENBCLK_EN

RTC

DBGDBWODBB

DH0–DH31, DL0–DL31

DP0–DP7

DPE

TADRTRY

TEA

INTCKSTP_IN

CKSTP_OUTHRESETSRESET

RSRVSC_DRIVE

SCAN INTERFACE

TEST INTERFACE

SYS_QUIESCRESUME

QUIESC_REQ

ADDRESSARBITRATION

ADDRESSTRANSFER

START

ADDRESSTRANSFER

TRANSFERATTRIBUTE

ADDRESSTERMINATION

CLOCKS

DATAARBITRATION

DATATRANSFER

DATATERMINATION

SYSTEMSTATUS

COP/ SCANINTERFACE

TESTSIGNALS

+3.6 V

59 59


8.2.1.1 Bus Request (BR)—OutputThe bus request (BR) signal is an output signal on the 601. Following are the state meaningand timing comments for the BR signal.

State Meaning Asserted—Indicates that the 601 is requesting mastership of the address bus. See Section 9.3.1, “Address Bus Arbitration.”

Negated—Indicates that the 601 is not requesting the address bus. The 601 may have no bus operation pending, it may be parked, or the ARTRY input was asserted on the previous bus clock cycle.

Timing Comments Assertion—Occurs when the 601 is not parked and a bus transaction is needed. This may occur even if the two possible pipeline accesses have occurred.

Negation—Occurs for at least one bus clock cycle after an accepted, qualified bus grant (see BG and ABB), even if another transaction is pending. It is also negated for at least one bus clock cycle when the assertion of ARTRY is detected on the bus.

8.2.1.2 Bus Grant (BG)—Input The bus grant (BG) signal is an input signal on the 601. Following are the state meaningand timing comments for the BG signal.

State Meaning Asserted—Indicates that the 601 may, with the proper qualification, assume mastership of the address bus. A qualified bus grant occurs when BG is asserted and ABB and ARTRY are not asserted. The ABB signal is driven by the 601 or another bus master, and ARTRY is driven by other bus masters. If the 601 is parked, BR need not be asserted for the qualified bus grant. See Section 9.3.1, “Address Bus Arbitration.”

Negated— Indicates that the 601 is not the next potential address bus master.

Timing Comments Assertion—May occur at any time to indicate the 601 is free to use the address bus. After the 601 assumes bus mastership, it does not check for a qualified bus grant again until the cycle during which the address bus tenure is completed (assuming it has another transaction to run). The 601 does not accept a BG in the cycles between the assertion of any TS or XATS and AACK.

Negation—May occur at any time to indicate the 601 cannot use the bus. The 601 may still assume bus mastership on the bus clock cycle of the negation of BG because during the previous cycle BG indicated to the 601 that it was free to take mastership (if qualified).


8.2.1.3 Address Bus Busy (ABB)The address bus busy (ABB) signal is both an input and an output signal.

8.2.1.3.1 Address Bus Busy (ABB)—OutputFollowing are the state meaning and timing comments for the ABB output signal.

State Meaning Asserted—Indicates that the 601 is the address bus master. See Section 9.3.1, “Address Bus Arbitration.”

Negated— Indicates that the 601 is not using the address bus. If ABB is negated during the bus clock cycle following a qualified bus grant, the 601 did not accept mastership, even if BR was asserted. This can occur if a potential transaction is aborted internally before the transaction is started.

Timing Comments Assertion—Occurs on the bus clock cycle following a qualified BG that is accepted by the processor (see Negated).

Negation—Occurs on the bus clock cycle following the assertion of AACK. If ABB is negated during the bus clock cycle following a qualified bus grant, the 601 did not accept mastership, even if BR was asserted.

High Impedance—Occurs one-half processor clock cycle after ABB is negated.

8.2.1.3.2 Address Bus Busy (ABB)—InputFollowing are the state meaning and timing comments for the ABB input signal.

State Meaning Asserted—Indicates that the address bus is in use. This condition effectively blocks the 601 from assuming address bus ownership, regardless of the BG input. (See Section 9.3.1, “Address Bus Arbitration.”).

Negated—Indicates that the address bus is not owned by another bus master and that it is available to the 601 when accompanied by a qualified bus grant.

Timing Comments Assertion—May occur when the 601 must be prevented from using the address bus (and the processor is not currently asserting ABB).

Negation—May occur whenever the 601 can use the address bus.

Note that this signal is logically ORed with an internally generated address bus busy signal.For more information, see Section 9.3.1, “Address Bus Arbitration.”


8.2.2 Address Transfer Start SignalsAddress transfer start signals are input and output signals that indicate that an address bustransfer has begun. The transfer start (TS) signal identifies the operation as a memorytransaction; extended address transfer start (XATS) identifies the transaction as an I/Ocontroller interface operation.

For detailed information about how TS and XATS interact with other signals, refer toSection 9.3.2, “Address Transfer,” and Section 9.6, “Memory- vs. I/O-Mapped I/OOperations,” respectively.

8.2.2.1 Transfer Start (TS)The TS signal is both an input and an output signal on the 601.

8.2.2.1.1 Transfer Start (TS)—OutputFollowing are the state meaning and timing comments for the TS output signal.

State Meaning Asserted—Indicates that the 601 has begun a memory bus transaction and that the address-bus and transfer-attribute signals are valid. It is also an implied data bus request for a memory transaction (unless it is an address-only operation.)

Negated—Is negated during an I/O controller interface operation.

Timing Comments Assertion—Coincides with the assertion of ABB. Negation—Occurs one bus clock cycle after TS is asserted.High Impedance—Coincides with the negation of ABB.

8.2.2.1.2 Transfer Start (TS)—InputFollowing are the state meaning and timing comments for the TS input signal.

State Meaning Asserted—Indicates that another master has begun a bus transaction and that the address bus and transfer attribute signals are valid for snooping (see GBL).

Negated—Indicates that no bus transaction is occurring.

Timing Comments Assertion—May occur during the assertion of ABB.Negation—Must occur one bus clock cycle after TS is asserted.

8.2.2.2 Extended Address Transfer Start (XATS)The XATS signal is both an input and an output signal on the 601.


8.2.2.2.1 Extended Address Transfer Start (XATS)—OutputFollowing are the state meaning and timing comments for the XATS output signal.

State Meaning Asserted—Indicates that the 601 has begun an I/O controller interface operation and that the first address cycle is valid. It is also an implied data bus request for certain I/O controller interface operation (unless it is an address-only operation.)

Negated—Is negated during an entire memory transaction.

Timing Comments Assertion—Coincides with the assertion of ABB. Negation—Occurs one bus clock cycle after the assertion of XATS.

High Impedance—Coincides with the negation of ABB.

8.2.2.2.2 Extended Address Transfer Start (XATS)—InputFollowing are the state meaning and timing comments for the XATS input signal.

State Meaning Asserted—Indicates that the 601 must check for an I/O controller interface operation reply with a receiver tag that matches bits 28–31 of the 601 PID register.

Negated—Indicates that there is no need to check for an I/O controller interface operation reply.

Timing Comments Assertion—May occur while ABB is asserted. Negation—Must occur one bus clock cycle after XATS is asserted.

8.2.3 Address Transfer SignalsThe address transfer signals are used to transmit the address and to generate and monitorparity for the address transfer. For a detailed description of how these signals interact, referto Section 9.3.2, “Address Transfer.”

8.2.3.1 Address Bus (A0–A31)The address bus (A0–A31) consists of 32 signals that are both input and output signals.

8.2.3.1.1 Address Bus (A0–A31)—Output (Memory Operations)Following are the state meaning and timing comments for the A0–A31 output signals.

State Meaning Asserted/Negated—Represents the physical address of the data to be transferred. On burst transfers, the address bus presents the quad-word–aligned address containing the critical code/data that missed the cache. See Section 9.3.2, “Address Transfer.”

Timing Comments Assertion/Negation—Occurs on the bus clock cycle after a qualified bus grant (coincides with assertion of ABB and TS.)

High Impedance—Occurs one bus clock cycle after AACK is asserted.


8.2.3.1.2 Address Bus (A0–A31)—Input (Memory Operations)Following are the state meaning and timing comments for the A0–A31 input signals.

State Meaning Asserted/Negated—Represents the physical address of a snoop operation.

Timing Comments Assertion/Negation—Must occur on the same bus clock cycle as the assertion of TS.

8.2.3.1.3 Address Bus (A0–A31)—Output (I/O Controller Interface Operations)

Following are the state meaning and timing comments for the address bus signals (A0 toA31) for output I/O controller interface operations on the 601.

State Meaning Asserted/Negated—For I/O controller interface operations where the 601 is the master, the address tenure consists of two packets (each requiring a bus cycle). For packet 0, these signals convey control and tag information. For packet 1, these signals represent the physical address of the data to be transferred.

Timing Comments Assertion/Negation—Address tenure consists of two beats. The first beat occurs on the bus clock cycle after a qualified bus grant, coinciding with XATS. The address bus transitions to the second beat on the next bus clock cycle.

High Impedance—Occurs on the bus clock cycle after AACK is asserted.

8.2.3.1.4 Address Bus (A0–A31)—Input (I/O Controller Interface Operations)

Following are the state meaning and timing comments for input I/O controller interfaceoperations on the 601.

State Meaning Asserted/Negated—When the 601 is not the master, it snoops (and checks address parity) on the first address beat only of all I/O controller interface operations for an I/O reply operation with a receiver tag that matches its PID tag. See Section 9.6, “Memory- vs. I/O-Mapped I/O Operations.”

Timing Comments Assertion/Negation—The 601 looks for only the first beat of the I/O transfer address tenure, which coincides with XATS. The second address bus beat is not required by the 601.

8.2.3.2 Address Bus Parity (AP0–AP3)The address bus parity (AP0–AP3) signals are both input and output signals reflecting onebit of odd-byte parity for each of the four bytes of address when a valid address is on thebus.


8.2.3.2.1 Address Bus Parity (AP0–AP3)—OutputFollowing are the state meaning and timing comments for the AP0–AP3 output signal onthe 601.

State Meaning Asserted/Negated—Represents odd parity for each of four bytes of the physical address for a transaction. Odd parity means that an odd number of bits, including the parity bit, are driven high. The signal assignments correspond to the following:

AP0 A0–A7AP1 A8–A15AP2 A16–A23AP3 A24–A31

For more information, see Section 9.3.2.1, “Address Bus Parity.”

Timing Comments Assertion/Negation—The same as A0–A31.High Impedance—The same as A0–A31.

8.2.3.2.2 Address Bus Parity (AP0–AP3)—InputFollowing are the state meaning and timing comments for the AP0–AP3 input signal on the601.

State Meaning Asserted/Negated—Represents odd parity for each of four bytes of the physical address for snooping and I/O controller interface operations. Detected even parity causes the processor to enter the checkstop state if address parity checking is enabled in the HID0 register (see Section 2.3.3.13.1, “Checkstop Sources and Enables Register—HID0”). (See also the APE signal description below.)

Timing Comments Assertion/Negation—The same as A0–A31.

8.2.3.3 Address Parity Error (APE)—OutputThe address parity error (APE) signal is an output signal on the 601. Note that the (APE)signal is an open-drain type output, and requires an external pull-up resistor (for example,10 KΩ to Vdd) to assure proper de-assertion of the (APE) signal. Following are the statemeaning and timing comments for the APE signal on the 601. For more information, seeSection 9.3.2.1, “Address Bus Parity.”

State Meaning Asserted—Indicates incorrect address bus parity has been detected by the 601 on a snoop (GBL asserted). This includes the first address beat of an I/O controller interface operation.

Negated—Indicates that the 601 has not detected a parity error (even parity) on the address bus.

Timing Comments Assertion—Occurs on the second bus clock cycle after TS or XATS is asserted.High Impedance—Occurs on the third bus clock cycle after TS or XATS is asserted.


8.2.4 Address Transfer Attribute SignalsThe transfer attribute signals are a set of signals that further characterize the transfer—suchas the size of the transfer, whether it is a read or write operation, and whether it is a burstor single-beat transfer. For a detailed description of how these signals interact, seeSection 9.3.2, “Address Transfer.”

Note that some signal functions vary depending on whether the transaction is a memoryaccess or an I/O access. For a description of how these signals function for I/O controllerinterface operations, see Section 9.6, “Memory- vs. I/O-Mapped I/O Operations.”

8.2.4.1 Transfer Type (TT0–TT4)The transfer type (TT0–TT4) signals consist of four input/output signals and one output-only signal on the 601. For a complete description of TT0–TT4 signals, see Table 8-1 andfor transfer type encodings, see Table 8-2.

8.2.4.1.1 Transfer Type (TT0–TT4)—OutputFollowing are the state meaning and timing comments for the TT0–TT4 output signals onthe 601.

State Meaning Asserted/Negated—Indicates the type of transfer in progress. These bits roughly correspond to the following decoded operations:

• Atomic • Read/write • Invalidate • Memory cycle

For I/O controller interface operations these signals are part of the extended address transfer code (XATC) along with TSIZ and TBST:

XATC(0–7)=TT(0–3)||TBST||TSIZ(0–2).

TT4 is driven negated as an output on the 601 and is defined for future expansion.

Timing Comments Assertion/Negation/High Impedance—The same as A0–A31.

8.2.4.1.2 Transfer Type (TT0–TT3)—InputFollowing are the state meaning and timing comments for the TT0–TT3 input signals onthe 601.

State Meaning Asserted/Negated—Indicates the type of transfer in progress (see Table 8-2). For I/O controller interface operations these signals form part of the XATC and are snooped by the 601 if XATS is asserted.



Table 8-1 provides the signal descriptions for TT0–TT4.

Table 8-2 describes the encodings for TT0–TT3.

Table 8-1. TT0–TT4 Signal Description

Signal Description

TT0 Special operations: This signal is asserted whenever a bus transaction is run in response to a lwarx/stwcx. instruction pair, a TLBI (translation lookaside buffer invalidate) operation, or either an eciwx or ecowx instruction.

TT1 Read (or write) operations: This signal indicates whether the transaction is a read (TT1 high) or a write (TT1 low). This assumes that the transaction is not address-only.

TT2 Invalidate operations: When asserted with GBL, the TT2 output signal indicates that all other caches in the system should invalidate the cache entry on a snoop hit. If the snoop hit is to a modified entry, the sector should be copied back before being invalidated.

TT3 Address-only operations: This signal, when asserted, indicates that the data transfer is to/from memory. External logic can synthesize a data bus request from the combined assertions of TS (or XATS) and TT3. If TT3 is not asserted with the address, the associated bus transaction is considered to be a broadcast operation that all potential bus masters must honor (or a reserved operation), except for the external control functions (eciwx and ecowx) which require both address and data tenures.

TT4 Reserved. Always negated (low state). (For expandability)

Table 8-2. Transfer Type Encodings

TT0 TT1 TT2 TT3 Operation Bus Transaction 1 Comment

0 0 0 0 Clean sector Address only Due to cache control operation2

0 0 0 1 Write with flush Single-beat write —

0 0 1 0 Flush sector Address only Due to cache control operation 2

0 0 1 1 Write with kill Burst Cache sector writes (replacement sector copy backs and snoop push operations)

0 1 0 0 sync Address only Due to cache control operation 2

0 1 0 1 Read Single-beat read or burst —

0 1 1 0 Kill sector Address only Store hit on shared sector or cache control operation 2

0 1 1 1 Read with intent to modify Burst Store cache miss

1 0 0 0 — — Reserved

1 0 0 1 Write with flush atomic Single-beat write Caused by stwcx.

1 0 1 0 External control out Single-beat write Caused by ecowx 3

1 0 1 1 — — Reserved


1. These are the transactions the 601 produces for the given encodings, and may not be the sametransactions produced by other bus masters with the same encoding. For example the encoding b'0001' isa single-beat write coming from the 601, but another master may use this encoding or another type ofwrite transaction. Bus participants should use the TT pins in conjunction with the other transfer attributepins to determine the type of transaction.

2. Cache control operations resulting from explicit cache control instructions (for example, dclf, sync, dclz,dcli).

3. The signal encodings for these operations do not use the TT0 and TT3 signals in the manner described inTable 8-1. Note that TT4 is reserved.

8.2.4.2 Transfer Size (TSIZ0–TSIZ2)The transfer size (TSIZ0–TSIZ2) signals consist of three input/output signals on the 601.

8.2.4.2.1 Transfer Size (TSIZ0–TSIZ2)—OutputFollowing are the state meaning and timing comments for the TSIZ0–TSIZ2 output signalson the 601.

State Meaning Asserted/Negated—For memory accesses, these signals along with TBST, indicate the data transfer size for the current bus operation, as shown in Table 8-3. Table 9-2 shows how the TSIZ signals are used with the address signals for aligned transfers. Table 9-3 shows how the TSIZ signals are used with the address signals for misaligned transfers. For I/O transfer protocol, these signals form part of the I/O transfer code (see the description in Section 8.2.4.1, “Transfer Type (TT0–TT4)”).

For external control instructions (eciwx and ecowx), TSIZ0–TSIZ2 are used to output bits 29–31 of the external access register (EAR), which are used to form the resource ID (TBST||TSIZ0–TSIZ2).


1 1 0 0 TLB invalidate Address only —

1 1 0 1 Read atomic Single-beat read or burst Caused by lwarx instruction

1 1 1 0 External control in Single-beat read Caused by eciwx 3

1 1 1 1 Read with intent to modify atomic

Burst Caused by stwcx. instruction

Table 8-2. Transfer Type Encodings (Continued)

TT0 TT1 TT2 TT3 Operation Bus Transaction 1 Comment


8.2.4.2.2 Transfer Size (TSIZ0–TSIZ2)—InputFollowing are the state meaning and timing comments for the TSIZ0–TSIZ2 input signalson the 601.

State Meaning Asserted/Negated—Represents the size of the current transfer, as shown in Table 8-3. For the I/O controller interface protocol, these signals form part of the I/O transfer code (see Section 8.2.4.1, “Transfer Type (TT0–TT4)”).


8.2.4.3 Transfer Burst (TBST)The transfer burst (TBST) signal is an input/output signal on the 601.

8.2.4.3.1 Transfer Burst (TBST)—OutputFollowing are the state meaning and timing comments for the TBST output signal.

State Meaning Asserted—Indicates that a burst transfer is in progress.

Negated—Indicates that a burst transfer is not in progress. Also, part of I/O transfer code (see Section 8.2.4.1, “Transfer Type (TT0–TT4)”).

For external control instructions (eciwx and ecowx), TBST is used to output bit 28 of the EAR, which is used to form the resource ID (TBST||TSIZ0–TSIZ2).


Table 8-3. Data Transfer Size

TBSTTSIZ0–TSIZ2

TransferSize

Asserted 010 Burst (32 bytes)

Negated 000 8 bytes

Negated 001 1 byte

Negated 010 2 bytes

Negated 011 3 bytes

Negated 100 4 bytes

Negated 101 5 bytes

Negated 110 6 bytes

Negated 111 7 bytes


8.2.4.3.2 Transfer Burst (TBST)—InputFollowing are the state meaning and timing comments for the TBST input signal.

State Meaning Asserted/Negated—Indicates that a burst transfer is in progress when asserted and TSIZ0–TSIZ2 are set to 010. For the I/O transfer protocol, this signal forms part of the I/O transfer code (see Section 8.2.4.1, “Transfer Type (TT0–TT4)”).


8.2.4.4 Transfer Code (TC0–TC1)—OutputThe transfer code (TC0–TC1) consists of two output signals on the 601. Following are thestate meaning and timing comments for the TC0–TC1 signals.

State Meaning Asserted/Negated—Represents a special encoding for the transfer in progress (see Table 8-4).


8.2.4.5 Cache Inhibit (CI)—OutputThe cache inhibit (CI) signal is an output signal on the 601. Following are the state meaningand timing comments for the CI signal.

State Meaning Asserted—Indicates that a single-beat transfer will not change the cache, reflecting the setting of the I bit for the block or page that contains the address of the current transaction.

Negated—Indicates that a burst transfer will allocate a sector in the 601 data cache.


Table 8-4. Encodings for TC0–TC1

Signal Description

TC0 Depends on whether the current transaction is a read or write operation; therefore, TC0 should be used with TT1. On a read operation, TC0 asserted indicates the transaction is an instruction fetch operation; otherwise, the read operation is a data operation. Asserting TC0 for write operations indicates the cache sector associated with a write is being invalidated; TC0 negated indicates the cache sector associated with a write is not being invalidated.

TC1 TC1, when asserted, indicates that an operation to reload the other sector is queued; therefore, the next bus transaction will likely be to the same page of memory. After the addressed sector in a cache line is loaded from memory, the 601 attempts to load the other sector in the cache line. This is a low-priority bus operation and may not be the next transaction. The assertion of TC1 suggests that the next access may be to the same page; the hint may be wrong depending on the bus traffic/code execution dynamics.


8.2.4.6 Write-Through (WT)—OutputThe write-through (WT) signal is an output signal on the 601. Following are the statemeaning and timing comments for the WT signal.

State Meaning Asserted—Indicates that a single-beat transaction is write-through, reflecting the value of the W bit for the block or page that contains the address of the current transaction. For burst writes, this indicates that the write is the result of a dcbf and dcbst instruction.

Negated—Indicates that a transaction is not write-through. For bursts it is negated for cast-outs and snoop pushes.


8.2.4.7 Global (GBL)The global (GBL) signal is an input/output signal on the 601.

8.2.4.7.1 Global (GBL)—OutputFollowing are the state meaning and timing comments for the GBL output signal.

State Meaning Asserted—Indicates that a transaction is global, reflecting the setting of the M bit for the block or page that contains the address of the current transaction (except in the case of copy-back operations, which are non-global.)

Negated—Indicates that a transaction is not global.


8.2.4.7.2 Global (GBL)—InputFollowing are the state meaning and timing comments for the GBL input signal.

State Meaning Asserted— Indicates that a transaction must be snooped by the 601.

Negated—Indicates that a transaction is not snooped by the 601 (even if TT0–TT4 indicate an invalidation transaction).


8.2.4.8 Cache Set Element (CSE0–CSE2)—OutputThe cache set element (CSE0–CSE2) signals consist of three output signals on the 601.Following are the state meaning and timing comments for the CSE signals.


State Meaning Asserted/Negated—Represents the cache replacement set element for the current transaction reloading into or writing out of the cache. Can be used with the address bus and the transfer attribute signals to externally track the state of each cache sector in the 601’s cache. See Section 4.7.4, “MESI Hardware Considerations.”


8.2.4.9 High-Priority Snoop Request (HP_SNP_REQ)The high-priority snoop request (HP_SNP_REQ) signal is an input signal (input-only) onthe 601. This pin must be enabled by setting HID0[31] if it is to be used. Following are thestate meaning and timing comments for the HP_SNP_REQ signal

State Meaning Asserted—Indicates that the 601 may add an additional reserved queue position to the list of available queue positions for push transactions that are a result of a snoop hit.

Negated—Indicates that the 601 will not make available the reserved queue for a snoop hit push resulting from a transaction. This is the “normal” mode.

Timing Comments Assertion/Negation—Must be valid through the entire address tenure.

Note: This pin is a feature of the 601 only and will not be available in any other PowerPCprocessors.

8.2.5 Address Transfer Termination SignalsThe address transfer termination signals are used to indicate either that the address phaseof the transaction has completed successfully or must be repeated, and when it should beterminated. These signals are also used to maintain MESI protocol. For detailedinformation about how these signals interact, see Section 9.3.3, “Address TransferTermination.”

8.2.5.1 Address Acknowledge (AACK)—InputThe address acknowledge (AACK) signal is an input signal (input-only) on the 601.Following are the state meaning and timing comments for the AACK signal.

State Meaning Asserted—Indicates that the address phase of a transaction is complete. The address bus will go to a high impedance state on the next bus clock cycle. The 601 samples ARTRY on the bus clock cycle following the assertion of AACK.

Negated—(During ABB) indicates that the address bus and the transfer attributes must remain driven.


Timing Comments Assertion—May occur as early as the bus clock cycle after TS or XATS is asserted; assertion can be delayed to allow adequate address access time for slow devices. For example, if an implementation supports slow snooping devices, an external arbiter can postpone the assertion of AACK.

Negation—Must occur one bus clock cycle after the assertion of AACK.

8.2.5.2 Address Retry (ARTRY)The address retry (ARTRY) signal is both an input and output signal on the 601.

8.2.5.2.1 Address Retry (ARTRY)—OutputFollowing are the state meaning and timing comments for the ARTRY output signal.

State Meaning Asserted—Indicates that the 601 detects a condition in which a snooped address tenure must be retried (see Table 8-5 for encoding). If the 601 needs to update memory as a result of the snoop that caused the retry, the 601 asserts BR (unless it is parked).

High Impedance—Indicates that the 601 does not need the snooped address tenure to be retried.

Timing Comments Assertion—Occurs two bus cycles immediately following the assertion of TS if a retry is required.

Negation—Occurs the second bus cycle after the assertion of AACK. Since this signal may be simultaneously driven by multiple devices, it negates in a unique fashion. First the buffer goes to high impedance for one bus cycle, then it is driven high for one 2XPCLK cycle before returning to high impedance.

This special method of negation may be disabled by setting HID0[29].

Table 8-5 shows the relationship between the SHD and ARTRY signals.

Table 8-5. SHD and ARTRY Signals

SHD ARTRY Description

Z Z No snoop hit, no busy pipeline

Z A Pipeline busy

A Z Snoop hit shared

A A Snoop hit modified


8.2.5.2.2 Address Retry (ARTRY)—InputFollowing are the state meaning and timing comments for the ARTRY input signal.

State Meaning Asserted—If the 601 is the address bus master, ARTRY indicates that the 601 must retry the preceding address tenure and immediately negate BR (if asserted). If the 601 is not the address bus master, this input indicates that the 601 should immediately negate BR for one bus clock cycle following the assertion of ARTRY. Note that the subsequent address retried may not be the same one associated with the assertion of the ARTRY signal.

Negated/High Impedance—Indicates that the 601 does not need to retry the last address tenure.

Timing Comments Assertion—Must occur by the bus clock cycle immediately following the assertion of AACK if a retry is required.

Negation—Must occur during the second cycle after the assertion of AACK. Note that this signal is sampled only following the assertion of AACK.

8.2.5.3 Shared (SHD)The shared (SHD) signal is both an input and output signal on the 601.

8.2.5.3.1 Shared (SHD)—OutputFollowing are the state meaning and timing comments for the SHD output signal.

State Meaning Asserted—Indicates that the 601 either needs the data to be marked shared (in response to a snoop hit for transaction not requiring invalidation) or with ARTRY indicates the 601 has a hit on a cache sector marked as modified.

Negated/High Impedance—Indicates that the 601 did not have a cache hit on the snooped address.

Timing Comments Assertion—The same as ARTRY.Negation—The same as ARTRY.High Impedance—The same as ARTRY.

See Table 8-5 for information about SHD and ARTRY signals.


8.2.5.3.2 Shared (SHD)—InputFollowing are the state meaning and timing comments for the SHD input signal.

State Meaning Asserted—Indicates that for a self-generated transaction, the 601 must allocate the incoming sector as shared (unmodified). If ARTRY is asserted, the transaction must be retried after the other master updates memory.

Negated—Indicates that the address for the current transaction is not in any other cache.

Timing Comments Assertion—The same as ARTRY.Negation—The same as ARTRY.

8.2.6 Data Bus Arbitration SignalsLike the address bus arbitration signals, data bus arbitration signals maintain an orderlyprocess for determining data bus mastership. Note that there is no data bus arbitration signalequivalent to the address bus arbitration signal BR (bus request), because, except foraddress-only transactions, TS and XATS imply data bus requests. For a detailed descriptionon how these signals interact, see Section 9.4.1, “Data Bus Arbitration.”

One special signal, DBWO, allows the 601 to be configured dynamically to write data outof order with respect to read data. For detailed information about using DBWO, seeSection 9.10, “Using DBWO (Data Bus Write Only).”

8.2.6.1 Data Bus Grant (DBG)—InputThe data bus grant (DBG) signal is an input signal (input-only) on the 601. Following arethe state meaning and timing comments for the DBG signal.

State Meaning Asserted—Indicates that the 601 may, with the proper qualification, assume mastership of the data bus. The 601 derives a qualified data bus grant when DBG is asserted and DBB, DRTRY, and ARTRY are negated; that is, the data bus is not busy (DBB is negated), there is no outstanding attempt to retry the current data tenure (DRTRY is negated), and there is no outstanding attempt to perform an ARTRY of the associated address tenure.

Negated—Indicates that the 601 must hold off its data tenures.

Timing Comments Assertion—May occur any time to indicate the 601 is free to take data bus mastership. It is not sampled until TS or XATS is asserted.

Negation—May occur at any time to indicate the 601 cannot assume data bus mastership.


8.2.6.2 Data Bus Write Only (DBWO)—InputThe data bus write only (DBWO) signal is an input signal (input-only) on the 601.Following are the state meaning and timing comments for the DBWO signal.

State Meaning Asserted—Indicates that the 601 may run the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. If DBWO is asserted, the 601 only assumes data bus ownership for a pending data bus write operation (that is, the 601 does not take the data bus for a pending read operation if this input is asserted along with DBG). Refer to Section 9.10, “Using DBWO (Data Bus Write Only),” for detailed instructions for using DBWO.

Negated—Indicates that the 601 must run the data bus tenures in the same order as the address tenures.

Timing Comments Assertion—Must occur no later than a qualified DBG for a previous write tenure. Do not assert if no pending data bus write tenures are pending from previous address tenures.

Negation—May occur any time after a qualified DBG and before the next assertion of DBG.

8.2.6.3 Data Bus Busy (DBB)The data bus busy (DBB) signal is both an input and output signal on the 601.

8.2.6.3.1 Data Bus Busy (DBB)—OutputFollowing are the state meaning and timing comments for the DBB output signal.

State Meaning Asserted—Indicates that the 601 is the data bus master. The 601 always assumes data bus mastership if it needs the data bus and is given a qualified data bus grant (see DBG).

Negated—Indicates that the 601 is not using the data bus.

Timing Comments Assertion—Occurs during the bus clock cycle following a qualified DBG.

Negation—Occurs during the bus clock cycle following the assertion of the final TA.

High Impedance—Occurs one-half processor clock cycle after DBB is negated.

8.2.6.3.2 Data Bus Busy (DBB)—InputFollowing are the state meaning and timing comments for the DBB input signal.

State Meaning Asserted—Indicates that another device is bus master.Negated—Indicates that the data bus is free (with proper qualification, see DBG) for use by the 601.


Timing Comments Assertion—Must occur when the 601 must be prevented from using the data bus.

Negation—May occur whenever the data bus is available.

8.2.7 Data Transfer SignalsLike the address transfer signals, the data transfer signals are used to transmit data and togenerate and monitor parity for the data transfer. For a detailed description of how the datatransfer signals interact, see Section 9.4.2, “Data Transfer.”

8.2.7.1 Data Bus (DH0–DH31, DL0–DL31) The data bus (DH0–DH31 and DL0–DL31) consists of 64 signals that are both input andoutput on the 601. Following are the state meaning and timing comments for the DH andDL signals.

State Meaning The data bus has two halves—data bus high (DH) and data bus low (DL). See Table 8-6 for the data bus lane assignments. I/O controller interface operations use DH exclusively (that is, there are no 64-bit, I/O transfers).

Timing Comments The data bus is driven once for non-cached transactions and four times for cache transactions (bursts).

Table 8-6. Data Bus Lane Assignments

Data Bus Signals Byte Lane

DH0–DH7 0

DH8–DH15 1

DH16–DH23 2

DH24–DH31 3

DL0–DL7 4

DL8–DL15 5

DL16–DL23 6

DL24–DL31 7


8.2.7.1.1 Data Bus (DH0–DH31, DL0–DL31)—OutputFollowing are the state meaning and timing comments for the DH and DL output signals.

State Meaning Asserted/Negated—Represents the state of data during a data write. Unused byte lanes are driven to deterministic values.

Timing Comments Assertion/Negation—Initial beat coincides with DBB and, for bursts, transitions on the bus clock cycle following each assertion of TA.High Impedance—Occurs on the bus clock cycle after the final assertion of TA.

8.2.7.1.2 Data Bus (DH0–DH31, DL0–DL31)—InputFollowing are the state meaning and timing comments for the DH and DL input signals.

State Meaning Asserted/Negated—Represents the state of data during a data read transaction.

Timing Comments Assertion/Negation—Data must be valid on the same bus clock cycle that TA is asserted; however, if DRTRY is asserted, valid data must coincide with the assertion of the final DRTRY for a given data beat.

8.2.7.2 Data Bus Parity (DP0–DP7)The eight data bus parity (DP0–DP7) signals on the 601 are both output and input signals.

8.2.7.2.1 Data Bus Parity (DP0–DP7)—OutputFollowing are the state meaning and timing comments for the DP output signals.

State Meaning Asserted/Negated—Represents odd parity for each of eight bytes of data write transactions. Odd parity means that an odd number of bits, including the parity bit, are driven high. The signal assignments are listed in Table 8-7.

Timing Comments Assertion/Negation—The same as DL0–DL31.High Impedance—The same as DL0–DL31.


8.2.7.2.2 Data Bus Parity (DP0–DP7)—InputFollowing are the state meaning and timing comments for the DP input signals.

State Meaning Asserted/Negated—Represents odd parity for each byte of read data. Parity is checked on all data byte lanes, regardless of the size of the transfer. Detected even parity causes a checkstop if data parity errors are enabled in the HID register. (See DPE.)

Timing Comments Assertion/Negation—The same as DL0–DL31.

8.2.7.3 Data Parity Error (DPE)—OutputThe data parity error (DPE) signal is an output signal (output-only) on the 601. Note thatthe (DPE) signal is an open-drain type output, and requires an external pull-up resistor (forexample, 10 KΩ to Vdd) to assure proper de-assertion of the (DPE) signal. Following arethe state meaning and timing comments for the DPE signal.

State Meaning Asserted—Indicates incorrect data bus parity.Negated—Indicates correct data bus parity.

Timing Comments Assertion—Occurs on the second bus clock cycle after TA is asserted to the 601.

High Impedance—Occurs on the third bus clock cycle after TA is asserted to the 601.

8.2.8 Data Transfer Termination SignalsData termination signals are required after each data beat in a data transfer. Note that in asingle-beat transaction, the data termination signals also indicate the end of the tenure,while in burst accesses, the data termination signals apply to individual beats and indicatethe end of the tenure only after the final data beat.

Table 8-7. DP0–DP7 Signal Assignments

Signal Name Signal Assignments

DP0 DH0–DH7

DP1 DH8–DH15

DP2 DH16–DH23

DP3 DH24–DH31

DP4 DL0–DL7

DP5 DL8–DL15

DP6 DL16–DL23

DP7 DL24–DL31


For a detailed description of how these signals interact, see Section 9.4.3, “Data TransferTermination.”

8.2.8.1 Transfer Acknowledge (TA)—InputThe transfer acknowledge (TA) signal is an input signal (input-only) on the 601. Followingare the state meaning and timing comments for the TA signal.

State Meaning Asserted— Indicates that a single-beat data transfer completed successfully or that a data beat in a burst transfer completed successfully (unless DRTRY is asserted on the next bus clock cycle). Note that TA must be asserted for each data beat in a burst transaction. For more information refer to Section 9.4.3, “Data Transfer Termination.”

Negated—(During DBB) indicates that, until TA is asserted, the 601 must continue to drive the data for the current write or must wait to sample the data for reads.

Timing Comments Assertion—Must not occur before AACK for the current transaction (if the address retry mechanism is to be used; otherwise, assertion may occur at any time during the assertion of DBB). The system can withhold assertion of TA to indicate that the 601 should insert wait states to extend the duration of the data beat.

Negation—Must occur after the bus clock cycle of the final (or only) data beat of the transfer. For a burst transfer, the system can assert TA for one bus clock cycle and then negate it to advance the burst transfer to the next beat and insert wait states during the next beat.

8.2.8.2 Data Retry (DRTRY)—InputThe data retry (DRTRY) signal is input only on the 601. Following are the state meaningand timing comments for the DRTRY signal.

State Meaning Asserted—Indicates that the 601 must invalidate the data from the previous read operation.

Negated—Indicates that data presented with TA on the previous read operation is valid. This is essentially a late TA to allow speculative forwarding of data (with TA) during reads. Note that DRTRY is ignored for write transactions

Timing Comments Assertion—Must occur during the bus clock cycle immediately after TA is asserted if a retry is required. The DRTRY signal may be held asserted for multiple bus clock cycles. When DRTRY is negated, data must be valid.

Negation—Must occur during the bus clock cycle after a valid data


beat. This may occur several cycles after DBB is negated, effectively extending the data bus tenure.

8.2.8.3 Transfer Error Acknowledge (TEA)—InputThe transfer error acknowledge (TEA) signal is input only on the 601. Following are thestate meaning and timing comments for the TEA signal.

State Meaning Asserted—Indicates that a bus error occurred. Causes a machine check exception (and possibly causes the processor to enter checkstop state if machine check enable bit is cleared (MSR[ME] = 0)). For more information see Section 5.4.2.2, “Checkstop State (MSR[ME] = 0).” Assertion terminates the current transaction; that is, assertion of TA and DRTRY are ignored. The assertion of TEA causes the negation/high impedance of DBB in the next clock cycle. However, data entering the GPR or the cache are not invalidated.

Negated—Indicates that no bus error was detected.

Timing Comments Assertion—May be asserted while DBB and/or DRTRY is asserted.

Negation— TEA must be negated no later than the negation of DBB or the last DRTRY.

8.2.9 System Status SignalsMost system status signals are input signals that indicate when exceptions are received,when checkstop conditions have occurred, and when the 601 must be reset. The 601generates the output signal, CKSTP_OUT, when it detects a checkstop condition. For adetailed description of these signals, see Section 9.7, “Interrupt, Checkstop, and ResetSignals.”

8.2.9.1 Interrupt (INT)—InputThe interrupt (INT) signal is input only. Following are the state meaning and timingcomments for the INT signal.

State Meaning Asserted—The 601 latches the interrupt condition if MSR[EE] is set; otherwise, the 601 ignores the interrupt condition. To guarantee that the 601 will take the external interrupt, the INT pin must be held active until the 601 takes the interrupt; otherwise, whether the 601 takes an external interrupt, depends on whether the MSR[EE] bit was set while the INT signal was held active.

Negated—Indicates that normal operation should proceed. See Section 9.7.1, “External Interrupt.”


Timing Comments Assertion—May occur at any time. Negation—May occur any time after the minimum pulse width has been met. (Minimum pulse width is three processor clock cycles.) After the minimum pulse width has been met, an interrupt exception occurs.

8.2.9.2 Checkstop Input (CKSTP_IN)—InputThe checkstop input (CKSTP_IN) signal is input only on the 601. Following are the statemeaning and timing comments for the CKSTP_IN signal.

State Meaning Asserted—Indicates that the 601 must terminate operation by internally gating off all clocks. Once CKSTP_IN has been asserted it must remain asserted until the system has been reset.

Negated—Indicates that normal operation should proceed. See Section 9.7.2, “Checkstops.”

Timing Comments Assertion—May occur at any time and may be asserted asynchronously to the input clocks. CKSTP_IN must be asserted for a minimum of three PCLK_EN clock cycles. Or, it may be asserted synchronously meeting setup and hold times (specified in the PowerPC 601 RISC Microprocessor Hardware Specifications) and must be asserted for at least two PCLK_EN clock cycles.

Negation—May occur any time after the CKSTP_OUT output signal has been asserted.

8.2.9.3 Checkstop Output (CKSTP_OUT)—OutputThe checkstop output (CKSTP_OUT) signal is output only on the 601. Note that the(CKSTP_OUT) signal is an open-drain type output, and requires an external pull-upresistor (for example, 10 KΩ to Vdd) to assure proper de-assertion of the (CKSTP_OUT)signal. Following are the state meaning and timing comments for the CKSTP_OUT signal.

State Meaning Asserted—Indicates that the 601 has detected a checkstop condition and has ceased operation.

Negated—Indicates that the 601 is operating normally.See Section 9.7.2, “Checkstops.”

Timing Comments Assertion—May occur at any time and may be asserted asynchronously to the 601 input clocks.

Negation—Requires HRESET assertion.

8.2.9.4 Reset SignalsThere are two reset signals on the 601—hard reset (HRESET) and soft reset (SRESET).Descriptions of the reset signals are as follows.


8.2.9.4.1 Hard Reset (HRESET)—InputThe hard reset (HRESET) signal is input only and must be used at power-on to properlyreset the processor. Following are the state meaning and timing comments for the HRESETsignal.

State Meaning Asserted—Initiates a complete hard reset operation when this input transitions from asserted to negated. Causes a reset exception as described in Section 5.4.1.2, “Hard Reset.” Output drivers are released to high impedance within three clocks after the assertion of HRESET.

Negated—Indicates that normal operation should proceed. See Section 9.7.3, “Reset Inputs.”

Timing Comments Assertion—May occur at any time and may be asserted asynchronously to the 601 input clocks.

Negation—May occur any time after the minimum reset pulse width has been met. (Minimum pulse width is 300 processor clock cycles.)

This input has additional functionality in certain test modes.

8.2.9.4.2 Soft Reset (SRESET)—InputThe soft reset (SRESET) signal is input only. Following are the state meaning and timingcomments for the SRESET signal.

State Meaning Asserted— Initiates processing for a reset exception as described in Section 5.4.1.1, “Soft Reset.”

Negated—Indicates that normal operation should proceed. See Section 9.7.3, “Reset Inputs.”

Timing Comments Assertion—May occur at any time.

Negation—May occur any time after the minimum soft-reset pulse width has been met. (Minimum pulse width is 10 processor clock cycles.)

This input has additional functionality in certain test modes.

8.2.9.5 System Quiesced (SYS_QUIESC)The system quiesced (SYS_QUIESC) signal is input only. Following are the state meaningand timing comments for the SYS_QUIESC signal.

State Meaning Asserted—Enables soft stop in the 601. For more information about soft stop state, see Section 9.7.4, “Soft Stop Control Signals.”

Negated—Indicates that soft stop is not enabled in the 601processor.

Timing Comments Assertion/Negation—Must meet setup and hold times as described in the PowerPC 601 RISC Microprocessor Hardware Specifications.


Note that systems that do not use this signal should tie it low.

8.2.9.6 Resume (RESUME)The resume (RESUME) signal is input only. Following are the state meaning and timingcomments for the RESUME signal.

State Meaning Asserted—Restarts the 601 after a soft stop.

Negated—Indicates that the 601 is not allowed to resume normal operation if a soft stop has occurred.

Timing Comments Assertion—May occur at any time and may be asserted asynchronously to the 601 input clocks. RESUME must be asserted for a minimum of three PCLK_EN clock cycles. Or, it may be asserted synchronously meeting setup and hold times (specified in the PowerPC 601 RISC Microprocessor Hardware Specifications) and must be asserted for at least two PCLK_EN clock cycles.

Negation—May occur any time after the minimum pulse width has been met.

Note that systems that do not use this signal should tie it low.

8.2.9.7 Quiesce Request (QUIESC_REQ) The quiesce request (QUIESC_REQ) signal is output only. Following are the state meaningand timing comments for the QUIESC_REQ signal.

State Meaning Asserted—Indicates that the 601 is requesting a soft stop for the system.

Negated—Indicates that the 601 is operating normally.

Timing Comments Assertion—May occur at any time to indicate that the 601 is requesting a soft stop.

Negation—May occur at any time to indicate that the 601 is not requesting a soft stop.

8.2.9.8 Reservation (RSRV)—OutputThe reservation (RSRV) signal is output only on the 601. Following are the state meaningand timing comments for the RSRV signal.

State Meaning Asserted/Negated—Represents the state of the reservation coherency bit in the reservation address register that is used by the lwarx and stwcx. instructions. See Section 9.8.1, “Support for the lwarx/stwcx. Instruction Pair.”


Timing Comments Assertion/Negation—Occurs synchronously with respect to bus clock cycles. The execution of an lwarx instruction sets the internal reservation condition. When the next bus transition occurs, RSRV is asserted.

8.2.9.9 Driver Mode (SC_DRIVE) The driver mode (SC_DRIVE) signal is input only on the 601. Following are the statemeaning and timing comments for the SC_DRIVE signal.

State Meaning Asserted—Indicates that the drive current for the following output buffers is increased; ABB, DBB, ARTRY, SHD, TS, XATS (approximately 2x).

Negated—The drive current for the six signals above will be the same as all other signals for the 601.

Timing Comments Assertion/Negation—This is not a dynamic signal; it must not change after HRESET is negated.

8.2.10 COP/Scan InterfaceThe 601 has extensive on-chip test capability including the following:

• Built-in self test (BIST)• Debug control/observation (COP)• Boundary scan (IEEE 1149.1 compatible interface)

The BIST hardware is exercised as part of the POR sequence. The COP and boundary scanlogic are not used under typical operating conditions.

Detailed discussion of the 601 test functions is beyond the scope of this document;however, sufficient information has been provided to allow the system designer to disablethe test functions that would impede normal operation.

The interface is shown in Figure 8-2. For more information, refer to Section 9.9, “IEEE1149.1-Compatible Interface.”


Figure 8-2. IEEE 1149.1-Compatible Boundary Scan Interface

See Table 8-8 for the COP/scan interface signals.

8.2.11 Clock SignalsThe clock signal inputs of the 601 determine the system clock frequency and provide aflexible clocking scheme that allows the processor to operate at an integer multiple of thesystem clock frequency.

Refer to the PowerPC 601 RISC Microprocessor Hardware Specifications for exact timingrelationships of the clock signals.

8.2.11.1 Double-Speed Processor Clock (2X_PCLK)—InputThe double-speed processor clock (2X_PCLK) signal is input only on the 601. This signalis the highest frequency input to the 601; it switches at twice the frequency of the internalP_CLOCK provided that the PCLK_EN signal is half the frequency of the 2X_PLCLK asshown in Figure 8-3. This input clocks the latch that samples the PCLK_EN input,providing duty-cycle control for the internal P_CLOCK (see Figure 8-3).

Table 8-8. COP/Scan Interface

SIgnal Name I/O Timing Comments

SCAN_CTL I This input signal should be driven high for normal operation.

SCAN_CLK I This input signal should be driven high for normal operation.

SCAN_SIN I This input signal should be driven low for normal operation.

ESP_EN I This input signal should be driven high for normal operation.

BSCAN_E N I This input signal should be driven high for normal operation.

RUN_NSTOP O This output signal is a no connect (NC) for normal operation.

SCAN_OUT O This output signal is a no connect (NC) for normal operation.

TDI (Test Data Input)

TMS (Test Mode Select)

TCK (Test Clock input)

BSCAN_EN (Boundary Scan Enable)

TDO (Test Data Output)

TRST/HRESET (Test Reset)


Following are the state meaning and timing comments for the 2X_PCLK signal.

State Meaning Rising edge—Is the clocking edge for a synchronizing latch used to generate the internal processor clock (see PCLK_EN).

Timing Comments Duty cycle—Refer to the PowerPC 601 RISC Microprocessor Hardware Specifications.

8.2.11.2 Clock Phase (PCLK_EN)—InputThe clock phase (PCLK_EN) signal is input only on the 601. The PCLK_EN signalswitches at the same frequency as the internal CPU clock (P_CLOCK in Figure 8-3). ThePCLK_EN signal determines the phase of the internal P_CLOCK (timing and duty cycleare derived from the 2X_PCLK input); therefore, this input can be used to synchronizemultiple 601s.

Figure 8-3 shows how the internal P_CLOCK is always identical to the PCLK_EN signalexcept it is inverted and delayed by one full 2X_PCLK cycle.

The 601 can tolerate dynamic P_CLOCK cycle stretching. This can be accomplished byaltering the duty cycle of the PCLK_EN input. For example, the system can extend a givenCPU clock cycle by negating PCLK_EN for more than one 2X_PCLK cycle. Thiseffectively delays the bus clock input sampling points and output drive points in half of aprocessor cycle increments and further delays execution of instructions accordingly.

Figure 8-3. Internal P_CLOCK Generation

Following are the state meaning and timing comments for the PCLK_EN signal.

State Meaning Asserted—Indicates that the 601 should generate the high phase of the internal processor clock synchronized to 2X_PCLK.

Negated—Indicates that the 601 should generate the low phase of the internal processor clock synchronized to 2X_PCLK.

Timing Comments Assertion—May occur one 2X_PCLK cycle after the negation of PCLK_EN with appropriate setup to the falling edge of 2X_PCLK.

Negation—Must occur one 2X_PCLK cycle after the assertion of PCLK_EN with appropriate setup to the falling edge of 2X_PCLK.

2X_PCLK

DPCLK_EN D(IN)

OUT

CLK CLK

(internal)P_CLOCK


8.2.11.3 Bus Phase (BCLK_EN)—InputThe bus phase (BCLK_EN) signal is input only on the 601. This input determines, inconjunction with PCLK_EN and 2X_PCLK, the transition timing for the 601 bus interface.While all timing is derived from the rising edge of the 2X_PCLK input, the two phaseinputs qualify the edge on which the processor and bus interface sequential logic canproceed. Inputs are sampled and outputs are driven with the qualified rising edge of the2X_PCLK input (see Figure 8-4).

Following are the state meaning and timing comments for the BCLK_EN signal.

State Meaning Asserted— Indicates that the 601 must use the rising edge of the internal processor clock to sample and drive the bus interface.

Negated—Indicates that the 601 outputs must not change state, and the inputs will not be sampled. This signal can be treated as a synchronous enable for the bus clock cycle clock.

Timing Comments Assertion/Negation—With appropriate setup and hold time to the 2X_PCLK provided the rising edge of the internal processor clock coincides with the 2X_PCLK.

Figure 8-4 through Figure 8-8 illustrate how the 601 clocking signals can be used togenerate a logical bus clock. Note that the resulting logical bus clock is represented as anarrow coincident with the rising edge of the resulting signal. It should not be inferred thatthe duty cycle of the bus clock signal is 50 percent.

Figure 8-4 shows how the clock inputs can be used to control the 601. Note that the signalIN is the output of the inverter shown in Figure 8-3.

Figure 8-4. Generation of Internal Clock (P_CLK)

Figure 8-5 shows a simple 601 clock implementation with the frequency of the logical busclock equal to that of the P_CLK.

2X_PCLK

PCLK_EN

IN

P_CLK

0 1 2 3 4 5 6 7

* * * *

***

*Delay of inverter output


Figure 8-5. Generation of Bus Transitions—Logical Bus Clock = P_CLK

Figure 8-6 shows the generation of the logical bus clock at one-half the frequency of theP_CLK.

Figure 8-6. Generation of Bus Transitions—Logical Bus Clock = 1/2 P_CLK

Figure 8-7 shows the generation of the logical bus clock at one-third the frequency of theP_CLK.

2X_PCLK

PCLK_EN

IN

P_CLK

BCLK_EN

Logical Bus Clock

0 1 2 3 4 5 6 7

* * * *

***


2X_PCLK

PCLK_EN

IN

P_CLK

BCLK_EN

Bus Transition

0 1 2 3 4 5 6 7 8 9 10 11


* * * *

***

* *

* *


Figure 8-7. Generation of Bus Transitions—Logical Bus Clock=1/3 P_CLK

Figure 8-8 shows how the PCLK_EN signal can be manipulated to perform cycle stretchingon the 601.

Figure 8-8. Generation of Bus Transitions—Cycle Stretching

In this document, processor clock refers to the internal P_CLOCK signal; bus clock refersto the clock that causes the bus transitions.

Figure 8-5 and Figure 8-6 show two examples of the generation of bus transitions. In thefirst example, BCLK_EN is grounded (always asserted) and the bus clock period isequivalent to the P_CLOCK cycle period. In the second example, the BCLK_EN input isdriven by a clock switching at PCLK_EN/2 frequency. This allows the 601 bus interface torun at half the frequency of the CPU P_CLOCK, easing system design constraints. Notethat the BCLK_EN input can be divided further (with respect to PCLK_EN), allowing aneven greater ratio between the clock- and bus-cycle frequencies.

2X_PCLK

PCLK_EN

IN

P_CLK

BCLK_EN

Bus Transition

0 1 2 3 4 5 6 7 8 9 10 11


* * * *

***

* *

* *

2X_PCLK

PCLK_EN

IN

P_CLK

BCLK_EN

Bus Transition

0 1 2 3 4 5 6 7 8 9 10 11


* * * *

***

* *

* *


To operate the bus interface slower than P_CLOCK/2, BCLK_EN must be asserted only forthe intended P_CLOCK window (for example, the duty cycle can be skewed such that thebus logic increments only once during each assertion of BCLK_EN).

8.2.11.4 Real-Time Clock (RTC)—InputThe real-time clock (RTC) signal is input only on the 601, and should be driven by a 7.8125MHz oscillator. Following are the state meaning and timing comments for the RTC signal.

State Meaning Rising edge—Increments the real-time clock in the 601.

Timing Comments Duty cycle—See the PowerPC 601 RISC Microprocessor Hardware Specifications.

8.3 Clocking in a Multiprocessor SystemClocking in a multiprocessor system adds a level of complexity. The 601 defines the ACtiming specifications for the chip inputs and outputs to allow for a reasonable amount ofsystem-level skew and still allow the chip to meet its timing goals. These timingspecifications can be found in the PowerPC 601 RISC Microprocessor HardwareSpecifications.

Chapter 9. System Interface Operation 9-1

Chapter 9 System Interface Operation9090

This section describes the PowerPC 601 microprocessor bus interface and its operation. Itshows how the 601 signals, defined in Chapter 8, “Signal Descriptions,” interact to performaddress and data transfers.

9.1 PowerPC 601 Microprocessor System Interface Overview

The system interface performs external accesses for loading and storing data and fetchinginstructions.

Instructions are automatically fetched from the memory system into the instruction unitwhere they are dispatched to the execution units at a maximum rate of three instructions perclock. Conversely, load and store instructions explicitly specify the movement of operandsto and from the integer and floating-point register files and the memory system.

When the 601 encounters an instruction or data access, it calculates the logical address(effective address) and uses the low-order address bits to check for a hit in the on-chip, 32-Kbyte cache. Operation of the cache is described in Section 9.1.1, “Operation of the On-Chip Cache.” During the cache lookup, the memory management unit (MMU) uses theupper-order address bits to calculate the virtual address, from which it calculates thephysical address. The physical address bits are then compared with the correspondingcache tag bits to determine if a cache hit occurred. If the access misses in the cache, thephysical address is used to access system memory.

In addition to the loads, stores, and instruction fetches, the 601 performs other read andwrite operations for table searches, cache cast-out operations when least-recently usedsectors are written to memory after a cache miss, and cache-sector snoop push-outoperations when a modified sector experiences a snoop hit from another bus master.

All read and write operations are handled by the memory unit, which consists of a two-element read queue that holds addresses for read operations, and a three-element writequeue that contains addresses and data for write operations. To maintain coherency, thequeues are included in snooping. The interface allows one level of pipelining; that is, withcertain restrictions discussed later, there can be two outstanding transactions at any given


time. Accesses are prioritized. The operation of the memory unit is described inSection 9.1.2, “Operation of the Memory Unit for Loads and Stores.”

Figure 9-1 shows the address path from the execution units and instruction fetcher, throughthe translation logic to the cache and system interface logic.

The 601 uses separate address and data buses and a variety of control and status signals forperforming reads and writes. The address bus is 32 bits wide and the data bus is 64 bitswide. The interface is synchronous—all timing is derived from the start of the bus cycle.All 601 inputs are sampled at and all outputs are driven from this edge. The bus can run atthe full processor-clock frequency or at an integer division of the processor-clock speed.The 601 provides a TTL-compatible interface.

9.1.1 Operation of the On-Chip Cache The 601’s cache is a combined instruction and data (or unified) cache. It is a physically-addressed, virtually-indexed, 32-Kbyte cache with eight-way set associativity. The cacheconsists of eight sets of 128 sectors. Each 16-word cache line consists of two 8-wordsectors. Both sectors share the same line address tag. Cache coherency, however, ismaintained for each sector, so there are separate coherency state bits for each sector. If onesector of the line is filled from memory, the 601 attempts to load the other sector as a low-priority bus operation. There is no guarantee that the other sector will be loaded.

Because the cache on the 601 is an on-chip, write-back primary cache, the predominanttype of transaction for most applications is burst-read memory operations, followed byburst-write memory operations, I/O controller interface operations, and single-beat(noncacheable or write-through) memory read and write operations. Additionally, there canbe address-only operations, variants of the burst and single-beat operations (global memoryoperations that are snooped, and atomic memory operations, for example), and addressretry activity (for example, when a snooped read access hits a modified line in the cache).

The cache tag directory has one address port dedicated to instruction fetch and load/storeaccesses and one dedicated to snooping transactions on the system interface. Therefore,snooping does not require additional clock cycles unless a snoop hit that requires a cachestatus update occurs.


Figure 9-1. PowerPC 601 Microprocessor Block Diagram

(INSTRUCTION FETCH)

RTC

IU BPU FPU

MMU

SYSTEM INTERFACE

RTCU

RTCL

INSTRUCTIONQUEUE

+

+ * / +

INSTRUCTIONINSTRUCTION

8 WORDS

XERGPRFILE

CTRCRLR

2 WORDS1 WORDDATA

ADDRESS

UTLB

BATARRAY

ITLB TAGS32-KBYTE

CACHE(INSTRUC-

TION AND DA-

PHYSICAL ADDRESS

ADDRESS

DATA

SNOOPADDRESS

ADDRESS

DATA

64-BIT DATA BUS

32-BIT ADDRESS BUS

MEMORY UNIT

SNOOPAB

READ QUEUE

WRITE QUEUE

+ * /

FPRFILE FPSCR

INSTRUCTION UNIT

ISSUE LOGIC

2 WORDS

4 WORDSDATA

8 WORDS


9.1.2 Operation of the Memory Unit for Loads and Stores As shown in Figure 9-1, the memory unit includes two read-queue elements and threewrite-queue elements. The read queue buffers are used for holding addresses for readoperations; the write queue buffers are used for holding addresses and data for writeoperations and to support such features as address pipelining, snooping, and writebuffering, described as follows:

• The two read-queue elements allow the system interface logic to buffer as many as two outstanding read operations. There are two restrictions that apply to filling the two read-queue elements described as follows:

— There cannot be two outstanding load operations.— There cannot be two outstanding read-with-intent-to-modify instructions.

• Note that when a read miss causes the cache to be updated, only the sector with the required data is guaranteed to be updated. The other sector can be updated only if both read-queue elements are free. The update of the other sector can be disabled by setting bits in the HID0 register. HID0[DRF], bit 26, can be used to disable fetches and HID0[DRL], bit 27, can be used to disable loads and stores.

• Two of the three write-queue elements, marked “A” and “B” in Figure 9-1, are buffers for write operations. They buffer store operations and sectors that are written back to memory such as when a cache location is updated after a cache miss. This allows the cache to be updated before the replaced sector is written back to system memory. Write-queue elements A or B will be used for snoop push operations if high-priority copyback is not enabled.

• The third queue element, marked “snoop” in Figure 9-1, has two modes of operation. The default mode provides high-priority copy-back operations that result from snoop hits to modified data (cache-sector snoop push-out operations while a read operation is pending on the bus). Snoop hits to modified data create a high-priority store operation that allows the processor to become bus master to store the modified data to memory, where it in turn is read by the snooping device. The override mode uses the HP_SNP_REQ signal to determine if the snoop queue is to be used. This mode is enabled by setting HID0[31].

9.1.3 Operation of the System Interface Memory accesses can occur in single-beat (1–8 bytes) and four-beat burst (32 bytes) datatransfers. The address and data buses are independent for memory accesses to supportpipelining and split transactions. The 601 can pipeline as many as two transactions and haslimited support for out-of-order split-bus transactions.

Access to the system interface is granted through an external arbitration mechanism thatallows devices to compete for bus mastership. This arbitration mechanism is flexible,allowing the 601 to be integrated into systems that implement various fairness and busparking procedures to avoid arbitration overhead. Additional multiprocessor support isprovided through coherency mechanisms that provide snooping, external control of the on-


chip cache and TLB, and support for a secondary cache. Multiprocessor software supportis provided through the use of atomic memory operations.

Typically, memory accesses are weakly ordered—sequences of operations, includingload/store string and multiple instructions, do not necessarily complete in the order theybegin—maximizing the efficiency of the bus without sacrificing coherency of the data. The601 allows read operations to precede store operations (except when a dependency exists,of course). In addition, the 601 can be configured to reorder high priority write operationsahead of lower priority store operations. Because the processor can dynamically optimizerun-time ordering of load/store traffic, overall performance is improved.

Note that the Synchronize (sync) or Enforce In-Order Execution of I/O (eieio) instructioncan be used to enforce strong ordering.

The following sections describe how the 601 interface operates, providing detailed timingdiagrams that illustrate how the signals interact. A collection of more general timingdiagrams are included as examples of typical bus operations.

Figure 9-2 is a legend of the conventions used in the timing diagrams.

This is a synchronous interface—all 601 input signals except the PCLK_EN signals aresampled relative to the start of the bus clock cycle. Outputs are driven off the start of thebus cycle (see the PowerPC 601 RISC Microprocessor Hardware Specifications for exacttiming information).

9.1.4 I/O Controller Interface Accesses In addition to supporting memory-mapped I/O with the high-performance memoryinterface, the 601 also implements the I/O controller interface protocol for compatibilitywith certain external devices. Memory and I/O controller interface accesses use the 601signals differently.

The 601 defines separate memory and I/O address spaces, or segments, distinguished by thesegment register T bit in the address translation logic of the 601. If the T-bit is cleared, thememory reference is a normal memory access and can use the virtual memory managementhardware of the 601. If the T-bit is set, the memory reference is an I/O controller interfaceaccess.

The function and timing of some address transfer and attribute signals (such as TT0–TT3,TBST, and TSIZ0–TSIZ2) are changed for I/O controller interface accesses. Additionalcontrols are required to facilitate transfers between the 601 and intelligent I/O devices. I/Ocontroller interface and memory transfers are distinguished from one another by theiraddress transfer start signals—TS indicates that a memory transfer is starting and XATSindicates that an I/O controller interface transaction is starting.

Unlike memory accesses, I/O controller interface accesses cannot be pipelined and must bestrongly ordered—each access occurs in strict program order and completes before another


access can begin. For this reason, I/O controller interface accesses are less efficient thanmemory accesses. The I/O extensions also allow for additional bus pacing and multipletransaction operations for variably-sized data transfers (1 to 128 bytes), and they support atagged, split request/response protocol. The I/O controller interface access protocol alsorequires the slave device to function as a bus master.

Figure 9-2. Timing Diagram Legend

9.2 Memory Access Protocol Memory accesses are divided into address and data tenures. Each tenure has three phases—bus arbitration, transfer, and termination. The 601 also supports address-only transactions.Note that address and data tenures can overlap, as shown in Figure 9-3.

Figure 9-3 shows that the address and data tenures are distinct from one another and thatboth consist of three phases—arbitration, transfer, and termination. Address and data

601 input (while the 601 is a bus master)

601 output (while the 601 is a bus master)

601 output (grouped: here, address plus attributes)

601 internal signal (inaccessible to the user, but used indiagrams to clarify operations)

Compelling dependency—event will occur on thenext clock cycle

Prerequisite dependency—event will occur on anundetermined subsequent clock cycle

601 three-state output or input

601 nonsampled input

Signal with sample point

A sampled condition (dot on high or low state) with multiple dependencies

Timing for a signal had it been asserted (it is notactually asserted)

Bar over signal name indicates active low

ap0

BR

ADDR+

qual BG_


tenures are independent (indicated in Figure 9-3 by the fact that the data tenure beginsbefore the address tenure ends), which allows split-bus transactions to be implemented atthe system level in multiprocessor systems. Figure 9-3 shows a data transfer that consistsof a single-beat transfer of as many as 64 bits. Four-beat burst transfers of 32-byte cachesectors require data transfer termination signals for each beat of data.

Figure 9-3. Overlapping Tenures on the PowerPC 601 Microprocessor Bus for a Single-Beat Transfer

The basic functions of the address and data tenures are as follows:

• Address tenure

— Arbitration: During arbitration, address bus arbitration signals are used to gain mastership of the address bus.

— Transfer: After the 601 is the address bus master, it transfers the address on the address bus. The address signals and the transfer attribute signals control the address transfer. The address parity and address parity error signals ensure the integrity of the address transfer.

— Termination: After the address transfer, the system signals that the address tenure is complete or that it must be repeated.

• Data tenure

— Arbitration: To begin the data tenure, the 601 arbitrates for mastership of the data bus.

— Transfer: After the 601 is the data bus master, it samples the data bus for read operations or drives the data bus for write operations. The data parity and data parity error signals ensure the integrity of the data transfer.

— Termination: Data termination signals are required after each data beat in a data transfer. Note that in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the final data beat.

ARBITRATION TRANSFER TERMINATION

ADDRESS TENURE

ARBITRATION SINGLE-BEAT TRANSFER TERMINATION

DATA TENURE

INDEPENDENT ADDRESS AND DATA


The 601 bus supports address-only transfers, which use only the address bus, with no datatransfer involved. This is useful in multiprocessor environments where external control ofon-chip primary caches and TLB entries is desirable. Additionally, the 601’s retrycapability provides an efficient snooping protocol for systems with multiple memorysystems (including caches) that must remain coherent.

9.2.1 Arbitration Signals Arbitration for both address and data bus mastership in a multiprocessor system isperformed by a central, external arbiter and, minimally, by the arbitration signals shown inSection 8.2.1, “Address Bus Arbitration Signals.” Most arbiter implementations requireadditional signals to coordinate bus master/slave/snooping activities. Note that address busbusy (ABB) and data bus busy (DBB) are bidirectional signals. These signals are inputsunless the 601 has mastership of one or both of the respective buses; they must be connectedhigh through pull-up resistors so that they remain negated when no devices have control ofthe buses.

The following list describes the address arbitration signals:

• BR (bus request)—Assertion indicates that the 601 is requesting mastership of the address bus.

• BG (bus grant)—Assertion indicates that the 601 may, with the proper qualification, assume mastership of the address bus. A qualified bus grant occurs when BG is asserted and ABB and ARTRY are negated.

If the 601 is parked, BR need not be asserted for the qualified bus grant.

• ABB (address bus busy)— Assertion indicates that the 601 is the address bus master.

The following list describes the data arbitration signals:

• DBG (data bus grant)—Indicates that the 601 may, with the proper qualification, assume mastership of the data bus. A qualified data bus grant occurs when DBG is asserted while DBB, DRTRY, and ARTRY are negated.

DBB signal is driven by the current bus master, DRTRY is only driven from the bus, and ARTRY is from the bus, but only for the address bus tenure associated with the current data bus tenure (that is, not from another address tenure).

• DBWO (data bus write only)—Assertion indicates that the 601 may run the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. If DBWO is asserted, the 601 only assumes data bus mastership for a pending data bus write operation (that is, the 601 does not take the data bus for a pending read operation if this input is asserted along with DBG). Care must be taken with DBWO to ensure the desired write is queued (for example, a cache-sector snoop push-out operation).


• DBB (data bus busy)—Assertion indicates that the 601 is the data bus master. The 601 always assumes data bus mastership if it needs the data bus and is given a qualified data bus grant (see DBG).

For more detailed information on the arbitration signals, refer to Section 8.2.1, “Address Bus Arbitration Signals,” and Section 8.2.6, “Data Bus Arbitration Signals.”

9.2.2 Address Pipelining and Split-Bus Transactions The 601 protocol provides independent address and data bus capability to support pipelinedand split-bus transaction system organizations. Address pipelining allows a new bustransaction to begin before the current transaction has finished. Split-bus transactioncapability allows the address bus and data bus to have different masters at the same time.

While this capability does not inherently reduce memory latency, support for addresspipelining and split-bus transactions can greatly improve effective bus/memory throughput.For this reason, these techniques are most effective in shared-memory multiprocessorimplementations where bus bandwidth is an important measurement of systemperformance.

External arbitration is required in systems in which multiple devices must compete for thesystem bus. The design of the external arbiter affects pipelining by regulating address busgrant (BG), data bus grant (DBG), and AACK signals. For example, a one-level pipeline isenabled by asserting AACK to the current address bus master and granting mastership ofthe address bus to the next requesting master before the current data bus tenure hascompleted. Two address tenures can occur before the current data bus tenure completes.

The 601 can pipeline its own transactions to a depth of one level (intraprocessor pipelining);however, the 601 bus protocol does not constrain the maximum number of levels ofpipelining that can occur on the bus between multiple masters (interprocessor pipelining).The external arbiter must control the pipeline depth and synchronization between mastersand slaves.

In a pipelined implementation, data bus tenures are kept in strict order with respect toaddress tenures. However, external hardware can further decouple the address and databuses, allowing the data tenures to occur out of order with respect to the address tenures.This requires some form of system tag to associate the out-of-order data transaction withthe proper originating address transaction (not defined for the 601 interface). Individual busrequests and data bus grants from each processor can be used by the system to implementtags to support interprocessor, out-of-order transactions.

The 601 supports a limited intraprocessor out-of-order, split-transaction capability via thedata bus write only (DBWO) signal. For more information about using DBWO, seeSection 9.10, “Using DBWO)—(Data Bus Write Only).”


9.3 Address Bus TenureThis section describes the three phases of the address tenure—address bus arbitration,address transfer, and address termination.

9.3.1 Address Bus Arbitration When the 601 needs access to the external bus and it is not parked (BG is negated), it assertsbus request (BR) until it is granted mastership of the bus and the bus is available (seeFigure 9-4). The external arbiter must grant master-elect status to the potential master byasserting the bus grant (BG) signal. The 601 requesting the bus determines that the bus isavailable when the ABB input is negated. When the address bus is not busy (ABB input isnegated), BG is asserted and the address retry (ARTRY) input is negated. This is referredto as a qualified bus grant. The potential master assumes address bus mastership byasserting ABB when it receives a qualified bus grant.

The 601 also provides an internally generated address bus busy signal, which it logicallyORs with the ABB signal received off of the bus. This internal address bus busy signal isasserted with any TS or XATS signal and is negated with a valid AACK. This internallygenerated address bus busy signal is useful in systems that do not use ABB.

Figure 9-4. Address Bus Arbitration

External arbiters must allow only one device at a time to be address bus master. Inimplementations in which no other device can be a master, BG can be grounded (alwaysasserted) to continually grant mastership of the address bus to the 601.

-1 0 1

need_bus

BR

bg

abb

artry

qual BG

ABB

Logical Bus Clock


If the 601 asserts BR before the external arbiter asserts BG, the 601 is considered to beunparked, as shown in Figure 9-4. Figure 9-5 shows the parked case, where a qualified busgrant exists on the clock edge following a need_bus condition. Notice that the bus clockcycle required for arbitration is eliminated if the 601 is parked, reducing overall memorylatency for a transaction. The 601 always negates ABB for at least one bus clock cycle afterAACK is asserted, even if it is parked and has another transaction pending.

Typically, bus parking is provided to the device that was the most recent bus master;however, system designers may choose other schemes, such as providing unrequested busgrants in situations where it is easy to correctly predict the next device requesting busmastership.

Figure 9-5. Address Bus Arbitration Showing Bus Parking

When the 601 receives a qualified bus grant, it assumes address bus mastership by assertingABB and negating the BR output signal. Meanwhile, the 601 drives the address for therequested access onto the address bus and asserts TS to indicate the start of a newtransaction.

When designing external bus arbitration logic, note that the 601 may assert BR withoutusing the bus after it receives the qualified bus grant. For example, in a system using bussnooping, if the 601 asserts BR to perform a replacement copy-back operation, anotherdevice can invalidate that sector before the 601 is granted mastership of the bus. Once the601 is granted the bus, it no longer needs to perform the copy-back operation; therefore, the601 does not assert ABB and does not use the bus for the copy-back operation. Note thatthe 601 asserts BR for at least one clock cycle in these instances.

-1 0 1

need_bus

BR

bg

abb

artry

qual BG

ABB


9.3.2 Address Transfer During the address transfer, the physical address and all attributes of the transaction aretransferred from the bus master to the slave device(s). Snooping logic may monitor thetransfer to enforce cache coherency (see discussion about snooping in Section 9.3.3,“Address Transfer Termination”). The signals used in the address transfer include thefollowing signal groups (see Figure 8-1):

• Address transfer start signal: Transfer start (TS)

Note that extended address transfer start (XATS) is used for I/O controller interface operations and has no function for memory accesses. See Section 9.6, “Memory- vs. I/O-Mapped I/O Operations.”

• Address transfer signals: Address bus (A0–A31), address parity (AP0–AP3), and address parity error (APE)

• Address transfer attribute signals: Transfer type (TT0–TT4), transfer code (TC0–TC3), transfer size (TSIZ0–TSIZ2), transfer burst (TBST), cache inhibit (CI), write-through (WT), global (GBL), and cache set element (CSE0–CSE2)

Figure 9-6 shows that the timing for all of these signals, except TS and APE, is identical.All of the address transfer and address transfer attribute signals are combined into theADDR+ grouping in Figure 9-6. The TS signal indicates that the 601 has begun an addresstransfer and that the address and transfer attributes are valid (within the context of asynchronous bus). The 601 always asserts TS (or XATS for I/O controller interfaceoperations) coincident with ABB. As an input, TS need not coincide with the assertion ofABB on the bus (that is, either TS or XATS can be asserted with, or on, a subsequent clockcycle after ABB is asserted; the 601 tracks this transaction correctly).

Figure 9-6. Address Bus Transfer

0 1 2 3 4

qual BG

TS

ABB

ADDR+

aack

artry_in


In Figure 9-6, the address transfer occurs during bus clock cycles 1 and 2 (arbitration occursin bus clock cycle 0 and the address transfer is terminated in bus clock 3). In this diagram,the address bus termination input, AACK, is asserted to the 601 on the bus clock followingassertion of TS (as shown by the dependency line). This is the minimum duration of theaddress transfer for the 601; the duration can be extended by delaying the assertion ofAACK for one or more bus clocks.

9.3.2.1 Address Bus Parity The 601 always generates one bit of correct odd-byte parity for each of the four bytes ofaddress when a valid address is on the bus. The calculated values are placed on the AP0–AP3 outputs when the 601 is the address bus master. If the 601 is not the master and TS andGBL are asserted together (qualified condition for snooping memory operations), thecalculated values are compared with the AP0–AP3 inputs. If there is an error, the APEoutput is asserted. An address bus parity error causes a checkstop condition if the bus paritycheckstop source is enabled in HID0. For more information, see Chapter 5, “Exceptions.”

9.3.2.2 Address Transfer Attribute SignalsThe transfer attribute signals include several encoded signals such as the transfer type(TT0–TT4) signals, transfer burst (TBST) signal, transfer size (TSIZ0–TSIZ2) signals, andtransfer code (TC0–TC1) signals. Section 8.2.4, “Address Transfer Attribute Signals,”describes the encodings for the address transfer attribute signals. Note that TT0–TT4,TBST, and TSIZ0–TSIZ2 have alternate functions for I/O controller interface operations(see Section 9.6, “Memory- vs. I/O-Mapped I/O Operations).”

9.3.2.2.1 Transfer Type (TT0–TT4) SignalsSnooping logic should fully decode the transfer type signals if the GBL signal is asserted.Slave devices can use the individual transfer type signals without fully decoding the group.The transfer type signals generally have the following individual functions:

• TT0—Special operations: This signal is asserted by the 601 whenever a bus transaction occurs in response to a lwarx/stwcx. (Load Word and Reserve Indexed/Store Word Conditional Indexed) instruction pair (see Chapter 3, “Addressing Modes and Instruction Set Summary”), an eciwx or ecowx instruction, or for a Translation Lookaside Buffer Invalidate Entry (tlbie) operation.

• TT1—Read (/write) operations: The TT1 signal indicates whether the transaction is a read (TT1 high) or a write (TT1 low) transaction. This is valid for transactions that are not address only.

• TT2—Invalidate operations: When asserted with GBL, the TT2 output signal indicates that all other caches in the system should invalidate the cache entry on a snoop hit. If the snoop hit is to a modified entry, the sector should be copied back before being invalidated.

• TT3—Memory (/address-only) operations: Except for eciwx or ecowx instructions (TT0–TT3 encodings 1010 or 1110) the TT3 signal, when asserted, indicates that the associated data transfer is to/from memory and that use of the data bus will be


required in order for the transaction to complete. External logic can synthesize the data bus request from the combination of TS (or XATS) and TT3 (DBR=TS&TT3). If TT3 is not asserted with the address, the associated bus transaction is considered to be a broadcast operation that all bus participants must honor (or a reserved operation). This is an address-only transaction; the 601 does not need and will not acquire data bus ownership, even if it receives a qualified data bus grant. Figure 9-7 shows an address-only transaction. On the start of bus cycle 2, TT3 is not asserted; therefore, the data bus will not be needed.

• TT4—The TT4 signal is reserved for future expansion.

Figure 9-7. Address-Only Bus Transaction

9.3.2.2.2 Transfer Size (TSIZ0–TSIZ2) SignalsThe transfer size signals (TSIZ0–TSIZ2) indicate the size of the requested data transfer asshown in Table 9-1. The TSIZ0–TSIZ2 signals may be used along with TBST and A29–A31 to determine which portion of the data bus contains valid data for a write transactionor which portion of the bus should contain valid data for a read transaction. Note that for aburst transaction (as indicated by the assertion of TBST) TSIZ0–TSIZ2 are always set tob'010'. Therefore, if the TBST signal is asserted, the memory system should transfer a totalof eight words (32 bytes), regardless of the TSIZ0–TSIZ2 encoding.

0 1 2 3 4

bg

TS

ABB

ADDR+

TT3

aack

artry_in


The basic coherency size of the bus is defined to be 32 bytes (corresponding to one cachesector). Data transfers that cross an aligned, 32-byte boundary either must present a newaddress onto the bus at that boundary (for coherency consideration) or must operate asnoncoherent data with respect to the 601.

9.3.2.3 Effect of Alignment in Data TransfersTable 9-2 lists the aligned transfers that can occur on the 601 bus. These are transfers inwhich the data is aligned to an address that is an integer multiple of the size of the data. Forexample, Table 9-2 shows that one-byte data is always aligned; however, for a four-byteword to be aligned, it must be oriented on an address that is a multiple of four.

Table 9-1. Transfer Size Signal Encodings

TBST TSIZ0 TSIZ1 TSIZ2 Transfer Size

Asserted 0 1 0 Eight-word burst

Negated 0 0 0 Eight bytes

Negated 0 0 1 One byte

Negated 0 1 0 Two bytes

Negated 0 1 1 Three bytes

Negated 1 0 0 Four bytes

Negated 1 0 1 Five bytes

Negated 1 1 0 Six bytes

Negated 1 1 1 Seven bytes


Notes:

√ The byte portions of the requested operand that are read or written during that bus transaction.

— These entries are not required and are ignored during read transactions and are driven with undefined data during all write transactions (except noncacheable write transfers, in which data is mirrored on both word lanes if the transfer does not exceed four bytes).

Data bus byte lane 0 corresponds to DH0-DH7, byte lane 7 corresponds to DL24-DL31.

The 601 also supports misaligned memory operations. These transfers address memory thatis not aligned to the size of the data being transferred (such as, a word read of an odd byteaddress). Although most of these operations hit in the primary cache (or generate burstmemory operations if they miss), the 601 interface supports misaligned transfers within adouble-word (64-bit aligned) boundary, as shown in Table 9-3. Note that the three-bytetransfer in Table 9-3 is only one example of misalignment. As long as the attempted transferdoes not cross a double-word boundary, the 601 can transfer the data on the misalignedaddress (for example, a word read from an odd byte-aligned address, or a seven-byte readfrom an odd byte-aligned address).

An attempt to address data that crosses a double-word boundary requires two bus transfersto access the data. This is illustrated in the last example of a three-byte transfer in Table 9-3.The transfer requires two accesses—the first for the last two bytes of one double-word

Table 9-2. Aligned Data Transfers

Transfer SizeData Bus Byte Lane(s)

TSIZ0 TSIZ1 TSIZ2 A29–A31 0 1 2 3 4 5 6 7

Byte 0 0 1 000 √ — — — — — — —

0 0 1 001 — √ — — — — — —

0 0 1 010 — — √ — — — — —

0 0 1 011 — — — √ — — — —

0 0 1 100 — — — — √ — — —

0 0 1 101 — — — — — √ — —

0 0 1 110 — — — — — — √ —

0 0 1 111 — — — — — — — √

Half word 0 1 0 000 √ √ — — — — — —

0 1 0 010 — — √ √ — — — —

0 1 0 100 — — — — √ √ — —

0 1 0 110 — — — — — — √ √

Word 1 0 0 000 √ √ √ √ — — — —

1 0 0 100 — — — — √ √ √ √

Double word 0 0 0 000 √ √ √ √ √ √ √ √


address, the second for one byte from the next double-word address. The TBST, TSIZ0–TSIZ2, and A29–A31 signals provide enough information to determine the size of thetransfer and the data bus byte lanes involved in the misaligned transfer.

Although misaligned transfers are supported, they may degrade performance substantially.In addition to the double-word straddle boundary condition, the address translation logiccan generate substantial exception overhead when the microcoded, sequenced, load/storemultiple and load/store string instructions access misaligned data. It is stronglyrecommended that software attempt to align code and data where possible.

9.3.2.3.1 Alignment of External Control InstructionsThe size of the data transfer associated with the eciwx and ecowx instructions is always fourbytes. However, if the eciwx or ecowx instruction is unaligned and crosses a double-wordboundary, the 601 will generate two bus operations, each with a size of fewer than fourbytes. For the first bus operation, bits A29–A31 will equal bits 29–31 of the effectiveaddress of the instruction, which will be b'101', b'110', or b'111'. The size associated withthe first bus operation will be 3, 2, or 1 bytes, respectively. For the second bus operation,bits A29–A31 will equal b'000', and the size associated with the operation will be 1, 2, or 3bytes, respectively. For both operations, TSIZ0–TSIZ2 equal bits 29–31 of the EAR, notthe size. The size of the second bus operation cannot be deduced from the operation itself;

Table 9-3. Misaligned Data Transfer (Three-Byte Examples)

Transfer SizeData Bus Byte Lanes

TSIZ(0–2) A29–A31 0 1 2 3 4 5 6 7

Three bytes 011 000 A A A — — — — —

011 001 — A A A — — — —

011 010 — — A A A — — —

011 011 — — — A A A — —

011 100 — — — — A A A —

011 101 — — — — — A A A

First transfer:two bytes

010 110 — — — — — — A A

Second transfer: one byte

001 000 A — — — — — — —

First transfer: one byte

001 111 — — — — — — — A

Second transfer: two bytes

010 000 A A — — — — —

A: Byte lane used—: Byte lane not used


the system must determine how many bytes were transferred on the first bus operation todetermine the size of the second operation.

Furthermore, the two bus operations associated with such an unaligned external controlinstruction are not atomic. That is, the 601 may initiate other types of memory operationsbetween the two transfers. Also, the two bus operations associated with an unaligned ecowxmay be interrupted by an eciwx bus operation, and vice versa. The 601 does guarantee thatthe two operations associated with an unaligned ecowx will not be interrupted by anotherecowx operation; and likewise for eciwx.

Because an unaligned external control address is considered a programming error, thesystem may choose to assert TEA or otherwise cause an exception when an unalignedexternal control bus operation occurs.

9.3.2.4 Transfer Code (TC0–TC1) SignalsThe TC0 and TC1 signals provide supplemental information about the correspondingaddress. Note that the TCx signals can be used with the TT0–TT4 and TBST signals tofurther define the current transaction. These encodings may be useful for debugging.

The meaning of TC0 depends on whether the current transaction is a read or writeoperation. On a read operation, TC0 asserted indicates that the transaction is an instructionfetch operation; otherwise, the read operation is a data operation. On a 601 write operation,TC0 asserted indicates that the associated sector is invalidated for a copy-back replacement,a Data Cache Block Flush instruction (dcbf), or snoop that causes invalidation (for examplea flush or kill). TC0 negated indicates the write is not invalidating any cache sector (forexample, write-through or cache-inhibited write operations.)

The TC1 signal is asserted on read and RWITM operations to indicate that a low-priorityoperation to load the sector adjacent to one that was previously loaded due to a cache missis queued; therefore, the next bus transaction will likely access the same page of memory.This operation may not be the next transaction if, for instance, a copy-back operation thatresulted from a snoop hit is required. Note that TC1 asserted indicates to the memorysystem the likelihood that the next access is on the same page, but it does not guarantee thiswill occur because of transfer priorities and the bus traffic/code execution dynamics. TC1is negated for all write operations on the bus.

Table 9-4 shows the encodings of the TC0 and TC1 signals.


9.3.3 Address Transfer TerminationThe 601 does not terminate the address transfer until the AACK (address acknowledge)input is asserted; therefore, the system can extend the address transfer phase by delayingthe assertion of AACK to the 601. Although AACK can be asserted as early as the bus clockcycle following TS (see Figure 9-8), to support snooping, address transfers require at leastthree bus clock cycles to negate and tristate the shared ARTRY and SHD signals with nocontention between devices. As shown in Figure 9-8, these signals are asserted for one busclock cycle, tristated for the next bus clock cycle, driven high for the next 2X_PCLK cycletime, and finally tristated. Note that AACK is asserted for only one bus clock cycle.

Note that precharging of the ARTRY and SHD signals during the negation period can bedisabled by enabling HID0[29]. After ARTRY and SHD are asserted, they will be three-stated for two bus cycles and the system is responsible for precharging both ARTRY andSHD signals. This allows masters in a system that uses both 3.6-V and 5-V levels to use thesame system bus.

The address transfer can be terminated with the requirement to retry if ARTRY is assertedduring the bus clock cycle following AACK. If ARTRY is asserted in this window, the 601negates BR in the following bus clock cycle; after that, it attempts to retry the addresstransfer. By delaying the bus request by one bus clock cycle, the protocol provides anopportunity for the snooping device that asserted the ARTRY to access the bus next, andtherefore retry determinacy is possible. In order for the retry determinacy to be guaranteed,however, the external bus arbitration logic must ensure that the snooping device is grantedthe bus next.

Table 9-4. Transfer Code Signal Encodings

Signal State Definition

TC0 Asserted Bus operation is an instruction fetchWrite: Operation is invalidating the cache line in the 601.Kill (address only): Operation is invalidating the cache line in the 601.

Negated Bus operation is a data readWrite: Operation is not invalidating the cache line in the 601.Kill (address only): Operation is not invalidating the cache line.

TC1 Asserted The next access is likely to be on same page. A sector has been loaded, and a low-priority load of the adjacent sector is queued.

Negated The next access is not likely to be on the next page; an optional low-priority load of an adjacent sector is not queued.


The only valid window for the ARTRY input is the one bus clock cycle following theassertion of AACK. Snooping devices must monitor the assertion of AACK to know whento deassert/tristate ARTRY, as shown in Figure 9-8. The assertion of ARTRY/SHD can bederived in one of the following ways:

• ARTRY/SHD can be asserted on the second clock after TS is asserted.

• ARTRY/SHD can be asserted before AACK is asserted, but is not qualified by the master 601 until the clock after AACK is asserted.

The 601 requires that the first (or only) TA not be asserted before AACK (note that TA canbe held off directly by the slave device delaying AACK assertion or indirectly by anexternal arbiter delaying DBG assertion). This requirement guarantees the relationshipbetween TA and ARTRY/SHD such that, in the case of an address retry, the 601 can purgethe data/instructions from its data path queues and waive off the data/instructions beforethey are forwarded to the cache/CPU.

Figure 9-8. Snooped Address Cycle with ARTRY

When the data tenure begins before the address tenure is complete, if the 601 has assertedDBB, assertion of ARTRY causes the 601 to terminate the data bus transaction and retryboth the address and data tenures later. If the transfer is a single-beat transfer and TA occursas early as the AACK window, there is no indication of an early data bus termination.However, if a burst transaction is in progress, the 601 negates DBB early in response toARTRY. The system logic does not need to assert TA for four bus clock cycles in this case.

1 2 3 4 5 6 7

ts

abb

addr

aack

ARTRY

SHD

qualBG

ABB


If DBG is not asserted until the ARTRY window and ARTRY is asserted, the 601 does notbecome data bus master. Note that some system designs, such as single-master systems, donot require the use of ARTRY.

For information about ARTRY scenarios, see Section 9.3.3.1, “Address Retry Sources.”For information about MESI protocol and its effect on address tenure termination, refer toSection 9.4.4, “Memory Coherency—MESI Protocol.”

9.3.3.1 Address Retry Sources The assertion of the SHD and ARTRY input signals provide sufficient information for theappropriate handling of cache sector coherency. They encode information about atransaction, as shown in Table 9-5.

If the SHD and ARTRY inputs are not asserted for a cache-sector fill operation, the sectoris marked as exclusive (see Section 9.4.4, “Memory Coherency—MESI Protocol”). If theSHD input is asserted without ARTRY, the sector is marked as shared.

Note: If the invalidate (TT2) output signal is asserted for the transaction, the sector ismarked exclusive regardless of the state of the SHD signal. If ARTRY is asserted withoutSHD, a device cannot service the address transaction currently (because of queuingconstraints) and the transaction is retried later. The 601 reacts to the assertion of ARTRYthe same way, regardless of the state of SHD. The timing of the SHD input is the same asthe timing for ARTRY.

One or more devices can indicate a queuing retry condition by asserting ARTRY while oneor more devices separately indicate the snoop-hit shared condition by asserting SHD. Thiscondition appears as a snoop hit modified condition on the bus, since both SHD andARTRY are asserted. This is not a problem for the 601 since ARTRY is not qualified bySHD (that is, SHD is a don't care if ARTRY is asserted to the 601).

9.4 Data Bus TenureThis section describes the data bus arbitration, transfer, and termination phases defined bythe 601 memory access protocol. The phases of the data tenure are identical to those of theaddress tenure, underscoring the symmetry in the control of the two buses.

Table 9-5. Address Retry Causes

SHD ARTRY Definition

High impedance High impedance Exclusive. No snoop hit. Pipeline not busy.

High impedance Asserted Pipeline busy. Queuing retry.

Asserted High impedance Snoop hit (shared).

Asserted Asserted Snoop hit (modified).


9.4.1 Data Bus ArbitrationData bus arbitration uses the data arbitration signal group, that is, DBG, DBWO, and DBB.Additionally, the combination of TS or XATS and TT3 (address-only signal) function as adata bus request.

The TS signal is an implied data bus request from the 601; the arbiter must qualify TS withthe transfer type (TT) encodings to determine if the current address transfer is an address-only operation, which does not require a data bus transfer (see Figure 9-7). If the data busis needed, the arbiter grants data bus mastership by asserting the DBG input to the 601. Aswith the address-bus arbitration phase, the 601 must qualify the DBG input with a numberof input signals before assuming bus mastership, as shown in Figure 9-9.

Figure 9-9. Data Bus Arbitration

A qualified data bus grant can be expressed as the following:

QDBG = DBG asserted while DBB, DRTRY, and ARTRY (associated with the data bus operation) are negated.

When a data tenure overlaps with its associated address tenure, a qualified ARTRYassertion coincident with a data bus grant does not result in data bus mastership (DBB isnot asserted). Otherwise, the 601 always asserts DBB on the bus clock cycle afterrecognition of a qualified data bus grant. Since the 601 can pipeline transactions, there maybe an outstanding data bus transaction when a new address transaction is retried. In thiscase, the 601 becomes the data bus master to complete the previous transaction.

9.4.1.1 Using the DBB SignalThe DBB signal should be connected between masters only if data tenure hand-off is leftto the masters. The memory system can control data hand-off directly with DBG.

0 1 2 3

TS

dbg

dbb

drtry

qual DBG

DBB


The 601 asserts DBB throughout the data transaction; however, the 601 does not park thedata bus and assert DBB across multiple transactions. DBB is negated on the bus clockcycle after a final TA is received from the bus.

9.4.2 Data TransferThe data transfer signals include DH0–DH31, DL0–DL31, DP0–DP7 and DPE. Formemory accesses, the DH and DL signals form a 64-bit data path for read and writeoperations.

The 601 transfers data in either single- or four-beat burst transfers. Single-beat operationscan transfer from one to eight bytes at a time and can be misaligned (see Section 9.3.2.3,“Effect of Alignment in Data Transfers”). Burst operations always transfer eight words andare aligned to four- or eight-word address boundaries. Burst transfers can achievesignificantly higher bus throughput than single-beat operations.

The type of transaction initiated by the 601 depends on whether the code or data iscacheable and, for store operations, whether the cache is operated in write-back or write-through mode which software controls at either the page or block basis. Burst transferssupport cacheable operations only; that is, memory structures must be marked as cacheable(and write-back for data store operations) in the respective TLB entry to take advantage ofburst transfers.

The 601 output TBST indicates to the system whether the current transaction is a single- orfour-beat transfer. A burst transfer has an assumed address order. For load or storeoperations that miss in the cache (and are marked as cacheable and, for stores, write-backin the MMU), the 601 presents the quad-word–aligned address associated with the criticalcode or data that initiated the transaction. This minimizes latency by allowing the criticalcode or data to be forwarded to the processor before the rest of the sector is filled. For allother burst operations, however, the sector is transferred beginning with the oct-wordaligned data. Note that this difference can complicate cache-to-cache implementations.

The 601 does not directly support interfacing to subsystems with less than a 64-bit data path(except for I/O controller interface operations, which are discussed in Section 9.6,“Memory- vs. I/O-Mapped I/O Operations”). However, the 601 duplicates, or mirrors, thetransfer data on the unused word lane, for store operations to pages marked asnoncacheable. This means, for example, that for a noncacheable byte store operation, thevalid byte is present on two byte lanes—one in the upper word and one in the lower word.For a word store operation, the word is mirrored across both word lanes. Unused byte lanesare undefined.

The data is not mirrored, however, for other store operations (including write-through). Acache hit causes the double word of data containing the data being transferred to be outputon the data bus lanes.


CAUTION While this information may be useful to some applications thatdo not cache data structures, data mirroring may not besupported on future versions of the 601 or other PowerPCprocessors.

9.4.3 Data Transfer TerminationFour signals are used to terminate data bus transactions: TA, DRTRY (data retry), TEA(transfer error acknowledge), and in some cases ARTRY. The TA signal indicates normaltermination of data transactions. DRTRY indicates invalid read data in the previous busclock cycle. TEA indicates a nonrecoverable bus error event.

ARTRY can also terminate a data bus transaction. For burst transactions, this ARTRY mustoccur no later than the cycle of the second TA. For single-beat transactions, it must occurno later than the cycle following TA. In either case, the ARTRY must be for the addressbus tenure associated with the data bus tenure.

9.4.3.1 Normal Single-Beat TerminationNormal termination of a single-beat data read operation occurs when TA is asserted by aresponding slave. The TEA and DRTRY signals must remain negated during the transfer(see Figure 9-10).

Figure 9-10. Normal Single-Beat Read Termination

Normal termination of a single-beat data write transaction occurs when TA is asserted bya responding slave. TEA must remain negated during the transfer. The DRTRY signal is not

0 1 2 3 4

TS

qual DBG

DBB

data

ta

drtry

TT2

AACK


sampled during data writes, as shown in Figure 9-11. As shown in both Figure 9-10 andFigure 9-11, the TT1 signal driven low by the 601 indicates a write is in progress.

Figure 9-11. Normal Single-Beat Write Termination

Normal termination of a burst transfer occurs when TA is asserted during four bus clockcycles, as shown in Figure 9-12. The bus clock cycles need not be consecutive, thusallowing pacing of the data transfer beats. For read bursts to terminate successfully, TEAand DRTRY must remain negated during the transfer. For write bursts, TEA must remainnegated during the transfer. DRTRY is ignored during data writes.

Figure 9-12. Normal Burst Transaction

0 1 2 3

TS

qual DBG

DBB

data

ta

drtry

TT2

AACK

1 2 3 4 5 6 7

TS

qual DBG

DBB

data

ta

drtry


For read bursts, DRTRY may be asserted one bus clock cycle after TA is asserted to signalthat the data presented with TA is invalid and that the processor must wait for the negationof DRTRY before forwarding data to the processor (see Figure 9-13). Thus, a data beat canbe speculatively terminated with TA and then one bus clock cycle later confirmed with thenegation of DRTRY. The DRTRY signal is valid only for read transactions. TA must beasserted on the bus clock cycle before the first bus clock cycle of the assertion of DRTRY;otherwise the results are undefined.

The DRTRY signal extends data bus mastership such that other processors cannot use thedata bus until DRTRY is negated. Therefore, in the example in Figure 9-13, DBB cannotbe asserted until bus clock cycle 5. This is true for both read and write operations eventhough DRTRY does not hold the master on write operations.

Figure 9-13. Termination with DRTRY

Figure 9-14 shows the effect of using DRTRY during a burst read. It also shows the effectof using TA to pace the data transfer rate. Notice that in bus clock cycle 3 of Figure 9-14,TA is negated for the second data beat. The 601 data pipeline does not proceed until busclock cycle 4 when the TA is reasserted.

Note that DRTRY is useful for systems that implement speculative forwarding of data suchas those with direct-mapped, second-level caches where hit/miss is determined on thefollowing bus clock cycle, or for parity- or ECC-checked memory systems.

Note that DRTRY may not be implemented on other PowerPC processors.

9.4.3.2 Data Transfer Termination Due to a Bus ErrorThe TEA signal indicates that a bus error occurred. It may be asserted while DBB (and/orDRTRY for read operations) is asserted. Asserting TEA to the 601 terminates thetransaction; that is, further assertions of TA and DRTRY are ignored and DBB is negated.

1 2 3 4 5

TS

qual DBG

DBB

data

ta

drtry


Figure 9-14. Read Burst with TA Wait States and DRTRY

Assertion of the TEA signal causes a machine-check exception (and possibly a check-stopcondition within the 601). For more information, see Section 5.4.2, “Machine CheckException (x'00200').” However assertion of TEA does not invalidate data entering the GPRor the cache; therefore, the 601 may act on invalid code/data (although the exception willeventually be recognized, if enabled). Additionally, the corresponding address of the accessthat caused TEA to be asserted is not latched by the 601. To recover, the 601 must be reset;therefore, this function should only be used to flag fatal system conditions to the processor(such as parity or uncorrectable ECC errors).

After the 601 has committed to run a transaction, that transaction must eventually complete.Address retry causes the transaction to be restarted; TA wait states and DRTRY assertionfor reads delay termination of individual data beats. Eventually, however, the system musteither terminate the transaction or assert the TEA signal to put the 601 into checkstop mode.For this reason, care must be taken to check for the end of physical memory and the locationof certain system facilities.

Note that TEA generates a machine-check exception depending on the ME bit in the MSR.Setting the checkstop enable control bits properly leads to a true checkstop condition.

Note also that the 601 does not implement a synchronous error capability for memoryaccesses (see Section 9.6, “Memory- vs. I/O-Mapped I/O Operations”). This means that theexception instruction pointer does not point to the memory operation that caused theassertion of TEA, but to the instruction about to be executed (perhaps several instructionslater).

TS

qual DBG

DBB

data

ta

drtry

1 2 3 4 5 6 7 8 9


9.4.4 Memory Coherency—MESI ProtocolThe 601 provides dedicated hardware to provide memory coherency by snooping bustransactions. The address retry capability enforces the four-state, MESI cache-coherencyprotocol (see Figure 9-15). In addition to the hardware required to monitor bus traffic forcoherency, the 601 has a cache port dedicated to snooping so that comparing cache entriesto address traffic on the bus does not tie up the 601's on-chip cache.

The global (GBL) signal output, indicates whether the current transaction must be snoopedby other snooping devices on the bus. Address bus masters assert GBL to indicate that thecurrent transaction is a global access (that is, an access to memory shared by more than oneprocessor/cache). If GBL is not asserted for the transaction, that transaction is not snooped.When other devices detect the GBL input asserted, they must respond by snooping thebroadcast address.

Normally, GBL reflects the M-bit value specified for the memory reference in thecorresponding translation descriptor(s). Note that care must be taken to minimize thenumber of pages marked as global, because the retry protocol discussed in the previoussection is used to enforce coherency and can require significant bus bandwidth.

When the 601 is not the address bus master, GBL is an input. The 601 snoops a transactionif TS and GBL are asserted together in the same bus clock cycle (this is a qualified snoopingcondition). No snoop update to the 601 cache occurs if the snooped transaction is notmarked global. This includes invalidation cycles.

When the 601 detects a qualified snoop condition, the address associated with the TS iscompared against the unified cache tags through a dedicated cache-tag port. Snoopingcompletes if no hit is detected. If, however, the address hits in the cache, the 601 reactsaccording to the MESI protocol shown in Figure 9-15, assuming the WIM bits are set towrite-back mode, caching allowed, and coherency enforced (WIM = 001).

Note that write hits to clean lines of nonglobal pages do not generate invalidate broadcasts.There are several types of bus transactions that involve the movement of data that can nolonger access the TLB M-bit (for example, replacement sector copy-back, snoop push, andtable-search operations). In these cases, the hardware cannot determine whether the sectorwas originally marked global; therefore, the 601 marks these transactions as nonglobal toavoid retry deadlocks.

The 601's on-chip cache is implemented as an eight-way set-associative cache. To facilitateexternal monitoring of the internal cache tags, the cache set element (CSE0–CSE2) signalsindicate which sector of the cache set is being replaced on read operations (includingRWITM). Note that these signals are valid only for 601 burst operations; for all other busoperations, the CSE signals should be ignored.


Figure 9-15. MESI Cache Coherency Protocol—State Diagram (WIM = 001)

Table 9-6 shows the CSE encodings.

Table 9-6. CSE(0–2) Signals

CSE0–CSE2 Cache Set Element

000 Set 0

001 Set 1

010 Set 2

011 Set 3

100 Set 4

101 Set 5

110 Set 6

111 Set 7

SHARED

SHR

RH

RH

EXCLUSIVE

SHW

RMS

SH

R

SHWSHR

RME WH

WH

WH

RH

MODIFIED

SH

W

SH

W(b

urst

)

INVALID(On a miss, the old

line is first invalidated and copied back

if M)

WM

BUS TRANSACTIONS

RH = Read Hit = Snoop PushRMS = Read Miss, SharedRME = Read Miss, Exclusive = Invalidate TransactionWH = Write HitWM = Write Miss = Read-with-Intent-to-Modify

SHR = Snoop Hit on a ReadSHW = Snoop Hit on a Write or = Cache Sector Fill

Read-with-Intent-to-Modify


9.5 Timing ExamplesThis section shows timing diagrams for various scenarios. Figure 9-16 illustrates the fastestsingle-beat reads. This figure shows both minimal latency and maximum single-beatthroughput. By delaying the data bus tenure, the latency increases, but, because of split-transaction pipelining, the overall throughput is not affected unless the data bus latencycauses the third address tenure to be delayed.

Note that all bidirectional signals go to high-impedance between bus tenures.

Figure 9-16. Fastest Single-Beat Reads

BR

BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

DBG

DBB

D0–D63

TA

DRTRY

TEA

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

CPU A CPU A CPU A

Read Read Read

In In In


Figure 9-17 illustrates the fastest single-beat writes. Note that all bidirectional signals go tohigh-impedance between bus tenures. TT1–TT3 are binary encoded b'x001'. TT0 can beeither 0 or 1, TT1 and TT2 are 0, and TT3 is 1.

Figure 9-17. Fastest Single-Beat Writes

Figure 9-18 shows three ways to delay single-beat reads showing data-delay controls:

• The TA hold-off can be used to insert wait states in clock cycles 3 and 4.• For the second access, DBG could have been asserted in clock cycle 6.• In the third access, DRTRY is asserted in clock cycle 11 to flush the previous data.

BR

BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

DBG

DBB

D0–D63

TA

DRTRY

TEA

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

CPU A CPU A CPU A

SBW SBW SBW

Out Out Out


Note that all bidirectional signals go to high-impedance between bus tenures. Thepipelining shown in Figure 9-18 can occur if the second access is not another load, (forexample, an instruction fetch).

Figure 9-18. Single-Beat Reads Showing Data-Delay Controls

BR

BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

DBG

DBB

D0–D63

TA

DRTRY

TEA

CPU A CPU A CPU A

Read Read Read

In In Bad

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

In


Figure 9-19 shows data-delay controls in a single-beat write. Note that all bidirectionalsignals are set to high impedance between bus tenures. Data transfers are delayed in thefollowing ways:

• The TA holdoff is used to insert wait states in clocks 3 and 4.• In clock 6, DBG is held negated, delaying the start of the data tenure.

The last access is not delayed (DRTRY is valid only for read operations).

Figure 9-19. Single-Beat Writes Showing Data Delay Controls

BR

BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

DBG

DBB

D0–D63

TA

DRTRY

TEA

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

CPU A CPU A CPU A

SBW SBW SBW

Out Out Out


Figure 9-20 shows three single-beat transfers back-to-back. Note that all bidirectionalsignals are set at high-impedance state between tenures.

Figure 9-20. Back-to-Back Single-Beat Transfers

CPU A BR

CPU A BG

CPU B BR

CPU B BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

CPU A DBG

CPU B DBG

DBB

D0–D63

TA

DRTRY

TEA

1 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7 8 9 10 11 12

CPU A CPU B CPU A

Read SBW Read

In Out In


Figure 9-21 shows the use of data-delay controls with burst transfers. Note that allbidirectional signals are set to high impedance between bus tenures. Note the following:

• The first data beat of bursted read data (clock 0) is the critical quad word.• The write burst shows the use of TA holdoff on the third data beat.• The final read burst shows the use of DRTRY on the third data beat.• The address for the third transfer is held off until the first transfer completes.

Figure 9-21. Burst Transfers with Data Delay Controls

BR

BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

DBG

DBB

D0–D63

TA

DRTRY

TEA

CPU A

In 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

CPU A CPU A

Read Write Read

In 1 In 2 In 3 Out 0 Out 1 Out 2 Out 3 In 0 In 1 In 2 In 3In 2


Figure 9-22 shows the use of the TEA signal. Note that all bidirectional signals are set tohigh impedance between bus tenures. Note the following:

• The first data beat of the read burst (in clock 0) is the critical quad-word.

• The TEA signal truncates the burst write transfer on the third data beat.

• The 601 eventually interrupts on the TEA event.

Figure 9-22. Use of Transfer Error Acknowledge (TEA)

BR

BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

DBG

DBB

D0–D63

TA

DRTRY

TEA

CPU A

In 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

CPU A CPU A

Read Write Read

In 1 In 2 In 3 Out 0 Out 1 Out 2 In 0 In 1 In 3In 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17


9.6 Memory- vs. I/O-Mapped I/O OperationsThe 601 defines separate memory and I/O address spaces, or segments, distinguished by thesegment register T-bit in the address translation logic of the 601. If the T-bit is cleared, thememory reference is a normal memory access and can use the virtual memory managementhardware of the 601. For highest performance, it is recommended that I/O devices bemapped through the memory interface.

However, if the T-bit is set, the external reference is to an I/O controller interface area. Thismapping is provided to implement the I/O controller interface protocol for compatibilitywith certain devices that respond to it.

The following points should be considered for I/O controller interface accesses:

• I/O controller interface accesses must be strongly ordered; for example, these accesses must run on the bus strictly in order with respect to the instruction stream.

• I/O controller interface accesses must provide synchronous error reporting. Chapter 4, “Cache and Memory Unit Operation,” describes architectural aspects of I/O controller interface segments, as well as an overview of the PowerPC architecture’s segmented address space management.

The 601 defines two types of I/O controller interface segments (segment register T-bit set)based on the value of the bus unit ID (BUID), as follows:

• I/O controller interface (BUID ≠ x'07F')—I/O controller interface accesses include all transactions between the 601 and subsystems (referred to as bus unit controllers (BUCs) mapped through I/O controller interface address space).

• Memory-forced I/O controller interface (BUID = x'07F')—Memory-forced I/O controller interface operations access memory space. They do not use the extensions to the memory protocol described for I/O controller interface accesses, and they bypass the page- and block-translation and protection mechanisms. The physical address is found by concatenating bits 28–31 of the respective segment register with bits 4–31 of the effective address. This address is marked as noncacheable, write-through, and global.

Because memory-forced I/O controller interface accesses address memory space, they are subject to the same coherency control as other memory reference operations. More generally, accesses to memory-forced I/O controller interface segments are considered to be cache-inhibited, write-through and memory-coherent operations with respect to the 601 cache and bus interface.

See Section 9.6.2, “I/O Controller Interface Transaction Protocol Details,” for moreinformation about the BUID.

The 601 has a single bus interface to support accesses to both memory accesses and I/Ocontroller interface segment accesses.


The system recognizes the assertion of the TS signal as the start of a memory access. Theassertion of XATS indicates an I/O controller interface access. This allows memory devicesto ignore I/O controller interface transactions. If XATS is asserted, the access is to I/Ospace and the following extensions to the memory access protocol apply:

• A new set of bus operations are defined. The transfer type, transfer burst, and transfer size signals are redefined for I/O controller interface operations; they convey the opcode for the I/O transaction (see Table 9-7).

• There are two beats of address for each I/O controller interface transfer. The first beat (packet 0) provides basic address information such as the segment register and the sender tag and several control bits; the second beat (packet 1) provides additional addressing bits from the segment register and the logical address.

• Explicit sender/receiver tags are provided.

• The sender that initiated the transaction must wait for a reply from the receiver bus-unit controller (BUC) before starting a new operation.

• The 601 does not burst I/O controller interface transactions, but streaming is permitted. Streaming (in this context) allows multiple single-beat transactions to occur before a reply from the I/O receiver is required.

I/O controller interface transactions use separate arbitration for the split address and databuses and define address-only and single-beat transactions. The address-retry vehicle isidentical, although there is no hardware coherency support for I/O controller interfacetransactions. ARTRY is useful, however, for pacing 601 transactions, effectively indicatingto the 601 that the BUC is in a queue-full condition and cannot accept new data.

In addition to the extensions noted above, there are fundamental differences betweenmemory and I/O controller interface operations. For example, use of DRTRY is undefinedfor 601 I/O controller interface operations. Additionally, only half of the 64-bit data path isavailable for 601 I/O controller interface transactions. This lowers the pin-count for I/Ointerfaces but generally results in substantially less bandwidth than memory accesses.Additionally, load/store instructions that address I/O controller interface segments cannotcomplete successfully without an error-free reply from the addressed BUC. Becausenormal I/O controller interface accesses involve multiple I/O transactions (streaming), theyare likely to be very long latency instructions; therefore, I/O controller interface operationsusually stall 601 instruction issue.

Figure 9-23 shows an I/O controller interface tenure. Note that the I/O response is anaddress-only bus transaction.

The decision on whether to map I/O peripherals into memory or I/O controller interfacespace depends on many factors; however, it should be noted that in the best case, the use ofthe 601 I/O controller interface protocol degrades performance and requires the addressedcontrollers to implement 601 bus master capability to generate the reply transactions.


Figure 9-23. I/O Controller Interface Tenures

9.6.1 I/O Controller Interface TransactionsSeven I/O controller interface transaction operations are defined by the 601, as shown inTable 9-7. These operations permit communication between the 601 and BUCs. A single601 store or load instruction (that translates to an I/O controller interface access) generatesone or more I/O controller interface operations (two or more I/O controller interfaceoperations for loads) from the 601 and one reply operation from the addressed BUC.

For the first beat of the address bus, the extended address transfer code (XATC), containsthe I/O opcode as shown in Table 9-7; the opcode is formed by concatenating the transfertype, transfer burst, and transfer size signals defined as follows:

XATC = TT(0–3)||TBST||TSIZ(0–2)

Table 9-7. I/O Controller Interface Bus Operations

Operation Address Only Direction XATC Encoding

Load start (request) Yes 601 ⇒ IO 0100 0000

Load immediate No 601 ⇒ IO 0101 0000

Load last No 601 ⇒ IO 0111 0000

Store immediate No 601 ⇒ IO 0001 0000

Store last No 601 ⇒ IO 0011 0000

Load reply Yes IO ⇒ 601 1100 0000

Store reply Yes IO ⇒ 601 1000 0000


ADDRESS TENURE

DATA TENURE

INDEPENDENT ADDRESS AND DATA


I/O RESPONSE

ARBITRATION TRANSFER TERMINATIONNO DATA TENURE FOR I/O RESPONSE

(I/O responses are address-only)


9.6.1.1 Store OperationsThere are three operations defined for I/O controller interface store operations from the 601to the BUC, defined as follows:

• Store immediate operations transfer up to 32 bits of data each from the 601 to the BUC.

• Store last operations transfer up to 32 bits of data each from the 601 to the BUC

• Store reply from the BUC reveals the success/failure of that I/O controller interface access to the 601.

An I/O controller interface store access consists of one or more data transfer operationsfollowed by the I/O store reply operation from the BUC. If the data can be transferred inone 32-bit data transaction, it is marked as a store last operation followed by the store replyoperation; no store immediate operation is involved in the transfer, as shown in thefollowing sequence:

STORE LAST (from the 601)••

STORE REPLY (from BUC)However, if more data is involved in the I/O controller interface access, there will be one ormore store immediate operations. The BUC can detect when the last data is beingtransferred by looking for the store last opcode, as shown in the following sequence:

STORE IMMEDIATE(s)••

STORE LAST••

STORE REPLY

9.6.1.2 Load OperationsI/O controller interface load accesses are similar to store operations, except that the 601latches data from the addressed BUC rather than supplying the data to the BUC. As withmemory accesses, the 601 is the master on both load and store operations; the externalsystem must provide the data bus grant to the 601 when the BUC is ready to supply the datato the 601.


The load request I/O controller interface operation has no analogous store operation; itinforms the addressed BUC of the total number of bytes of data that the BUC must provideto the 601 on the subsequent load immediate/load last operations. For I/O controllerinterface load accesses, the simplest, 32-bit (or fewer) data transfer sequence is as follows:

LOAD REQUEST••

LOAD LAST••

LOAD REPLY(from BUC)However, if more data is involved in the I/O controller interface access, there will be one ormore load immediate operations. The BUC can detect when the last data is beingtransferred by looking for the load last opcode, as seen in the following sequence:

LOAD REQUEST••

LOAD IMM(s)••

LOAD LAST••

LOAD REPLYNote that three of the seven defined operations are address-only transactions and do not usethe data bus. However, unlike the memory transfer protocol, these transactions are notbroadcast from one master to all snooping devices; The I/O controller interface address-only transaction protocol strictly controls communication between the 601 and the BUC.

9.6.2 I/O Controller Interface Transaction Protocol DetailsAs mentioned previously, there are two address-bus beats corresponding to two packets ofinformation about the address. The two packets contain the sender and receiver tags, theaddress and extended address bits, and extra control and status bits. The two beats of theaddress bus (plus attributes) are shown at the top of Figure 9-24 as two packets. The firstpacket, packet 0, is then expanded to depict the XATC and address bus information indetail.


9.6.2.1 Packet 0Figure 9-24 shows the organization of the first packet in an I/O controller interfacetransaction.

The XATC contains the I/O opcode, as discussed earlier and as shown in Table 9-7. Theaddress bus contains the following:

Key bit || segment register || sender tag

Figure 9-24. I/O Controller Interface Operation—Packet 0

This information is organized as follows:

• Bits 0 and 1 of the address bus are reserved—The 601 always drives these bits to zero.

• Key bit—Bit 2 is the key bit from the segment register (either SR[Ku] or SR[Ks]). Ku indicates user-level access and Ks indicate supervisor-level access. The 601 multiplexes the correct key bit into this position according to the current operating context (user or supervisor).

• Segment register—Address bits 3–27 correspond to bits 3–27 of the segment register. Note that address bits 3–11 form the nine-bit receiver tag. Software must initialize these bits in the segment register to the ID of the BUC to be addressed; they are referred to as the BUID (bus unit ID) bits.

• PID (sender tag)—Address bits 28–31 form the four-bit sender tag. These bits come from bits 28–31 of the 601 PID (processor ID) register. A four-bit tag allows a maximum of 16 processor IDs to be defined for a given system. If more bits are needed for a very large multiprocessor system, for example, it is envisioned that the second-level cache (or equivalent logic) can append a larger processor tag as needed. The BUC addressed by the receiver tag should latch the sender address required by the subsequent I/O reply operation.

I/O Opcode

0 1 2 3 1112 27 28 310 7

A (0–31) + Attributes

Address Bus (A0–A31)

PKT 0 PKT 1

+XATC

Reserved

Key bit

From Segment Register

BUID PID


9.6.2.2 Packet 1The second address beat, packet 1, transfers byte counts and the physical address for thetransaction, as shown in Figure 9-25.

Figure 9-25. I/O Controller Interface Operation—Packet 1

For packet 1, the XATC is defined as follows:

• Load request operations—XATC contains the total number of bytes to be transferred (128 bytes maximum for the 601)

• Immediate/last (load or store) operations—XATC contains the current transfer byte count (one to four bytes.)

Address bits 0–31 contain the physical address of the transaction. The physical address isgenerated by concatenating segment register bits 28–31 with bits 4-31 of the effectiveaddress, as follows:

Segment register (bits 28–31) || effective address (bits 4–31)

While the 601 provides the address of the transaction to the BUC, the BUC must maintaina valid address pointer for the reply.

9.6.3 I/O Reply OperationsBUCs must respond to 601 I/O controller interface transactions with an I/O reply operation,as shown in Figure 9-26. The purpose of this reply operation is to inform the 601 of thesuccess or failure of the attempted I/O controller interface access. This requires the systemI/O controller interface to have 601 bus mastership capability—a substantially morecomplex design task than bus slave implementations that use memory-mapped I/O access.

Reply operations from the BUC to the 601 are address-only transactions. As with packet 0of the address bus on 601 I/O controller interface operations, the XATC contains the opcodefor the operation (see Table 9-7). Additionally, the I/O reply operation transfers thesender/receiver tags in the first beat.

Byte Count

0 7

ADDR +


PKT 0 PKT 1

+XATC Bus Address0 3 4 31SR(28-31)


Figure 9-26. I/O Reply Operation

The address bits are described in Table 9-8.

The second beat of the address bus is reserved; the XATC and address buses should bedriven to zero to preserve compatibility with future protocol enhancements.

The following sequence occurs when the 601 detects an error bit set on an I/O replyoperation:

1. The 601 completes the instruction that initiated the access.

2. If the instruction is a load, the data is forwarded onto the register file(s)/sequencer.

3. An I/O controller interface error exception is generated, which transfers 601 control to the I/O controller interface error exception handler to recover from the error. Refer to Section 5.4.10, “I/O Controller Interface Error Exception (x'00A00'),” for more information.

If the error bit is not set, the 601 instruction that initiated the access completes andinstruction execution resumes.

Table 9-8. Address Bits for I/O Reply Operations

Address Bits Description

0–1 Reserved. These bits should be set to zero for compatibility with future PowerPC microprocessors.

2 Error bit. It is set if the BUC records an error in the access.

3–11 BUID. Sender tag of a reply operation. Corresponds with bits 3–11 of one of the 601 segment registers.

12–27 Address bits 12–27 are BUC-specific and are ignored by the 601.

28–31 PID (receiver tag). The 601 effectively snoops operations on the bus and, on reply operations, compares this field to bits 28–31 of the PID register to determine if it should recognize this I/O reply.

I/O Opcode

0 7


+XATC

Reserved

ErrorBit

Segment Register

BUID PIDBUC Specific

0 1 2 3 1112 27 28 31


System designers should note the following:

• “Misplaced” reply operations (that match the processor tag and arrive unexpectedly) cause a checkstop condition. Refer to Chapter 5, “Exceptions,” for more information.

• External logic must assert AACK for the 601, even though it is the receiver of the reply operation. AACK is an input-only to the 601.

• The 601 monitors address parity when enabled by software and XATS and reply operations (load or store).

9.6.4 I/O Controller Interface Operation Timing The following timing diagrams show the sequence of events in a typical 601 I/O controllerinterface load access (Figure 9-27) and a typical 601 I/O controller interface store access(Figure 9-28). All arbitration signals except for ABB and DBB have been omitted forclarity. Note that for either case, the number of immediate operations depends on theamount and the alignment of data to be transferred. If no more than four bytes are beingtransferred, and the data is double-word aligned (that is, does not straddle an eight-byteaddress boundary), there will be no immediate operation as shown in the figures.

The 601 can transfer as many as 128 bytes of data in one load or store instruction (requiringmore than 33 immediate operations in the case of misaligned operands).

In Figure 9-27, XATS is asserted with the same timing relationship as TS in a memoryaccess. Notice, however, that the address bus (and XATC) transition on the next bus clockcycle. The first of the two beats on the address bus is valid for one bus clock cycle windowonly, and that window is defined by the assertion of XATS. The second address bus beat,however, can be extended by delaying the assertion of AACK until the system has latchedthe address.

The load request and load reply operations shown in Figure 9-27 are address-onlytransactions as denoted by the negated TT3 signal during their respective address tenures.Note that other types of bus operations can occur between the individual I/O controllerinterface operations on the bus. The 601 involved in this transaction, however, does notinitiate any other I/O controller load or store operations once the first I/O controllerinterface operation has begun address tenure; however, if the I/O operation is retried, otherhigher-priority operations can occur.

Notice that, in this example (zero wait states), 13 bus clock cycles are required to transferno more than eight bytes of data.


Figure 9-27. I/O Controller Interface Load Access Example

Figure 9-28 shows an I/O store access, comprised of three I/O controller interfaceoperations in this example. As with the example in Figure 9-27, notice that data istransferred only on the 32 bits of the DH bus. As opposed to Figure 9-27, there is no requestoperation since the 601 has the data ready for the BUC.

The TEA signal may be asserted on any I/O controller interface operation. If it is asserted,the processor enters a checkstop condition if MSR[ME] is cleared, or it will queue amachine check exception if ME is set. After TEA is asserted, it must be reasserted for alltenures associated with the current I/O controller interface operation until the load last orstore last operation occurs. When the operation occurs, the execution unit is released to takethe machine check exception. If the TEA signal is asserted for an I/O controller interfaceoperation, the reply operations (store reply or load reply) must not occur. If it does, it causesa checkstop condition. If the TEA signal is not asserted with each tenure of a given I/Ocontroller interface operation, the result of the assertion of TEA is unpredictable. The 601may take a machine check exception or cause a checkstop condition.

ABB

XATS

ADDR, XAT0

XATC(3)

DBB

DH0–DH31

TA

1 2 3 4 5 6 7 8 9 10 11 12 13

PKT 0 PKT 1 PKT 0 PKT 1 PKT 0 PKT 1 Reply Rsrvd

REQUEST OP IMM. OP LAST OP REPLY OP

PKT 0 PKT 1 PKT 0 PKT 1 PKT 0 PKT 1


Figure 9-28. I/O Controller Interface Store Access Example

9.7 Interrupt, Checkstop, and Reset SignalsThis section describes external interrupts, checkstop operations, and hard and soft resetinputs.

9.7.1 External InterruptThe maskable interrupt input (INT) to the 601 eventually forces the processor to take theexternal interrupt vector if the MSR(EE) bit is set. See Chapter 5, “Exceptions,” for moreinformation about interrupts and exceptions.

9.7.2 CheckstopsThe 601 has two checkstop signals, an input (CKSTP_IN) and an output (CKSTP_OUT).If CKSTP_IN is asserted, the 601 halts operations by gating off all internal clocks. The 601does not assert CKSTP_OUT if CKSTP_IN if asserted.

If CKSTP_OUT is asserted, the 601 has checkstopped internally. The CKSTP_OUT signalcan be asserted for various reasons including receiving a TEA signal, as the result of thelack of an instruction dispatch, or internal and external parity errors. For more informationon checkstop state, refer to Section 5.4.2.2, “Checkstop State (MSR[ME] = 0).”

Note that checkstop conditions can be disabled by setting bits in the HID0 register. Forinformation, see Section 2.3.3.13.1, “Checkstop Sources and Enables Register—HID0.”

ABB

XATS

ADDR, XATC

XATC(3)

DBB

DH0–DH31

TA

1 2 3 4 5 6 7 8 9 10

PKT 0 PKT 1 PKT 0 PKT 1 Reply Rsrvd

IMM. OP LAST OP REPLY OP

PKT 0 PKT 1 PKT 0 PKT 1


9.7.3 Reset InputsThe 601 has two reset inputs, described as follows:

• HRESET (hard reset)—The HRESET signal is used for power-on reset sequences, or for situations in which the 601 must go through the entire cold-start sequence of self-tests. Once asserted, this input must be held asserted for a minimum of 300 processor clock cycles to ensure that the processor has had enough time to recognize the input and initialize registers. For more information about hard reset, see Section 2.7.1, “Hard Reset.”

• SRESET (soft reset)—The soft reset input provides warm reset capability. This input can be used to avoid forcing the 601 to complete the cold start sequence. This can be useful to recover from such conditions as check stop or some machine-check states that cannot be restarted.

When either reset input is negated and if the self-test sequence completes without error, theprocessor attempts to fetch code from the system reset exception vector. The vector islocated at offset x'00100' from the exception prefix (all zeros or ones, depending on thesetting of the exception prefix bit in the machine state register (MSR[EP]). The EP bit is setfor HRESET.

9.7.4 Soft Stop Control SignalsThe soft stop control signals allow the processor to stop the clocks and bring the activity toa quiescent state in an orderly fashion (as opposed to a hard stop, which simply halts theclocks without regard to system activity).

The soft stop state is entered by asserting the QUIESC_REQ signal. This signal allows thesystem to complete any bus activities that might be affected by stopping the clocks. Whenthe system is ready to enter the soft stop state, it asserts the SYS_QUIESC signal. At thistime the 601 takes a soft stop.

During a soft stop all internal clocking is disabled after the system activity quiesces in anorderly manner, that is, there are no partially finished instructions. Soft stop is typicallyused for debugging; during the soft stop, the state bits in the chip can be scanned, examinedand scanned back in. The processor returns to normal operation when the RESUME signalis asserted.

9.8 Processor State SignalsThis section describes the 601's support for atomic update and memory through the use ofthe lwarx/stwcx. opcode pair and the configuration options for the 601 output buffer.

9.8.1 Support for the lwarx/stwcx. Instruction PairThe Load Word and Reserve Indexed (lwarx) and the Store Word Conditional Indexed(stwcx.) instructions provide a means for atomic memory updating. Memory can beupdated atomically by setting a reservation on the load and checking that the reservation is


still valid before the store is performed. In the 601, the reservations are made on behalf ofaligned, 32-byte sections of the memory address space.

The reservation (RSRV) output signal is always driven by bus clock cycle and reflects thestatus of the reservation coherency bit in the reservation address register (see Chapter 4,“Cache and Memory Unit Operation,” for more information). See Section 8.2.9.8,“Reservation (RSRV)—Output,” for information about timing.

9.9 IEEE 1149.1-Compatible InterfaceThe 601 boundary-scan interface is IEEE 1149.1 compatible only and is not a fullycompliant implementation of the IEEE 1149.1 standard. Although the standard allowsbuilt-in self-test (BIST), the 601 interface supports only boundary scan. This sectiondescribes the 601 interface and its differences with the IEEE 1149.1 interface.

9.9.1 Deviations from the IEEE 1149.1 Boundary-Scan Specifications The 601 deviates from the IEEE 1149.1 specifications in the following ways:

• In the IEEE 1149.1 specifications, no mode pin is required to use the IEEE 1149.1 boundary-scan interface. However, in the 601, the scan enable mode input (BSCAN_EN) signal must be asserted to run boundary-scan testing. The signal must be pulled up when boundary-scan testing is not being performed.

• Whereas the IEEE 1149.1 specifications indicate that only the TCK signal should be used to clock data-register latches, in the 601 the processor system clock must be active (oscillating) during testing.

• The 601 implements only the PRELOAD portion of the SAMPLE/PRELOAD function.

• IEEE 1149.1 specifies that data on the primary output should be held valid while the processor is in the SHIFT DR state and that data should change only in the UPDATE DR or UPDATE IR states (assuming the instruction is valid). In the 601, no stable values are held on primary outputs for the SHIFT DR state. The SHIFT DR state forces primary outputs to high impedance. Outputs are enabled if the instruction register (IR) contains a valid instruction and the test access port (TAP) is in the UPDATE DR or UPDATE IR state.

• IEEE 1149.1 specifies that asserting the TRST signal should reset only the TAP. In the 601, the TRST signal resets the TAP, system logic, and the COP.

IEEE 1149.1 also specifies the use of the TRST signal to disable TAP. On the 601, this can be done by negating the BSCAN_EN signal, which prohibits resetting the TAP and system logic independently. The TRST signal should not be used to disable the TAP in the system functional environment; the BSCAN_EN signal should be used. The user can use the TRST signal as described above or hold TMS high for five TCK cycles. Note that not all SRLs in the 601 are boundary-scan SRLs. The boundary-scan chain includes functional system SRLs.


9.9.2 Additional Information about the IEEE 1149.1 InterfaceNote the following points concerning the IEEE 1149.1 interface:

• Because the driver inhibit to all COMMON I/O signals is controlled by a common signal, all COMMON INPUT/OUTPUT devices are either configured as input pins or output pins, as determined by the JTAGEN bit (bit position 418) in the boundary-scan chain.

• Not all latches in the boundary-scan chain are boundary-scan latches.

9.9.3 IEEE 1149.1 Interface DescriptionThere are five device pins used for the IEEE 1149.1 interface on the 601. These pins alsoperform other functions in the 601 and consequently have signal names describing theirother (non-IEEE 1149.1) functions. Table 9-9 shows the signal name, IEEE 1149.1function and package pin number of the five TAP pins.

Note: The SCAN_SIN must be pulled-up while performing IEEE 1149.1 testing and mustbe pulled down when IEEE 1149.1 testing is not being performed.

The BSCAN_EN input pin must be driven low to enable the boundary-scan test mode.Additionally, the BSCAN_EN input must be pulled-up when boundary-scan testing is notbeing performed. The addition of this pin is a deviation from the IEEE 1149.1 specificationwhich only defines five pins for the test interface.

9.9.4 IEEE Interface Clock RequirementsIn addition to the five standard IEEE 1149.1 signals, the 601 requires that its 2X_PCLK andPCLK_EN clock inputs remain active during IEEE 1149.1 operation. The 2X_PCLK andPCLK_EN signals can be supplied by automatic test equipment or the clock generationcircuits on the unit under test. Timing of the IEEE 1149.1 signals is pseudo asynchronousto the 601 clock inputs and mimics typical IEEE 1149.1 timing. The timing of the IEEE1149.1 signals can be treated as asynchronous to the 601 clocks as long as they remainasserted for several PCLK_EN cycles.

Table 9-9. IEEE Interface Pin Descriptions

Signal Name

IEEE 1149.1 Function

PackagePin

SCAN_CTL TMS 184

SCAN_CLK TCK 187

SCAN_SIN TDI 186

SCAN_OUT TDO 78

HRESET TRST 279


The recommended method of IEEE 1149.1 operation is to limit TCK frequency to 20% orless of the PCLK_EN frequency; this allows five PCLK_EN cycles for each TCK cycle.This insures that at least one positive transition of PCLK_EN will occur on each half cycleof TCK within the required set-up and hold times even if the duty cycle of TCK varies by20%. Figure 9-29 shows TCK = 0.2 (PCLK_EN) and the duty cycle of TCK ± 20%. Notethat there is no edge timing relationship between TCK and 2X_PCLK, even thoughFigure 9-29 may tend to indicate otherwise.

Figure 9-29. IEEE 1149.1 Interface TCK Requirements

Typical PCLK_EN frequency will be 50–66 MHz if supplied by the clock circuits on theunit under test. This allows TCK frequencies of 10–13 MHz. In most IEEE 1149.1 testingscenarios, the TCK frequency is likely to be much less than 20% of the PCLK_ENfrequency. When PCLK_EN and 2X_PCLK are provided by the automatic test equipment,it is acceptable to run these clocks at a much lower frequency than they would normally run.For example, if PCLK_EN = 5 MHz and 2X_PCLK = 10 MHz, then TCK must be ≤ 1MHz.

The timing relationships between the IEEE 1149.1 signals is shown in Figure 9-30 and thesignal timing requirements are given in Table 9-10. All the timing parameters are specifiedas a portion of the PCLK_EN cycle time.

PCLK_EN

TCKNormal

TCK–20%

TCK+20%


Figure 9-30. IEEE 1149.1 Signal Clocking Requirements

Table 9-10 provides signal timing requirements for the IEEE 1149.1 interface for thesignals shown in Figure 9-30.

9.9.5 IEEE 1149.1 Interface Reset RequirementsThe TRST (HRESET) input is used to reset the TAP, and must be held low for a minimumof 60 TCK cycles to accomplish a reset. The TRST (HRESET) signal will reset the wholechip including the TAP controller. This deviates from the IEEE 1149.1 specification in thatTRST should only reset the TAP, and not the system logic. The TAP controller can be resetto the “Test-Logic-Reset” state at any time by holding TMS (SCAN_CTL) high for 5 TCKcycles. This 5-cycle reset will only reset the TAP controller to the “Test-Logic-Reset” stateand fill the instruction register with all 1’s (BYPASS command), it does not reset theboundary-scan chain or any other chip logic.

The IEEE specification also provides for the ability to disable the TAP via the TRST input.This can be performed on the 601 via the BSCAN_EN input. Using the TRST input todisable the TAP will cause the entire chip to reset. When IEEE 1149.1 testing is not beingperformed the TAP should be disabled by pulling the BSCAN_EN input high.

Table 9-10. IEEE 1149.1 Signal Timing Requirements

Label Characteristic Minimum Maximum

1 TCK frequency 0.2 (PCLK_EN) frequency

2 TCK duty cycle 30% 80%

3 Input setup time for TDI and TMS One PCLK_EN period

4 Input hold time for TDI and TMS One PCLK_EN period

5 Output delay time for TDO 1 PCLK_EN period +10 ns

2

3 4

5

1

TCK

TDIor

TMS

TDO


9.9.6 IEEE Interface Instruction SetThe 601 processor implements three IEEE 1149.1 instructions, BYPASS, EXTEST andSAMPLE/PRELOAD. The 601 provides no SAMPLE function in theSAMPLE/PRELOAD instruction, which is a deviation from the IEEE 1149.1 specification.The instruction register is three bits long. Table 9-11 provides the binary encoding for theIEEE 1149.1 instructions.

When using the EXTEST instruction, note that no stable logic levels will be held on theoutputs while in the SHIFT DR state. The 601 outputs are forced to the high impedancestate while the TAP controller is in the SHIFT DR state. The 601 outputs will be enabled ifa valid instruction is in the instruction register and the TAP controller is in the UPDATE DRor UPDATE IR state. This is a deviation from the IEEE 1149.1 standard which requiresoutputs to be held valid while in the SHIFT DR state.

9.9.7 IEEE 1149.1 Interface Boundary-Scan ChainThe 601 boundary-scan chain is 424 bits long (0–423.) Not every bit in the boundary- scanchain is directly controllable or observable during inbound and outbound testing in theEXTEST mode. Table 9-12 shows the pin number, signal name, and scan chain bit positionfor each boundary-scan pin. Note that the pins that do not have boundary-scan latches areshown with N/A in the bit position column. The last column of the table shows which ofthe 601 pins have an inverter between the package pin and the boundary-scan latch. Allbidirectional I/O pins have two boundary latches associated with them, one for the inputand one for the output. In the table, the bidirectional I/O pins are shown as type I/O and theboundary-scan chain bit positions for these pins are shown in Input/Output order.Table 9-12 describes the boundary-scan string.

Table 9-11. IEEE Interface Instruction Set

Instruction Code

BYPASS 111

EXTEST 000

SAMPLE/PRELOAD 101


Table 9-12. IEEE 1149.1 Boundary-Scan Chain Description

Pin Number Signal Name TypeBit

PositionInverted

1 TST21 Input N/A

3 TST20 Input N/A

4 TST16 Input N/A

5 TST11 Input N/A

7 TST13 Input N/A

8 TST15 Input N/A

9 TST17 Input N/A

10 TST14 Input N/A

13 TST9 Input N/A

14 TST6 Input N/A

15 TST7 Input N/A

17 TST8 Input N/A

18 A0 I/O 141/108

19 A1 I/O 142/109

21 A2 I/O 143/110

22 A3 I/O 144/111

23 A4 I/O 145/112

26 A5 I/O 146/113

27 A6 I/O 147/114

28 A7 I/O 148/115

30 A8 I/O 149/116

31 A9 I/O 150/117

32 A10 I/O 151/118

34 A11 I/O 152/119

35 A12 I/O 153/120

36 A13 I/O 154/121

41 A14 I/O 155/122

42 A15 I/O 156/123

43 A16 I/O 157/124

45 A17 I/O 158/125

46 A18 I/O 159/126


47 A19 I/O 160/127

49 A20 I/O 161/128

50 A21 I/O 162/129

51 A22 I/O 163/130

54 A23 I/O 164/131

55 A24 I/O 165/132

56 A25 I/O 166/133

58 A26 I/O 167/134

59 A27 I/O 168/135

60 A28 I/O 169/136

62 A29 I/O 170/137

63 A30 I/O 171/138

64 A31 I/O 172/139

67 AP0 I/O 198/269

68 AP1 I/O 199/270

69 AP2 I/O 200/271

70 TST19 Output N/A

71 AP3 I/O 201/272

72 CKSTP_OUT Output 423

74 RUN_NSTOP Output N/A

75 DH31 I/O 376/32

78 SCAN_OUT Output N/A

80 DH30 I/O 375/31

81 DH29 I/O 374/30

82 DH28 I/O 373/29

83 DH27 I/O 372/28

84 DH26 I/O 371/27

85 DH25 I/O 370/26

86 DH24 I/O 369/25

90 DH23 I/O 368/24

91 DH22 I/O 367/23

93 DH21 I/O 366/22

Table 9-12. IEEE 1149.1 Boundary-Scan Chain Description (Continued)


PositionInverted


94 DH20 I/O 365/21

95 DH19 I/O 364/20

97 DH18 I/O 363/19

98 DH17 I/O 362/18

99 DH16 I/O 361/17

103 DH15 I/O 360/16

104 DH14 I/O 359/15

106 DH13 I/O 358/14

107 DH12 I/O 357/13

108 DH11 I/O 356/12

110 DH10 I/O 355/11

111 DH9 I/O 354/10

112 DH8 I/O 353/9

118 DH7 I/O 352/8

119 DH6 I/O 351/7

121 DH5 I/O 350/6

122 DH4 I/O 349/5

123 DH3 I/O 348/4

125 DH2 I/O 347/3

126 DH1 I/O 346/2

127 DH0 I/O 345/1

130 DL31 I/O 413/69

131 DL30 I/O 412/68

132 DL29 I/O 411/67

134 DL28 I/O 410/66

135 DL27 I/O 409/65

136 DL26 I/O 408/64

138 DL25 I/O 407/63

139 DL24 I/O 406/62

140 DL23 I/O 405/61

143 DL22 I/O 404/60

144 DL21 I/O 403/59



PositionInverted


145 DL20 I/O 402/58

147 DL19 I/O 401/57

148 DL18 I/O 400/56

149 DL17 I/O 399/55

151 DL16 I/O 398/54

155 DL15 I/O 397/53

157 DL14 I/O 396/52

159 DL13 I/O 395/51

161 DL12 I/O 394/50

165 DL11 I/O 393/49

167 DL10 I/O 392/48

168 DL9 I/O 391/47

169 DL8 I/O 390/46

172 DL7 I/O 389/45

173 DL6 I/O 388/44

178 DL5 I/O 387/43

180 DL4 I/O 386/42

181 DL3 I/O 385/41

182 DL2 I/O 384/40

184 SCAN_CTL Input N/A

185 DL1 I/O 383/39

186 SCAN_SIN Input N/A

187 SCAN_CLK Input N/A

188 DL0 I/O 382/38

194 DP7 I/O 417/73

195 DP6 I/O 416/72

197 DP5 I/O 415/71

198 DP4 I/O 414/70

199 DP3 I/O 380/36

201 DP2 I/O 379/35

202 DP1 I/O 378/34

203 DP0 I/O 377/33



PositionInverted


210 SC_DRIVE Input 231

211 CSE1 Output 87

212 CSE2 Output 88

214 WT Output 84 Inverted

215 CSE0 Output 86

216 CI Output 83 Inverted

219 BR Output 209 Inverted

220 DBB I/O 177/217

221 ARTRY I/O 175/326

222 DPE Output 97

224 ABB I/O 205/216

226 TS I/O 186/214 Inverted input only

227 TT1 I/O 191/274

228 TT0 I/O 190/273

229 XATS I/O 194/211 Inverted input only

231 APE Output 98

232 TSIZ1 I/O 188/279

233 GBL I/O 181/85 Inverted output only

235 SHD I/O 182/325

236 TBST I/O 184/277

237 TSIZ2 I/O 189/280

238 TT4 Output 286

241 TSIZ0 I/O 187/278

243 TC0 Output 89

244 TT3 I/O 193/276

246 TST2 Output N/A

247 TST3 Output N/A

248 TT2 I/O 192/275

250 HP_SNP_REQ IN 196

251 TC1 Output 90

254 RSRV Output 210

255 TST22 Input 232



PositionInverted


Note that the internal signal JTAGEN shown at the end of Table 9-12 is used to control thedirection of all the bidirectional pins of the 601. JTAGEN is bit position 420 in theboundary-scan chain and needs to be set by the user when EXTEST operation is desired.Setting the JTAGEN bit to 1 places all the 601 bidirectional I/O pins in the output enabled(drive) mode. When JTAGEN is cleared to 0 all the 601 bidirectional I/O pins are set to theinput (receive) mode.

256 QUIESC_REQ Input N/A

258 CKSTP_IN Input 423

260 SYS_QUIESC Input N/A

262 INT Input 233

264 SRESET Input 234

271 BCLK_EN Input 204

273 RTC Input N/A

275 ESP_EN Input N/A

277 RESUME Input N/A

279 HRESET Input N/A

282 2X_PCLK Input N/A

285 PCLK_EN Input N/A

288 TST5 Input N/A

290 TA Input 183

291 TEA Input 185 Inverted

292 DRTRY Input 180

295 AACK Input 174

297 DBWO Input 179

298 BG Input 176

299 BSCAN_EN Input N/A

300 DBG Input 178

302 TST12 Input N/A

303 TST18 Input N/A

304 TST10 Input N/A

N/A JTAGEN Internal 420



PositionInverted


9.10 Using DBWO (Data Bus Write Only)The 601 supports split transaction pipelined transactions. Additionally, the DBWO signalallows the 601 to be configured dynamically to source write data out of order with respectto read data.

In general, an address tenure on the bus is followed strictly in order by its associated datatenure. Transactions pipelined by a single 601 complete strictly in order. However, the 601can run bus transactions out of order only when the external system allows the 601 toperform a cache-sector snoop push-out operation (or other write transaction, if pending inthe 601 write queues) between the address and data tenures of a read operation through theuse of DBWO. This effectively envelopes the write operation within the read operation.This can be useful in some external queued controller scenarios or for more complexmemory implementations that can support so-called dump-and-run operations. Theseinclude the cache sector cast out of a modified sector caused by a load miss. A replacementcopyback operation can be written to memory buffers while the memory location is beingaccessed for the line fill. The sector is written (dumped) into memory buffers while thememory is accessed for the load operation. Optimally, the replacement copy-back operationcan be absorbed by the memory system without affecting load memory latency. Figure 9-31gives an example of the use of the DBWO input.

Figure 9-31 illustrates the following sequence of operations:

1. Processor A begins a read operation. (Bus clock cycle 2)

2. Processor B attempts a global read but is interrupted by a retry from processor A (bus clock cycle 7)

3. Processor A performs a cache-sector snoop push-out operation out of order because of the assertion of DBWO (bus clock cycle 8)

4. Processor B successfully performs the global read (bus clock cycle 13)

5. Processor A successfully concludes its original read operation (bus clock cycle 16)

Note that steps 4 and 5 can occur in either order.


Figure 9-31. Data Bus Write-Only Transaction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

CPU A BR

CPU A BG

CPU B BR

CPU B BG

ABB

TS

A0–A31

TT0–TT3

TBST

GBL

AACK

ARTRY

CPU A DBG

CPU A DBWO

CPU B DBG

DBB

D0–D63

TA

DRTRY

TEA

CPU A CPU B CPU A CPU B

Read Read Snoop Push CPU B

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

In* Out Out Out Out In In

* Indicates the 601 flushed this data due to address retry

CPU ACPU B CPU B CPU A


Note that although the 601 can pipeline any write transaction behind the read transaction,special care should be used when using the enveloped write feature. It is envisioned thatmost system implementations will not need this capability; for these applications DBWOshould remain negated. In systems where this capability is needed, DBWO should beasserted under the following scenario:

1. The 601 initiates a read transaction (either single-beat or burst) by completing the read address tenure with no address retry.

2. Then, the 601 initiates a write transaction by completing the write address tenure, with no address retry.

3. At this point, if DBWO is asserted with a qualified data bus grant to the 601, the 601 asserts DBB and drives the write data onto the data bus, out of order with respect to the address pipeline. The write transaction concludes with the 601 negating DBB.

4. The next qualified data bus grant signals the 601 to complete the outstanding read transaction by latching the data on the bus. This assertion of DBG should not be accompanied by an asserted DBWO.

Any number of bus transactions by other bus masters can be attempted between any of thesesteps.

Note the following regarding DBWO:

• DBWO cannot be asserted if no data bus write tenures are pending.

• DBWO can be asserted if no data bus read is pending, but it has no effect on write ordering.

• The ordering and presence of data bus writes is determined by the writes in the write queues at the time BG is asserted for the write address (not DBG). If a particular write is desired (for example, a cache-sector snoop push-out operation), then BG must be asserted after that particular write is in the queue and it must be the highest priority write in the queue at that time. A cache-sector snoop push-out operations may be the highest priority write, but more than one may be queued.

• Because more than one write may be in the write queue when DBG is asserted for the write address, more than one data bus write may be enveloped by a pending data bus read.

The arbiter must monitor bus operations and coordinate the various masters and slaves withrespect to the use of the data bus when DBWO is used. Individual DBG signals associatedwith each bus device should allow the arbiter to synchronize both pipelined and split-transaction bus organizations. Individual DBG signals provide a primitive form of source-level tagging for the granting of the data bus.

Note that use of the DBWO signal allows some operation-level tagging with respect to the601 and the use of the data bus.

Chapter 10. Instruction Set 10-1

Chapter 10 Instruction Set100100

This chapter describes individual instructions, including a description of instructionformats and notation and an alphabetical listing of the PowerPC 601 microprocessor’sinstructions by mnemonic.

10.1 Instruction FormatsInstructions are four bytes long and word-aligned, so when instruction addresses arepresented to the processor (as in branch instructions) the two low-order bits are ignored.Similarly, whenever the processor develops an instruction address, its two low-order bitsare zero.

Bits 0–5 always specify the primary opcode. Many instructions also have a secondaryopcode. The remaining bits of the instruction contain one or more fields for the differentinstruction formats.

Some instruction fields are reserved or must contain a predefined value as shown in theindividual instruction layouts. If a reserved field does not have all bits set to 0, or if a fieldthat must contain a particular value does not contain that value, the instruction form isinvalid and the results are as described in Appendix D, “Classes of Instructions”.

10.1.1 Split-Field NotationSome instruction fields occupy more than one contiguous sequence of bits or occupy acontiguous sequence of bits used in permuted order. Such a field is called a split field. Inthe format diagrams and in the individual instruction layouts, the name of a split field isshown in small letters, once for each of the contiguous sequences. In the pseudocodedescription of an instruction having a split field and in some places where individual bits ofa split field are identified, the name of the field in small letters represents the concatenationof the sequences from left to right. Otherwise, the name of the field is capitalized andrepresents the concatenation of the sequences in some order, which need not be left to right,as described for each affected instruction.


10.1.2 Instruction Fields Table 10-1 describes the instruction fields used in the various instruction formats.

Table 10-1. Instruction Formats

Field Description

AA (30) Absolute address bit0 The immediate field represents an address relative to the current instruction address. The

effective (logical) address of the branch is either the sum of the LI field sign-extended to 32 bits and the address of the branch instruction or the sum of the BD field sign-extended to 32 bits and the address of the branch instruction.

1 The immediate field represents an absolute address. The effective address of the branch is the LI field sign-extended to 32 bits or the BD field sign-extended to 32 bits.

BD (16–29) Immediate field specifying a 14-bit signed two's complement branch displacement that is concatenated on the right with b'00' and sign-extended to 32 bits.

BI (11–15) Field used to specify a bit in the CR to be used as the condition of a branch conditional instruction

BO (6–10) Field used to specify options for the branch conditional instructions. The encoding is described in Section 3.6.2, “Conditional Branch Control.”

crbA (11–15) Field used to specify a bit in the CR to be used as a source

crbB (16–20) Field used to specify a bit in the CR to be used as a source

crbD (6–10) Field used to specify a bit in the CR or in the FPSCR as the destination of the result of an instruction

crfD (6–8) Field used to specify one of the CR fields or one of the FPSCR fields as a destination

crfS (11–13) Field used to specify one of the CR fields or one of the FPSCR fields as a source

CRM (12–19) Field mask used to identify the CR fields that are to be updated by the mtcrf instruction

d(16–31) Immediate field specifying a 16-bit signed two's complement integer that is sign-extended to 32 bits

FM (7–14) Field mask used to identify the FPSCR fields that are to be updated by the mtfsf instruction

frA (11–15) Field used to specify an FPR as a source of an operation

frB (16–20) Field used to specify an FPR as a source of an operation

frC (21–25) Field used to specify an FPR as a source of an operation

frD (6–10) Field used to specify an FPR as the destination of an operation

frS (6–10) Field used to specify an FPR as a source of an operation

IMM (16–19) Immediate field used as the data to be placed into a field in the FPSCR

LI (6–29) Immediate field specifying a 24-bit, signed two's complement integer that is concatenated on the right with b'00' and sign-extended to 32 bits

LK (31) Link bit.0 Does not update the link register.1 Updates the link register. If the instruction is a branch instruction, the address of the instruction

following the branch instruction is placed into the link register.


MB (21–25) and ME (26–30)

Fields used in rotate instructions to specify a 32-bit mask consisting of “1” bits from bit MB+32 through bit ME+32 inclusive, and “0” bits elsewhere, as described in Section 3.3.4, “Integer Rotate and Shift Instructions”.

NB (16–20) Field used to specify the number of bytes to move in an immediate string load or store

opcode (0–5) Primary opcode field

OE (21) Used for extended arithmetic to enable setting OV and SO in the XER

rA (11–15) Field used to specify a GPR to be used as a source or as a destination

rB (16–20) Field used to specify a GPR to be used as a source

Rc (31) Record bit0 Does not update the condition register1 Updates the condition register (CR) to reflect the result of the operation.

For integer instructions, CR bits 0–3 are set to reflect the result as a signed quantity. The result as an unsigned quantity or a bit string can be deduced from the EQ bit. For floating-point instructions, CR bits 4–7 are set to reflect floating-point exception, floating-point enabled exception, floating-point invalid operation exception, and floating-point overflow exception.

rD(6–10) Field used to specify a GPR to be used as a destination

rS (6–10) Field used to specify a GPR to be used as a source

SH (16–20) Field used to specify a shift amount

SIMM (16–31) Immediate field used to specify a 16-bit signed integer

SPR (11–20) Field used to specify a special purpose register for the mtspr and mfspr instructions. The encoding is described in Section 3.7.2, “Move to/from Special-Purpose Register Instructions.”

TO (6–10) Field used to specify the conditions on which to trap. The encoding is described in Section 3.6.9, “Trap Instructions and Mnemonics

UIMM (16–31) Immediate field used to specify a 16-bit unsigned integer

XO (21–30, 22–30, 26–30, or 30)

Secondary opcode field

Table 10-1. Instruction Formats (Continued)

Field Description

10.1.3 Notation and ConventionsThe operation of some instructions is described by a semiformal language (pseudocode).See Table 10-2 for a list of pseudocode notation and conventions used throughout thischapter.

Table 10-2. Pseudocode Notation and Conventions

Notation/Convention Meaning

← Assignment

←iea Assignment of an instruction effective address.

¬ NOT logical operator

∗ Multiplication

÷ Division (yielding quotient)

+ Two’s-complement addition

- Two’s-complement subtraction, unary minus

=, ≠ Equals and Not Equals relations

<,≤,>,≥ Signed comparison relations

<U,>U Unsigned comparison relations

? Unordered comparison relation

&, | AND, OR logical operators

|| Used to describe the concatenation of two values (i.e., 010 || 111 is the same as 010111)

⊕ , ≡ Exclusive-OR, Equivalence logical operators ((a ≡ b) = (a ⊕ ¬ b))

b'nnnn' A number expressed in binary format

x'nnnn' A number expressed in hexadecimal format

(rA|0) The contents of rA if the rA field has the value 1–31, or the value 0 if the rA field is 0

. (period) As the last character of an instruction mnemonic, a period (.) means that the instruction updates the condition register field.

CEIL(x) Least integer ≥ x

DOUBLE(x) Result of converting x form floating-point single format to floating-point double format.

EXTS(x) Result of extending x on the left with sign bits

GPR(x) General purpose register x

MASK(x, y) Mask having 1s in positions x through y (wrapping if x > y) and 0s elsewhere

MEM(x, y) Contents of y bytes of memory starting at address x

ROTL[32](x, y) Result of rotating the 64-bit value x||x left y positions, where x is 32 bits long

SINGLE(x) Result of converting x from floating-point double format to floating-point single format

SPR(x) Special purpose register x

Precedence rules for pseudocode operators are summarized in Table 10-3.

xn x is raised to the nth power

(n)x The replication of x, n times (i.e., x concatenated to itself n-1 times). (n)0 and (n)1 are special cases

x[n] n is a bit or field within x, where x is a register

TRAP Invoke the system trap handler

Undefined An undefined value. The value may vary from one implementation to another, and from one execution to another on the same implementation.

Characterization Reference to the setting of status bits, in a standard way that is explained in the text

CIA Current instruction address, which is the 32-bit address of the instruction being described by a sequence of pseudocode. Used by relative branches to set the next instruction address (NIA). Does not correspond to any architected register.

NIA Next instruction address, which is the 32-bit address of the next instruction to be executed (the branch destination) after a successful branch. In pseudocode, a successful branch is indicated by assigning a value to NIA. For instructions which do not branch, the next instruction address is CIA + 4.

if...then...else... Conditional execution, indenting shows range, else is optional

Do Do loop, indenting shows range. “To” and/or “by” clauses specify incrementing an iteration variable, and “while” and/or “until” clauses give termination conditions, in the usual manner.

Leave Leave innermost do loop, or do loop described in leave statement

Table 10-3. Precedence Rules

Operators Associativity

x[n], function evaluation Left to right

(n)x or replication,x(n) or exponentiation

Right to left

unary -, ¬ Right to left

∗ , ÷ Left to right

+,- Left to right

|| Left to right

=,≠,<,≤,>,≥,<U,>U,? Left to right

&,⊕,≡ Left to right

| Left to right

– (range) None

← None

Table 10-2. Pseudocode Notation and Conventions (Continued)

Notation/Convention Meaning


Note that operators higher in Table 10-3 are applied before those lower in the table.Operators at the same level in the table associate from left to right, from right to left, or notat all, as shown.

10.2 Instruction SetThe remainder of this chapter lists and describes the instruction set for the 601. Theinstructions are listed in alphabetical order by mnemonic and include those instructions thatare specific to the 601 that are not specified as part of the PowerPC architecture. Figure 10-1shows the format for each instruction description page.

Figure 10-1. Instruction Description

Note in Figure 10-1 that the execution unit that executes the instruction may not be the samefor other PowerPC processors.

addx addxAdd Integer Unit

add rD,rA,rB (OE=0 Rc=0)

add. rD,rA,rB (OE=0 Rc=1)

addo rD,rA,rB (OE=1 Rc=0)

addo. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: cax, cax., caxo, caxo.]

rD ← (rA) + (rB)

The sum (rA) + (rB) is placed into rD.

Other registers altered:

• Condition Register (CR0 Field):

Affected: LT, GT, EQ, SO (if Rc=1)

• XER:

Affected: SO, OV (if OE=1)

0 5 6 10 11 15 16 20 21 22 30 3131 D A B OE 266 Rc

Instruction name

Instruction syntax

Equivalent POWER mnemonics

Instruction encoding

Pseudocode description of instruction operationText description of instruction operationRegisters altered by instruction


absx POWER Architecture Instruction absxAbsolute Integer Unit

abs rD,rA (OE=0 Rc=0)abs. rD,rA (OE=0 Rc=1)abso rD,rA (OE=1 Rc=0)abso. rD,rA (OE=1 Rc=1)

This instruction is not part of the PowerPC architecture.

The absolute value |(rA)| is placed into rD. If rA contains the most negative number (i.e.,x'8000 0000'), the result of the instruction is the most negative number and sets XER[OV]if overflow signaling is enabled.




• XER:


Note: This instruction is specific to the 601.

0 5 6 10 11 15 16 20 21 22 30 31

Reserved

31 D A 0 0 0 0 0 OE 360 Rc


addx addxAdd Integer Unit

add rD,rA,rB (OE=0 Rc=0)add. rD,rA,rB (OE=0 Rc=1) addo rD,rA,rB (OE=1 Rc=0) addo. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: cax, cax., caxo, caxo.]

rD ← (rA) + (rB)





• XER:


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 266 Rc


addcx addcxAdd Carrying Integer Unit

addc rD,rA,rB (OE=0 Rc=0)addc. rD,rA,rB (OE=0 Rc=1)addco rD,rA,rB (OE=1 Rc=0)addco. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: a, a., ao, ao.]

rD ← (rA) + (rB)





• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 10 Rc


addex addexAdd Extended Integer Unit

adde rD,rA,rB (OE=0 Rc=0) adde. rD,rA,rB (OE=0 Rc=1) addeo rD,rA,rB (OE=1 Rc=0) addeo. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: ae, ae., aeo, aeo.]

rD ← (rA) + (rB) + XER[CA]

The sum (rA) + (rB) + XER[CA] is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 138 Rc


addi addiAdd Immediate Integer Unit

addi rD,rA,SIMM

[POWER mnemonic: cal]

if rA=0 then rD←EXTS(SIMM)else rD←(rA)+EXTS(SIMM)

The sum (rA| 0) + SIMM is placed into rD.


• None


subi rA,rB,value equivalent to addi rD,rA,-value

0 5 6 10 11 15 16 31

14 D A SIMM


addic addicAdd Immediate Carrying Integer Unit

addic rD,rA,SIMM

[POWER mnemonic: ai]

rD ← (rA) + EXTS(SIMM)

The sum (rA) + SIMM is placed into rD.


• XER:

Affected: CA

0 5 6 10 11 15 16 31

12 D A SIMM


addic. addic. Add Immediate Carrying and Record Integer Unit

addic. rD,rA,SIMM

[POWER mnemonic: ai.]

rD ← (rA) + EXTS(SIMM)

The sum (rA) + SIMM is placed into rD.



Affected: LT, GT, EQ, SO

• XER:

Affected: CA

0 5 6 10 11 15 16 31

13 D A SIMM


addis addis Add Immediate Shifted Integer Unit

addis rD,rA,SIMM

[POWER mnemonic: cau]

if rA=0 then rD←(SIMM || (16)0)else rD←(rA)+(SIMM || (16)0)

The sum (rA| 0) + (SIMM || x'0000') is placed into rD.


• None

0 5 6 10 11 15 16 31

15 D A SIMM


addmex addmexAdd to Minus One Extended Integer Unit

addme rD,rA (OE=0 Rc=0)addme. rD,rA (OE=0 Rc=1)addmeo rD,rA (OE=1 Rc=0)addmeo. rD,rA (OE=1 Rc=1)

[POWER mnemonics: ame, ame., ameo, ameo.]

rD ← (rA) + XER[CA] - 1

The sum (rA)+XER[CA]+x'FFFFFFFF' is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

Reserved

31 D A 0 0 0 0 0 OE 234 Rc


addzex addzexAdd to Zero Extended Integer Unit

addze rD,rA (OE=0 Rc=0)addze. rD,rA (OE=0 Rc=1)addzeo rD,rA (OE=1 Rc=0)addzeo. rD,rA (OE=1 Rc=1)

[POWER mnemonics: aze, aze., azeo, azeo.]

rD ← (rA) + XER[CA]

The sum (rA)+XER[CA] is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

Reserved

31 D A 0 0 0 0 0 OE 202 Rc


andx andxAND Integer Unit

and rA,rS,rB (Rc=0)and. rA,rS,rB (Rc=1)

rA ← (rS) & (rB)

The contents of rS is ANDed with the contents of rB and the result is placed into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 28 Rc


andcx andcxAND with Complement Integer Unit

andc rA,rS,rB (Rc=0)andc. rA,rS,rB (Rc=1)

rA←(rS)+ ¬ (rB)

The contents of rS is ANDed with the one’s complement of the contents of rB and the resultis placed into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 60 Rc


andi. andi.AND Immediate Integer Unit

andi. rA,rS,UIMM

[POWER mnemonic: andil.]

)

rA ← (rS) & ((48)0 || UIMM)

The contents of rS is ANDed with x'0000' || UIMM and the result is placed into rA.




0 5 6 10 11 15 16 31

28 S A UIMM


andis. andis. AND Immediate Shifted Integer Unit

andis. rA,rS,UIMM

[POWER mnemonic: andiu.]

rA ← (rS) + ((32)0 || UIMM || (16)0)

The contents of rS is ANDed with x'0000_0000' || UIMM || x'0000' and the result is placedinto rA.




0 5 6 10 11 15 16 31

29 S A UIMM


bx bxBranch Branch Processing Unit

b target_addr (AA=0 LK=0)ba target_addr (AA=1 LK=0)bl target_addr (AA=0 LK=1) bla target_addr (AA=1 LK=1)

if AA, then NIA ←iea EXTS(LI || b'00')else NIA ←iea CIA+EXTS(LI || b'00')if LK, then

LR ←iea CIA+4

target_addr specifies the branch target address.

If AA=0, then the branch target address is the sum of LI || b'00' sign-extended and theaddress of this instruction.

If AA=1, then the branch target address is the value LI || b'00' sign-extended.

If LK=1, then the effective address of the instruction following the branch instruction isplaced into the link register.


Affected: Link Register (LR) (if LK=1)

0 5 6 29 30 31

18 LI AA LK


bcx bcxBranch Conditional Branch Processing Unit

bc BO,BI,target_addr (AA=0 LK=0)bca BO,BI,target_addr (AA=1 LK=0)bcl BO,BI,target_addr (AA=0 LK=1)bcla BO,BI,target_addr (AA=1 LK=1)

if ¬ BO[2], then CTR ← CTR-1ctr_ok ← BO[2] | ((CTR≠0) ⊕ BO[3])cond_ok ← BO[0] | (CR[BI] ≡ BO[1])if ctr_ok & cond_ok, then

if AA, then NIA ←iea EXTS(BD || b'00') else NIA ←iea CIA+EXTS(BD || b'00')

if LK, then LR ←iea CIA+4

The BI field specifies the bit in the Condition Register (CR) to be used as the condition ofthe branch. The BO field is used as described above.

target_addr specifies the branch target address.

If AA=0, the branch target address is the sum of BD || b'00' sign-extended and the addressof this instruction.

If AA=1, the branch target address is the value BD || b'00' sign-extended.

If LK=1, the effective address of the instruction following the branch instruction is placedinto the link register.


Affected: Count Register (CTR) (if BO[2]=0)



blt target equivalent to bc 12,0,target

bne cr2,target equivalent to bc 4,10,target

bdnz target equivalent to bc 16,0,target

0 5 6 10 11 15 16 29 30 31

16 BO BI BD AA LK


bcctrx bcctrxBranch Conditional to Count Register Branch Processing Unit

bcctr BO,BI (LK=0)bcctrl BO,BI (LK=1)

[POWER mnemonics: bcc, bccl]

cond_ok ← BO[0] | (CR[BI] ≡ BO[1])if cond_ok then

NIA ←iea CTR || b'00' if LK then

LR ←iea CIA+4

The BI field specifies the bit in the condition register to be used as the condition of thebranch. The BO field is used as described above, and the branch target address is CTR[0–29] || b'00'.

If LK=1, the effective address of the instruction following the branch instruction is placedinto the link register.

If the “decrement and test CTR” option is specified (BO[2]=0), the instruction form isinvalid.

In the case of BO[2]=0 on the 601, the decremented count register is tested for zero andbranches based on this test, but instruction fetching is directed to the address specified bythe nondecremented version of the count register. The use of this invalid form of the bcctrxinstruction is not recommended. This description is provided for informational purposesonly.




bltctr equivalent to bcctr 12,0

bnectr cr2 equivalent to bcctr 4,10

0 5 6 10 11 15 16 20 21 30 31

Reserved

19 BO BI 0 0 0 0 0 528 LK


bclrx bclrxBranch Conditional to Link Register Branch Processing Unit

bclr BO,BI (LK=0)bclrl BO,BI (LK=1)

[POWER mnemonics: bcr, bcrl]

if ¬ BO[2] then CTR ← CTR-1ctr_ok ← BO[2] | ((CTR≠0) ⊕ BO[3])cond_ok ← BO[0] | (CR[BI] ≡ BO[1])if ctr_ok & cond_ok then

NIA ←iea LR || b'00' if LK then

LR ←iea CIA+4

The BI field specifies the bit in the condition register to be used as the condition of thebranch. The BO field is used as described above, and the branch target address is LR[0–29] || b'00'.

If LK=1 then the effective address of the instruction following the branch instruction isplaced into the link register.


Affected: Count Register (CTR) (if BO[2]=0)



bltlr equivalent to bclr 12,0

bnelr cr2 equivalent to bclr 4,10

bdnzlr equivalent to bclr 16,0

0 5 6 10 11 15 16 20 21 30 31

Reserved

19 BO BI 0 0 0 0 0 16 LK


clcs POWER Architecture Instruction clcsCache Line Compute Size Integer Unit

clcs rD,rA

This instruction is not part of the PowerPC architecture.This instruction places the cache line size specified by rA into rD, according to thefollowing:

The value placed in rD shall be 64 for valid values of rA.



Affected: Undefined (if Rc=1)

(rA) Line Size Returned in rD

00xxx Undefined

010xx Undefined

01100 Instruction Cache Line Size (64)

01101 Data Cache Line Size (64)

01110 Minimum Line Size (64)

01111 Maximum Line Size (64)

1xxxx Undefined

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 D A 0 0 0 0 0 531 Rc

cmp cmp Compare Integer Unit

cmp crfD,L,rA,rB

a ← (rA)b ← (rB)if a b then c ← b'010'else c ← b'001'CR[4∗ crfD–4∗ crfD+3] ← c || XER[SO]

The contents of rA is compared with the contents of rB, treating the operands as signedintegers. The result of the comparison is placed into CR Field crfD.

The L operand controls whether the instruction operands are treated as 64- or 32-bitoperands, with L=0 indicating 32-bit operands and L=1 indicating 64-bit operands. Thestate of the L operand does not affect the operation of the 601.


• Condition Register (CR Field specified by operand crfD):


0 5 6 8 9 10 11 15 16 20 21 30 31

Reserved

B 0 031 crfD 0 L A

cmpi cmpiCompare Immediate Integer Unit

cmpi crfD,L,rA,SIMM

a ← (rA)if a < EXTS(SIMM) then c ← b'100'else if a > EXTS(SIMM) then c ← b'010'else c ← b'001'CR[4∗ crfD–4∗ crfD+3] ← c || XER[SO]

The contents of rA is compared with the sign-extended value of the SIMM field, treatingthe operands as signed integers. The result of the comparison is placed into CR Field crfD.





0 5 6 8 9 10 11 15 16 31

Reserved

SIMM11 crfD 0 L A

cmpl cmplCompare Logical Integer Unit

cmpl crfD,L,rA,rB

a ← (rA)b ← (rB)if a U b then c ← b'010'else c ← b'001'CR[4∗ crfD–4∗ crfD+3] ← c || XER[SO]

The contents of rA is compared with the contents of rB, treating the operands as unsignedintegers. The result of the comparison is placed into CR Field crfD.





0 5 6 8 9 10 11 15 16 20 21 31

Reserved

31 crfD 0 L A B 32 0

cmpli cmpli Compare Logical Immediate Integer Unit

cmpli crfD,L,rA,UIMM

a ← (rA)if a U ((48)0 || UIMM) then c ← b'010'else c ← b'001'CR[4∗ crfD–4∗ crfD+3] ← c || XER[SO]

The contents of rA is compared with x'0000' || UIMM, treating the operands as unsignedintegers. The result of the comparison is placed into CR Field crfD.





0 5 6 8 9 10 11 15 16 31

Reserved

UIMM10 crfD 0 L A

cntlzwx cntlzwxCount Leading Zeros Word Integer Unit

cntlzw rA,rS (Rc=0)cntlzw. rA,rS (Rc=1)

[POWER mnemonics: cntlz, cntlz.]

n ← 0do while n < 32

if rS[n]=1 then leaven ← n+1

rA ← n

A count of the number of consecutive zero bits starting at bit 0 of rS is placed into rA. Thisnumber ranges from 0 to 32, inclusive.




For count leading zeros instructions, if Rc=1 then LT is cleared to zero in the CR0 field.

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 S A 0 0 0 0 0 26 Rc


crand crandCondition Register AND Integer Unit

crand crbD,crbA,crbB

CR[crbD] ← CR[crbA] & CR[crbB]

The bit in the condition register specified by crbA is ANDed with the bit in the conditionregister specified by crbB. The result is placed into the condition register bit specified bycrbD.


• Condition Register:

Affected: Bit specified by operand crbD

0 5 6 10 11 15 16 20 21 30 31

Reserved

19 crbD crbA crbB 257 0


crandc crandcCondition Register AND with Complement Integer Unit

crandc crbD,crbA,crbB

CR[crbD] ← CR[crbA] & ¬ CR[crbB]

The bit in the condition register specified by crbA is ANDed with the complement of thebit in the condition register specified by crbB and the result is placed into the conditionregister bit specified by crbD.




0 5 6 10 11 15 16 20 21 30 31

Reserved



creqv creqvCondition Register Equivalent Integer Unit

creqv crbD,crbA,crbB

CR[crbD] ← CR[crbA] ≡ CR[crbB]

The bit in the condition register specified by crbA is XORed with the bit in the conditionregister specified by crbB and the complemented result is placed into the condition registerbit specified by crbD.




0 5 6 10 11 15 16 20 21 30 31

Reserved



crnand crnandCondition Register NAND Integer Unit

crnand crbD,crbA,crbB

CR[crbD] ← ¬ (CR[crbA] & CR[crbB])

The bit in the condition register specified by crbA is ANDed with the bit in the conditionregister specified by crbB and the complemented result is placed into the condition registerbit specified by crbD.




0 5 6 10 11 15 16 20 21 30 31

Reserved



crnor crnorCondition Register NOR Integer Unit

crnor crbD,crbA,crbB

CR[crbD] ← ¬ (CR[crbA] | CR[crbB])

The bit in the condition register specified by crbA is ORed with the bit in the conditionregister specified by crbB and the complemented result is placed into the condition registerbit specified by crbD.




0 5 6 10 11 15 16 20 21 30 31

Reserved



cror crorCondition Register OR Integer Unit

cror crbD,crbA,crbB

CR[crbD] ← CR[crbA] | CR[crbB]

The bit in the condition register specified by crbA is ORed with the bit in the conditionregister specified by crbB. The result is placed into the condition register bit specified bycrbD.




0 5 6 10 11 15 16 20 21 30 31

Reserved



crorc crorcCondition Register OR with Complement Integer Unit

crorc crbD,crbA,crbB

CR[crbD] ← CR[crbA] | ¬ CR[crbB]

The bit in the condition register specified by crbA is ORed with the complement of thecondition register bit specified by crbB and the result is placed into the condition registerbit specified by crbD.




0 5 6 10 11 15 16 20 21 30 31

Reserved



crxor crxorCondition Register XOR Integer Unit

crxor crbD,crbA,crbB

CR[crbD] ← CR[crbA] ⊕ CR[crbB]

The bit in the condition register specified by crbA is XORed with the bit in the conditionregister specified by crbB and the result is placed into the condition register specified bycrbD.



Affected: Bit specified by crbD

0 5 6 10 11 15 16 20 21 30 31

Reserved



dcbf dcbfData Cache Block Flush Integer Unit

dcbf rA,rB

EA is the sum (rA|0)+(rB).

The action taken depends on the memory mode associated with the target address, and onthe state of the block. The list below describes the action taken for the various cases. Theactions described will be executed regardless of whether the page or block containing theaddressed byte is designated as write-through or if it is in caching-inhibited or cachingallowed mode.

• Coherency Required (WIM = xx1)

— Unmodified Block—Invalidates copies of the block in the caches of all processors.

— Modified Block—Copies the block to memory. Invalidates copies of the block in the caches of all processors.

— Absent Block—If modified copies of the block are in the caches of other processors, causes them to be copied to memory and invalidated. If unmodified copies are in the caches of other processors, causes those copies to be invalidated.

• Coherency Not Required (WIM = xx0)

— Unmodified Block—Invalidates the block in the processor’s cache.

— Modified Block—Copies the block to memory. Invalidates the block in the processor’s cache.

— Absent Block—Does nothing.

This instruction operates as a load from the addressed byte with respect to addresstranslation and protection.

If EA specifies a memory address for which SR[T]=1, the instruction is treated as a no-op.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 86 031 0 0 0 0 0 A


dcbi dcbi Data Cache Block Invalidate Integer Unit

dcbi rA,rB


The action taken is dependent on the memory mode associated with the target, and the stateof the block. The list below describes the action to take if the block containing the byteaddressed by EA is or is not in the cache. The actions described must be executed regardlessof whether the page containing the addressed byte is in caching-inhibited or caching-allowed mode. This is a supervisor-level instruction.

• Coherency Required (WIM = xx1)

— Unmodified Block—Invalidates copies of the block in the caches of all processors.

— Modified Block—Invalidates copies of the block in the caches of all processors. (Discards the modified contents.)

— Absent Block—If copies are in the caches of any other processor, causes the copies to be invalidated. (Discards any modified contents.)

• Coherency Not Required (WIM = xx0)

— Unmodified Block—Invalidates the block in the local cache.

— Modified Block—Invalidates the block in the local cache. (Discards the modified contents.)

— Absent Block—No action is taken.

This instruction operates as a store to the addressed byte with respect to address translationand protection. The reference and change bits are modified appropriately. If EA specifies amemory address for which SR[T]=1, the instruction is treated as a no-op.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 470 031 0 0 0 0 0 A


dcbst dcbstData Cache Block Store Integer Unit

dcbst rA,rB


If the block containing the byte addressed by EA is in coherency required mode, and ablock containing the byte addressed by EA is in the data cache of any processor and hasbeen modified, the writing of it to main memory is initiated.

If the block containing the byte addressed by EA is in coherency not required mode, and ablock containing the byte addressed by EA is in the data cache of this processor and hasbeen modified, the writing of it to main memory is initiated.

The function of this instruction is independent of the write-through and cachinginhibited/allowed modes of the page or block containing the byte addressed by EA.

This instruction operates as a load from the addressed byte with respect to addresstranslation and protection.

If the EA specifies a memory address for an I/O controller interface segment (segmentregister T-bit=1), the dcbst instruction operates as a no-op.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 54 031 0 0 0 0 0 A


dcbt dcbtData Cache Block Touch Integer Unit

dcbt rA,rB


This instruction is a hint that performance will probably be improved if the blockcontaining the byte addressed by EA is fetched into the data cache, because the programwill probably soon load from the addressed byte. Executing dcbt does not cause anyexceptions to be invoked.

This instruction operates as a load from the addressed byte with respect to addresstranslation and protection except that no exception occurs in the case of a translation faultor protection violation.

If the EA specifies a memory address for which SR[T]=1, the instruction is treated as a no-op.

The purpose of this instruction is to allow the program to request a cache block fetch beforeit is actually needed by the program. The program can later perform loads to put data intoregisters. However, the processor is not obliged to load the addressed block into the datacache. If the sector is loaded, it will be either in shared state or exclusive unmodified state.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 278 031 0 0 0 0 0 A


dcbtst dcbtstData Cache Block Touch for Store Integer Unit

dcbtst rA,rB


This instruction is a hint that performance will probably be improved if the blockcontaining the byte addressed by EA is fetched into the data cache, because the programwill probably soon store into the addressed byte. Executing dcbtst does not cause anyexceptions to be invoked.

This instruction operates as load from the addressed byte with respect to address translationand protection, except that no exception occurs in the case of a translation fault orprotection violation. Since dcbtst does not modify memory, it is not recorded as a store (thechange (C) bit is not modified in the page tables).

If the EA specifies a memory address for which SR[T]=1, the instruction is treated as a no-op.

The dcbtst instruction behaves exactly like the dcbt instruction as implemented on the 601.

The purpose of this instruction is to allow the program to schedule a cache block fetchbefore it is actually needed by the program. The program can later perform stores to putdata into memory. However the processor is not obliged to load the addressed block intothe data cache.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 246 031 0 0 0 0 0 A


dcbz dcbzData Cache Block Set to Zero Integer Unit

dcbz rA,rB

[POWER mnemonic: dclz]


If the block containing the byte addressed by EA is in the data cache, all bytes of the blockare cleared to zero.

If the block containing the byte addressed by EA is not in the data cache and thecorresponding page is caching allowed, the block is allocated in the data cache withoutfetching the block from main memory, and all bytes of the block are set to zero.

If the page containing the byte addressed by EA is caching inhibited or write-through, thenthe alignment exception handler is invoked and the handler should clear to zero all bytes ofthe area of memory that corresponds to the addressed block. If the block containing the byteaddressed by EA is in coherency required mode, and the block exists in the data cache(s)of any other processor(s), it is kept coherent in those caches.

This instruction is treated as a store to the addressed byte with respect to address translationand protection.

If the EA specifies a memory address for an I/O controller interface segment (segmentregister T-bit=1), the dcbz instruction is treated as a no-op.

See Chapter 5, “Exceptions” for a discussion about a possible delayed machine checkexception that can occur by use of dcbz if the operating system has set up an incorrectmemory mapping.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 1014 031 0 0 0 0 0 A


divx POWER Architecture Instruction divxDivide Integer Unit

div rD,rA,rB (OE=0 Rc=0)div. rD,rA,rB (OE=0 Rc=1)divo rD,rA,rB (OE=1 Rc=0)divo. rD,rA,rB (OE=1 Rc=1)


The quotient [(rA)||(MQ)]÷(rB) is placed into rD. The remainder is placed in the MQregister. The remainder has the same sign as the dividend, except that a zero quotient or azero remainder is always positive. The results obey the equation:

dividend=(divisor x quotient)+remainder

where dividend is the original (rA)||(MQ), divisor is the original (rB), quotient is the final(rD), and remainder is the final (MQ).

If Rc=1, then CR bits LT, GT, and EQ reflect the remainder. If OE=1, then SO and OV areset to one if the quotient cannot be represented in 32 bits. For the case of –231÷ –1, the MQregister is cleared to zero and –231 is placed in rD. For all other overflows, MQ, rD and theCR0 field are undefined (if Rc=1).




• XER:



0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 331 Rc


divsx POWER Architecture Instruction divsxDivide Short Integer Unit

divs rD,rA,rB (OE=0 Rc=0)divs. rD,rA,rB (OE=0 Rc=1)divso rD,rA,rB (OE=1 Rc=0)divso. rD,rA,rB (OE=1 Rc=1)


The quotient (rA)÷(rB) is placed into rD. The remainder is placed in the MQ register. Theremainder has the same sign as the dividend, except that a zero quotient or a zero remainderis always positive. The results obey the equation:

dividend=(divisor∗ quotient)+remainder

where dividend is the original rA, divisor is the original rB, quotient is the final rD, andremainder is the final MQ.

If Rc=1 then the CR bits LT, GT, and EQ reflect the remainder. If OE=1,then SO and OVare set to one if the quotient cannot be represented in 32 bits (e.g., as is the case when thedivisor is zero, or the dividend is –231 and the divisor is –1), the MQ register is cleared tozero and –231 is placed in rD. For all other overflows, MQ, rD and the CR0 field (if Rc=1)are undefined.




• XER:

Affected: SO, OV (If OE=1)


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 363 Rc

divwx divwxDivide Word Integer Unit

divw rD,rA,rB (OE=0 Rc=0)divw. rD,rA,rB (OE=0 Rc=1) divwo rD,rA,rB (OE=1 Rc=0) divwo. rD,rA,rB (OE=1 Rc=1)

dividend ←(rA)divisor ←(rB)rD ← dividend ÷ divisor

Register rA is the 32-bit dividend. Register rB is the 32-bit divisor. A 32-bit quotient isformed and placed into rD. The remainder is not supplied as a result.

Both operands are interpreted as signed integers. The quotient is the unique signed integerthat satisfies the following:

dividend=(quotient times divisor)+r

where

0≤ r < |divisor|

if the dividend is non-negative, and

–|divisor| < r ≤ 0

if the dividend is negative.

If an attempt is made to perform any of the divisions

x'8000 0000' / –1

<anything> / 0

then the contents of rD are undefined as are (if Rc=1) the contents of the LT, GT, and EQbits of the CR0 field. In these cases, if OE=1 then OV is set to 1.

0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 491 Rc





• XER:


The 32-bit signed remainder of dividing rA by rB can be computed as follows, except inthe case that rA=–231 and rB=–1.

divw rD,rA,rB # rD=quotient

mull rD,rD,rB # rD=quotient∗ divisor

subf rD,rD,rA # rD=remainder

divwux divwuxDivide Word Unsigned Integer Unit

divwu rD,rA,rB (OE=0 Rc=0)divwu. rD,rA,rB (OE=0 Rc=1)divwuo rD,rA,rB (OE=1 Rc=0)divwuo. rD,rA,rB (OE=1 Rc=1)

dividend ← (rA)divisor ← (rB)rD ← dividend ÷ divisor

The dividend is the contents of rA. The divisor is the contents of rB. A 32-bit quotient isformed and placed into rD. The remainder is not supplied as a result.

Both operands are interpreted as unsigned integers. The quotient is the unique unsignedinteger that satisfies the following:

dividend=(quotient ∗ divisor)+r

where

0≤ r < divisor.

If an attempt is made to divide by zero, the contents of rD are undefined as are (if Rc=1)the contents of the LT, GT, and EQ bits of the CR0 field. In this case, if OE=1 then OV isset to 1.




• XER:


The 32-bit signed remainder of dividing rA by rB can be computed as follows, except inthe case that rA=–231 and rB=–1.

divwu rD,rA,rB # rD=quotient

mull rD,rD,rB # rD=quotient∗ divisor

subf rD,rD,rA # rD=remainder

0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 459 Rc


dozx POWER Architecture Instruction dozxDifference or Zero Integer Unit

doz rD,rA,rB (OE=0 Rc=0)doz. rD,rA,rB (OE=0 Rc=1)dozo rD,rA,rB (OE=1 Rc=0)dozo. rD,rA,rB (OE=1 Rc=1)


The sum ¬ (rA)+(rB) +1 is placed into rD. If the value in rA is algebraically greater thanthe value in rB, rD is set to zero. If Rc=1, the CR0 field is set to reflect the result placed inrD (i.e., if rD is set to zero, EQ is set to 1). If OE=1, OV can only be set on positiveoverflows.




• XER:



0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 264 Rc


dozi POWER Architecture Instruction doziDifference or Zero Immediate Integer Unit

dozi rD,rA,SIMM


The sum ¬ (rA)+SIMM+1 is placed into rD. If the value in rA is algebraically greater thanthe value of the SIMM field, rD is set to zero.


• None


0 5 6 10 11 15 16 31

9 D A SIMM


eciwx eciwxExternal Control Input Word Indexed Integer Unit

eciwx rD,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)if EAR[E]=1 then

paddr ← address translation of EA send load request for paddr to device identified by EAR[RID] rD ← word from device

else DSISR[11] ← 1 generate data access exception


If EAR[E]=1, a load request for the physical address corresponding to EA is sent to thedevice identified by EAR[RID], bypassing the cache. The word returned by the device isplaced in rD. The EA sent to the device must be word aligned, or the results will beboundedly undefined.

If EAR[E]=0, a data access exception is taken, with bit 11 of DSISR set to 1.

The eciwx instruction is supported for effective addresses that reference ordinary(SR[T]=0) segments, and for EAs mapped by the BAT registers. The eciwx instructionsupport EAs generated when MSR[DT]=0 and MSR[DT]=1 when executed by the 601,while the PowerPC architecture only supports EAs generated when MSR[DT]=1. Theinstruction is treated as a no-op for EAs that correspond to I/O controller interface(SR[T]=1) segments.

The access caused by this instruction is treated as a load from the location addressed by EAwith respect to protection and reference and change recording.

This instruction is defined as an optional instruction by the PowerPC architecture, and maynot be available in all PowerPC implementations.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 D A B 310 0


ecowx ecowxExternal Control Output Word Indexed Integer Unit

ecowx rS,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)if EAR[E]=1 then

paddr ← address translation of EA send store request for paddr to device identified by EAR[RID] send rS to device

else DSISR[11] ← 1 generate data access exception


If EAR[E]=1, a store request for the physical address corresponding to EA and the contentsof rS are sent to the device identified by EAR[RID], bypassing the cache. The EA sent tothe device must be word aligned, or the results will be boundedly undefined.

If EAR[E]=0, a data access exception is taken, with bit 11 of DSISR set to 1.

The ecowx instruction is supported for effective addresses that reference ordinary(SR[T]=0) segments, and for EAs mapped by the BAT registers. The ecowx instructionsupport EAs generated when MSR[DT]=0 and MSR[DT]=1 when executed by the 601,while the PowerPC architecture only supports EAs generated when MSR[DT]=1. Theinstruction is treated as a no-op for EAs that correspond to I/O controller interface(SR[T]=1) segments. The access caused by this instruction is treated as a store to thelocation addressed by EA with respect to protection and reference and change recording.

This instruction is defined as an optional instruction by the PowerPC architecture, and maynot be available in all PowerPC implementations.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 S A B 438 0


eieio eieioEnforce In-Order Execution of I/O Integer Unit

The eieio instruction provides an ordering function for the effects of load and storeinstructions executed by a given processor. Executing an eieio instruction ensures that allmemory accesses previously initiated by the given processor are complete with respect tomain memory before any memory accesses subsequently initiated by the given processoraccess main memory.

The synchronize (sync) and the enforce in-order execution of I/O (eieio) instructions arehandled in the same manner internally to the 601. These instructions delay execution ofsubsequent instructions until all previous instructions have completed to the point that theycan no longer cause an exception, all previous memory accesses are performed globally,and the sync or eieio operation is broadcast onto the 601 bus interface.

eieio orders loads/stores to caching inhibited memory and stores to write-through requiredmemory.


• None

The eieio instruction is intended for use only in performing memory-mapped I/Ooperations and to prevent load/store combining operations in main memory. It can bethought of as placing a barrier into the stream of memory accesses issued by a processor,such that any given memory access appears to be on the same side of the barrier to both theprocessor and the I/O device.

The eieio instruction may complete before previously initiated memory accesses have beenperformed with respect to other processors and mechanisms.

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 854 0


eqvx eqvxEquivalent Integer Unit

eqv rA,rS,rB (Rc=0)eqv. rA,rS,rB (Rc=1)

rA ← ((rS) ≡ (rB))

The contents of rS is XORed with the contents of rB and the complemented result is placedinto rA.




0 5 6 10 11 15 16 21 22 30 31

31 S A B 284 Rc


extsbx extsbxExtend Sign Byte Integer Unit

extsb rA,rS (Rc=0)extsb. rA,rS (Rc=1)

S ← rS[24]rA[24–31] ← rS[24–31]rA[0–23] ← (24)S

The contents of rS[24–31] are placed into rA[24–31]. Bit 24 of rS is placed into rA[0–23].




0 5 6 10 11 15 16 20 21 30 31

Reserved

31 S A 0 0 0 0 0 954 Rc


extshx extshxExtend Sign Half Word Integer Unit

extsh rA,rS (Rc=0)extsh. rA,rS (Rc=1)

[POWER mnemonics: exts, exts.]

S ← rS[16]rA[16–31]← rS[16–31]rA[0–15] ← (16)S

The contents of rS[16–31] are placed into rA[16–31]. Bit 16 of rS is placed into rA[0–15].




0 5 6 10 11 15 16 20 21 30 31

Reserved

31 S A 0 0 0 0 0 922 Rc


fabsx fabsxFloating-Point Absolute Value Floating-Point Unit

fabs frD,frB (Rc=0)fabs. frD,frB (Rc=1)

The contents of frB with bit 0 cleared to zero is placed into frD.



Affected: FX, FEX, VX, OX (if Rc=1)

0 5 6 10 11 15 16 20 21 30 31

Reserved

63 D 0 0 0 0 0 B 264 Rc


faddx faddxFloating-Point Add Floating-Point Unit

fadd frD,frA,frB (Rc=0)fadd. frD,frA,frB (Rc=1)

[POWER mnemonics: fa, fa.]

The floating-point operand in frA is added to the floating-point operand in frB. If the mostsignificant bit of the resultant significand is not a one, the result is normalized. The resultis rounded to the target precision under control of the floating-point rounding control fieldRN of the FPSCR and placed into frD.

Floating-point addition is based on exponent comparison and addition of the twosignificands. The exponents of the two operands are compared, and the significandaccompanying the smaller exponent is shifted right, with its exponent increased by one foreach bit shifted, until the two exponents are equal. The two significands are then addedalgebraically to form an intermediate sum. All 53 bits in the significand as well as all threeguard bits (G, R, and X) enter into the computation.

If a carry occurs, the sum's significand is shifted right one bit position and the exponent isincreased by one. FPSCR[FPRF] is set to the class and sign of the result, except for invalidoperation exceptions when FPSCR[VE]=1.




• Floating-point Status and Control Register:

Affected: FPRF, FR, FI, FX, OX, UX, XX,VXSNAN, VXISI

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

63 D A B 0 0 0 0 0 21 Rc


faddsx faddsxFloating-Point Add Single-PrecisionFloating-Point Unit

fadds frD,frA,frB (Rc=0)fadds. frD,frA,frB (Rc=1)

The floating-point operand in frA is added to the floating-point operand in frB. If the mostsignificant bit of the resultant significand is not a one, the result is normalized. The resultis rounded to the target precision under control of the floating-point rounding control fieldRN of the FPSCR and placed into frD.

Floating-point addition is based on exponent comparison and addition of the twosignificands. The exponents of the two operands are compared, and the significandaccompanying the smaller exponent is shifted right, with its exponent increased by one foreach bit shifted, until the two exponents are equal. The two significands are then addedalgebraically to form an intermediate sum. All 53 bits in the significand as well as all threeguard bits (G, R, and X) enter into the computation.

If a carry occurs, the sum's significand is shifted right one bit position and the exponent isincreased by one. FPSCR[FPRF] is set to the class and sign of the result, except for invalidoperation exceptions when FPSCR[VE]=1.





Affected: FPRF, FR, FI, FX, OX, UX, XX,VXSNAN, VXISI

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

59 D A B 0 0 0 0 0 21 Rc


fcmpo fcmpo Floating-Point Compare Ordered Floating-Point Unit

fcmpo crfD,frA,frB

The floating-point operand in frA is compared to the floating-point operand in frB. Theresult of the compare is placed into CR Field crfD and the FPCC.

If one of the operands is a NaN, either quiet or signaling, then CR Field crfD and the FPCCare set to reflect unordered. If one of the operands is a signaling NaN, then VXSNAN is set,and if invalid operation is disabled (VE=0) then VXVC is set. Otherwise, if one of theoperands is a QNaN then VXVC is set.



Affected: FPCC, FX, VXSNAN, VXVC

0 5 6 8 9 10 11 15 16 20 21 30 31

Reserved

B 32 063 crfD 0 0 A


fcmpu fcmpu Floating-Point Compare Unordered Floating-Point Unit

fcmpu crfD,frA,frB

The floating-point operand in register frA is compared to the floating-point operand inregister frB. The result of the compare is placed into CR Field crfD and the FPCC.

If one of the operands is a NaN, either quiet or signaling, then CR Field crfD and the FPCCare set to reflect unordered. If one of the operands is a signaling NaN, then VXSNAN is set.



Affected: FPCC, FX, VXSNAN

0 5 6 8 9 10 11 15 16 20 21 30 31

Reserved

B 0 063 crfD 0 0 A


fctiwx fctiwxFloating-Point Convert to Integer Word Floating-Point Unit

fctiw frD,frB (Rc=0)fctiw. frD,frB (Rc=1)

The floating-point operand in register frB is converted to a 32-bit signed integer, using therounding mode specified by FPSCR[RN], and placed in bits 32–63 of frD. Bits 0–31 of frDare undefined.

If the contents of frB is greater than 231 – 1, bits 32–63 of frD are set to x '7FFF_FFFF '.

If the contents of frB is less than –231, bits 32–63 of frD are set to x '8000_0000 '.

The conversion is described fully in Section F.2, “Conversion from Floating-Point Numberto Unsigned Fixed-Point Integer Word.”

Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined.FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the resultis inexact.





Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN, VXCVI

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 14 Rc63 D 0 0 0 0 0


fctiwzx fctiwzxFloating-Point Convert to Integer Word with Round toward Zero Floating-Point Unit

fctiwz frD,frB (Rc=0)fctiwz. frD,frB (Rc=1)

The floating-point operand in register frB is converted to a 32-bit signed integer, using therounding mode round toward zero, and placed in bits 32–63 of frD. Bits 0–31 of frD areundefined.

If the operand in frB is greater than 231 – 1, bits 32–63 of frD are set to x '7FFF_FFFF '.

If the operand in frB is less than –231, bits 32–63 of frD are set to x '8000_0000 '.

The conversion is described fully in Section F.2, “Conversion from Floating-Point Numberto Unsigned Fixed-Point Integer Word.”

Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined.FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the resultis inexact.





Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN, VXCVI

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 15 Rc63 D 0 0 0 0 0


fdivx fdivxFloating-Point Divide Floating-Point Unit

fdiv frD,frA,frB (Rc=0)fdiv. frD,frA,frB (Rc=1)

[POWER mnemonics: fd, fd.]

The floating-point operand in register frA is divided by the floating-point operand inregister frB. No remainder is preserved.

If an operand is a denormalized number then it is prenormalized before the operation isstarted. If the most significant bit of the resultant significand is not a one the result isnormalized. The result is rounded to the target precision under control of the floating-pointrounding control field RN of the FPSCR and placed into frD.


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operationexceptions when FPSCR[VE]=1 and zero divide exceptions when FPSCR[ZE]=1.





Affected: FPRF, FR, FI, FX, OX, UX, ZX, XX, VXSNAN, VXIDI, VXZDZ

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

B 0 0 0 0 0 18 Rc63 D A


fdivsx fdivsxFloating-Point Divide Single-Precision Floating-Point Unit

fdivs frD,frA,frB (Rc=0)fdivs. frD,frA,frB (Rc=1)

The floating-point operand in register frA is divided by the floating-point operand inregister frB. No remainder is preserved.



FPSCR[FPRF] is set to the class and sign of the result, except for invalid operationexceptions when FPSCR[VE]=1 and zero divide exceptions when FPSCR[ZE]=1.





Affected: FPRF, FR, FI, FX, OX, UX, ZX, XX, VXSNAN, VXIDI, VXZDZ

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

B 0 0 0 0 0 18 Rc59 D A


fmaddx fmaddxFloating-Point Multiply-Add Floating-Point Unit

fmadd frD,frA,frC,frB (Rc=0)fmadd. frD,frA,frC,frB (Rc=1)

[POWER mnemonics: fma, fma.]

The following operation is performed:

frD ← [(frA)∗ (frC)]+(frB)

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC. The floating-point operand in register frB is added to this intermediate result.


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operationexceptions when FPSCR[VE]=1.





Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ

0 5 6 10 11 15 16 20 21 25 26 30 31

B C 29 Rc63 D A


fmaddsx fmaddsxFloating-Point Multiply-Add Single-Precision Floating-Point Unit

fmadds frD,frA,frC,frB (Rc=0) fmadds. frD,frA,frC,frB (Rc=1)


frD ← [(frA)∗ (frC)]+(frB)

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC. The floating-point operand in register frB is added to this intermediate result.








0 5 6 10 11 15 16 20 21 25 26 30 31

B C 29 Rc59 D A


fmrx fmrxFloating-Point Move Register Floating-Point Unit

fmr frD,frB (Rc=0)fmr. frD,frB (Rc=1)

The contents of register frB is placed into frD.




0 5 6 10 11 15 16 20 21 30 31

Reserved

B 72 Rc63 D 0 0 0 0 0


fmsubx fmsubxFloating-Point Multiply-Subtract Floating-Point Unit

fmsub frD,frA,frC,frB (Rc=0)fmsub. frD,frA,frC,frB (Rc=1)

[POWER mnemonics: fms, fms.]


frD ← [(frA)∗ (frC)] - (frB)

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC. The floating-point operand in register frB is subtracted from this intermediateresult.








0 5 6 10 11 15 16 20 21 25 26 30 31

B C 28 Rc63 D A


fmsubsx fmsubsxFloating-Point Multiply-Subtract Single-Precision Floating-Point Unit

fmsubs frD,frA,frC,frB (Rc=0)fmsubs. frD,frA,frC,frB (Rc=1)


frD ← [(frA)∗ (frC)] - (frB)









0 5 6 10 11 15 16 20 21 25 26 30 31

B C 28 Rc59 D A


fmulx fmulxFloating-Point Multiply Floating-Point Unit

fmul frD,frA,frC (Rc=0)fmul. frD,frA,frC (Rc=1)

[POWER mnemonics: fm, fm.]

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC.


Floating-point multiplication is based on exponent addition and multiplication of thesignificands.






Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXIMZ

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

0 0 0 0 0 C 25 Rc63 D A


fmulsx fmulsxFloating-Point Multiply Single-Precision Floating-Point Unit

fmuls frD,frA,frC (Rc=0)fmuls. frD,frA,frC (Rc=1)

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC.


Floating-point multiplication is based on exponent addition and multiplication of thesignificands.






Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXIMZ

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

0 0 0 0 0 C 25 Rc59 D A


fnabsx fnabsxFloating-Point Negative Absolute Value Floating-Point Unit

fnabs frD,frB (Rc=0)fnabs. frD,frB (Rc=1)

The contents of register frB with bit 0 set to one is placed into frD.




0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

B 136 Rc63 D 0 0 0 0 0


fnegx fnegxFloating-Point Negate Floating-Point Unit

fneg frD,frB (Rc=0)fneg. frD,frB (Rc=1)

The contents of register frB with bit 0 inverted is placed into frD.




0 5 6 10 11 15 16 20 21 30 31

Reserved

B 40 Rc63 D 0 0 0 0 0


fnmaddx fnmaddx Floating-Point Negative Multiply-Add Floating-Point Unit

fnmadd frD,frA,frC,frB (Rc=0)fnmadd. frD,frA,frC,frB (Rc=1)

[POWER mnemonics: fnma, fnma.]


frD ← - ([(frA)∗ (frC)]+(frB))

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC. The floating-point operand in register frB is added to this intermediate result.If an operand is a denormalized number then it is prenormalized before the operation isstarted. If the most significant bit of the resultant significand is not a one the result isnormalized. The result is rounded to the target precision under control of the floating-pointrounding control field RN of the FPSCR, then negated and placed into frD.

This instruction produces the same result as would be obtained by using the floating-pointmultiply-add instruction and then negating the result, with the following exceptions:

• QNaNs propagate with no effect on their sign bit.

• QNaNs that are generated as the result of a disabled invalid operation exception have a sign bit of zero.

• SNaNs that are converted to QNaNs as the result of a disabled invalid operation exception retain the sign bit of the SNaN.







0 5 6 10 11 15 16 20 21 25 26 30 31

B C 31 Rc63 D A


fnmaddsx fnmaddsx Floating-Point Negative Multiply-Add Single-Precision Floating-Point Unit

fnmadds frD,frA,frC,frB (Rc=0)fnmadds. frD,frA,frC,frB (Rc=1)


frD ← -([(frA)∗ (frC)]+(frB))

The floating-point operand in register frA is multiplied by the floating-point operand inregister frC. The floating-point operand in register frB is added to this intermediate result.If an operand is a denormalized number then it is prenormalized before the operation isstarted. If the most significant bit of the resultant significand is not a one the result isnormalized. The result is rounded to the target precision under control of the floating-pointrounding control field RN of the FPSCR, then negated and placed into frD.

This instruction produces the same result as would be obtained by using the floating-pointmultiply-add instruction and then negating the result, with the following exceptions:










0 5 6 10 11 15 16 20 21 25 26 30 31

B C 31 Rc59 D A


fnmsubx fnmsubxFloating-Point Negative Multiply-Subtract Floating-Point Unit

fnmsub frD,frA,frC,frB (Rc=0)fnmsub. frD,frA,frC,frB (Rc=1)[POWER mnemonics: fnms, fnms.

]


frD ← - ([(frA)∗ (frC)] - (frB))


If an operand is a denormalized number, it is prenormalized before the operation is started.If the most significant bit of the resultant significand is not one, the result is normalized.The result is rounded to the target precision under control of the floating-point roundingcontrol field RN of the FPSCR, then negated and placed into frD.

This instruction produces the same result obtained by negating the result of a floatingmultiply-subtract instruction with the following exceptions:






• Condition Register (CR1 Field)




0 5 6 10 11 15 16 20 21 25 26 30 31

B C 30 Rc63 D A


fnmsubsx fnmsubsxFloating-Point Negative Multiply-Subtract Single-Precision

fnmsubs frD,frA,frC,frB (Rc=0)fnmsubs. frD,frA,frC,frB (Rc=1)

)


frD ← -([(frA)∗ (frC)] - (frB))


If an operand is a denormalized number, it is prenormalized before the operation is started.If the most significant bit of the resultant significand is not one, the result is normalized.The result is rounded to the target precision under control of the floating-point roundingcontrol field RN of the FPSCR, then negated and placed into frD.

This instruction produces the same result obtained by negating the result of a floatingmultiply-subtract instruction with the following exceptions:






• Condition Register (CR1 Field)




0 5 6 10 11 15 16 20 21 25 26 30 31

B C 30 Rc59 D A


frspx frspxFloating-Point Round to Single-Precision Floating-Point Unit

frsp frD,frB (Rc=0)frsp. frD,frB (Rc=1)

If it is already in single-precision range, the floating-point operand in register frB is placedinto frD. Otherwise the floating-point operand in register frB is rounded to single-precisionusing the rounding mode specified by FPSCR[RN] and placed into frD.

The rounding is described fully in Appendix F, Section F.1, “Conversion from Floating-Point Number to Signed Fixed-Point Integer Word.”






Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 12 Rc63 D 0 0 0 0 0


fsubx fsubxFloating-Point Subtract Floating-Point Unit

fsub frD,frA,frB (Rc=0)fsub. frD,frA,frB (Rc=1)[POWER mnemonics: fs, fs.]

The floating-point operand in register frB is subtracted from the floating-point operand inregister frA. If the most significant bit of the resultant significand is not a one the result isnormalized. The result is rounded to the target precision under control of the floating-pointrounding control field RN of the FPSCR and placed into frD.

The execution of the floating-point subtract instruction is identical to that of floating-pointadd, except that the contents of frB participates in the operation with its sign bit (bit 0)inverted.






Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

B 0 0 0 0 0 20 Rc63 D A


fsubsx fsubsxFloating-Point Subtract Single-Precision Floating-Point Unit

fsubs frD,frA,frB (Rc=0)fsubs. frD,frA,frB (Rc=1)

The floating-point operand in register frB is subtracted from the floating-point operand inregister frA. If the most significant bit of the resultant significand is not a one the result isnormalized. The result is rounded to the target precision under control of the floating-pointrounding control field RN of the FPSCR and placed into frD.

The execution of the floating-point subtract instruction is identical to that of floating-pointadd, except that the contents of frB participates in the operation with its sign bit (bit 0)inverted.






Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI

0 5 6 10 11 15 16 20 21 25 26 30 31

Reserved

B 0 0 0 0 0 20 Rc59 D A


icbi icbiInstruction Cache Block Invalidate Integer Unit

icbi rA,rB

EA is the sum (rA|0)+(rB)

In other PowerPC processors, if the block containing the byte addressed by EA is incoherency required mode, and a block containing the byte addressed by EA is in theinstruction cache of any processor, the block is made invalid in all such processors, so thatsubsequent references cause the block to be refetched.

Also, if the block containing the byte addressed by EA is in coherency not required mode,and a block containing the byte addressed by EA is in the instruction cache of thisprocessor, the block is made invalid in this processor, so that subsequent references causethe block to be fetched from main memory (or perhaps from a data cache).

Since the 601 has a unified cache, it treats the icbi instruction as a no-op, even to the extentof not validating the EA.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 982 031 0 0 0 0 0 A


isync isyncInstruction Synchronize Integer Unit

isync

[POWER mnemonic: ics]

This instruction waits for all previous instructions to complete and then discards anyfetched instructions, causing subsequent instructions to be fetched (or refetched) frommemory and to execute in the context established by the previous instructions. Thisinstruction has no effect on other processors or on their caches.

This instruction is context synchronizing.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 150 0


lbz lbzLoad Byte and Zero Integer Unit

lbz rD,d(rA)

if rA=0 then b ← 0else b ← (rA)EA ← b+EXTS(d)rD ← (24)0 || MEM(EA, 1)

The effective address is the sum (rA|0) + d. The byte in memory addressed by EA is loadedinto rD[24–31]. Bits rD[0–23] are cleared to 0.


• None

0 5 6 10 11 15 16 31

d34 D A


lbzu lbzuLoad Byte and Zero with Update Integer Unit

lbzu rD,d(rA)

EA ← rA+EXTS(d)rD←(24)0 || MEM(EA, 1)rA←EA

EA is the sum (rA|0) + d. The byte in memory addressed by EA is loaded into rD[24–31].Bits rD[0-23] are cleared to 0.

EA is placed into rA.

If operand rA=0 the 601 does not update register r0, or if rA=rD the load data is loadedinto register rD and the register update is suppressed. The PowerPC architecture definesload with update instructions with operand rA=0 or rA=rD as invalid forms


• None

0 5 6 10 11 15 16 31

d35 D A


lbzux lbzuxLoad Byte and Zero with Update Indexed Integer Unit

lbzux rD,rA,rB

EA ← (rA)+(rB)rD ← (24)0 || MEM(EA, 1)rA ← EA

EA is the sum (rA|0) + (rB). The byte addressed by EA is loaded into rD[24–31]. BitsrD[0–23] are set to 0.


If operand rA=0 the 601 does not update register r0, or if rA=rD the load data is loadedinto register rD and the register update is suppressed. The PowerPC architecture definesload with update instructions with operand rA=0 or rA=rD as invalid forms


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 119 031 D A


lbzx lbzxLoad Byte and Zero Indexed Integer Unit

lbzx rD,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)rD ← (24)0 || MEM(EA, 1)

EA is the sum (rA|0) + (rB). The byte in memory addressed by EA is loaded into rD[24–31].

Bits rD[0–23] are set to 0.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 87 031 D A


lfd lfd Load Floating-Point Double-Precision Integer Unit and

Floating-Point Unit

lfd frD,d(rA)

if rA=0 then b ← 0else b ← (rA)EA ← b+EXTS(d)frD ← MEM(EA, 8)

EA is the sum (rA|0) + d.

The double word in memory addressed by EA is placed into frD.


• None

0 5 6 10 11 15 16 31

d50 D A


lfdu lfdu Load Floating-Point Double-Precision with Update Integer Unit and

Floating-Point Unit

lfdu frD,d(rA)

EA ← rA+EXTS(d)frD ← MEM(EA, 8)rA ← EA




If operand rA=0 the 601 does not update register r0. The PowerPC architecture defines loadwith update instructions with operand rA=0 as an invalid form.


• None

0 5 6 10 11 15 16 31

d51 D A


lfdux lfdux Load Floating-Point Double-Precision with Update Indexed Integer Unit and

Floating-Point Unit

lfdux frD,rA,rB

EA ← (rA)+(rB)frD ← MEM(EA, 8)rA ← EA

EA is the sum (rA|0) + (rB).





• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 631 031 D A


lfdx lfdx Load Floating-Point Double-Precision Indexed Integer Unit and

Floating-Point Unit

lfdx frD,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)frD ← MEM(EA, 8)




• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 599 031 D A


lfs lfs Load Floating-Point Single-Precision Integer Unit and

Floating-Point Unit

lfs frD,d(rA)

if rA=0 then b ← 0else b ← (rA)EA ← b+EXTS(d)frD ← DOUBLE(MEM(EA, 4))


The word in memory addressed by EA is interpreted as a floating-point single-precisionoperand. This word is converted to floating-point double-precision (see Section 3.5.9.1,“Double-Precision Conversion for Floating-Point Load Instructions”) and placed into frD.


• None

0 5 6 10 11 15 16 31

d48 D A


lfsu lfsu Load Floating-Point Single-Precision with Update Integer Unit and

Floating-Point Unit

lfsu frD,d(rA)

EA ← (rA)+EXTS(d)frD ← DOUBLE(MEM(EA, 4))rA ← EA






• None

0 5 6 10 11 15 16 31

d49 D A


lfsux lfsux Load Floating-Point Single-Precision with Update Indexed Integer Unit and

Floating-Point Unit

lfsux frD,rA,rB

EA ← (rA)+(rB)frD ← DOUBLE(MEM(EA, 4))rA ← EA






• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 567 031 D A


lfsx lfsx Load Floating-Point Single-Precision Indexed Integer Unit and

Floating-Point Unit

lfsx frD,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)frD ← DOUBLE(MEM(EA, 4))




• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 535 031 D A


lha lha Load Half Word Algebraic Integer Unit

lha rD,d(rA)

if rA=0 then b ← 0else b ← (rA)EA ← b+EXTS(d)rD ← EXTS(MEM(EA, 2))

EA is the sum (rA|0) + d. The half word in memory addressed by EA is loaded into rD[16–31]. Bits rD[0–15] are filled with a copy of the most significant bit of the loadedhalf word.


• None

0 5 6 10 11 15 16 31

d42 D A


lhau lhauLoad Half Word Algebraic with Update Integer Unit

lhau rD,d(rA)

EA ← (rA)+EXTS(d)rD ← EXTS(MEM(EA, 2))rA ← EA

EA is the sum (rA|0) + d. The half word in memory addressed by EA is loaded into rD[16–31].

Bits rD[0–15] are filled with a copy of the most significant bit of the loaded half word.


If operand rA=0 the 601 does not update register r0, or if rA = rD the load data is loadedinto register rD and the register update is suppressed. The PowerPC architecture definesload with update instructions with operand rA = 0 or rA = rD as invalid forms


• None

0 5 6 10 11 15 16 31

d43 D A


lhaux lhauxLoad Half Word Algebraic with Update Indexed Integer Unit

lhaux rD,rA,rB

EA ← (rA)+(rB)rD ← EXTS(MEM(EA, 2))rA ← EA

EA is the sum (rA|0) + (rB). The half word in memory addressed by EA is loaded intorD[16-31]. Bits rD[0–15] are filled with a copy of the most significant bit of the loaded halfword.




• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 375 031 D A


lhax lhaxLoad Half Word Algebraic Indexed Integer Unit

lhax rD,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)rD ← EXTS(MEM(EA, 2))

EA is the sum (rA|0) + (rB). The half word in memory addressed by EA is loaded intorD[16-31]. Bits rD[0–15] are filled with a copy of the most significant bit of the loaded halfword.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 343 031 D A


lhbrx lhbrxLoad Half Word Byte-Reverse Indexed Integer Unit

lhbrx rD,rA,rB

if rA=0 then b ← 0else b ← (rA)EA ← b+(rB)rD ← (16)0 || MEM(EA+1, 1) || MEM(EA,1)

EA is the sum (rA|0) + (rB). Bits 0–7 of the half word in memory addressed by EA areloaded into rD[24–31]. Bits 8–15 of the half word in memory addressed by EA are loadedinto rD[16–23]. Bits rD[0–15] are cleared to 0.

The PowerPC architecture cautions programmers that some implementations of thearchitecture may run the lhbrx instructions with greater latency than other types of loadinstructions. This is not the case in the 601. This instruction operates with the same latencyas other load instructions.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 790 031 D A


lhz lhzLoad Half Word and Zero Integer Unit

lhz rD,d(rA)

if rA=0 then b←0else b ← rAEA ← b+EXTS(d)rD ← (16)0 || MEM(EA, 2)

EA is the sum (rA|0) + d. The half word in memory addressed by EA is loaded into rD[16–31]. Bits rD[0–15] are cleared to 0.


• None

0 5 6 10 11 15 16 31

d40 D A


lhzu lhzuLoad Half Word and Zero with Update Integer Unit

lhzu rD,d(rA)

EA ← rA+EXTS(d)rD ← (16)0 || MEM(EA, 2)rA ← EA

EA is the sum (rA|0) + d. The half word in memory addressed by EA is loaded into rD[16–31]. Bits rD[0–15] are cleared to 0.


If operand rA=0 the 601 does not update register r0, or if rA = rD the load data is loadedinto register rD and the register update is suppressed. The PowerPC architecture definesload with update instructions with operand rA = 0 or rA = rD as invalid forms.


• None

0 5 6 10 11 15 16 31

d41 D A


lhzux lhzuxLoad Half Word and Zero with Update Indexed Integer Unit

lhzux rD,rA,rB

EA ← (rA)+(rB)rD←(16)0 || MEM(EA, 2)rA←EA

EA is the sum (rA|0) + (rB). The half word in memory addressed by EA is loaded into rD[16–31]. Bits rD[0–15] are cleared to 0.


If operand rA=0 the 601 does not update register r0, or if rA = rD the load data is loadedinto register rD and the register update is suppressed. The PowerPC architecture definesload with update instructions with operand rA = 0 or rA = rD as invalid forms.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 311 031 D A


lhzx lhzxLoad Half Word and Zero Indexed Integer Unit

lhzx rD,rA,rB

if rA=0 then b←0else b←rAEA←b+rBrD←(16)0 || MEM(EA, 2)

The effective address is the sum (rA|0) + (rB). The half word in memory addressed by EAis loaded into rD[16–31]. Bits rD[0–15] are cleared to 0.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 279 031 D A


lmw lmwLoad Multiple Word Integer Unit

lmw rD,d(rA)

[POWER mnemonic: lm]

if rA=0 then b←0else b←rAEA←b+EXTS(d)r←rDdo while r ≤ 31

GPR(r)← MEM(EA, 4)r←r+1EA←EA+4


n=(32-rD).

n consecutive words starting at EA are loaded into GPRs rD through r31. EA must be amultiple of 4; otherwise, the system alignment exception handler is invoked if the loadcrosses a page boundary. For additional information about data alignment exceptions, seechapter 5, section 5.4.3, “Data Access Exception (x'00300').

If rA is in the range of registers specified to be loaded, it will be skipped in the load process.If operand rA=0, the register is not considered as used for addressing, and will be loaded.


• None

In future implementations, this instruction is likely to have greater latency and take longerto execute, perhaps much longer, than a sequence of individual load instructions thatproduce the same results.

Note that on other PowerPC implementations, load and store multiple instructions that arenot on a word boundary either take an alignment exception or generate results that areboundedly undefined.

0 5 6 10 11 15 16 31

d46 D A


lscbxx POWER Architecture Instruction lscbxxLoad String and Compare Byte Indexed Integer Unit

lscbx rD,rA,rB (Rc=0)lscbx. rD,rA,rB (Rc=0)

This instruction is not part of the PowerPC architecture.EA is the sum (rA|0) + (rB). XER[25–31] contains the byte count. Register rD is thestarting register.

n=XER[25–31], which is the number of bytes to be loaded. nr=CEIL(n/4), which is thenumber of registers to receive data.

Starting with the leftmost byte in rD, consecutive bytes in memory addressed by the EA areloaded into rD through rD + nr – 1, wrapping around back through GPR 0 if required, untileither a byte match is found with XER[16–23] or n bytes have been loaded. If a byte matchis found, that byte is also loaded.

Bytes are always loaded left to right in the register. In the case when a match was foundbefore n bytes were loaded, the contents of the rightmost byte(s) not loaded of that registerand the contents of all succeeding registers up to and including rD + nr – 1 are undefined.Any reference made to memory after the matched byte is found will not cause a memoryexception. In the case when a match was not found, the contents of the rightmost byte(s)not loaded of rD + nr – 1 is undefined.

When XER[25–31]=0, the content of rD is undefined.

The count of the number of bytes loaded up to and including the matched byte, if a matchwas found, is placed in XER[25–31]. If there is no match, the contents of XER[25–31] areunchanged.

If rA and rB are in the range of registers specified to be loaded, it will be skipped in theload process. If operand rA=0, the register is not considered as used for addressing, and willbe loaded.

Under certain conditions (for example, segment boundary crossings) the data alignmenterror handler may be invoked. For additional information about data alignment exceptions,see chapter 5, section 5.4.3, “Data Access Exception (x'00300').

0 5 6 10 11 15 16 20 21 30 31

B 277 Rc31 D A


Other registers affected:



• XER:

Affected: XER[25–31]=# of bytes loaded

Note: If Rc=1 and XER[25–31]=0 then the CR0 field is undefined. If Rc=1 and XER[25–31]≠0 then the CR0 field is set as follows:

LT, GT, EQ, SO =b'00' || match || XER(SO)



lswi lswiLoad String Word Immediate Integer Unit

lswi rD,rA,NB

[POWER mnemonic: lsi]

if rA=0 then EA←0else EA←rAif NB=0 then n←32else n←NBr←rD - 1i←32do while n ≥ 0

if i=32 thenr←r+1 (mod 32)GPR(r)←0GPR(r)[i–i+7]←MEM(EA, 1)

i←i+8

The EA is (rA|0).

Let n = NB if NB ≠ 0, n = 32 if NB = 0; n is the number of bytes to load. Let nr = CEIL(n/4);nr is the number of registers to be loaded with data.

n consecutive bytes starting at the EA are loaded into GPRs rD through rD+nr–1. Bytesare loaded left to right in each register. The sequence of registers wraps around to r0 ifrequired. If the four bytes of register rD+nr–1 are only partially filled, the unfilled low-order byte(s) of that register are cleared to 0.

If rA is in the range of registers specified to be loaded, it will be skipped in the load process.If operand rA=0, the register is not considered as used for addressing, and will be loaded.

Under certain conditions (for example, segment boundary crossing) the data alignmenterror handler may be invoked. For additional information about data alignment exceptions,see chapter 5, section 5.4.3, “Data Access Exception (x'00300').


• None


0 5 6 10 11 15 16 20 21 30 31

Reserved

NB 597 031 D A


lswx lswxLoad String Word Indexed Integer Unit

lswx rD,rA,rB

[POWER mnemonic: lsx]

if rA=0 then b←0else b←rAEA←b+rBn←XER[25–31]r←rD - 1i←32do while n > 0

if i=32 thenr←r+1 (mod 32)GPR(r)←0GPR(r)[i–i+7]←MEM(EA, 1)

i←i+8

EA is the sum (rA|0) + (rB). Let n = XER[25–31]; n is the number of bytes to load. Letnr = CEIL(n/4): nr is the number of registers to receive data. If n>0, n consecutive bytesstarting at EA are loaded into GPRs rD through rD + nr – 1.

Bytes are loaded left to right in each register. The sequence of registers wraps aroundthrough r0 if required. If the bytes of rD + nr – 1 are only partially filled, the unfilled low-order byte(s) of that register are cleared to 0. If n=0, the content of rD is undefined.

If rA and rB are in the range of registers specified to be loaded, it will be skipped in theload process. If operand rA = 0, the register is not considered as used for addressing, andwill be loaded.

Under certain conditions (for example, segment boundary crossings) the alignment errorhandler may be invoked. For additional information about alignment exceptions, seeSection 5.4.6, “Alignment Exception (x'00600').”


• None


0 5 6 10 11 15 16 20 21 30 31

Reserved

B 533 031 D A


lwarx lwarxLoad Word and Reserve Indexed Integer Unit

lwarx rD,rA,rB

if rA=0 then b←0else b←rAEA←b+rBRESERVE←1RESERVE_ADDR←func(EA)rD←MEM(EA,4)

EA is the sum (rA|0) + (rB). The word in memory addressed by EA is loaded into rD.

This instruction creates a reservation for use by a store word conditional instruction. Thephysical address computed from the EA is associated with the reservation, and replaces anyaddress previously associated with the reservation.

The EA must be a multiple of 4. If it is not, the alignment exception handler will be invokedif the load crosses a page boundary, or the results will be boundedly undefined.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 20 031 D A


lwbrx lwbrxLoad Word Byte-Reverse Indexed Integer Unit

lwbrx rD,rA,rB[POWER mnemonic: lbrx]

if rA=0 then b←0else b←rAEA←b+rBrD←MEM(EA+3, 1) || MEM(EA+2, 1) || MEM(EA+1, 1) || MEM(EA, 1)

EA is the sum (rA|0)+(rB). Bits 0–7 of the word in memory addressed by EA are loadedinto rD[24–31]. Bits 8–15 of the word in memory addressed by EA are loaded into rD[16–23]. Bits 16–23 of the word in memory addressed by EA are loaded into rD[8–15].Bits 24–31 of the word in memory addressed by EA are loaded into rD[0–7].

The PowerPC architecture cautions programmers that some implementations of thearchitecture may run the lwbrx instructions with greater latency than other types of loadinstructions. This is not the case in the 601. This instruction operates with the same latencyas other load instructions.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 534 031 D A


lwz lwzLoad Word and Zero Integer Unit

lwz rD,d(rA)[POWER mnemonic: l]

if rA=0 then b←0else b←rAEA←b+EXTS(d)rD←MEM(EA, 4)

EA is the sum (rA|0) + d. The word in memory addressed by EA is loaded into rD.


• None

0 5 6 10 11 15 16 31

d32 D A


lwzu lwzuLoad Word and Zero with Update Integer Unit

lwzu rD,d(rA)[POWER mnemonic: lu]

EA ← rA+EXTS(d)rD←MEM(EA, 4)rA←EA

EA is the sum (rA|0) + d. The word in memory addressed by EA is loaded into rD.


If operand rA=0 the 601 does not update register r0, or if rA = rD the load data is loadedinto rD and the register update is suppressed. The PowerPC architecture defines load withupdate instructions with operand rA = 0 or rA = rD as invalid forms.


• None

0 5 6 10 11 15 16 31

d33 D A


lwzux lwzuxLoad Word and Zero with Update Indexed Integer Unit

lwzux rD,rA,rB[POWER mnemonic: lux]

EA ← (rA)+(rB)rD←MEM(EA, 4)rA←EA

EA is the sum (rA|0)+(rB). The word in memory addressed by EA is loaded into rD.




• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 55 031 D A


lwzx lwzxLoad Word and Zero Indexed Integer Unit

lwzx rD,rA,rB

[POWER mnemonic: lx]

if rA=0 then b←0else b←rAEA←b+rBrD←MEM(EA, 4)

EA is the sum (rA|0) + (rB). The word in memory addressed by EA is loaded into rD.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 23 031 D A

maskgx POWER Architecture Instruction maskgxMask Generate Integer Unit

maskg rA,rS,rB (Rc=0)maskg. rA,rS,rB (Rc=1)


Let mstart=rS[27–31], specifying the starting point of a mask of ones. Let mstop=rB[27–31], specifying the end point of the mask of ones.

If mstart < mstop + 1 then

MASK(mstart..mstop) = ones

MASK(all other bits) = zeros

If mstart = mstop = 1 then

MASK(0–31) = ones

If mstart > mstop + 1 then

MASK(mstop + 1..mstart – 1) = zeros

MASK(all other bits) = ones

MASK is then placed in rA.





0 5 6 10 11 15 16 20 21 30 31

B 29 Rc31 S A


maskirx POWER Architecture Instruction maskirxMask Insert from Register Integer Unit

maskir rA,rS,rB (Rc=0)maskir. rA,rS,rB (Rc=1)

This instruction is not part of the PowerPC architecture.Register rS is inserted into rA under control of the mask in rB.





0 5 6 10 11 15 16 20 21 30 31

B 541 Rc31 S A


mcrf mcrfMove Condition Register Field Integer Unit

mcrf crfD,crfS

CR[4∗ crfD–4∗ crfD+3] ← CR[4∗ crfS–4∗ crfS+3]

The contents of condition register field crfS are copied into condition register field crfD.All other condition register fields remain unchanged.

Note that if the link bit (bit 31) is set for this instruction, the PowerPC architecture considersthe instruction to be of an invalid form. Relative to the 601, this instruction executes andthe link register is left in an undefined state.

Note: Use of invalid instruction forms is not recommended. This description is providedfor informational purposes only.


• Condition Register (CR field specified by operand crfD):


0 5 6 8 9 10 11 13 14 15 16 20 21 30 31

Reserved

19 crfD 0 0 crfS 0 0 0 0 0 0 0 0 0


mcrfs mcrfsMove to Condition Register from FPSCR Floating-Point Unit

mcrfs crfD,crfS

The contents of FPSCR field crfS are copied to CR Field crfD. All exception bits copiedare reset to zero in the FPSCR.


• Condition Register (CR Field specified by operand crfS):

Affected: FX, OX (if crfS=0)

Affected: UX, ZX, XX, VXSNAN (if crfS=1)

Affected: VXISI, VXIDI, VXZDZ, VXIMZ (if crfS=2)

Affected: VXVC (if crfS=3)

Affected: VXSOFT, VXSQRT, VXCVI (if crfS=5)

0 5 6 8 9 10 11 13 14 15 16 20 21 30 31

Reserved

63 crfD 0 0 crfS 0 0 0 0 0 0 0 64 0


mcrxr mcrxrMove to Condition Register from XER Integer Unit

mcrxr crfD

CR[4∗ crfD+3]←XER[0–3]XER[0–3]← b'0000'

The contents of XER[0–3] are copied into the condition register field designated by crfD.All other fields of the condition register remain unchanged. XER[0–3] is cleared to zero.


• Condition Register (CR Field specified by crfD operand):


• XER[0–3]

0 5 6 8 9 10 11 15 16 20 21 30 31

Reserved

31 crfD 0 0 0 0 0 0 0 0 0 0 0 0 512 0


mfcr mfcr Move from Condition Register Integer Unit

mfcr rD

rD← CR

The contents of the condition register are placed into rD.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

0 0 0 0 0 19 031 D 0 0 0 0 0


mffsx mffsx Move from FPSCR Floating-Point Unit

mffs frD (Rc=0)mffs. frD (Rc=1)

The contents of the FPSCR are placed into bits 32–63 of register frD. Bits 0–31 of registerfrD are undefined.




POWER Compatibility Note: The PowerPC architecture defines bits 0–31 of floating-point register frD as undefined. In the 601, these bits take on the value x'FFF8_ 0000'.

0 5 6 10 11 15 16 20 21 30 31

Reserved

0 0 0 0 0 583 Rc63 frD 0 0 0 0 0


mfmsr mfmsr Move from Machine State Register Integer Unit

mfmsr rD

rD← MSR

The contents of the MSR are placed into rD.



• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

0 0 0 0 0 83 031 D 0 0 0 0 0


mfspr mfspr Move from Special Purpose Register Integer Unit

mfspr rD,SPR

n←SPR[5–9] || SPR[0–4] rD← SPR(n)

The SPR field denotes a special purpose register, encoded as shown in Table 10-4. Thecontents of the designated special purpose register are placed into rD.

The value of SPR[0] is 1 if and only if reading the register is at the supervisor-level.Execution of this instruction specifying a supervisor-level register when MSR[PR]=1 willresult in a supervisor-level instruction type program exception.

If the SPR field contains a value that is not valid for the 601, the instruction is treated as ano-op. For an invalid instruction form in which SPR[0]=1, if MSR[PR]=1 a supervisor-level instruction type program exception will occur instead of a no-op.


• None

Table 10-4. SPR Encodings for mfspr

SPR1

Register Name AccessDecimal SPR[5–9] SPR[0–4]

0 00000 00000 MQ User

1 00000 00001 XER User

4 00000 00100 RTCU2 User

5 00000 00101 RTCL2 User

6 00000 00110 DEC3 User

8 00000 01000 LR User

9 00000 01001 CTR User

18 00000 10010 DSISR Supervisor

19 00000 10011 DAR Supervisor

22 00000 10110 DEC3 Supervisor

25 00000 11001 SDR1 Supervisor

0 5 6 10 11 20 21 30 31

Reserved

SPR 339 031 D


1Note that the order of the two 5-bit halves of the SPR number is reversed compared with actual instructioncoding. If the SPR field contains any value other than one of these implementation-specific values or one ofthe values shown in Table 3-40, the instruction form is invalid. SPR[0]=1 if and only if the register is beingaccessed at the supervisor level. Execution of this instruction specifying a defined and supervisor-levelregister when MSR[PR]=1 results in a privilege violation type program exception.

For mtspr and mfspr instructions, SPR number coded in assembly language does not appear directly as a10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed inthe instruction, with high-order 5 bits appearing in bits 16–20 of the instruction and low-order 5 bits in bits 11to 15.

SPR encodings for DEC, MQ, RTCL, and RTCU are not part of the PowerPC architecture.

2On the 601, the mfspr instruction for the RTCU and RTCL registers must use these encodings (SPR4 andSPR5, respectively) regardless whether the processor is in supervisor or user mode. The mtspr instruction,which is supervisor-only for the RTCU and RTCL registers, must use the SPR20 and SPR21 encodings,respectively.

26 00000 11010 SRR0 Supervisor


272 01000 10000 SPRG0 Supervisor




282 01000 11010 EAR Supervisor

287 01000 11111 PVR Supervisor

528 10000 10000 BAT0U Supervisor

529 10000 10001 BAT0L Supervisor







1008 11111 10000 Checkstop Register (HID0) Supervisor

1009 11111 10001 Debug Mode Register (HID1) Supervisor




Table 10-4. SPR Encodings for mfspr(Continued)

SPR1

Register Name AccessDecimal SPR[5–9] SPR[0–4]


For forward compatability with other members of the PowerPC microprocessor family the mftb instructionshould be used to obtain the contents of the RTCL and RTCU registers. The mftb instruction is a PowerPCinstruction unimplemented by the 601, and will be trapped by the illegal instruction exception handler, whichcan then issue the appropriate mfspr instructions for reading the RTCL and RTCU registers

3Read access to the DEC register is supervisor-only in the PowerPC architecture, using SPR22. However,the POWER architecture allows user-level read access using SPR6. Note that the SPR6 encoding for theDEC will not be supported by other PowerPC processors.


mfsr mfsr Move from Segment Register Integer Unit

mfsr rD,SR

rD←SEGREG(SR)

The contents of segment register SR is placed into rD.


This instruction is defined only for 32-bit implementations; using it on a 64-bitimplementation causes an illegal instruction type program exception.


• None

0 5 6 10 11 12 15 16 20 21 30 31

Reserved

0 0 0 0 0 595 031 D 0 SR


mfsrin mfsrin Move from Segment Register Indirect Integer Unit

mfsrin rD,rB

rD←SEGREG(rB[0–3])

The contents of the segment register selected by bits 0–3 of rB are copied into rD.


This instruction is defined only for 32-bit implementations. Using it on a 64-bitimplementation causes an illegal instruction exception.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 659 031 D 0 0 0 0 0


mtcrf mtcrf Move to Condition Register Fields Integer Unit

mtcrf CRM,rS

mask←(4)(CRM[0]) || (4)(CRM[1]) ||... (4)(CRM[7])CR←(rS[32–63] & mask) | (CR & ¬ mask)

The contents of rS are placed into the condition register under control of the field maskspecified by CRM. The field mask identifies the 4-bit fields affected. Let i be an integer inthe range 0–7. If CRM(i) = 1, CR Field i (CR bits 4∗ i through 4∗ i+3) is set to the contentsof the corresponding field of the of rS.


• CR fields selected by mask

0 5 6 10 11 12 19 20 21 30 31

Reserved

CRM 0 144 031 S 0


mtfsb0x mtfsb0xMove to FPSCR Bit 0 Integer Unit

mtfsb0 crbD (Rc=0)mtfsb0. crbD (Rc=1)

Bit crbD of the FPSCR is cleared to zero.





Affected: FPSCR bit crbD

Note: Bits 1 and 2 (FEX and VX) cannot be explicitly reset.

0 5 6 10 11 15 16 20 21 30 31

Reserved

0 0 0 0 0 70 Rc63 crb D 0 0 0 0 0


mtfsb1x mtfsb1x Move to FPSCR Bit 1 Integer Unit

mtfsb1 crbD (Rc=0)mtfsb1. crbD (Rc=1)

Bit crbD of the FPSCR is set to one.





FPSCR bit crbD

Note: Bits 1 and 2 (FEX and VX) cannot be explicitly reset.

0 5 6 10 11 15 16 20 21 30 31

Reserved

0 0 0 0 0 38 Rc63 crbD 0 0 0 0 0


mtfsfx mtfsfxMove to FPSCR Fields Integer Unit

mtfsf FM,frB (Rc=0)mtfsf. FM,frB (Rc=1)

Bits 32–63 of register frB are placed into the FPSCR under control of the field maskspecified by FM. The field mask identifies the 4-bit fields affected. Let i be an integer in therange 0–7. If FM(i) = 1, FPSCR Field i (FPSCR bits 4∗ i through 4∗ i+3) is set to the contentsof the corresponding field of the low-order 32 bits of register frB.

The other PowerPC implementations, the move to FPSCR fields (mtfsf) instruction mayperform more slowly when only a portion of the fields are updated.





FPSCR fields selected by mask

Updating fewer than all eight fields of the FPSCR may have substantially poorerperformance on some implementations than updating all the fields.

When FPSCR[0–3] is specified, bits 0 (FX) and 3 (OX) are set to the values of frB[32] andfrB[35] (that is, even if this instruction causes OX to change from 0 to 1, FX is set fromfrB[32] and not by the usual rule that FX is set to 1 when an exception bit changes from 0to 1). Bits 1 and 2 (FEX and VX) are set according to the usual rule and not from frB[33–34].

0 5 6 7 14 15 16 20 21 30 31

Reserved

63 0 FM 0 frB 711 Rc


mtfsfix mtfsfix Move to FPSCR Field Immediate Integer Unit

mtfsfi crfD,IMM (Rc=0)mtfsfi. crfD,IMM (Rc=1)

The value of the IMM field is placed into FPSCR field crfD.





FPSCR field crfD

When FPSCR[0–3] is specified, bits 0 (FX) and 3 (OX) are set to the values of IMM[0] andIMM[3] (that is, even if this instruction causes OX to change from 0 to 1, FX is set fromIMM[0] and not by the usual rule that FX is set to 1 when an exception bit changes from 0to 1). Bits 1 and 2 (FEX and VX) are set according to the usual rule, given in Section 2.2.3,“Floating-Point Status and Control Register (FPSCR)” and not from IMM[1–2].

0 5 6 8 9 10 11 12 15 16 19 20 21 30 31

Reserved

63 crfD 0 0 0 0 0 0 0 IMM 0 134 Rc


mtmsr mtmsr Move to Machine State Register Integer Unit

mtmsr rS

MSR←rS[0–31]

The contents of rS are placed into the MSR.

This is a supervisor-level instruction and execution synchronizing.


• MSR

0 5 6 10 11 15 16 20 21 30 31

Reserved

0 0 0 0 0 146 031 S 0 0 0 0 0


mtspr mtspr Move to Special Purpose Register Integer Unit

mtspr SPR,rS

n = SPR[5–9] || SPR[0–4]SPREG(n)←rS[0–31]

The SPR field denotes a special purpose register, encoded as shown in Table 10-4. Thecontents of rS are placed into the designated special purpose register.

The value of SPR[0] is 1 if and only if writing the register is a supervisor-level operation.Execution of this instruction specifying a defined and supervisor-level register whenMSR[PR]=1 results in a supervisor-level instruction exception.

If the SPR field contains an invalid value, the instruction is treated as a no-op. For an invalidinstruction form in which SPR[0]=1, if MSR[PR]=1 a supervisor-level instructionexception will occur instead of a no-op.


• None

Table 10-4 lists the SPR encodings for the 601.

Table 10-5. SPR Encodings for mtspr

SPR1Register Name Access

Decimal SPR[5–9] SPR[0–4]

0 00000 00000 MQ User

1 00000 00001 XER User

8 00000 01000 LR User

9 00000 01001 CTR User

18 00000 10010 DSISR Supervisor

19 00000 10011 DAR Supervisor

20 00000 10100 RTCU2 Supervisor

21 00000 10101 RTCL2 Supervisor

22 00000 10110 DEC3 Supervisor

0 5 6 10 11 20 21 30 31

Reserved

SPR 467 031 S


1Note that the order of the two 5-bit halves of the SPR number is reversed compared with actual instructioncoding. If the SPR field contains any value other than one of these implementation-specific values or one ofthe values shown in Table 3-40, the instruction form is invalid. SPR[0]=1 if and only if the register is beingaccessed at the supervisor level. Execution of this instruction specifying a defined and supervisor-levelregister when MSR[PR]=1 results in a privilege violation type program exception.

For mtspr and mfspr instructions, SPR number coded in assembly language does not appear directly as a10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in theinstruction, with high-order 5 bits appearing in bits 16–20 of the instruction and low-order 5 bits in bits 11–15.

SPR encodings for DEC, MQ, RTCL, and RTCU are not part of the PowerPC architecture.2On the 601, the mfspr instruction for the RTCU and RTCL registers must use these encodings (SPR4 andSPR5, respectively) regardless whether the processor is in supervisor or user mode. The mtspr instruction,which is supervisor-only for the RTCU and RTCL registers, must use the SPR20 and SPR21 encodings,respectively.3Read access to the DEC register is supervisor-only in the PowerPC architecture, using SPR22. However,the POWER architecture allows user-level read access using SPR6. Note that the SPR6 encoding for theDEC will not be supported by other PowerPC processors.

25 00000 11001 SDR1 Supervisor







282 01000 11010 EAR Supervisor

528 10000 10000 IBAT0U Supervisor

529 10000 10001 IBAT0L Supervisor







1008 11111 10000 Checkstop Register (HID0) Supervisor

1009 11111 10001 Debug Mode Register (HID1) Supervisor




Table 10-5. SPR Encodings for mtspr(Continued)

SPR1Register Name Access



mtsr mtsr Move to Segment Register Integer Unit

mtsr SR,rS

SEGREG(SR)←(rS)

The contents of rS is placed into SR.


This instruction is defined only for 32-bit implementations. Using it on a 64-bitimplementation causes an illegal instruction type program exception.


• None

0 5 6 10 11 12 15 16 20 21 30 31

Reserved

0 0 0 0 0 210 031 S 0 SR


mtsrin mtsrin Move to Segment Register Indirect Integer Unit

mtsrin rS,rB

[POWER mnemonic: mtsri]

SEGREG(rB[0–3])←(rS)

The contents of rS are copied to the segment register selected by bits 0–3 of rB.


This instruction is defined only for 32-bit implementations. Using it on a 64-bitimplementation causes an illegal instruction exception.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

B 242 031 S 0 0 0 0 0


mulx POWER Architecture Instruction mulxMultiply Integer Unit

mul rD,rA,rB (OE=0 Rc=0)mul. rD,rA,rB (OE=0 Rc=1)mulo rD,rA,rB (OE=1 Rc=0)mulo. rD,rA,rB (OE=1 Rc=1)


Bits 0–31 of the product (rA)∗ (rB) are placed into rD. Bits 32–63 of the product (rA)∗ (rB)are placed into the MQ register.

If Rc=1, then LT,GT and EQ reflect the result in the MQ register (the low order 32 bits). IfOE=1 then SO and OV are set to one if the product cannot be represented in 32 bits.

If the smaller absolute value of the two multipliers is placed in rB, the instruction maycomplete execution more quickly. See Chapter 7, “Instruction Timing,” for additionalinformation about instruction performance.




• XER:



0 5 6 10 11 15 16 20 21 22 30 31

B OE 107 Rc31 D A


mulhwx mulhwx Multiply High Word Integer Unit

mulhw rD,rA,rB (Rc=0)mulhw. rD,rA,rB (Rc=1)

prod[0–63]←rA[32–63]∗ rB[32–63]rD[32–63]←prod[0–31]rD[0–31]←undefined

The contents of rA and of rB are interpreted as 32-bit signed integers. They are multipliedto form a 64-bit signed integer product. The high-order 32 bits of the 64-bit product areplaced into rD.





Reserved

0 5 6 10 11 15 16 20 21 22 30 31

31 D A B 0 75 Rc


mulhwux mulhwux Multiply High Word Unsigned Integer Unit

mulhwu rD,rA,rB (Rc=0)mulhwu. rD,rA,rB (Rc=1)

prod[0–63]←rA[32–63]∗ rB[32–63]rD[32–63]←prod[0–31]rD[0–31]←undefined

The contents of rA and of rB are extracted and interpreted as 32-bit unsigned integers. Theyare multiplied to form a 64-bit unsigned integer product. The high-order 32 bits of the 64-bit product are placed into rD.


This instruction causes the contents of the MQ to become undefined.




Reserved

0 5 6 10 11 15 16 20 21 22 30 31

31 D A B 0 11 Rc


mulli mulli Multiply Low Immediate Integer Unit

mulli rD,rA,SIMM

[POWER mnemonic: muli]

prod[0–48]←rA∗ SIMMrD←prod[16–48]

The low-order 32 bits of the 48-bit product (rA)∗ SIMM are placed into rD. The low-orderbits of the 32-bit product are independent of whether the operands are treated as signed orunsigned integers.


• None

0 5 6 10 11 15 16 31

SIMM07 D A


mullwx mullwx Multiply Low Integer Unit

mullw rD,rA,rB (OE=0 Rc=0)mullw. rD,rA,rB (OE=0 Rc=1)mullwo rD,rA,rB (OE=1 Rc=0) mullwo. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: muls, muls., mulso, mulso.]

rD←rA[32–63]∗ rB[32–63]

The low-order 32 bits of the 64-bit product (rA)∗ (rB) are placed into rD. The low-orderbits of the 32-bit product are independent of whether the operands are treated as signed orunsigned integers. However, OV is set based on the result interpreted as a signed integer.


If OE=1, then OV is set to one if the product cannot be represented in 32 bits.




• XER:


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 235 Rc


nabsx POWER Architecture Instruction nabsxNegative Absolute Integer Unit

nabs rD,rA (OE=0 Rc=0)nabs. rD,rA (OE=0 Rc=1)nabso rD,rA (OE=1 Rc=0)nabso. rD,rA (OE=1 Rc=1)


The negative absolute value –|(rA)| is placed into rD.




• XER:


Note that nabs never overflows. If OE=1 then XER(OV) is cleared to zero and XER(SO)is not changed.


Reserved

0 5 6 10 11 15 16 20 21 22 30 31

31 D A 0 0 0 0 0 OE 488 Rc


nandx nandx NAND Integer Unit

nand rA,rS,rB (Rc=0)nand. rA,rS,rB (Rc=1)

rA← ¬ ((rS) & (rB))

The contents of rS are ANDed with the contents of rB and the one’s complement of theresult is placed into rA.




NAND with rA=rB can be used to obtain the one's complement.

0 5 6 10 11 15 16 20 21 30 31

31 S A B 476 Rc


negx negx Negate Integer Unit

neg rD,rA (OE=0 Rc=0)neg. rD,rA (OE=0 Rc=1)nego rD,rA (OE=1 Rc=0)nego. rD,rA (OE=1 Rc=1)

rD← ¬ (rA) + 1

The sum ¬ (rA) + 1 is placed into rD.

If rA contains the most negative 32-bit number (x'8000_0000'), the low-order 32 bits of theresult contain the most negative 32-bit number and, if OE=1, OV is set.




• XER:

Affected: SO OV (if OE=1)

Reserved

0 5 6 10 11 15 16 20 21 22 30 31

31 D A 0 0 0 0 0 OE 104 Rc


norx norx NOR Integer Unit

nor rA,rS,rB (Rc=0)nor. rA,rS,rB (Rc=1)

rA← ¬ ((rS) | (rB))

The contents of rS are ORed with the contents of rB and the one’s complement of the resultis placed into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 124 Rc


orx orx OR Integer Unit

or rA,rS,rB (Rc=0)or. rA,rS,rB (Rc=1)

rA←(rS) | (rB)

The contents of rS is ORed with the contents of rB and the result is placed into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 444 Rc


orcx orcx OR with Complement Integer Unit

orc rA,rS,rB (Rc=0) orc. rA,rS,rB (Rc=1)

rA ← (rS) | ¬ (rB)

The contents of rS is ORed with the complement of the contents of rB and the result isplaced into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 412 Rc


ori ori OR Immediate Integer Unit

ori rA,rS,UIMM

[POWER mnemonic: oril]

rA←(rS) | ((16)0 || UIMM)

The contents of rS is ORed with x'0000' || UIMM and the result is placed into rA.

The preferred "no-op" (an instruction that does nothing) is:

ori 0,0,0


• None

0 5 6 10 11 15 16 31

24 S A UIMM


oris orisOR Immediate Shifted Integer Unit

oris rA,rS,UIMM

[POWER mnemonic: oriu]

rA←(rS) | (UIMM || (16)0)

The contents of rS is ORed with UIMM || x'0000' and the result is placed into rA.


• None

0 5 6 10 11 15 16 31

25 S A UIMM


rfi rfiReturn from Interrupt Integer Unit

MSR[16–31]←SRR1[16–31]NIA←iea SRR0[0–29] || 0b00

Bits 16–31 of SRR1 are placed into bits 16–31 of the MSR, then the next instruction isfetched, under control of the new MSR value, from the address SRR0[0–29] || b'00'.

This is a supervisor-level instruction and is context synchronizing.


• MSR

Reserved

0 5 6 10 11 15 16 20 21 30 31

19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 0


rlmix POWER Architecture Instruction rlmixRotate Left then Mask Insert Integer Unit

rlmi rA,rS,rB,MB,ME (Rc=0)rlmi. rA,rS,rB,MB,ME (Rc=1)

This instruction is not part of the PowerPC architecture.The contents of rS is rotated left the number of positions specified by bits 27–31 of rB. Therotated data is inserted into rA under control of the generated mask.





0 5 6 10 11 15 16 20 21 25 26 30 31

22 S A B MB ME Rc


rlwimix rlwimix Rotate Left Word Immediate then Mask Insert Integer Unit

rlwimi rA,rS,SH,MB,ME (Rc=0)rlwimi. rA,rS,SH,MB,ME (Rc=1)

[POWER mnemonics: rlimi, rlimi.]

n←SHr←ROTL(rS, n)m←MASK(MB, ME)rA←(r&m) | (rA & ¬ m)

The contents of rS are rotated left SH bits. A mask is generated having 1-bits from bit MBthrough bit ME and 0-bits elsewhere. The rotated data is inserted into rA under control ofthe generated mask.




0 5 6 10 11 15 16 20 21 25 26 30 31

20 S A SH MB ME Rc


rlwinmx rlwinmx Rotate Left Word Immediate then AND with Mask Integer Unit

rlwinm rA,rS,SH,MB,ME (Rc=0)rlwinm. rA,rS,SH,MB,ME (Rc=1)

[POWER mnemonics: rlinm, rlinm.]

n←SHr←ROTL(rS, n)m←MASK(MB, ME)rA←r & m

The contents of rS are rotated left SH bits. A mask is generated having 1-bits from bit MBthrough bit ME and 0-bits elsewhere. The rotated data is ANDed with the generated maskand the result is placed into rA.




The opcode rlwinm can be used to extract an n-bit field, that starts at bit position b in rS[0–31], right-justified into rA (clearing the remaining 32 – n bits of rA), by setting SH = b + n, MB = 32-n, and ME = 31. It can be used to extract an n-bit field, that starts at bit position b in rS[0–31], left-justified into rA (clearing the remaining 32 – n bits of rA), by setting SH = b, MB = 0, and ME = n – 1. It can be used to rotate the contents of a register left (or right) by n bits, by setting SH = n(32– n), MB = 0, and ME = 31. It can be used to shift the contents of a register right by n bits, by setting SH = 32 – N, MB = n, and ME = 31. It can be used to clear the high-order b bits of a register and then shift the result left by n bits by setting SH = n, MB = b – n and ME = 31 – n. It can be used to clear the low-order n bits of a register, by setting SH = 0, MB = 0, and ME = 31 – n.

0 5 6 10 11 15 16 20 21 25 26 30 31

21 S A SH MB ME Rc


rlwnmx rlwnmx Rotate Left Word then AND with Mask Integer Unit

rlwnm rA,rS,rB,MB,ME (Rc=0)rlwnm. rA,rS,rB,MB,ME (Rc=1)

[POWER mnemonics: rlnm, rlnm.]

n←rB[27-31]r←ROTL(rS, n)m←MASK(MB, ME)rA←r & m

The contents of rS are rotated left the number of bits specified by rB[27–31]. A mask isgenerated having 1-bit from bit MB through bit ME and 0-bits elsewhere. The rotated datais ANDed with the generated mask and the result is placed into rA.




The opcode rlwnm can be used to extract an n-bit field, that starts at variable bit position bin rS[0–31], right-justified into rA (clearing the remaining 32 – n bits of rA), by settingrB[27–31] = b + n, MB = 32 – n, and ME = 31. It can be used to extract an n-bit field, thatstarts at variable bit position b in rS[0–31], left-justified into rA (clearing the remaining32 – n bits of rA), by setting rB[27–31] = b, MB = 0, and ME = n – 1. It can be used torotate the contents of a register left (or right) by variable n bits, by setting rB[27–31] =n(32 – N), MB = 0, and ME = 31.

Equivalent mnemonics are provided for some of these uses.

0 5 6 10 11 15 16 20 21 25 26 30 31

23 S A B MB ME Rc


rribx POWER Architecture Instruction rribxRotate Right and Insert Bit Integer Unit

rrib rA,rS,rB (Rc=0)rrib. rA,rS,rB (Rc=1)


Bit 0 of rS is rotated right the amount specified by bits 27–31 of rB. The bit is then insertedinto rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 537 Rc


sc scSystem Call Integer Unit

[POWER mnemonic: svca]

This instruction calls the operating system to perform a service. When control is returnedto the program that executed the system call, the content of the registers depends on theregister conventions used by the program providing the system service.

This instruction is context synchronizing, as described in Section 3.1.2, “ContextSynchronization”. Although the PowerPC architecture considers sc to be a branchprocessor instruction, it is executed by the integer processor in the 601.


• Dependent on the system service

POWER Compatibility Note: The PowerPC sc instruction is substantially different fromthe POWER svc instruction. The following aspects of these instructions were consideredwith respect to POWER compatibility:

The PowerPC architecture defines the sc instruction with the “LK” bit set to be an invalidform. POWER architecture defines the svc instruction (same opcode as PowerPC scinstruction) with the “LK” bit set as a valid form which places the address of the instructionfollowing the svc into the link register. In the case of the 601, an sc instruction with the“LK” bit set will execute correctly (as defined in the PowerPC architecture) and will updatethe link register with the address of the instruction following the sc instruction.

The PowerPC architecture defines the sc instruction in such a manner that requires bit 30of the instruction to be b'1' (when bit 30 is b'0', the instruction is considered reserved). ThePOWER architecture svc instruction does not have such a restriction, and uses this bit todefine an alternate form of the svc instruction. Although the 601 does not support thisalternate form of the svc instruction, it does ignore the state of bit 30 of the instructionduring decode and execution.

As a result of executing an sc instruction, the PowerPC architecture defines bits 0–15 ofregister SRR1 to be undefined. In the case of the 601, execution of the sc instruction willcause bits 16–31 of the instruction to be placed into bits 0–15 of register SRR1.

The effective (logical) address of the instruction following the system call instruction isplaced into SRR0. Bits 16–31 of the MSR are placed into bits 16–31 of SRR1.

Reserved

0 5 6 10 11 15 16 29 30 31

17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0


Then a system call exception is generated. The exception causes the MSR to be altered asdescribed in Section 5.4, “Exception Definitions.”

The exception causes the next instruction to be fetched from offset x'C00' from the physicalbase address indicated by the new setting of MSR[EP]. This instruction is context-synchronizing.


• SRR0 SRR1 MSR


slex POWER Architecture Instruction slexShift Left Extended Integer Unit

sle rA,rS,rB (Rc=0)sle. rA,rS,rB (Rc=1)


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of rB.The rotated word is placed in the MQ register. A mask of 32 – n ones followed by n zerosis generated. The logical AND of the rotated word and the generated mask is placed in rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 153 Rc


sleqx POWER Architecture Instruction sleqxShift Left Extended with MQ Integer Unit

sleq rA,rS,rB (Rc=0)sleq. rA,rS,rB (Rc=1)


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of rB.A mask of 32 – n ones followed by n zeros is generated. The rotated word is then mergedwith the contents of the MQ register, under control of the generated mask. The merged wordis placed in rA. The rotated word is placed in the MQ register.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 217 Rc


sliqx POWER Architecture Instruction sliqxShift Left Immediate with MQ Integer Unit

sliq rA,rS,SH (Rc=0)sliq. rA,rS,SH (Rc=1)


Register rS is rotated left n bits where n is the shift amount specified by SH. The rotatedword is placed in the MQ register. A mask of 32 – n ones followed by n zeros is generated.The logical AND of the rotated word is placed into rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A SH 184 Rc


slliqx POWER Architecture Instruction slliqxShift Left Long Immediate with MQ Integer Unit

slliq rA,rS,SH (Rc=0)slliq. rA,rS,SH (Rc=1)


Register rS is rotated left n bits where n is the shift amount specified by SH. A mask of32 – n ones followed by n zeros is generated. The rotated word is then merged with thecontents of the MQ register, under control of the generated mask. The merged word isplaced into rA. The rotated word is placed into the MQ register.





0 5 6 10 11 15 16 20 21 30 31

31 S A SH 248 Rc


sllqx POWER Architecture Instruction sllqxShift Left Long with MQ Integer Unit

sllq rA,rS,rB (Rc=0)sllq. rA,rS,rB (Rc=1)


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of rB.

When bit 26 of rB is a zero, a mask of 32 – n ones followed by n zeros is generated. A wordof zeros is then merged with the contents of the MQ register, under control of the generatedmask.

When bit 26 of rB is a one, a mask of 32 – n ones followed by n ones is generated. A wordof zeros is then merged with the contents of the MQ register, under control of the generatedmask.

The merged word is placed into rA. The MQ register is not altered.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 216 Rc


slqx POWER Architecture Instruction slqxShift Left with MQ Integer Unit

slq rA,rS,rB (Rc=0)slq. rA,rS,rB (Rc=1)


Register rS is rotated left n bits where n is the shift amount specified in bits 27–31 of rB.The rotated word is placed in the MQ register.

When bit 26 of rB is a zero, a mask of 32 – n ones followed by n zeros is generated.

When bit 26 of rB is a one, a mask of all zeros is generated.

The logical AND of the rotated word and the generated mask is placed into rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 152 Rc


slwx slwx Shift Left Word Integer Unit

slw rA,rS,rB (Rc=0)slw. rA,rS,rB (Rc=1)

[POWER mnemonics: sl, sl.]

n←rB[27-31]rA←ROTL(rS, n)

If bit 16 of rB=0, the contents of rS are shifted left the number of bits specified by rB[27–31]. Bits shifted out of position 0 are lost. Zeros are supplied to the vacated positions on theright. The 32-bit result is placed into rA. If bit 16 of rB=1, 32 zeros are placed into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 24 Rc


sraiqx POWER Architecture Instruction sraiqxShift Right Algebraic Immediate with MQ Integer Unit

sraiq rA,rS,SH (Rc=0)sraiq. rA,rS,SH (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified by SH. A maskof n zeros followed by 32 – n ones is generated. The rotated word is placed in the MQregister. The rotated word is then merged with a word of 32 sign bits from rS, under controlof the generated mask.

The merged word is placed in rA.

The rotated word is ANDed with the complement of the generated mask. This 32-bit resultis ORed together and then ANDed with bit 0 of rS to produce XER[CA].




• XER:

Affected: CA

All shift right algebraic instructions can be used for a fast divide by 2(n) if followed withaddze.


0 5 6 10 11 15 16 20 21 30 31

31 S A SH 952 Rc


sraqx POWER Architecture Instruction sraqxShift Right Algebraic with MQ Integer Unit

sraq rA,rS,rB (Rc=0)sraq. rA,rS,rB (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 ofrB. When bit 26 of rB is a zero, a mask of n zeros followed by 32 – n ones is generated.When bit 26 of rB is a one, a mask of all zeros is generated. The rotated word is placed inthe MQ register. The rotated word is then merged with a word of 32 sign bits from rS, undercontrol of the generated mask.






• XER:

Affected: CA

All shift right algebraic instructions can be used for a fast divide by 2(n) if followed withaddze.


0 5 6 10 11 15 16 20 21 30 31

31 S A B 920 Rc


srawx srawx Shift Right Algebraic Word Integer Unit

sraw rA,rS,rB (Rc=0)sraw. rA,rS,rB (Rc=1)

[POWER mnemonics: sra, sra.]

n←rB[27-31]rA←ROTL(rS, n)

If rB[26]=0,then the contents of rS are shifted right the number of bits specified by rB[27–31]. Bits shifted out of position 31 are lost. The result is padded on the left with sign bitsbefore being placed into rA. If rB[26]=1, then rA is filled with 32 sign bits (bit 0) from rS.CR0 is set based on the value written into rA.




• XER:

Affected: CA

0 5 6 10 11 15 16 20 21 30 31

31 S A B 792 Rc


srawix srawix Shift Right Algebraic Word Immediate Integer Unit

srawi rA,rS,SH (Rc=0) srawi. rA,rS,SH (Rc=1)

[POWER mnemonics: srai, srai.]

n←SHrA←ROTL(rS, 32-n)

The contents of rS are shifted right SH bits. Bits shifted out of position 31 are lost. Theshifted value is sign extended before being placed in rA. The 32-bit result is placed into rA.XER[CA] is set to 1 if rS contains a negative number and any 1-bits are shifted out ofposition 31; otherwise XER[CA] is cleared to 0. A shift amount of zero causes XER[CA]to be cleared to 0.




• XER:

Affected: CA

0 5 6 10 11 15 16 20 21 30 31

31 S A SH 824 Rc


srex POWER Architecture Instruction srexShift Right Extended Integer Unit

sre rA,rS,rB (Rc=0)sre. rA,rS,rB (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 ofrB. The rotated word is placed in the MQ register. A mask of n zeros followed by 32 – nones is generated. The logical AND of the rotated word and the generated mask is placedin rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 665 Rc


sreax POWER Architecture Instruction sreaxShift Right Extended Algebraic Integer Unit

srea rA,rS,rB (Rc=0)srea. rA,rS,rB (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 ofrB. A mask of n zeros followed by 32 – n ones is generated. The rotated word is placed inthe MQ register. The rotated word is then merged with a word of 32 sign bits from rS, undercontrol of the generated mask.






• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 30 31

31 S A B 921 Rc


sreqx POWER Architecture Instruction sreqxShift Right Extended with MQ Integer Unit

sreq rA,rS,rB (Rc=0)sreq. rA,rS,rB (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 ofrB. A mask of n zeros followed by 32 – n ones is generated. The rotated word is thenmerged with the contents of the MQ register, under control of the generated mask. Themerged word is placed in rA. The rotated word is placed into the MQ register.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 729 Rc


sriqx POWER Architecture Instruction sriqxShift Right Immediate with MQ Integer Unit

sriq rA,rS,SH (Rc=0)sriq. rA,rS,SH (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified by SH. Therotated word is placed into the MQ register. A mask of n zeros followed by 32 – n ones isgenerated. The logical AND of the rotated word and the generated mask is placed in rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A SH 696 Rc


srliqx POWER Architecture Instruction srliqxShift Right Long Immediate with MQ Integer Unit

srliq rA,rS,SH (Rc=0)srliq. rA,rS,SH (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified by SH. A maskof n zeros followed by 32 – n ones is generated. The rotated word is then merged with theMQ register, under control of the generated mask. The merged word is placed in rA. Therotated word is placed into the MQ register.





0 5 6 10 11 15 16 20 21 30 31

31 S A SH 760 Rc


srlqx POWER Architecture Instruction srlqxShift Right Long with MQ Integer Unit

srlq rA,rS,rB (Rc=0)srlq. rA,rS,rB (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 ofrB. When bit 26 of rB is a zero, a mask of n zeros followed by 32 – n ones is generated.The rotated word is then merged with the MQ register, under control of the generated mask.

When bit 26 of rB is a one, a mask of n ones followed by 32 – n zeros is generated. A wordof zeros is then merged with the contents of the MQ register, under control of the generatedmask.

The merged word is placed in rA. The MQ register is not altered.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 728 Rc


srqx POWER Architecture Instruction srqxShift Right with MQ Integer Unit

srq rA,rS,rB (Rc=0)srq. rA,rS,rB (Rc=1)


Register rS is rotated left 32 – n bits where n is the shift amount specified in bits 27–31 ofrB. The rotated word is placed into the MQ register.

When bit 26 of rB is a zero, a mask of n zeros followed by 32 – n ones is generated.

When bit 26 of rB is a one, a mask of all zeros is generated.

The logical AND of the rotated word and the generated mask is placed in rA.





0 5 6 10 11 15 16 20 21 30 31

31 S A B 664 Rc


srwx srwx Shift Right Word Integer Unit

srw rA,rS,rB (Rc=0)srw. rA,rS,rB (Rc=1)

[POWER mnemonics: sr, sr.]

n←rB[27-31]rA←ROTL(rS, 32-n)

If rB[26]=0, the contents of rA are shifted right the number of bits specified by rA[27–31].Bits shifted out of position 31 are lost. Zeros are supplied to the vacated positions on theleft. The 32-bit result is placed into rA.

If rB[26]=1, then rA is filled with zeros.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 536 Rc


stb stbStore Byte Integer Unit

stb rS,d(rA)

if rA = 0 then b←0else b←(rA)EA←b + EXTS(d)MEM(EA, 1)←rS[24-31]

EA is the sum (rA|0)+d. Register rS[24–31] is stored into the byte in memory addressedby EA. Register rS is unchanged.


• None

0 5 6 10 11 15 16 31

38 S A d


stbu stbuStore Byte with Update Integer Unit

stbu rS,d(rA)

EA←(rA) + EXTS(d)MEM(EA, 1)←rS[24-31]rA←EA

EA is the sum (rA|0)+d. Register rS[24–31] is stored into the byte in memory addressedby EA.


While the PowerPC architecture defines the instruction form as invalid if rA=0, the 601supports execution with rA=0 as shown above.


• None

0 5 6 10 11 15 16 31

39 S A d


stbux stbuxStore Byte with Update Indexed Integer Unit

stbux rS,rA,rB

EA←(rA) + (rB)MEM(EA, 1)←rS[24-31]rA←EA

EA is the sum (rA|0)+(rB). Register rS[24–31] is stored into the byte in memory addressedby EA.




• None

Reserved

0 5 6 10 11 15 16 21 22 30 31

31 S A B 247 0


stbx stbxStore Byte Indexed Integer Unit

stbx rS,rA,rB

if rA = 0 then b←0else b←(rA)EA←b + (rB)MEM(EA, 1) ← rS[24-31]

EA is the sum (rA|0)+(rB). Register rS[24–31] is stored into the byte in memory addressedby EA. Register rS is unchanged.


• None

Reserved

0 5 6 10 11 15 16 21 22 30 31

31 S A B 215 0


stfd stfdStore Floating-Point Double-Precision Floating-Point Unit

stfd frS,d(rA)

if rA = 0 then b←0else b←(rA)EA←b + EXTS(d)MEM(EA, 8)←(frS)




• None

0 5 6 10 11 15 16 30 31

54 frS A d


stfdu stfduStore Floating-Point Double-Precision with Update Floating-Point Unit

stfdu frS,d(rA)

if rA = 0 then b←0else b←(rA)EA←b + dMEM(EA, 4)← SINGLE(frS)rA←EA






• None

0 5 6 10 11 15 16 31

55 frS A d


stfdux stfduxStore Floating-Point Double-Precision with Update Indexed Floating-Point Unit

stfdux frS,rA,rB

EA←(rA) + (rB)MEM(EA, 8)←(frS)rA←EA






• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 frS A B 759 0


stfdx stfdxStore Floating-Point Double-Precision Indexed Floating-Point Unit

stfdx frS,rA,rB

if rA = 0 then b ←0else b←(rA)EA←b + (rB)MEM(EA, 8)←(frS)




• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 frS A B 727 0


stfs stfs Store Floating-Point Single-Precision Integer Unit and

Floating-Point Unit

stfs frS,d(rA)

if rA = 0 then b←0else b←(rA)EA←b + EXTS(d)MEM(EA, 4)←SINGLE(frS)

EA is the sum (rA|0)+d.

The contents of register frS is converted to single-precision and stored into the word inmemory addressed by EA.


• None

0 5 6 10 11 15 16 31

52 frS A d


stfsu stfsu Store Floating-Point Single-Precision with Update Integer Unit and

Floating-Point Unit

stfsu frS,d(rA)

EA←rA + EXTS(d)MEM(EA, 4)←SINGLE(frS)rA←EA


The contents of frS is converted to single-precision and stored into the word in memoryaddressed by EA.




• None

0 5 6 10 11 15 16 31

53 frS A d


stfsux stfsux Store Floating-Point Single-Precision with Update Indexed Integer Unit and

Floating-Point Unit

stfsux frS,rA,rB

EA←(rA) + (rB)MEM(EA, 4)←SINGLE(frS)rA←EA


The contents of frS is converted to single-precision and stored into the word in memoryaddressed by EA.




• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 frS A B 695 0


stfsx stfsx Store Floating-Point Single-Precision Indexed Integer Unit and

Floating-Point Unit

stfsx frS,rA,rB

if rA=0 then b←0else b←(rA)EA←b + (rB)MEM(EA, 4)←SINGLE(frS)


The contents of register frS is converted to single-precision and stored into the word inmemory addressed by EA.


• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 frS A B 663 0


sth sthStore Half Word Integer Unit

sth rS,d(rA)

if rA = 0 then b←0else b←(rA)EA←b + EXTS(d)MEM(EA, 2)←rS[16-31]

EA is the sum (rA|0) + d. Register rS[16–31] is stored into the half word in memoryaddressed by EA.


• None

0 5 6 10 11 15 16 31

44 S A d


sthbrx sthbrxStore Half Word Byte-Reverse Indexed Integer Unit

sthbrx rS,rA,rB

if rA = 0 then b←0else b←(rA)EA←b + (rB)MEM(EA, 2)←rS[24-31] || rS[16-23]

EA is the sum (rA|0)+(rB). The contents of rS[24–31] are stored into bits 0–7 of the halfword in memory addressed by EA. Bits rS[16–23] are stored into bits 8–15 of the half wordin memory addressed by EA.


• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 918 0


sthu sthuStore Half Word with Update Integer Unit

sthu rS,d(rA)

EA←rA + EXTS(d)MEM(EA, 2)←rS[16-31]rA←EA

EA is the sum (rA|0)+d. The contents of rS[16–31] are stored into the half word in memoryaddressed by EA.




• None

0 5 6 10 11 15 16 31

45 S A d


sthux sthuxStore Half Word with Update Indexed Integer Unit

sthux rS,rA,rB

EA←(rA) + (rB)MEM(EA, 2)←rS[16-31]rA←EA

EA is the sum (rA|0)+(rB). Register rS[16–31] is stored into the half word in memoryaddressed by EA.




• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 439 0


sthx sthxStore Half Word Indexed Integer Unit

sthx rS,rA,rB

if rA = 0 then b←0else b←(rA)EA←b + (rB)MEM(EA, 2)←rS[16-31]

EA is the sum (rA|0) + (rB). Register rS[16–31] is stored into the half word in memoryaddressed by EA.


• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 407 0


stmw stmwStore Multiple Word Integer Unit

stmw rS,d(rA)

[POWER mnemonic: stm]

if rA = 0 then b←0else b←(rA)EA←b + EXTS(d)r←rSdo while r ≤ 31

MEM(EA, 4) ← GPR(r)r←r + 1EA← EA + 4


n = (32 – rS).

n consecutive words starting at EA are stored from the GPRs rS through 31. For example,if rS=30, 2 words are stored.

EA must be a multiple of 4; otherwise, the system alignment error handler may be invoked.For additional information about alignment and data access exceptions, see Section 5.4.3,“Data Access Exception (x'00300').”


• None

In future implementations, this instruction is likely to have greater latency and take longerto execute, perhaps much longer, than a sequence of individual store instructions thatproduce the same results.

Note that on other PowerPC implementations, load and store multiple instructions that arenot on a word boundary either take an alignment exception or generate results that areboundedly undefined.

0 5 6 10 11 15 16 31

47 S A d


stswi stswiStore String Word Immediate Integer Unit

stswi rS,rA,NB

[POWER mnemonic: stsi]

if rA = 0 then EA←0else EA←(rA)if NB = 0 then n←32else n←NBr←rS-1i←0do while n>0

if i = 0 then r←r+1 (mod 32)MEM(EA, 1)←GPR(r)[i–i+7]i←i+8if i = 32 then i←0EA←EA+1n←n-1

EA is (rA|0). Let n = NB if NB≠0, n = 32 if NB = 0; n is the number of bytes to store. Letnr = CEIL(n/4): nr is the number of registers to supply data.

n consecutive bytes starting at EA are stored from GPRs rS through rS + nr – 1.

Under certain conditions (for example, segment boundary crossings) the data alignmenterror handler may be invoked. For additional information about data alignment exceptions,see Section 5.4.3, “Data Access Exception (x'00300').”

Bytes are stored left to right from each register. The sequence of registers wraps aroundthrough GPR0 if required.


• None


Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A NB 725 0


stswx stswxStore String Word Indexed Integer Unit

stswx rS,rA,rB

[POWER mnemonic: stsx]

if rA = 0 then b←0else b←(rA)EA←b+(rB)n←XER[25-31]r←rS-1i←0do while n>0

if i = 0 then r←r+1 (mod 32)MEM(EA, 1)←GPR(r)[i–i+7]i←i+8if i = 32 then i←0EA←EA+1n←n-1

EA is the sum (rA|0)+(rB). Let n = XER[25–31]; n is the number of bytes to store.

Let nr = CEIL(n/4): nr is the number of registers to supply data.

n consecutive bytes starting at EA are stored from GPRs rS through rS + nr – 1.

Under certain conditions (for example, segment boundary crossings) the data alignmenterror handler may be invoked. For additional information about data alignment exceptions,see Section 5.4.3, “Data Access Exception (x'00300').”

Bytes are stored left to right from each register. The sequence of registers wraps aroundthrough GPR0 if required.


• None


Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 661 0


stw stwStore Word Integer Unit

stw rS,d(rA)

[POWER mnemonic: st]

if rA = 0 then b←0else b←(rA)EA←b + EXTS(d)MEM(EA, 4)←rS

EA is the sum (rA|0) + d. The contents of rS are stored into the word in memory addressedby EA.


• None

0 5 6 10 11 15 16 31

36 S A d


stwbrx stwbrxStore Word Byte-Reverse Indexed Integer Unit

stwbrx rS,rA,rB

[POWER mnemonic: stbrx]

if rA = 0 then b←0else b←(r

A)EA

←

b + (

r

B)MEM(EA, 4)

←

r

S[24-31] ||

r

S[16-23] ||

r

S[8-15] ||

r

S[0-7]

EA is the sum (

r

A|0) + (

r

B). The contents of

r

S[24–31] are stored into bits 0–7 of the wordin memory addressed by EA. Bits

r

S[16–23] are stored into bits 8–15 of the word inmemory addressed by EA. Bits

r

S[8–15] are stored into bits 16–23 of the word in memoryaddressed by EA. Bits

r

S[0–7] are stored into bits 24–31 of the word in memory addressedby EA.


• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 662 0

10-202


stwcx. stwcx.

Store Word Conditional Indexed Integer Unit

stwcx. r

S

,r

A

,r

B

if

r

A = 0 then b

←

0else b

←

(

r

A)EA

←

b + (

r

B)if RESERVE then

MEM(EA, 4)

←

r

SRESERVE

←

0CR0

←

0b00 || 0b1|| XER[SO]else

CR0

←

0B00 || 0B0 || XER[SO]

EA is the sum (

r

A|0)+(

r

B).

If a reservation exists, the contents of

r

S are stored into the word in memory addressed byEA and the reservation is cleared. If no reservation exists, the instruction completes withoutaltering memory or cache.

CR0 Field is set to reflect whether the store operation was performed (i.e., whether areservation existed when the

stwcx.

instruction commenced execution) as follows.

CR0[LT GT EQ S0]

←

b'00' || store_performed || XER[SO]

The EQ bit in the condition register field CR0 is modified to reflect whether the storeoperation was performed (i.e., whether a reservation existed when the

stwcx.

instructionbegan execution). If the store was completed successfully, the EQ bit is set to one.

EA must be a multiple of 4; otherwise, the system alignment error handler may be invokedor the results may be boundedly undefined.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 150 1

Chapter 10. Instruction Set

10-203

stwu stwu

Store Word with Update Integer Unit

stwu r

S

,

d

(r

A

)

[POWER mnemonic:

stu

]

EA

←

r

A + EXTS(d)MEM(EA, 4)

←

r

S

r

A

←

EA

EA is the sum (

r

A|0)+d. The contents of

r

S are stored into the word in memory addressedby EA.

EA is placed into

r

A.

While the PowerPC architecture defines the instruction form as invalid if

r

A=0, the 601supports execution with

r

A=0 as shown above.


• None

0 5 6 10 11 15 16 31

37 S A d

10-204


stwux stwux

Store Word with Update Indexed Integer Unit

stwux r

S

,r

A

,r

B

[POWER mnemonic:

stux

]

EA

←

(

r

A) + (

r

B)MEM(EA, 4)

←

r

S

r

A

←

EA

EA is the sum (

r

A|0)+(

r

B). The contents of

r

S are stored into the word in memoryaddressed by EA.

EA is placed into

r

A.

While the PowerPC architecture defines the instruction form as invalid if

r

A=0, the 601supports execution with

r

A=0 as shown above.


• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 183 0

Chapter 10. Instruction Set

10-205

stwx stwx

Store Word Indexed Integer Unit

stwx r

S

,r

A

,r

B

[POWER mnemonic:

stx

]

if

r

A = 0 then b

←

0else b

←

(

r

A)EA

←

b + (

r

B)MEM(EA, 4)

←

r

S

EA is the sum (

r

A|0)+(

r

B). The contents of

r

S are is stored into the word in memoryaddressed by EA.


• None

Reserved

0 5 6 10 11 15 16 20 21 30 31

31 S A B 151 0

10-206


subf

x

subf

x

Subtract from Integer Unit

subf r

D,rA,rB (OE=0 Rc=0)subf. rD,rA,rB (OE=0 Rc=1) subfo rD,rA,rB (OE=1 Rc=0) subfo. rD,rA,rB (OE=1 Rc=1)

rD ← ¬ (rA) + (rB) + 1

The sum ¬ (rA) + (rB) +1 is placed into rD.




• XER:


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 40 Rc


subfcx subfcx Subtract from Carrying Integer Unit

subfc rD,rA,rB (OE=0 Rc=0)subfc. rD,rA,rB (OE=0 Rc=1)subfco rD,rA,rB (OE=1 Rc=0)subfco. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: sf, sf., sfo, sfo.]

rD← ¬ (rA) + (rB) + 1

The sum ¬ (rA) + (rB) + 1 is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 8 Rc


subfex subfex Subtract from Extended Integer Unit

subfe rD,rA,rB (OE=0 Rc=0)subfe. rD,rA,rB (OE=0 Rc=1)subfeo rD,rA,rB (OE=1 Rc=0)subfeo. rD,rA,rB (OE=1 Rc=1)

[POWER mnemonics: sfe, sfe., sfeo, sfeo.]

rD← ¬ (rA) + (rB) + XER[CA]

The sum ¬ (rA) + (rB) + XER[CA] is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

31 D A B OE 136 Rc


subfic subfic Subtract from Immediate Carrying Integer Unit

subfic rD,rA,SIMM

[POWER mnemonic: sfi]

rD← ¬ (rA) + EXTS(SIMM) + 1

The sum ¬ (rA) + EXTS(SIMM) + 1 is placed into rD.


• XER:

Affected: CA

0 5 6 10 11 15 16 31

08 D A SIMM


subfmex subfmex Subtract from Minus One Extended Integer Unit

subfme rD,rA (OE=0 Rc=0)subfme. rD,rA (OE=0 Rc=1)subfmeo rD,rA (OE=1 Rc=0)subfmeo. rD,rA (OE=1 Rc=1)

[POWER mnemonics: sfme, sfme., sfmeo, sfmeo.]

rD← ¬ (rA) + XER[CA] – 1

The sum ¬ (rA) + XER[CA] + x'FFFFFFFF' is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

31 D A

Reserved

0 0 0 0 0 OE 232 Rc


subfzex subfzex Subtract from Zero Extended Integer Unit

subfze rD,rA (OE=0 Rc=0)subfze. rD,rA (OE=0 Rc=1)subfzeo rD,rA (OE=1 Rc=0)subfzeo. rD,rA (OE=1 Rc=1)

[POWER mnemonics: sfze, sfze., sfzeo, sfzeo.]

rD← ¬ (rA) + XER[CA]

The sum ¬ (rA) + XER[CA] is placed into rD.




• XER:

Affected: CA


0 5 6 10 11 15 16 20 21 22 30 31

31 D A

Reserved

0 0 0 0 0 OE 200 Rc


sync syncSynchronize Integer Unit

[POWER mnemonic: dcs]

The sync instruction provides an ordering function for the effects of all instructionsexecuted by a given processor. Executing a sync instruction ensures that all instructionspreviously initiated by the given processor appear to have completed before any subsequentinstructions are initiated by the given processor. When the sync instruction completes, allexternal accesses initiated by the given processor prior to the sync will have beenperformed with respect to all other mechanisms that access memory.

The sync instruction can be used to ensure that the results of all stores into a data structure,performed in a “critical section” of a program, are seen by other processors before the datastructure is seen as unlocked. The eieio instruction may be more appropriate than sync forcases in which the only requirement is to control the order in which external references areseen by I/O devices.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 598 0


tlbie tlbie Translation Lookaside Buffer Invalidate Entry Integer Unit

tlbie rB

[POWER mnemonic: tlbi]

VPI←rB[4–19]Identify TLB entries corresponding to VPI each such TLB entry←invalid

EA is the contents of rB. The translation lookaside buffer (referred to as the TLB)containing entries corresponding to the EA are made invalid (i.e., removed from the TLB).Additionally, a TLB invalidate operation is broadcast on the system interface. The TLBsearch is done regardless of the settings of MSR[IT] and MSR[DT]. Block addresstranslation for EA, if any, is ignored.

Because the 601 supports broadcast of TLB entry invalidate operations, the following mustbe observed:

• The tlbie instruction(s) must be contained in a critical section, controlled by software locking, so that tlbie is issued on only one processor at a time.

• A sync instruction must be issued after every tlbie and at the end of the critical section. This causes the hardware to wait for the effects of the preceding tlbie instructions(s) to propagate to all processors.

A processor detecting a TLB invalidate broadcast performs the following:




4. Resumes normal execution

This is a supervisor-level instruction. It is optional in the PowerPC architecture.

Nothing is guaranteed about instruction fetching in other processors if the tlbie instructiondeletes the page in which some other processor is currently executing.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 0 0 0 0 0 0 0 0 0 0 B 306 0

tw tw Trap Word Integer Unit

tw TO,rA,rB

[POWER mnemonic: t]

a← EXTS(rA)b← EXTS(rB)if (a < b) & TO[0] then TRAPif (a > b) & TO[1] then TRAPif (a = b) & TO[2] then TRAPif (a U b) & TO[4] then TRAP

The contents of rA are compared with the contents of rB. If any bit in the TO field is set to1 and its corresponding condition is met by the result of the comparison, then the systemtrap handler is invoked.


• None

0 5 6 10 11 15 16 20 21 30 31

Reserved

31 TO A B 4 0

twi twi Trap Word Immediate Integer Unit

twi TO,rA,SIMM

[POWER mnemonic: ti]

a← EXTS(rA)if (a < EXTS(SIMM)) & TO[0] then TRAPif (a > EXTS(SIMM)) & TO[1] then TRAPif (a = EXTS(SIMM)) & TO[2] then TRAPif (a U EXTS(SIMM)) & TO[4] then TRAP

The contents of rA are compared with the sign-extended SIMM field. If any bit in the TOfield is set to 1 and its corresponding condition is met by the result of the comparison, thenthe system trap handler is invoked.


• None

0 5 6 10 11 15 16 31

03 TO A SIMM


xorx xorxXOR Integer Unit

xor rA,rS,rB (Rc=0)xor. rA,rS,rB (Rc=1)

rA←(rS) ⊕ (rB)

The contents of rA is XORed with the contents of rB and the result is placed into rA.




0 5 6 10 11 15 16 20 21 30 31

31 S A B 316 Rc


xori xori XOR Immediate Integer Unit

xori rA,rS,UIMM

[POWER mnemonic: xoril]

rA←(rS) ⊕ ((16)0 || UIMM)

The contents of rS is XORed with x'0000' || UIMM and the result is placed into rA.


• None

0 5 6 10 11 15 16 31

26 S A UIMM


xoris xoris XOR Immediate Shifted Integer Unit

xoris rA,rS,UIMM

[POWER mnemonic: xoriu]

rA←(rS) ⊕ (UIMM || (16)0)

The contents of rS is XORed with UIMM || x'0000' and the result is placed into rA.


• None

0 5 6 10 11 15 16 31

27 S A UIMM


10.3 Instructions Not Implemented by the 601Table 10-6 provides a list of 32-bit instructions that are not implemented by the 601, andthat generate an illegal instruction exception. Refer to Appendix C, “PowerPC InstructionsNot Implemented”, for a more detailed description of the instructions.

Table 10-7 provides a list of 32-bit SPR encodings that are not implemented by the 601.

Table 10-6. 32-Bit Instructions Not Implemented by the PowerPC 601 Microprocessor


fres Floating-Point Reciprocal Estimate Single-Precision

frsqrte Floating-Point Reciprocal Square Root Estimate

fsel Floating-Point Select

fsqrt Floating-Point Square Root

fsqrts Floating-Point Square Root Single-Precision

mftb Move from Time Base

stfiwx Store Floating-Point as Integer Word Indexed

tlbia Translation Lookaside Buffer Invalidate All

tlbsync Translation Lookaside Buffer Synchronize

Table 10-7. 32-Bit SPR Encodings Not Implemented by the PowerPC 601 Microprocessor

SPRRegister

NameAccess


284 01000 11100 TB Supervisor

285 01000 11101 TBU Supervisor

536 10000 11000 DBAT0U Supervisor

537 10000 11001 DBAT0L Supervisor








Table 10-8 provides a list of 64-bit instructions that are not implemented by the 601, andthat generate an illegal instruction exception. Refer to Appendix C, “PowerPC InstructionsNot Implemented.”

Table 10-8. 64-Bit Instructions Not Implemented by the PowerPC 601 Microprocessor


cntlzd Count Leading Zeros Double Word

divd Divide Double Word

divdu Divide Double Word Unsigned

extsw Extend Sign Word

fcfid Floating Convert From Integer Double Word

fctid Floating Convert to Integer Double Word

fctidz Floating Convert to Integer Double Word with Round to Zero

ld Load Double Word

ldarx Load Double Word and Reserve Indexed

ldu Load Double Word with Update

ldux Load Double Word with Update Indexed

ldx Load Double Word Indexed

lwa Load Word Algebraic

lwaux Load Word Algebraic with Update Indexed

lwax Load Word Algebraic Indexed

mulld Multiply Low Double Word

mulhd Multiply High Double Word

mulhdu Multiply High Double Word Unsigned

rldcl Rotate Left Double Word then Clear Left

rldcr Rotate Left Double Word then Clear Right

rldic Rotate Left Double Word Immediate then Clear

rldicl Rotate Left Double Word Immediate then Clear Left

rldicr Rotate Left Double Word Immediate then Clear Right

rldimi Rotate Left Double Word Immediate then Mask Insert

slbia SLB Invalidate All

slbie SLB Invalidate Entry

sld Shift Left Double Word

srad Shift Right Algebraic Double Word

sradi Shift Right Algebraic Double Word Immediate


Table 10-9 provides the 64-bit SPR encoding that is not implemented by the 601.

srd Shift Right Double Word

std Store Double Word

stdcx. Store Double Word Conditional Indexed

stdu Store Double Word with Update

stdux Store Double Word Indexed with Update

stdx Store Double Word Indexed

td Trap Double Word

tdi Trap Double Word Immediate

Table 10-9. 64-Bit SPR Encoding Not Implemented by the PowerPC 601 Microprocessor

SPRRegister

NameAccess


280 01000 11000 ASR Supervisor

Table 10-8. 64-Bit Instructions Not Implemented by the PowerPC 601 Microprocessor (Continued)


© Motorola Inc. 1995Portions hereof © International Business Machines Corp. 1991–1995. All rights reserved.

This document contains information on a new product under development by Motorola and IBM. Motorola and IBM reserve the right to change ordiscontinue this product without notice. Information in this document is provided solely to enable system and software implementers to use PowerPCmicroprocessors. There are no express or implied copyright or patent licenses granted hereunder by Motorola or IBM to design, modify the design of, orfabricate circuits based on the information in this document.

The PowerPC 601 microprocessor embodies the intellectual property of Motorola and of IBM. However, neither Motorola nor IBM assumes anyresponsibility or liability as to any aspects of the performance, operation, or other attributes of the microprocessor as marketed by the other party or byany third party. Neither Motorola nor IBM is to be considered an agent or representative of the other, and neither has assumed, created, or granted herebyany right or authority to the other, or to any third party, to assume or create any express or implied obligations on its behalf. Information such as datasheets, as well as sales terms and conditions such as prices, schedules, and support, for the product may vary as between parties selling the product.Accordingly, customers wishing to learn more information about the products as marketed by a given party should contact that party.

Both Motorola and IBM reserve the right to modify this manual and/or any of the products as described herein without further notice. NOTHING IN THISMANUAL, NOR IN ANY OF THE ERRATA SHEETS, DATA SHEETS, AND OTHER SUPPORTING DOCUMENTATION, SHALL BE INTERPRETED ASTHE CONVEYANCE BY MOTOROLA OR IBM OF AN EXPRESS WARRANTY OF ANY KIND OR IMPLIED WARRANTY, REPRESENTATION, ORGUARANTEE REGARDING THE MERCHANTABILITY OR FITNESS OF THE PRODUCTS FOR ANY PARTICULAR PURPOSE. Neither Motorola norIBM assumes any liability or obligation for damages of any kind arising out of the application or use of these materials. Any warranty or other obligationsas to the products described herein shall be undertaken solely by the marketing party to the customer, under a separate sale agreement between themarketing party and the customer. In the absence of such an agreement, no liability is assumed by Motorola, IBM, or the marketing party for any damages,actual or otherwise.

“Typical” parameters can and do vary in different applications. All operating parameters, including “Typicals,” must be validated for each customerapplication by customer’s technical experts. Neither Motorola nor IBM convey any license under their respective intellectual property rights nor the rightsof others. Neither Motorola nor IBM makes any claim, warranty, or representation, express or implied, that the products described in this manual aredesigned, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to supportor sustain life, or for any other application in which the failure of the product could create a situation where personal injury or death may occur. Shouldcustomer purchase or use the products for any such unintended or unauthorized application, customer shall indemnify and hold Motorola and IBM andtheir respective officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonableattorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if suchclaim alleges that Motorola or IBM was negligent regarding the design or manufacture of the part.

Motorola and are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Opportunity/Affirmative Action Employer.

IBM and IBM logo are registered trademarks, and IBM Microelectronics is a trademark of International Business Machines Corp.The PowerPC name, PowerPC logotype,PowerPC 601, PowerPC 603, PowerPC 603e, PowerPC Architectur, and POWER Architecture are trademarksof International Business Machines Corp. used by Motorola under license from International Business Machines Corp. International Business MachinesCorp. is an Equal Opportunity/Affirmative Action Employer.

powerpc 601 - nxp semiconductorscache.freescale.com/files/32bit/doc/user_guide/mpc601um.pdfxlii...

Documents