the risc-v instruction set manualswjun/courses/2019w-cs152/... · 2020. 3. 11. · the risc-v...

236
The RISC-V Instruction Set Manual Volume I: Unprivileged ISA Document Version 20190608-Base-Ratified Editors: Andrew Waterman 1 , Krste Asanovi´ c 1,2 1 SiFive Inc., 2 CS Division, EECS Department, University of California, Berkeley [email protected], [email protected] June 8, 2019

Upload: others

Post on 05-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • The RISC-V Instruction Set ManualVolume I: Unprivileged ISA

    Document Version 20190608-Base-Ratified

    Editors: Andrew Waterman1, Krste Asanović1,21SiFive Inc.,

    2CS Division, EECS Department, University of California, [email protected], [email protected]

    June 8, 2019

  • Contributors to all versions of the spec in alphabetical order (please contact editors to suggestcorrections): Arvind, Krste Asanović, Rimas Avižienis, Jacob Bachmeyer, Christopher F. Bat-ten, Allen J. Baum, Alex Bradbury, Scott Beamer, Preston Briggs, Christopher Celio, ChuanhuaChang, David Chisnall, Paul Clayton, Palmer Dabbelt, Ken Dockser, Roger Espasa, Shaked Flur,Stefan Freudenberger, Jan Gray, Michael Hamburg, John Hauser, David Horner, Bruce Hoult,Bill Huffman, Alexandre Joannou, Olof Johansson, Ben Keller, David Kruckemyer, Yunsup Lee,Paul Loewenstein, Daniel Lustig, Yatin Manerkar, Luc Maranget, Margaret Martonosi, Joseph My-ers, Vijayanand Nagarajan, Rishiyur Nikhil, Jonas Oberhauser, Stefan O’Rear, Albert Ou, JohnOusterhout, David Patterson, Christopher Pulte, Jose Renau, Colin Schmidt, Peter Sewell, SusmitSarkar, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Caroline Trippel, RayVanDeWalker, Muralidaran Vijayaraghavan, Megan Wachs, Andrew Waterman, Robert Watson,Derek Williams, Andrew Wright, Reinoud Zandijk, and Sizhuo Zhang.

    This document is released under a Creative Commons Attribution 4.0 International License.

    This document is a derivative of “The RISC-V Instruction Set Manual, Volume I: User-Level ISAVersion 2.1” released under the following license: c⃝ 2010–2017 Andrew Waterman, Yunsup Lee,David Patterson, Krste Asanović. Creative Commons Attribution 4.0 International License.

    Please cite as: “The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version20190608-Base-Ratified”, Editors Andrew Waterman and Krste Asanović, RISC-V Foundation,March 2019.

  • Preface

    This document describes the RISC-V unprivileged architecture.

    The RVWMOmemory model has been ratified at this time. The ISA modules marked Ratified, havebeen ratified at this time. The modules marked Frozen are not expected to change significantlybefore being put up for ratification. The modules marked Draft are expected to change beforeratification.

    The document contains the following versions of the RISC-V ISA modules:

    Base Version Status

    RVWMO 2.0 RatifiedRV32I 2.1 RatifiedRV64I 2.1 RatifiedRV32E 1.9 DraftRV128I 1.7 Draft

    Extension Version Status

    Zifencei 2.0 RatifiedZicsr 2.0 RatifiedM 2.0 RatifiedA 2.0 FrozenF 2.2 RatifiedD 2.2 RatifiedQ 2.2 RatifiedC 2.0 Ratified

    Ztso 0.1 FrozenCounters 2.0 Draft

    L 0.0 DraftB 0.0 DraftJ 0.0 DraftT 0.0 DraftP 0.2 DraftV 0.7 DraftN 1.1 Draft

    Zam 0.1 Draft

    The changes in this version of the document include:

    • Moved description to Ratified for the ISA modules ratified by the board in early 2019.

    i

  • ii Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    • Removed the A extension from ratification.• Changed document version scheme to avoid confusion with versions of the ISA modules.• Incremented the version numbers of the base integer ISA to 2.1, reflecting the presence of theratified RVWMO memory model and exclusion of FENCE.I, counters, and CSR instructionsthat were in previous base ISA.

    • Incremented the version numbers of the F and D extensions to 2.2, reflecting that version 2.1changed the canonical NaN, and version 2.2 defined the NaN-boxing scheme and changed thedefinition of the FMIN and FMAX instructions.

    • Changed name of document to refer to “unprivileged” instructions as part of move to separateISA specifications from platform profile mandates.

    • Added clearer and more precise definitions of execution environments, harts, traps, and mem-ory accesses.

    • Defined instruction-set categories: standard, reserved, custom, non-standard, and non-conforming.

    • Removed text implying operation under alternate endianness, as alternate-endianness opera-tion has not yet been defined for RISC-V.

    • Changed description of misaligned load and store behavior. The specification now allowsvisible misaligned address traps in execution environment interfaces, rather than just man-dating invisible handling of misaligned loads and stores in user mode. Also, now allowsaccess exceptions to be reported for misaligned accesses (including atomics) that should notbe emulated.

    • Moved FENCE.I out of the mandatory base and into a separate extension, with Zifencei ISAname. FENCE.I was removed from the Linux user ABI and is problematic in implementationswith large incoherent instruction and data caches. However, it remains the only standardinstruction-fetch coherence mechanism.

    • Removed prohibitions on using RV32E with other extensions.• Removed platform-specific mandates that certain encodings produce illegal instruction ex-ceptions in RV32E and RV64I chapters.

    • Counter/timer instructions are now not considered part of the mandatory base ISA, andso CSR instructions were moved into separate chapter and marked as version 2.0, with theunprivileged counters moved into another separate chapter. The counters are not ready forratification as there are outstanding issues, including counter inaccuracies.

    • A CSR-access ordering model has been added.• Explicitly defined the 16-bit half-precision floating-point format for floating-point instructionsin the 2-bit fmt field.

    • Defined the signed-zero behavior of FMIN.fmt and FMAX.fmt, and changed their behavior onsignaling-NaN inputs to conform to the minimumNumber and maximumNumber operationsin the proposed IEEE 754-201x specification.

    • The memory consistency model, RVWMO, has been defined.• The “Zam” extension, which permits misaligned AMOs and specifies their semantics, hasbeen defined.

    • The “Ztso” extension, which enforces a stricter memory consistency model than RVWMO,has been defined.

    • Improvements to the description and commentary.• Defined the term IALIGN as shorthand to describe the instruction-address alignment con-straint.

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified iii

    • Removed text of P extension chapter as now superceded by active task group documents.• Removed text of V extension chapter as now superceded by separate vector extension draftdocument.

    Preface to Document Version 2.2

    This is version 2.2 of the document describing the RISC-V user-level architecture. The documentcontains the following versions of the RISC-V ISA modules:

    Base Version Draft Frozen?

    RV32I 2.0 YRV32E 1.9 NRV64I 2.0 YRV128I 1.7 N

    Extension Version Frozen?

    M 2.0 YA 2.0 YF 2.0 YD 2.0 YQ 2.0 YL 0.0 NC 2.0 YB 0.0 NJ 0.0 NT 0.0 NP 0.1 NV 0.7 NN 1.1 N

    To date, no parts of the standard have been officially ratified by the RISC-V Foundation, butthe components labeled “frozen” above are not expected to change during the ratification processbeyond resolving ambiguities and holes in the specification.

    The major changes in this version of the document include:

    • The previous version of this document was released under a Creative Commons Attribution4.0 International License by the original authors, and this and future versions of this documentwill be released under the same license.

    • Rearranged chapters to put all extensions first in canonical order.• Improvements to the description and commentary.• Modified implicit hinting suggestion on JALR to support more efficient macro-op fusion ofLUI/JALR and AUIPC/JALR pairs.

    • Clarification of constraints on load-reserved/store-conditional sequences.• A new table of control and status register (CSR) mappings.• Clarified purpose and behavior of high-order bits of fcsr.

  • iv Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    • Corrected the description of the FNMADD.fmt and FNMSUB.fmt instructions, which hadsuggested the incorrect sign of a zero result.

    • Instructions FMV.S.X and FMV.X.S were renamed to FMV.W.X and FMV.X.W respectivelyto be more consistent with their semantics, which did not change. The old names will continueto be supported in the tools.

    • Specified behavior of narrower (64 bits to avoid moving the rd specifier in verylong instruction formats.

    • CSR instructions are now described in the base integer format where the counter registersare introduced, as opposed to only being introduced later in the floating-point section (andthe companion privileged architecture manual).

    • The SCALL and SBREAK instructions have been renamed to ECALL and EBREAK, re-spectively. Their encoding and functionality are unchanged.

    • Clarification of floating-point NaN handling, and a new canonical NaN value.• Clarification of values returned by floating-point to integer conversions that overflow.• Clarification of LR/SC allowed successes and required failures, including use of compressedinstructions in the sequence.

    • A new RV32E base ISA proposal for reduced integer register counts, supports MAC exten-sions.

    • A revised calling convention.• Relaxed stack alignment for soft-float calling convention, and description of the RV32E callingconvention.

    • A revised proposal for the C compressed extension, version 1.9.

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified v

    Preface to Version 2.0

    This is the second release of the user ISA specification, and we intend the specification of thebase user ISA plus general extensions (i.e., IMAFD) to remain fixed for future development. Thefollowing changes have been made since Version 1.0 [24] of this ISA specification.

    • The ISA has been divided into an integer base with several standard extensions.• The instruction formats have been rearranged to make immediate encoding more efficient.• The base ISA has been defined to have a little-endian memory system, with big-endian orbi-endian as non-standard variants.

    • Load-Reserved/Store-Conditional (LR/SC) instructions have been added in the atomic in-struction extension.

    • AMOs and LR/SC can support the release consistency model.• The FENCE instruction provides finer-grain memory and I/O orderings.• An AMO for fetch-and-XOR (AMOXOR) has been added, and the encoding for AMOSWAPhas been changed to make room.

    • The AUIPC instruction, which adds a 20-bit upper immediate to the PC, replaces the RDNPCinstruction, which only read the current PC value. This results in significant savings forposition-independent code.

    • The JAL instruction has now moved to the U-Type format with an explicit destinationregister, and the J instruction has been dropped being replaced by JAL with rd=x0. Thisremoves the only instruction with an implicit destination register and removes the J-Typeinstruction format from the base ISA. There is an accompanying reduction in JAL reach, buta significant reduction in base ISA complexity.

    • The static hints on the JALR instruction have been dropped. The hints are redundant withthe rd and rs1 register specifiers for code compliant with the standard calling convention.

    • The JALR instruction now clears the lowest bit of the calculated target address, to simplifyhardware and to allow auxiliary information to be stored in function pointers.

    • The MFTX.S and MFTX.D instructions have been renamed to FMV.X.S and FMV.X.D,respectively. Similarly, MXTF.S and MXTF.D instructions have been renamed to FMV.S.Xand FMV.D.X, respectively.

    • The MFFSR and MTFSR instructions have been renamed to FRCSR and FSCSR, respec-tively. FRRM, FSRM, FRFLAGS, and FSFLAGS instructions have been added to individu-ally access the rounding mode and exception flags subfields of the fcsr.

    • The FMV.X.S and FMV.X.D instructions now source their operands from rs1, instead of rs2.This change simplifies datapath design.

    • FCLASS.S and FCLASS.D floating-point classify instructions have been added.• A simpler NaN generation and propagation scheme has been adopted.• For RV32I, the system performance counters have been extended to 64-bits wide, with separateread access to the upper and lower 32 bits.

    • Canonical NOP and MV encodings have been defined.• Standard instruction-length encodings have been defined for 48-bit, 64-bit, and >64-bit in-structions.

    • Description of a 128-bit address space variant, RV128, has been added.• Major opcodes in the 32-bit base instruction format have been allocated for user-definedcustom extensions.

  • vi Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    • A typographical error that suggested that stores source their data from rd has been correctedto refer to rs2.

  • Contents

    Preface i

    1 Introduction 1

    1.1 RISC-V Hardware Platform Terminology . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 RISC-V Software Execution Environments and Harts . . . . . . . . . . . . . . . . . . 3

    1.3 RISC-V ISA Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.5 Base Instruction-Length Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.6 Exceptions, Traps, and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 RV32I Base Integer Instruction Set, Version 2.1 13

    2.1 Programmers’ Model for Base Integer ISA . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2 Base Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.3 Immediate Encoding Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.4 Integer Computational Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.5 Control Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.6 Load and Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    2.7 Memory Ordering Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.8 Environment Call and Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    2.9 HINT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3 “Zifencei” Instruction-Fetch Fence, Version 2.0 31

    vii

  • viii Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    4 RV32E Base Integer Instruction Set, Version 1.9 33

    4.1 RV32E Programmers’ Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    4.2 RV32E Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    5 RV64I Base Integer Instruction Set, Version 2.1 35

    5.1 Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    5.2 Integer Computational Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    5.3 Load and Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    5.4 HINT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    6 RV128I Base Integer Instruction Set, Version 1.7 41

    7 “M” Standard Extension for Integer Multiplication and Division, Version 2.0 43

    7.1 Multiplication Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    7.2 Division Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    8 “A” Standard Extension for Atomic Instructions, Version 2.0 47

    8.1 Specifying Ordering of Atomic Instructions . . . . . . . . . . . . . . . . . . . . . . . 47

    8.2 Load-Reserved/Store-Conditional Instructions . . . . . . . . . . . . . . . . . . . . . . 48

    8.3 Atomic Memory Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    9 “Zicsr”, Control and Status Register (CSR) Instructions, Version 2.0 53

    9.1 CSR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    10 Counters 57

    10.1 Base Counters and Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    10.2 Hardware Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    11 “F” Standard Extension for Single-Precision Floating-Point, Version 2.2 61

    11.1 F Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    11.2 Floating-Point Control and Status Register . . . . . . . . . . . . . . . . . . . . . . . 63

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified ix

    11.3 NaN Generation and Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    11.4 Subnormal Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    11.5 Single-Precision Load and Store Instructions . . . . . . . . . . . . . . . . . . . . . . . 65

    11.6 Single-Precision Floating-Point Computational Instructions . . . . . . . . . . . . . . 65

    11.7 Single-Precision Floating-Point Conversion and Move Instructions . . . . . . . . . . 67

    11.8 Single-Precision Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . 69

    11.9 Single-Precision Floating-Point Classify Instruction . . . . . . . . . . . . . . . . . . . 69

    12 “D” Standard Extension for Double-Precision Floating-Point, Version 2.2 71

    12.1 D Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    12.2 NaN Boxing of Narrower Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    12.3 Double-Precision Load and Store Instructions . . . . . . . . . . . . . . . . . . . . . . 72

    12.4 Double-Precision Floating-Point Computational Instructions . . . . . . . . . . . . . . 73

    12.5 Double-Precision Floating-Point Conversion and Move Instructions . . . . . . . . . . 73

    12.6 Double-Precision Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . 75

    12.7 Double-Precision Floating-Point Classify Instruction . . . . . . . . . . . . . . . . . . 75

    13 “Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 77

    13.1 Quad-Precision Load and Store Instructions . . . . . . . . . . . . . . . . . . . . . . . 77

    13.2 Quad-Precision Computational Instructions . . . . . . . . . . . . . . . . . . . . . . . 78

    13.3 Quad-Precision Convert and Move Instructions . . . . . . . . . . . . . . . . . . . . . 78

    13.4 Quad-Precision Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . 79

    13.5 Quad-Precision Floating-Point Classify Instruction . . . . . . . . . . . . . . . . . . . 80

    14 RVWMO Memory Consistency Model, Version 0.1 81

    14.1 Definition of the RVWMO Memory Model . . . . . . . . . . . . . . . . . . . . . . . . 82

    14.2 CSR Dependency Tracking Granularity . . . . . . . . . . . . . . . . . . . . . . . . . 86

    14.3 Source and Destination Register Listings . . . . . . . . . . . . . . . . . . . . . . . . . 86

    15 “L” Standard Extension for Decimal Floating-Point, Version 0.0 93

  • x Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    15.1 Decimal Floating-Point Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    16 “C” Standard Extension for Compressed Instructions, Version 2.0 95

    16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    16.2 Compressed Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    16.3 Load and Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    16.4 Control Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    16.5 Integer Computational Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

    16.6 Usage of C Instructions in LR/SC Sequences . . . . . . . . . . . . . . . . . . . . . . 108

    16.7 HINT Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    16.8 RVC Instruction Set Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    17 “B” Standard Extension for Bit Manipulation, Version 0.0 113

    18 “J” Standard Extension for Dynamically Translated Languages, Version 0.0 115

    19 “T” Standard Extension for Transactional Memory, Version 0.0 117

    20 “P” Standard Extension for Packed-SIMD Instructions, Version 0.2 119

    21 “V” Standard Extension for Vector Operations, Version 0.7 121

    22 “N” Standard Extension for User-Level Interrupts, Version 1.1 123

    22.1 Additional CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

    22.2 User Status Register (ustatus) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    22.3 Other CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    22.4 N Extension Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    22.5 Reducing Context-Swap Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    23 “Zam” Standard Extension for Misaligned Atomics, v0.1 125

    24 “Ztso” Standard Extension for Total Store Ordering, v0.1 127

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified xi

    25 RV32/64G Instruction Set Listings 129

    26 RISC-V Assembly Programmer’s Handbook 137

    27 Extending RISC-V 141

    27.1 Extension Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

    27.2 RISC-V Extension Design Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    27.3 Extensions within fixed-width 32-bit instruction format . . . . . . . . . . . . . . . . 144

    27.4 Adding aligned 64-bit instruction extensions . . . . . . . . . . . . . . . . . . . . . . . 146

    27.5 Supporting VLIW encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

    28 ISA Extension Naming Conventions 149

    28.1 Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    28.2 Base Integer ISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    28.3 Instruction-Set Extension Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    28.4 Version Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

    28.5 Underscores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

    28.6 Additional Standard Extension Names . . . . . . . . . . . . . . . . . . . . . . . . . . 150

    28.7 Supervisor-level Instruction-Set Extensions . . . . . . . . . . . . . . . . . . . . . . . 151

    28.8 Hypervisor-level Instruction-Set Extensions . . . . . . . . . . . . . . . . . . . . . . . 151

    28.9 Machine-level Instruction-Set Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 151

    28.10Non-Standard Extension Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

    28.11Subset Naming Convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

    29 History and Acknowledgments 153

    29.1 “Why Develop a new ISA?” Rationale from Berkeley Group . . . . . . . . . . . . . . 153

    29.2 History from Revision 1.0 of ISA manual . . . . . . . . . . . . . . . . . . . . . . . . . 155

    29.3 History from Revision 2.0 of ISA manual . . . . . . . . . . . . . . . . . . . . . . . . . 156

    29.4 History from Revision 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

    29.5 History from Revision 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

  • xii Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    29.6 History for Revision 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    29.7 Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    A RVWMO Explanatory Material, Version 0.1 161

    A.1 Why RVWMO? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

    A.2 Litmus Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

    A.3 Explaining the RVWMO Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

    A.3.1 Preserved Program Order and Global Memory Order . . . . . . . . . . . . . . 163

    A.3.2 Load Value Axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

    A.3.3 Atomicity Axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

    A.3.4 Progress Axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

    A.3.5 Overlapping-Address Orderings (Rules 1–3) . . . . . . . . . . . . . . . . . . . 168

    A.3.6 Fences (Rule 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

    A.3.7 Explicit Synchronization (Rules 5–8) . . . . . . . . . . . . . . . . . . . . . . . 171

    A.3.8 Syntactic Dependencies (Rules 9–11) . . . . . . . . . . . . . . . . . . . . . . . 173

    A.3.9 Pipeline Dependencies (Rules 12–13) . . . . . . . . . . . . . . . . . . . . . . . 176

    A.4 Beyond Main Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

    A.4.1 Coherence and Cacheability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

    A.4.2 I/O Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

    A.5 Code Porting and Mapping Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 180

    A.6 Implementation Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

    A.6.1 Possible Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

    A.7 Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    A.7.1 Mixed-size RSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    B Formal Memory Model Specifications, Version 0.1 191

    B.1 Formal Axiomatic Specification in Alloy . . . . . . . . . . . . . . . . . . . . . . . . . 192

    B.2 Formal Axiomatic Specification in Herd . . . . . . . . . . . . . . . . . . . . . . . . . 197

    B.3 An Operational Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified xiii

    B.3.1 Intra-instruction Pseudocode Execution . . . . . . . . . . . . . . . . . . . . . 204

    B.3.2 Instruction Instance State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

    B.3.3 Hart State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

    B.3.4 Shared Memory State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

    B.3.5 Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

    B.3.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

  • xiv Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

  • Chapter 1

    Introduction

    RISC-V (pronounced “risk-five”) is a new instruction-set architecture (ISA) that was originallydesigned to support computer architecture research and education, but which we now hope willalso become a standard free and open architecture for industry implementations. Our goals indefining RISC-V include:

    • A completely open ISA that is freely available to academia and industry.• A real ISA suitable for direct native hardware implementation, not just simulation or binarytranslation.

    • An ISA that avoids “over-architecting” for a particular microarchitecture style (e.g., mi-crocoded, in-order, decoupled, out-of-order) or implementation technology (e.g., full-custom,ASIC, FPGA), but which allows efficient implementation in any of these.

    • An ISA separated into a small base integer ISA, usable by itself as a base for customizedaccelerators or for educational purposes, and optional standard extensions, to support general-purpose software development.

    • Support for the revised 2008 IEEE-754 floating-point standard [7].• An ISA supporting extensive ISA extensions and specialized variants.• Both 32-bit and 64-bit address space variants for applications, operating system kernels, andhardware implementations.

    • An ISA with support for highly-parallel multicore or manycore implementations, includingheterogeneous multiprocessors.

    • Optional variable-length instructions to both expand available instruction encoding space andto support an optional dense instruction encoding for improved performance, static code size,and energy efficiency.

    • A fully virtualizable ISA to ease hypervisor development.• An ISA that simplifies experiments with new privileged architecture designs.

    Commentary on our design decisions is formatted as in this paragraph. This non-normative textcan be skipped if the reader is only interested in the specification itself.

    The name RISC-V was chosen to represent the fifth major RISC ISA design from UC Berkeley(RISC-I [15], RISC-II [8], SOAR [21], and SPUR [11] were the first four). We also pun on theuse of the Roman numeral “V” to signify “variations” and “vectors”, as support for a range of

    1

  • 2 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    architecture research, including various data-parallel accelerators, is an explicit goal of the ISAdesign.

    The RISC-V ISA is defined avoiding implementation details as much as possible (although com-mentary is included on implementation-driven decisions) and should be read as the software-visibleinterface to a wide variety of implementations rather than as the design of a particular hardwareartifact. The RISC-V manual is structured in two volumes. This volume covers the design ofthe base unprivileged instructions, including optional unprivileged ISA extensions. Unprivilegedinstructions are those that are generally usable in all privilege modes in all privileged architectures,though behavior might vary depending on privilege mode and privilege architecture. The secondvolume provides the design of the first (“classic”) privileged architecture. The manuals use IEC80000-13:2008 conventions, with a byte of 8 bits.

    In the unprivileged ISA design, we tried to remove any dependence on particular microarchi-tectural features, such as cache line size, or on privileged architecture details, such as pagetranslation. This is both for simplicity and to allow maximum flexibility for alternative microar-chitectures or alternative privileged architectures.

    1.1 RISC-V Hardware Platform Terminology

    A RISC-V hardware platform can contain one or more RISC-V-compatible processing cores to-gether with other non-RISC-V-compatible cores, fixed-function accelerators, various physical mem-ory structures, I/O devices, and an interconnect structure to allow the components to communicate.

    A component is termed a core if it contains an independent instruction fetch unit. A RISC-V-compatible core might support multiple RISC-V-compatible hardware threads, or harts, throughmultithreading.

    A RISC-V core might have additional specialized instruction-set extensions or an added coprocessor.We use the term coprocessor to refer to a unit that is attached to a RISC-V core and is mostlysequenced by a RISC-V instruction stream, but which contains additional architectural state andinstruction-set extensions, and possibly some limited autonomy relative to the primary RISC-Vinstruction stream.

    We use the term accelerator to refer to either a non-programmable fixed-function unit or a core thatcan operate autonomously but is specialized for certain tasks. In RISC-V systems, we expect manyprogrammable accelerators will be RISC-V-based cores with specialized instruction-set extensionsand/or customized coprocessors. An important class of RISC-V accelerators are I/O accelerators,which offload I/O processing tasks from the main application cores.

    The system-level organization of a RISC-V hardware platform can range from a single-core micro-controller to a many-thousand-node cluster of shared-memory manycore server nodes. Even smallsystems-on-a-chip might be structured as a hierarchy of multicomputers and/or multiprocessors tomodularize development effort or to provide secure isolation between subsystems.

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 3

    1.2 RISC-V Software Execution Environments and Harts

    The behavior of a RISC-V program depends on the execution environment in which it runs. ARISC-V execution environment interface (EEI) defines the initial state of the program, the numberand type of harts in the environment including the privilege modes supported by the harts, theaccessibility and attributes of memory and I/O regions, the behavior of all legal instructions exe-cuted on each hart (i.e., the ISA is one component of the EEI), and the handling of any interruptsor exceptions raised during execution including environment calls. Examples of EEIs include theLinux application binary interface (ABI), or the RISC-V supervisor binary interface (SBI). Theimplementation of a RISC-V execution environment can be pure hardware, pure software, or acombination of hardware and software. For example, opcode traps and software emulation can beused to implement functionality not provided in hardware. Examples of execution environmentimplementations include:

    • “Bare metal” hardware platforms where harts are directly implemented by physical processorthreads and instructions have full access to the physical address space. The hardware platformdefines an execution environment that begins at power-on reset.

    • RISC-V operating systems that provide multiple user-level execution environments by mul-tiplexing user-level harts onto available physical processor threads and by controlling accessto memory via virtual memory.

    • RISC-V hypervisors that provide multiple supervisor-level execution environments for guestoperating systems.

    • RISC-V emulators, such as Spike, QEMU or rv8, which emulate RISC-V harts on an under-lying x86 system, and which can provide either a user-level or a supervisor-level executionenvironment.

    A bare hardware platform can be considered to define an EEI, where the accessible harts, memory,and other devices populate the environment, and the initial state is that at power-on reset.Generally, most software is designed to use a more abstract interface to the hardware, as moreabstract EEIs provide greater portability across different hardware platforms. Often EEIs arelayered on top of one another, where one higher-level EEI uses another lower-level EEI.

    From the perspective of software running in a given execution environment, a hart is a resource thatautonomously fetches and executes RISC-V instructions within that execution environment. In thisrespect, a hart behaves like a hardware thread resource even if time-multiplexed onto real hardwareby the execution environment. Some EEIs support the creation and destruction of additional harts,for example, via environment calls to fork new harts.

    The term hart was introduced in the work on Lithe [13, 14] to provide a term to represent anabstract execution resource as opposed to a software thread programming abstraction.

    The important distinction between a hardware thread (hart) and a software thread contextis that the software running inside an execution environment is not responsible for causingprogress of each of its harts; that is the responsibility of the outer execution environment. Sothe environment’s harts operate like hardware threads from the perspective of the software insidethe execution environment.

  • 4 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    An execution environment implementation might time-multiplex a set of guest harts ontofewer host harts provided by its own execution environment but must do so in a way that guestharts operate like independent hardware threads. In particular, if there are more guest harts thanhost harts then the execution environment must be able to preempt the guest harts and must notwait indefinitely for guest software on a guest hart to ”yield” control of the guest hart.

    1.3 RISC-V ISA Overview

    A RISC-V ISA is defined as a base integer ISA, which must be present in any implementation, plusoptional extensions to the base ISA. The base integer ISAs are very similar to that of the earlyRISC processors except with no branch delay slots and with support for optional variable-lengthinstruction encodings. A base is carefully restricted to a minimal set of instructions sufficient toprovide a reasonable target for compilers, assemblers, linkers, and operating systems (with addi-tional privileged operations), and so provides a convenient ISA and software toolchain “skeleton”around which more customized processor ISAs can be built.

    Although it is convenient to speak of the RISC-V ISA, RISC-V is actually a family of related ISAs,of which there are currently four base ISAs. Each base integer instruction set is characterized bythe width of the integer registers and the corresponding size of the address space and by the numberof integer registers. There are two primary base integer variants, RV32I and RV64I, described inChapters 2 and 5, which provide 32-bit or 64-bit address spaces respectively. We use the termXLEN to refer to the width of an integer register in bits (either 32 or 64). Chapter 4 describesthe RV32E subset variant of the RV32I base instruction set, which has been added to supportsmall microcontrollers, and which has half the number of integer registers. Chapter 6 sketches afuture RV128I variant of the base integer instruction set supporting a flat 128-bit address space(XLEN=128). The base integer instruction sets use a two’s-complement representation for signedinteger values.

    Although 64-bit address spaces are a requirement for larger systems, we believe 32-bit addressspaces will remain adequate for many embedded and client devices for decades to come and willbe desirable to lower memory traffic and energy consumption. In addition, 32-bit address spacesare sufficient for educational purposes. A larger flat 128-bit address space might eventually berequired, so we ensured this could be accommodated within the RISC-V ISA framework.

    The four base ISAs in RISC-V are treated as distinct base ISAs. A common question is whyis there not a single ISA, and in particular, why is RV32I not a strict subset of RV64I? Someearlier ISA designs (SPARC, MIPS) adopted a strict superset policy when increasing addressspace size to support running existing 32-bit binaries on new 64-bit hardware.

    The main advantage of explicitly separating base ISAs is that each base ISA can be opti-mized for its needs without requiring to support all the operations needed for other base ISAs.For example, RV64I can omit instructions and CSRs that are only needed to cope with the nar-rower registers in RV32I. The RV32I variants can use encoding space otherwise reserved forinstructions only required by wider address-space variants.

    The main disadvantage of not treating the design as a single ISA is that it complicatesthe hardware needed to emulate one base ISA on another (e.g., RV32I on RV64I). However,differences in addressing and illegal instruction traps generally mean some mode switch wouldbe required in hardware in any case even with full superset instruction encodings, and the differentRISC-V base ISAs are similar enough that supporting multiple versions is relatively low cost.Although some have proposed that the strict superset design would allow legacy 32-bit libraries

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 5

    to be linked with 64-bit code, this is impractical in practice, even with compatible encodings, dueto the differences in software calling conventions and system-call interfaces.

    The RISC-V privileged architecture provides fields in misa to control the unprivileged ISA ateach level to support emulating different base ISAs on the same hardware. We note that newerSPARC and MIPS ISA revisions have deprecated support for running 32-bit code unchanged on64-bit systems.

    A related question is why there is a different encoding for 32-bit adds in RV32I (ADD) andRV64I (ADDW)? The ADDW opcode could be used for 32-bit adds in RV32I and ADDD for64-bit adds in RV64I, instead of the existing design which uses the same opcode ADD for 32-bit adds in RV32I and 64-bit adds in RV64I with a different opcode ADDW for 32-bit adds inRV64I. This would also be more consistent with the use of the same LW opcode for 32-bit loadin both RV32I and RV64I. The very first versions of RISC-V ISA did have a variant of thisalternate design, but the RISC-V design was changed to the current choice in January 2011.Our focus was on supporting 32-bit integers in the 64-bit ISA not on providing compatibilitywith the 32-bit ISA, and the motivation was to remove the asymmetry that arose from havingnot all opcodes in RV32I have a *W suffix (e.g., ADDW, but AND not ANDW). In hindsight,this was perhaps not well-justified and a consequence of designing both ISAs at the same timeas opposed to adding one later to sit on top of another, and also from a belief we had to foldplatform requirements into the ISA spec which would imply that all the RV32I instructions wouldhave been required in RV64I. It is too late to change the encoding now, but this is also of littlepractical consequence for the reasons stated above.

    It has been noted we could enable the *W variants as an extension to RV32I systems toprovide a common encoding across RV64I and a future RV32 variant.

    RISC-V has been designed to support extensive customization and specialization. Each base integerISA can be extended with one or more optional instruction-set extensions, and we divide each RISC-V instruction-set encoding space (and related encoding spaces such as the CSRs) into three disjointcategories: standard, reserved, and custom. Standard encodings are defined by the Foundation,and shall not conflict with other standard extensions for the same base ISA. Reserved encodingsare currently not defined but are saved for future standard extensions. We use the term non-standard to describe an extension that is not defined by the Foundation. Custom encodings shallnever be used for standard extensions and are made available for vendor-specific non-standardextensions. We use the term non-conforming to describe a non-standard extension that uses eithera standard or a reserved encoding (i.e., custom extensions are not non-conforming). Instruction-setextensions are generally shared but may provide slightly different functionality depending on thebase ISA. Chapter 27 describes various ways of extending the RISC-V ISA. We have also developeda naming convention for RISC-V base instructions and instruction-set extensions, described indetail in Chapter 28.

    To support more general software development, a set of standard extensions are defined to provideinteger multiply/divide, atomic operations, and single and double-precision floating-point arith-metic. The base integer ISA is named “I” (prefixed by RV32 or RV64 depending on integer registerwidth), and contains integer computational instructions, integer loads, integer stores, and control-flow instructions. The standard integer multiplication and division extension is named “M”, andadds instructions to multiply and divide values held in the integer registers. The standard atomicinstruction extension, denoted by “A”, adds instructions that atomically read, modify, and writememory for inter-processor synchronization. The standard single-precision floating-point exten-sion, denoted by “F”, adds floating-point registers, single-precision computational instructions, andsingle-precision loads and stores. The standard double-precision floating-point extension, denotedby “D”, expands the floating-point registers, and adds double-precision computational instruc-tions, loads, and stores. The standard “C” compressed instruction extension provides narrower

  • 6 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    16-bit forms of common instructions.

    Beyond the base integer ISA and the standard GC extensions, we believe it is rare that a newinstruction will provide a significant benefit for all applications, although it may be very beneficialfor a certain domain. As energy efficiency concerns are forcing greater specialization, we believe itis important to simplify the required portion of an ISA specification. Whereas other architecturesusually treat their ISA as a single entity, which changes to a new version as instructions are addedover time, RISC-V will endeavor to keep the base and each standard extension constant over time,and instead layer new instructions as further optional extensions. For example, the base integerISAs will continue as fully supported standalone ISAs, regardless of any subsequent extensions.

    1.4 Memory

    A RISC-V hart has a single byte-addressable address space of 2XLEN bytes for all memory accesses.A word of memory is defined as 32 bits (4 bytes). Correspondingly, a halfword is 16 bits (2 bytes), adoubleword is 64 bits (8 bytes), and a quadword is 128 bits (16 bytes). The memory address space iscircular, so that the byte at address 2XLEN −1 is adjacent to the byte at address zero. Accordingly,memory address computations done by the hardware ignore overflow and instead wrap aroundmodulo 2XLEN .

    The execution environment determines the mapping of hardware resources into a hart’s addressspace. Different address ranges of a hart’s address space may (1) be vacant, or (2) contain mainmemory, or (3) contain one or more I/O devices. Reads and writes of I/O devices may havevisible side effects, but accesses to main memory cannot. Although it is possible for the executionenvironment to call everything in a hart’s address space an I/O device, it is usually expected thatsome portion will be specified as main memory.

    When a RISC-V platform has multiple harts, the address spaces of any two harts may be entirelythe same, or entirely different, or may be partly different but sharing some subset of resources,mapped into the same or different address ranges.

    For a purely “bare metal” environment, all harts may see an identical address space, accessedentirely by physical addresses. However, when the execution environment includes an operatingsystem employing address translation, it is common for each hart to be given a virtual addressspace that is largely or entirely its own.

    Executing each RISC-V machine instruction entails one or more memory accesses, subdivided intoimplicit and explicit accesses. For each instruction executed, an implicit memory read (instructionfetch) is done to obtain the encoded instruction to execute. Many RISC-V instructions performno further memory accesses beyond instruction fetch. Specific load and store instructions performan explicit read or write of memory at an address determined by the instruction. The executionenvironment may dictate that instruction execution performs other implicit memory accesses (suchas to implement address translation) beyond those documented for the unprivileged ISA.

    The execution environment determines what portions of the non-vacant address space are accessiblefor each kind of memory access. For example, the set of locations that can be implicitly read forinstruction fetch may or may not have any overlap with the set of locations that can be explicitlyread by a load instruction; and the set of locations that can be explicitly written by a store

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 7

    instruction may be only a subset of locations that can be read. Ordinarily, if an instructionattempts to access memory at an inaccessible address, an exception is raised for the instruction.Vacant locations in the address space are never accessible.

    Except when specified otherwise, implicit reads that do not raise an exception and that have noside effects may occur arbitrarily early and speculatively, even before the machine could possiblyprove that the read will be needed. For instance, a valid implementation could attempt to read allof main memory at the earliest opportunity, cache as many fetchable (executable) bytes as possiblefor later instruction fetches, and avoid reading main memory for instruction fetches ever again.To ensure that certain implicit reads are ordered only after writes to the same memory locations,software must execute specific fence or cache-control instructions defined for this purpose (such asthe FENCE.I instruction defined in Chapter 3).

    The memory accesses (implicit or explicit) made by a hart may appear to occur in a different orderas perceived by another hart or by any other agent that can access the same memory. This perceivedreordering of memory accesses is always constrained, however, by the applicable memory consistencymodel. The default memory consistency model for RISC-V is the RISC-V Weak Memory Ordering(RVWMO), defined in Chapter 14 and in appendices. Optionally, an implementation may adoptthe stronger model of Total Store Ordering, as defined in Chapter 24. The execution environmentmay also add constraints that further limit the perceived reordering of memory accesses. Since theRVWMO model is the weakest model allowed for any RISC-V implementation, software written forthis model is compatible with the actual memory consistency rules of all RISC-V implementations.As with implicit reads, software must execute fence or cache-control instructions to ensure specificordering of memory accesses beyond the requirements of the assumed memory consistency modeland execution environment.

    1.5 Base Instruction-Length Encoding

    The base RISC-V ISA has fixed-length 32-bit instructions that must be naturally aligned on 32-bitboundaries. However, the standard RISC-V encoding scheme is designed to support ISA extensionswith variable-length instructions, where each instruction can be any number of 16-bit instructionparcels in length and parcels are naturally aligned on 16-bit boundaries. The standard compressedISA extension described in Chapter 16 reduces code size by providing compressed 16-bit instructionsand relaxes the alignment constraints to allow all instructions (16 bit and 32 bit) to be aligned onany 16-bit boundary to improve code density.

    We use the term IALIGN (measured in bits) to refer to the instruction-address alignment constraintthe implementation enforces. IALIGN is 32 bits in the base ISA, but some ISA extensions, includingthe compressed ISA extension, relax IALIGN to 16 bits. IALIGN may not take on any value otherthan 16 or 32.

    We use the term ILEN (measured in bits) to refer to the maximum instruction length supportedby an implementation, and which is always a multiple of IALIGN. For implementations supportingonly a base instruction set, ILEN is 32 bits. Implementations supporting longer instructions havelarger values of ILEN.

    Figure 1.1 illustrates the standard RISC-V instruction-length encoding convention. All the 32-bit

  • 8 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    instructions in the base ISA have their lowest two bits set to 11. The optional compressed 16-bitinstruction-set extensions have their lowest two bits equal to 00, 01, or 10.

    Expanded Instruction-Length Encoding

    A portion of the 32-bit instruction-encoding space has been tentatively allocated for instructionslonger than 32 bits. The entirety of this space is reserved at this time, and the following proposalfor encoding instructions longer than 32 bits is not considered frozen.

    Standard instruction-set extensions encoded with more than 32 bits have additional low-order bitsset to 1, with the conventions for 48-bit and 64-bit lengths shown in Figure 1.1. Instruction lengthsbetween 80 bits and 176 bits are encoded using a 3-bit field in bits [14:12] giving the number of16-bit words in addition to the first 5×16-bit words. The encoding with bits [14:12] set to 111 isreserved for future longer instruction encodings.

    xxxxxxxxxxxxxxaa 16-bit (aa ̸= 11)

    xxxxxxxxxxxxxxxx xxxxxxxxxxxbbb11 32-bit (bbb ̸= 111)

    · · ·xxxx xxxxxxxxxxxxxxxx xxxxxxxxxx011111 48-bit

    · · ·xxxx xxxxxxxxxxxxxxxx xxxxxxxxx0111111 64-bit

    · · ·xxxx xxxxxxxxxxxxxxxx xnnnxxxxx1111111 (80+16*nnn)-bit, nnn ̸=111

    · · ·xxxx xxxxxxxxxxxxxxxx x111xxxxx1111111 Reserved for ≥192-bits

    Byte Address: base+4 base+2 base

    Figure 1.1: RISC-V instruction length encoding. Only the 16-bit and 32-bit encodings are consid-ered frozen at this time.

    Given the code size and energy savings of a compressed format, we wanted to build in supportfor a compressed format to the ISA encoding scheme rather than adding this as an afterthought,but to allow simpler implementations we didn’t want to make the compressed format mandatory.We also wanted to optionally allow longer instructions to support experimentation and largerinstruction-set extensions. Although our encoding convention required a tighter encoding of thecore RISC-V ISA, this has several beneficial effects.

    An implementation of the standard IMAFD ISA need only hold the most-significant 30 bits ininstruction caches (a 6.25% saving). On instruction cache refills, any instructions encounteredwith either low bit clear should be recoded into illegal 30-bit instructions before storing in thecache to preserve illegal instruction exception behavior.

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 9

    Perhaps more importantly, by condensing our base ISA into a subset of the 32-bit instructionword, we leave more space available for non-standard and custom extensions. In particular,the base RV32I ISA uses less than 1/8 of the encoding space in the 32-bit instruction word.As described in Chapter 27, an implementation that does not require support for the standardcompressed instruction extension can map 3 additional non-conforming 30-bit instruction spacesinto the 32-bit fixed-width format, while preserving support for standard ≥32-bit instruction-setextensions. Further, if the implementation also does not need instructions >32-bits in length, itcan recover a further four major opcodes for non-conforming extensions.

    Encodings with bits [15:0] all zeros are defined as illegal instructions. These instructions are con-sidered to be of minimal length: 16 bits if any 16-bit instruction-set extension is present, otherwise32 bits. The encoding with bits [ILEN-1:0] all ones is also illegal; this instruction is considered tobe ILEN bits long.

    We consider it a feature that any length of instruction containing all zero bits is not legal, asthis quickly traps erroneous jumps into zeroed memory regions. Similarly, we also reserve theinstruction encoding containing all ones to be an illegal instruction, to catch the other commonpattern observed with unprogrammed non-volatile memory devices, disconnected memory buses,or broken memory devices.

    Software can rely on a naturally aligned 32-bit word containing zero to act as an illegalinstruction on all RISC-V implementations, to be used by software where an illegal instructionis explicitly desired. Defining a corresponding known illegal value for all ones is more difficultdue to the variable-length encoding. Software cannot generally use the illegal value of ILEN bitsof all 1s, as software might not know ILEN for the eventual target machine (e.g., if softwareis compiled into a standard binary library used by many different machines). Defining a 32-bitword of all ones as illegal was also considered, as all machines must support a 32-bit instructionsize, but this requires the instruction-fetch unit on machines with ILEN>32 report an illegalinstruction exception rather than access fault when such an instruction borders a protectionboundary, complicating variable-instruction-length fetch and decode.

    RISC-V base ISAs have little-endian memory systems. Instructions are stored in memory witheach 16-bit parcel stored in a memory halfword. Parcels forming one instruction are stored atincreasing halfword addresses, with the lowest-addressed parcel holding the lowest-numbered bitsin the instruction specification.

    We chose little-endian byte ordering for the RISC-V memory system because little-endian sys-tems are currently dominant commercially (all x86 systems; iOS, Android, and Windows forARM). A minor point is that we have also found little-endian memory systems to be more nat-ural for hardware designers. However, certain application areas, such as IP networking, operateon big-endian data structures, and certain legacy code bases have been built assuming big-endianprocessors, so we expect that future specifications will describe big-endian or bi-endian variantsof RISC-V.

    We have to fix the order in which instruction parcels are stored in memory, independentof memory system endianness, to ensure that the length-encoding bits always appear first inhalfword address order. This allows the length of a variable-length instruction to be quicklydetermined by an instruction-fetch unit by examining only the first few bits of the first 16-bitinstruction parcel. Once we had decided to fix on a native little-endian memory system andinstruction parcel ordering, this naturally led to placing the length-encoding bits in the LSBpositions of the instruction format to avoid breaking up opcode fields.

  • 10 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    1.6 Exceptions, Traps, and Interrupts

    We use the term exception to refer to an unusual condition occurring at run time associated withan instruction in the current RISC-V hart. We use the term interrupt to refer to an externalasynchronous event that may cause a RISC-V hart to experience an unexpected transfer of control.We use the term trap to refer to the transfer of control to a trap handler caused by either anexception or an interrupt.

    The instruction descriptions in following chapters describe conditions that can raise an exceptionduring execution. The general behavior of most RISC-V EEIs is that a trap to some handler occurswhen an exception is signaled on an instruction (except for floating-point exceptions, which, inthe standard floating-point extensions, do not cause traps). The manner in which interrupts aregenerated, routed to, and enabled by a hart depends on the EEI.

    Our use of “exception” and “trap” is compatible with that in the IEEE-754 floating-point stan-dard.

    How traps are handled and made visible to software running on the hart depends on the enclosingexecution environment. From the perspective of software running inside an execution environment,traps encountered by a hart at runtime can have four different effects:

    Contained Trap: The trap is visible to, and handled by, software running inside the executionenvironment. For example, in an EEI providing both supervisor and user mode on harts,an ECALL by a user-mode hart will generally result in a transfer of control to a supervisor-mode handler running on the same hart. Similarly, in the same environment, when a hart isinterrupted, an interrupt handler will be run in supervisor mode on the hart.

    Requested Trap: The trap is a synchronous exception that is an explicit call to the executionenvironment requesting an action on behalf of software inside the execution environment. Anexample is a system call. In this case, execution may or may not resume on the hart afterthe requested action is taken by the execution environment. For example, a system call couldremove the hart or cause an orderly termination of the entire execution environment.

    Invisible Trap: The trap is handled transparently by the execution environment and executionresumes normally after the trap is handled. Examples include emulating missing instructions,handling non-resident page faults in a demand-paged virtual-memory system, or handlingdevice interrupts for a different job in a multiprogrammed machine. In these cases, thesoftware running inside the execution environment is not aware of the trap (we ignore timingeffects in these definitions).

    Fatal Trap: The trap represents a fatal failure and causes the execution environment to terminateexecution. Examples include failing a virtual-memory page-protection check or allowing awatchdog timer to expire. Each EEI should define how execution is terminated and reportedto an external environment.

    The following table shows the characteristics of each kind of trap:

    The EEI defines for each trap whether it is handled precisely, though the recommendation is tomaintain preciseness where possible. Contained and requested traps can be observed to be imprecise

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 11

    Contained Requested Invisible Fatal

    Execution terminates? N N1 N YSoftware is oblivious? N N Y Y2

    Handled by environment? N Y Y Y

    Table 1.1: Characteristics of traps. Notes: 1) termination may be requested; 2) imprecise fataltraps might be observable by software.

    by software inside the execution environment. Invisible traps, by definition, cannot be observed tobe precise or imprecise by software running inside the execution environment. Fatal traps can beobserved to be imprecise by software running inside the execution environment, if known-errorfulinstructions do not cause immediate termination.

    Because this document describes unprivileged instructions, traps are rarely mentioned. Architec-tural means to handle contained traps are defined in the privileged architecture manual, along withother features to support richer EEIs. Unprivileged instructions that are defined solely to causerequested traps are documented here. Invisible traps are, by their nature, out of scope for thisdocument. Instruction encodings that are not defined here and not defined by some other meansmay cause a fatal trap.

  • 12 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

  • Chapter 2

    RV32I Base Integer Instruction Set,Version 2.1

    This chapter describes version 2.0 of the RV32I base integer instruction set.

    RV32I was designed to be sufficient to form a compiler target and to support modern operatingsystem environments. The ISA was also designed to reduce the hardware required in a minimalimplementation. RV32I contains 40 unique instructions, though a simple implementation mightcover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that al-ways traps and might be able to implement the FENCE instruction as a NOP, reducing baseinstruction count to 38 total. RV32I can emulate almost any other ISA extension (except the Aextension, which requires additional hardware support for atomicity).

    In practice, a hardware implementation including the machine-mode privileged architecturewill also require the 6 CSR instructions.

    Subsets of the base integer ISA might be useful for pedagogical purposes, but the base hasbeen defined such that there should be little incentive to subset a real hardware implementationbeyond omitting support for misaligned memory accesses and treating all SYSTEM and FENCEinstructions as a single trap.

    Most of the commentary for RV32I also applies to the RV64I base.

    2.1 Programmers’ Model for Base Integer ISA

    Figure 2.1 shows the unprivileged state for the base integer ISA. For RV32I, the 32 x registersare each 32 bits wide, i.e., XLEN=32. Register x0 is hardwired with all bits equal to 0. Generalpurpose registers x1–x31 hold values that various instructions interpret as a collection of Booleanvalues, or as two’s complement signed binary integers or unsigned binary integers.

    There is one additional unprivileged register: the program counter pc holds the address of thecurrent instruction.

    There is no dedicated stack pointer or subroutine return address link register in the Base IntegerISA; the instruction encoding allows any x register to be used for these purposes. However, the

    13

  • 14 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    standard software calling convention uses register x1 to hold the return address for a call, withregister x5 available as an alternate link register. The standard calling convention uses registerx2 as the stack pointer.

    Hardware might choose to accelerate function calls and returns that use x1 or x5. See thedescriptions of the JAL and JALR instructions.

    The optional compressed 16-bit instruction format is designed around the assumption thatx1 is the return address register and x2 is the stack pointer. Software using other conventionswill operate correctly but may have greater code size.

    The number of available architectural registers can have large impacts on code size, perfor-mance, and energy consumption. Although 16 registers would arguably be sufficient for an integerISA running compiled code, it is impossible to encode a complete ISA with 16 registers in 16-bitinstructions using a 3-address format. Although a 2-address format would be possible, it wouldincrease instruction count and lower efficiency. We wanted to avoid intermediate instructionsizes (such as Xtensa’s 24-bit instructions) to simplify base hardware implementations, and oncea 32-bit instruction size was adopted, it was straightforward to support 32 integer registers. Alarger number of integer registers also helps performance on high-performance code, where therecan be extensive use of loop unrolling, software pipelining, and cache tiling.

    For these reasons, we chose a conventional size of 32 integer registers for the base ISA. Dy-namic register usage tends to be dominated by a few frequently accessed registers, and regfile im-plementations can be optimized to reduce access energy for the frequently accessed registers [20].The optional compressed 16-bit instruction format mostly only accesses 8 registers and hence canprovide a dense instruction encoding, while additional instruction-set extensions could supporta much larger register space (either flat or hierarchical) if desired.

    For resource-constrained embedded applications, we have defined the RV32E subset, whichonly has 16 registers (Chapter 4).

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 15

    XLEN-1 0

    x0 / zero

    x1

    x2

    x3

    x4

    x5

    x6

    x7

    x8

    x9

    x10

    x11

    x12

    x13

    x14

    x15

    x16

    x17

    x18

    x19

    x20

    x21

    x22

    x23

    x24

    x25

    x26

    x27

    x28

    x29

    x30

    x31

    XLENXLEN-1 0

    pc

    XLEN

    Figure 2.1: RISC-V base unprivileged integer register state.

  • 16 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    2.2 Base Instruction Formats

    In the base RV32I ISA, there are four core instruction formats (R/I/S/U), as shown in Figure 2.2.All are a fixed 32 bits in length and must be aligned on a four-byte boundary in memory. Aninstruction-address-misaligned exception is generated on a taken branch or unconditional jumpif the target address is not four-byte aligned. This exception is reported on the branch or jumpinstruction, not on the target instruction. No instruction-address-misaligned exception is generatedfor a conditional branch that is not taken.

    The alignment constraint for base ISA instructions is relaxed to a two-byte boundary wheninstruction extensions with 16-bit lengths or other odd multiples of 16-bit lengths are added(i.e., IALIGN=16).

    Instruction-address-misaligned exceptions are reported on the branch or jump that wouldcause instruction misalignment to help debugging, and to simplify hardware design for systemswith IALIGN=32, where these are the only places where misalignment can occur.

    31 25 24 20 19 15 14 12 11 7 6 0

    funct7 rs2 rs1 funct3 rd opcode R-type

    imm[11:0] rs1 funct3 rd opcode I-type

    imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type

    imm[31:12] rd opcode U-type

    Figure 2.2: RISC-V base instruction formats. Each immediate subfield is labeled with the bitposition (imm[x ]) in the immediate value being produced, rather than the bit position within theinstruction’s immediate field as is usually done.

    The RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same positionin all formats to simplify decoding. Except for the 5-bit immediates used in CSR instructions(Chapter 9), immediates are always sign-extended, and are generally packed towards the leftmostavailable bits in the instruction and have been allocated to reduce hardware complexity. In partic-ular, the sign bit for all immediates is always in bit 31 of the instruction to speed sign-extensioncircuitry.

    Decoding register specifiers is usually on the critical paths in implementations, and so the in-struction format was chosen to keep all register specifiers at the same position in all formats atthe expense of having to move immediate bits across formats (a property shared with RISC-IVaka. SPUR [11]).

    In practice, most immediates are either small or require all XLEN bits. We chose an asym-metric immediate split (12 bits in regular instructions plus a special load-upper-immediate in-struction with 20 bits) to increase the opcode space available for regular instructions.

    Immediates are sign-extended because we did not observe a benefit to using zero-extensionfor some immediates as in the MIPS ISA and wanted to keep the ISA as simple as possible.

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 17

    2.3 Immediate Encoding Variants

    There are a further two variants of the instruction formats (B/J) based on the handling of imme-diates, as shown in Figure 2.3.

    31 30 25 24 21 20 19 15 14 12 11 8 7 6 0

    funct7 rs2 rs1 funct3 rd opcode R-type

    imm[11:0] rs1 funct3 rd opcode I-type

    imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type

    imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode B-type

    imm[31:12] rd opcode U-type

    imm[20] imm[10:1] imm[11] imm[19:12] rd opcode J-type

    Figure 2.3: RISC-V base instruction formats showing immediate variants.

    The only difference between the S and B formats is that the 12-bit immediate field is used to encodebranch offsets in multiples of 2 in the B format. Instead of shifting all bits in the instruction-encodedimmediate left by one in hardware as is conventionally done, the middle bits (imm[10:1]) and signbit stay in fixed positions, while the lowest bit in S format (inst[7]) encodes a high-order bit in Bformat.

    Similarly, the only difference between the U and J formats is that the 20-bit immediate is shiftedleft by 12 bits to form U immediates and by 1 bit to form J immediates. The location of instructionbits in the U and J format immediates is chosen to maximize overlap with the other formats andwith each other.

    Figure 2.4 shows the immediates produced by each of the base instruction formats, and is labeledto show which instruction bit (inst[y ]) produces each bit of the immediate value.

    Sign-extension is one of the most critical operations on immediates (particularly for XLEN>32),and in RISC-V the sign bit for all immediates is always held in bit 31 of the instruction to allowsign-extension to proceed in parallel with instruction decoding.

    Although more complex implementations might have separate adders for branch and jumpcalculations and so would not benefit from keeping the location of immediate bits constant acrosstypes of instruction, we wanted to reduce the hardware cost of the simplest implementations. Byrotating bits in the instruction encoding of B and J immediates instead of using dynamic hard-ware muxes to multiply the immediate by 2, we reduce instruction signal fanout and immediatemux costs by around a factor of 2. The scrambled immediate encoding will add negligible timeto static or ahead-of-time compilation. For dynamic generation of instructions, there is somesmall additional overhead, but the most common short forward branches have straightforwardimmediate encodings.

  • 18 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    31 30 20 19 12 11 10 5 4 1 0

    — inst[31] — inst[30:25] inst[24:21] inst[20] I-immediate

    — inst[31] — inst[30:25] inst[11:8] inst[7] S-immediate

    — inst[31] — inst[7] inst[30:25] inst[11:8] 0 B-immediate

    inst[31] inst[30:20] inst[19:12] — 0 — U-immediate

    — inst[31] — inst[19:12] inst[20] inst[30:25] inst[24:21] 0 J-immediate

    Figure 2.4: Types of immediate produced by RISC-V instructions. The fields are labeled with theinstruction bits used to construct their value. Sign extension always uses inst[31].

    2.4 Integer Computational Instructions

    Most integer computational instructions operate on XLEN bits of values held in the integer registerfile. Integer computational instructions are either encoded as register-immediate operations usingthe I-type format or as register-register operations using the R-type format. The destination isregister rd for both register-immediate and register-register instructions. No integer computationalinstructions cause arithmetic exceptions.

    We did not include special instruction-set support for overflow checks on integer arithmeticoperations in the base instruction set, as many overflow checks can be cheaply implemented usingRISC-V branches. Overflow checking for unsigned addition requires only a single additionalbranch instruction after the addition: add t0, t1, t2; bltu t0, t1, overflow.

    For signed addition, if one operand’s sign is known, overflow checking requires only a singlebranch after the addition: addi t0, t1, +imm; blt t0, t1, overflow. This covers thecommon case of addition with an immediate operand.

    For general signed addition, three additional instructions after the addition are required,leveraging the observation that the sum should be less than one of the operands if and only if theother operand is negative.

    add t0, t1, t2

    slti t3, t2, 0

    slt t4, t0, t1

    bne t3, t4, overflow

    In RV64I, checks of 32-bit signed additions can be optimized further by comparing the results ofADD and ADDW on the operands.

    Integer Register-Immediate Instructions

    31 20 19 15 14 12 11 7 6 0

    imm[11:0] rs1 funct3 rd opcode

    12 5 3 5 7I-immediate[11:0] src ADDI/SLTI[U] dest OP-IMMI-immediate[11:0] src ANDI/ORI/XORI dest OP-IMM

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 19

    ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored andthe result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MVrd, rs1 assembler pseudoinstruction.

    SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU issimilar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended toXLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equalszero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs).

    ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performsa bitwise logical inversion of register rs1 (assembler pseudoinstruction NOT rd, rs).

    31 25 24 20 19 15 14 12 11 7 6 0

    imm[11:5] imm[4:0] rs1 funct3 rd opcode

    7 5 5 3 5 70000000 shamt[4:0] src SLLI dest OP-IMM0000000 shamt[4:0] src SRLI dest OP-IMM0100000 shamt[4:0] src SRAI dest OP-IMM

    Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shiftedis in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The rightshift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits);SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic rightshift (the original sign bit is copied into the vacated upper bits).

    31 12 11 7 6 0

    imm[31:12] rd opcode

    20 5 7U-immediate[31:12] dest LUIU-immediate[31:12] dest AUIPC

    LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUIplaces the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest12 bits with zeros.

    AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-typeformat. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits withzeros, adds this offset to the address of the AUIPC instruction, then places the result in registerrd.

    The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from thePC for both control-flow transfers and data accesses. The combination of an AUIPC and the12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while anAUIPC plus the 12-bit immediate offset in regular load or store instructions can access any32-bit PC-relative data address.

  • 20 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    The current PC can be obtained by setting the U-immediate to 0. Although a JAL +4instruction could also be used to obtain the local PC (of the instruction following the JAL),it might cause pipeline breaks in simpler microarchitectures or pollute BTB structures in morecomplex microarchitectures.

    Integer Register-Register Operations

    RV32I defines several arithmetic R-type operations. All operations read the rs1 and rs2 registersas source operands and write the result into register rd. The funct7 and funct3 fields select thetype of operation.

    31 25 24 20 19 15 14 12 11 7 6 0

    funct7 rs2 rs1 funct3 rd opcode

    7 5 5 3 5 70000000 src2 src1 ADD/SLT/SLTU dest OP0000000 src2 src1 AND/OR/XOR dest OP0000000 src2 src1 SLL/SRL dest OP0100000 src2 src1 SUB/SRA dest OP

    ADD performs the addition of rs1 and rs2. SUB performs the subtraction of rs2 from rs1. Overflowsare ignored and the low XLEN bits of results are written to the destination rd. SLT and SLTUperform signed and unsigned compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note,SLTU rd, x0, rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assemblerpseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise logical operations.

    SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value inregister rs1 by the shift amount held in the lower 5 bits of register rs2.

    NOP Instruction

    31 20 19 15 14 12 11 7 6 0

    imm[11:0] rs1 funct3 rd opcode

    12 5 3 5 70 0 ADDI 0 OP-IMM

    The NOP instruction does not change any architecturally visible state, except for advancing thepc and incrementing any applicable performance counters. NOP is encoded as ADDI x0, x0, 0.

    NOPs can be used to align code segments to microarchitecturally significant address boundaries,or to leave space for inline code modifications. Although there are many possible ways to encodea NOP, we define a canonical NOP encoding to allow microarchitectural optimizations as well asfor more readable disassembly output. The other NOP encodings are made available for HINTinstructions (Section 2.9).

    ADDI was chosen for the NOP encoding as this is most likely to take fewest resources toexecute across a range of systems (if not optimized away in decode). In particular, the instructiononly reads one register. Also, an ADDI functional unit is more likely to be available in a

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 21

    superscalar design as adds are the most common operation. In particular, address-generationfunctional units can execute ADDI using the same hardware needed for base+offset addresscalculations, while register-register ADD or logical/shift operations require additional hardware.

    2.5 Control Transfer Instructions

    RV32I provides two types of control transfer instructions: unconditional jumps and conditionalbranches. Control transfer instructions in RV32I do not have architecturally visible delay slots.

    Unconditional Jumps

    The jump and link (JAL) instruction uses the J-type format, where the J-immediate encodes asigned offset in multiples of 2 bytes. The offset is sign-extended and added to the address of thejump instruction to form the jump target address. Jumps can therefore target a ±1MiB range.JAL stores the address of the instruction following the jump (pc+4) into register rd. The standardsoftware calling convention uses x1 as the return address register and x5 as an alternate link register.

    The alternate link register supports calling millicode routines (e.g., those to save and restoreregisters in compressed code) while preserving the regular return address register. The registerx5 was chosen as the alternate link register as it maps to a temporary in the standard callingconvention, and has an encoding that is only one bit different than the regular link register.

    Plain unconditional jumps (assembler pseudoinstruction J) are encoded as a JAL with rd=x0.

    31 30 21 20 19 12 11 7 6 0

    imm[20] imm[10:1] imm[11] imm[19:12] rd opcode

    1 10 1 8 5 7offset[20:1] dest JAL

    The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The targetaddress is obtained by adding the sign-extended 12-bit I-immediate to the register rs1, then settingthe least-significant bit of the result to zero. The address of the instruction following the jump(pc+4) is written to register rd. Register x0 can be used as the destination if the result is notrequired.

    31 20 19 15 14 12 11 7 6 0

    imm[11:0] rs1 funct3 rd opcode

    12 5 3 5 7offset[11:0] base 0 dest JALR

    The unconditional jump instructions all use PC-relative addressing to help support position-independent code. The JALR instruction was defined to enable a two-instruction sequence tojump anywhere in a 32-bit absolute address range. A LUI instruction can first load rs1 with theupper 20 bits of a target address, then JALR can add in the lower bits. Similarly, AUIPC thenJALR can jump anywhere in a 32-bit pc-relative address range.

    Note that the JALR instruction does not treat the 12-bit immediate as multiples of 2 bytes,unlike the conditional branch instructions. This avoids one more immediate format in hardware.

  • 22 Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

    In practice, most uses of JALR will have either a zero immediate or be paired with a LUI orAUIPC, so the slight reduction in range is not significant.

    Clearing the least-significant bit when calculating the JALR target address both simplifiesthe hardware slightly and allows the low bit of function pointers to be used to store auxiliaryinformation. Although there is potentially a slight loss of error checking in this case, in practicejumps to an incorrect instruction address will usually quickly raise an exception.

    When used with a base rs1=x0, JALR can be used to implement a single instruction subrou-tine call to the lowest 2KiB or highest 2KiB address region from anywhere in the address space,which could be used to implement fast calls to a small runtime library. Alternatively, an ABIcould dedicate a general-purpose register to point to a library elsewhere in the address space.

    The JAL and JALR instructions will generate an instruction-address-misaligned exception if thetarget address is not aligned to a four-byte boundary.

    Instruction-address-misaligned exceptions are not possible on machines that support extensionswith 16-bit aligned instructions, such as the compressed instruction-set extension, C.

    Return-address prediction stacks are a common feature of high-performance instruction-fetch units,but require accurate detection of instructions used for procedure calls and returns to be effective.For RISC-V, hints as to the instructions’ usage are encoded implicitly via the register numbersused. A JAL instruction should push the return address onto a return-address stack (RAS) onlywhen rd=x1/x5. JALR instructions should push/pop a RAS as shown in the Table 2.1.

    rd rs1 rs1=rd RAS action

    !link !link - none!link link - poplink !link - pushlink link 0 pop, then pushlink link 1 push

    Table 2.1: Return-address stack prediction hints encoded in register specifiers used in the instruc-tion. In the above, link is true when the register is either x1 or x5.

    Some other ISAs added explicit hint bits to their indirect-jump instructions to guide return-address stack manipulation. We use implicit hinting tied to register numbers and the callingconvention to reduce the encoding space used for these hints.

    When two different link registers (x1 and x5) are given as rs1 and rd, then the RASis both popped and pushed to support coroutines. If rs1 and rd are the same link regis-ter (either x1 or x5), the RAS is only pushed to enable macro-op fusion of the sequences:lui ra, imm20; jalr ra, imm12(ra) and auipc ra, imm20; jalr ra, imm12(ra)

    Conditional Branches

    All branch instructions use the B-type instruction format. The 12-bit B-immediate encodes signedoffsets in multiples of 2 bytes. The offset is sign-extended and added to the address of the branchinstruction to give the target address. The conditional branch range is ±4KiB.

  • Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified 23

    31 30 25 24 20 19 15 14 12 11 8 7 6 0

    imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode

    1 6 5 5 3 4 1 7offset[12|10:5] src2 src1 BEQ/BNE offset[11|4:1] BRANCHoffset[12|10:5] src2 src1 BLT[U] offset[11|4:1] BRANCHoffset[12|10:5] src2 src1 BGE[U] offset[11|4:1] BRANCH

    Branch instructions compare two registers. BEQ and BNE take the branch if registers rs1 and rs2are equal or unequal respectively. BLT and BLTU take the branch if rs1 is less than rs2, usingsigned and unsigned comparison respectively. BGE and BGEU take the branch if rs1 is greaterthan or equal to rs2, using signed and unsigned comparison respectively. Note, BGT, BGTU,BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU,respectively.

    Signed array bounds may be checked with a single BLTU instruction, since any negative indexwill compare greater than any nonnegative bound.

    Software should be optimized such that the sequential code path is the most common path, withless-frequently taken code paths placed out of line. Software should also assume that backwardbranches will be predicted taken and forward branches as not taken, at least the first time they areencountered. Dynamic predictors should quickly learn any predictable branch behavior.

    Unlike some other architectures, the RISC-V jump (JAL with rd=x0) instruction should alwaysbe used for unconditional branches instead of a conditional branch instruction with an always-true condition. RISC-V jumps are also PC-relative and support a much wider offset range thanbranches, and will not pollute conditional-branch prediction tables.

    The conditional branches were designed to include arithmetic comparison operations between tworegisters (as also done in PA-RISC, Xtensa, and MIPS R6), rather than use condition codes(x86, ARM, SPARC, PowerPC), or to only compare one register against zero (Alpha, MIPS),or two registers only for equality (MIPS). This design was motivated by the observation that acombined compare-and-branch instruction fits into a regular pipeline, avoids additional conditioncode state or use of a temporary register, and reduces static code size and dy