unapproved - content.riscv.org...unapproved standardised hart to trace encoder interfaceunapproved...

28
UNAPPROVED UNAPPROVED Debug and Trace Infrastructure for RISC-V SoC Hanan Moller, UltraSoC [email protected] RISC-V Service Tools Virtual Meetup 23 April 2020

Upload: others

Post on 12-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

Debug and Trace Infrastructurefor RISC-V SoC

Hanan Moller, UltraSoC

[email protected]

RISC-V Service Tools Virtual Meetup

23 April 2020

Page 2: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

23 April 2020 2

• Overview

• RISC-V Instruction Trace

• Cycle Accurate Instruction Trace

• Holistic System

• Summary

Page 3: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• It is all about the system - not about ISAs or Cores

• Understanding today’s systems’ behaviour is hard… very hard

• Surprisingly, software sometimes does not behave as expected

• Software developers spend between 50-75% of development time debugging[1], this will only increase as systems become even more complex

• Realtime performance, safety and security are increasingly important

• In such systems non-intrusive visibility is crucial

• Processor branch trace alone is not enough

• Understanding CPU stalling behaviour is key

Overview

23 April 2020 3

[1] The Economic Impacts of Inadequate Infrastructure for Software Testing, http://www.rti.org/pubs/software_testing.pdf, 2002

Page 4: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

23 April 2020 4

RISC-V StandardisedInstruction Trace

Page 5: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Trace execution progress from a known start address by communicating deltas taken by

the program

• Deltas are typically due to instructions such as branch, jump, call and return;interrupts and exceptions are also types of deltas

• The instructions between the deltas are assumed to be executed sequentially

• True for ISAs, such as RISC-V, where all instructions are executed unconditionally

• True when instruction execution can be determined based on the program (e.g., not predicated)

• The trace needs to report only:

• Whether or not a branch was taken – destination address known from execution binary

• The destination address of taken indirect branches or jumps – destination address not known statically

Processor Branch Trace

23 April 2020 5

Page 6: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Interrupts generally occur asynchronously to the program’s execution rather than

intentionally as a result of a specific instruction or event

• Exceptions can be thought of in the same way, even though they can be typically linked back to a specific instruction address

• The decoder generally does not know where an interrupt occurs in the instruction sequence, so the trace encoder must report the address where normal program flow ceased, as well as give an indication of the asynchronous destination which may be as simple as reporting the exception type

• When an interrupt or exception occurs, or the processor is halted, the final instruction executed beforehand must be traced

Interrupts and Exceptions

23 April 2020 6

Page 7: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDStandardised Hart to Trace Encoder Interface

23 April 2020 7

Signal Function

iretire Instruction has retired

itype [ITYPELEN-1:0] Type of instruction

cause [CAUSELEN-1:0] Exception cause

tval [XLEN-1:0] Exception data

priv [PRIVLEN-1:0] Privilege mode during execution

iaddr [XLEN-1:0] The address of the instruction

This shows the mandatory interface signals.

Optional information can include the instruction context.

The interface supports side-band signals, for example, to indicate core is in reset, halted, or stalled.

CPU Core(s)

Trace Encoder

Trace Decoder

(SW)

Debug Infrastructure

Page 8: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDStandardised Hart to Trace Encoder Interface

23 April 2020 8

Signal Function

iretire Instruction has retired

itype [ITYPELEN-1:0] Type of instruction

cause [CAUSELEN-1:0] Exception cause

tval [XLEN-1:0] Exception data

priv [PRIVLEN-1:0] Privilege mode during execution

iaddr [XLEN-1:0] The address of the instruction

This shows the mandatory interface signals.

Optional information can include the instruction context.

The interface supports side-band signals, for example, to indicate core is in reset, halted, or stalled.

CPU Core(s)

Trace Encoder

Trace Decoder

(SW)

Debug Infrastructure

Page 9: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDStandardised Hart to Trace Encoder Interface

23 April 2020 9

Optional for multi-retirement CPUs for improved efficiency.

CPU Core(s)

Trace Encoder

Trace Decoder

(SW)

Debug Infrastructure

Signal Function

caretire[CARETIRE-1:0] Number of instructions retired in this block

Page 10: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• The Trace Encoder sends a packet containing one of the following:

1. Update – a branch map with optional differential address

2. Update – a branch map with optional full address

3. Update – address only (differential)

4. Synchronise - a context with or without a full current address

The above ensures an efficient packing to reduce data being routed on and subsequently transported off-chip

Standardised Trace Encoder Output

23 April 2020 10

Page 11: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDInstruction Trace Algorithm

• Formats 0 and 1 : branch map and address

• Differential or full address

• Format 2 : address only

• Format 3 : sync packet

• Subformat 0 : for when starting, resynchronizing or resuming from halt. No ecause, interrupt and tval

• Subformat 1 : for exception. All fields present

• Subformat 2 : for context change. No address, ecause, interrupt and tval.

• See spec in:https://github.com/riscv/riscv-trace-spec

23 April 2020 11

Page 12: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDBenchmark Results

23 April 2020 12

program BPI

coremark 0.1825

dhrystone 0.1816

median 0.3106

mm 0.0333

mt-matmul 0.0917

mt-vvadd 0.0789

multiply 0.1584

pmp 0.3555

qsort 0.2450

rsort 0.0042

spmv 0.1119

towers 0.0895

vvadd 0.0911

hello_world 0.0713

Average 0.1433

RISC-V Standard Compressed Branch Trace

Page 13: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Controlling when trace is generated is important

• Helps reduces volume of trace data

• Filters are required

• Using filters the following trace examples are available:

• Trace in an address range

• Start trace at an address; end trace at an address

• Trace particular privilege level

• Trace interrupt service routines

• Other examples

• Trace for fixed period of time

• Start trace when external (to the encoder) event detected

• Stop trace when an external (to the encoder) event detected

Trace Control

23 April 2020 13

Page 14: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

23 April 2020 14

Cycle Accurate Instruction Trace Using Retirement Delta Trace

Page 15: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Stated objective is to be able to determine when each instruction retires, but more

importantly when the CPU stalls and does not retire instructions

• Can be achieved by reporting the number of clock cycles between each successive retirement

• The expectation is the CPU will be able to retire one instruction every cycle most of the time, so it will also be advantageous to be able to report the number of back-to-back retirements without stall

• Based on this, the basic reported unit (or group) should be a pair of numbers: the first being the number of contiguous retirements, the second being the number of stall cycles that follow

• For CPUs that support multiple retirement, need to be able to report details of how many instructions retire per cycle as well

Retirement Delta Trace

23 April 2020 15

Page 16: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Key is to efficiently represent a sequence of count values

• Various encoding formats were investigated

• Elias Delta[2] is most efficient but for values under 32 Elias Gamma has the same or better efficiency

• Coding number of cycles so large numbers will be less frequent so coding efficiency is less important

• Gamma will result in more efficient coding when it matters

• Amortising short stalls below some threshold will also improve efficiency

Retirement Delta Trace

23 April 2020 16

[2] https://commons.wikimedia.org/wiki/File:Fibonacci,_Elias_Gamma,_and_Elias_Delta_encoding_schemes.GIF

Page 17: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Packets comprise packed groups of count tokens. Five token types are defined:

1. Token A – Elias gamma encoded count of contiguous retire or amortised stall cycles

2. Token B – Elias gamma encoded count of missed retirement opportunities

3. Token C – Elias gamma encoded count of unamortised contiguous stall cycles

4. Token D – Count of retirements in a single cycle (usually binary, though Elias gamma is an option)

5. Token E – Elias gamma encoded count of number of cycles to next non-stall cycle

• Tokens are assembled into 3 group formats:

• {A, C}

• {A, B, C}

• {D, E}

This ensures most efficient packing to reduce data being routed on and subsequently transported off-chip

Retirement Delta Trace Encoder Output

23 April 2020 17

Page 18: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

A=4, C=1(6 bits)

• Basic retirement delta for single retirement harts

• Uses packed groups of {A, C} tokens

Trace Encoder Output – Single Retirement Example

23 April 2020 18

R R R R S R R R S R R S S S S R

A=3, C=1(4 bits)

A=2, C=4(8 bits)

Group 1 Group 2 Group 3

R retire S stall

• Three groups, 18 bits in total (2 bits per instruction)

Page 19: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Amortise single stalls with retirements

• Uses groups of {A, B, C} tokens

Trace Encoder Output – Stall Amortisation Example

23 April 2020 19

R R R R S R R R S R R S S S S R

A=12, B=3, C=3(13 bits)

Group 1

R retire S stall

• One group, 13 bits in total – more efficient (1.4 bits per instruction)

• Precise location of single stalls not known – less accurate

Page 20: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Basic retirement delta for multi retirement harts (4 instructions per cycle in this example)

• Uses packed groups of {A, B, C} tokens

Trace Encoder Output – Multi Retirement Example

23 April 2020 20

R1 R4 R3 R2 S R4 R4 R4 S R1 R1 S S S S R

A=4, B=6, C=1(11 bits)

A=3, B=0, C=1(5 bits)

A=2, B=6, C=4(13 bits)

Group 1 Group 2 Group 3

R3 retire 3 instructions S stall

• Three groups, 29 bits in total (1.2 bits per instruction)

• With single stalls amortised: A=12, B=24, C=3; 19 bits (0.8 bits per instruction)

Page 21: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Per cycle detail for multi retirement harts (4 instructions per cycle in this example)

• Uses packed groups of {D, E} tokens

Trace Encoder Output – Intra-Cycle Example

23 April 2020 21

R1 R4 R3 R2 S R4 R4 R4 S R1 R1 S S S S R

D=1E=1(3b)

G1

R3 retire 3 instructions S stall

• Nine groups, 30 bits in total (1.25 bits per instruction)

• Highest precision detail of retirement activity

D=4E=1(3b)

G2

D=3E=1(3b)

G3

D=2E=2(5b)

G4

D=4E=1(3b)

G5

D=4E=1(3b)

G6

D=4E=2(5b)

G7

D=1E=1(3b)

G8

D=1E=5(7b)

G9

Page 22: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDRetirement Delta Trace Algorithm

• Shows the {A,C} token case

• Some boundary conditions omitted for clarity

• {A,B,C} and {D,E} cases are similar

23 April 2020 22

Y

Retired? C_Count > 0

Room in

Packet?C_Count++

Convert counts to Elias

Send packet

Init new packet

Add group to packet

Reset counts to 0A_Count++

Y Y

N

N

N

Page 23: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVEDBenchmark Results

23 April 2020 23

program BPI CPI

coremark 1.0120 4.0328

dhrystone 1.2741 4.6771

median 1.3347 5.1407

mm 1.2192 5.1468

mt-matmul 1.2774 5.3007

mt-vvadd 1.2966 5.1938

multiply 0.8720 3.5071

pmp 0.8538 3.3465

qsort 1.2064 4.5891

rsort 1.1924 4.4238

spmv 1.1248 4.6240

towers 1.3873 4.6516

vvadd 1.3291 5.0886

hello_world 1.0544 4.3205

Average 1.1739 4.5745

program BPI

coremark 0.1825

dhrystone 0.1816

median 0.3106

mm 0.0333

mt-matmul 0.0917

mt-vvadd 0.0789

multiply 0.1584

pmp 0.3555

qsort 0.2450

rsort 0.0042

spmv 0.1119

towers 0.0895

vvadd 0.0911

hello_world 0.0713

Average 0.1433

Retirement Delta Trace RISC-V Standard Compressed Branch Trace

Page 24: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

Instruction Property Probability Additional Stall Cycle Ranges

STOREL1 cache hit 85% 0

L2 cache hit 10% 5 … 10

Cache Miss (posted-write)

5% 20 … 50

LOADL1 cache hit 85% 0

L2 cache hit 10% 5 … 10

Cache Miss (DRAM fetch)

5% 100 … 200

FENCE … always 100 … 1,000

WFI … always 0 … 20,000

I$ MISS … 2% 0 … 200

Pipeline Flush … 2% 3 … 5

Profile of Instructions Used

23 April 2020 24

Page 25: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

23 April 2020 25

Holistic System

Page 26: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

xtensa

Monitoring, Reliability, and Security for the Whole SoC

23 April 2020 26

Interconnect (AXI, ACE, ACE-lite, OCP, CHI NoC)

GPUDRAM

controllerCustom

Logic

BusMon

Trace Receiver

PAMTrace

EncoderPAM

Static Instrumentation

DMAStatus

Monitor

Message Engine Message Engine Message Engine

Message Engine

AXIComm

JTAGComm

USB 2Comm

AuroraComm

Portfolio of Analytic Modules

Family ofCommunicators

Flexible & Scalable

Message Fabric

System Block

UltraSoC IP

DSP

SystemMemory

Buffer

AnalysisSub

System

Bus Sentinel

UltraSoC IP With Lock

EBCComm

USB 3 controller

ETH controller

Lockstep Manager

Bus Sentinel

Page 27: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED• Today’s systems are too hard to understand with yesterday’s tools

• The systems must do what they were designed to do: safely and securely

• Understanding system behaviour is key : both in-field and realtime

• One dimension to this is to understand where and how CPUs stall

• An encoder providing this has been presented

• A number of different filtering and triggering schemes are supported

• Coupling this with a holistic non-intrusive monitoring infrastructure provides the means of understanding complete SoC behaviour

Summary

23 April 2020 27

Page 28: UNAPPROVED - content.riscv.org...UNAPPROVED Standardised Hart to Trace Encoder InterfaceUNAPPROVED 23 April 2020 7 Signal Function iretire Instruction has retired itype [ITYPELEN-1:0]

UNAPPROVED

UNAPPROVED

Contact details:

[email protected]@UltraSoC

23 April 2020 28