mastering the challenge of multicore soc debugging

Post on 27-Dec-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mastering the challenge of multicore SoCdebugging

• Aaron Bauch, Sr FAE

Agenda

• Different types of multicore processing • Reasons for using multicore processors• Challenges of using multicore devices• Debugging in multicore environments• Debugging Demo• Questions

America4 offices

38 employees

Asia3 offices

24 employees

We are a dedicated team that offers superior technology and service that enable customers to create the products of today and the innovations of tomorrow.

Global organization

Europe4 offices

138 employees

Employees IAR Systems

Product / DevelopmentSales / Market / SupportAdministration

System performance

Performance is improved by...

• Compiler optimizations–Code motion–Loop unrolling–Function inlining

• Parallelism–Bit– Instruction–Data–Task

• Clock Frequency

Code motion

for(i=0;i<10;i++){

b = k * c;p[i] = b;

}

b = k * c;for(i=0;i<10;i++){

p[i] = b;}

Loop unrolling

/* copy 20 elements */for(i=0;i<20;++i){

a[i]=b[i];}

/* unrolled four times */for (i=0;i<20;i+=4){

a[i]=b[i];a[i+1]=b[i+1];a[i+2]=b[i+2];a[i+3]=b[i+3];

}

x=(x>>n)|(x<<(32-n)) MOV R0,R0,ROR R2

if ((x & 0x03) != 0)x >>= 2;

TST R0,#+0x3MOVNE R0,R0,LSR #+2

MUL R1,R0,R3ADD R2,R2,R1

MLA R2,R0,R3,R2

Recognize coding patterns

Parallelism

Some tasks are not suitable for parallelization

Parallelism

Other tasks are easily parallelized

SIMD

SIMD parallelism = Single Instruction, Multiple Data

Multicores

• Multiple cores on one chip can scale performance• Each core is a full CPU and can work independently or in concert

with other cores

Core 1

Core 2Core vs.

Different types of multicore processing

Homogenous multicore

• Two or more identical processors (cores) which can share a main memory, peripherals, interrupt controller etc.

• Each processor has its own registers and function units, and may have its own local memory or cache

Core

I/O

Local

Core

Local

Memory ...

Heterogeneous multicore

• Different cores share a main memory and peripherals• Can be used for application that need both real time performance

and signal processing capabilities

Core 1

Local

Core 2

Local

Shared memory

SMP vs. AMP

Symmetric Multi Processing (SMP):• Each core runs the same code from common memory• Requires homogenous system

Asymmetric Multi Processing (AMP):• Each core runs its own code or part of the application• Cores are independent of each other• Can be done with both homogenous and heterogeneous multicore processors

Overview

CPU Core 1

Local Cache

Sharedmemory/

Cache

CPU Core N

Local Cache

Interrupt Controlle

rI/O

Local Cache

Sharedmemory

Local Cache

CPU Core 1 CPU Core

2

Local Cache

Local Cache

CPU Core 1 CPU Core N

SMP AMP

Homogenous

Heterogeneous x

Sharedmemory/

Cache

Interrupt Controlle

rI/O

Reasons for Using Multicore

Homogeneous SMP

• High performance requirements–Max clock is 1-2GHz per core on ARM Cortex A9–More cores means more performance

• Multicore has easier communication and board layout vs. multi-device

CPU-1N MHz

CPU-1N MHz

CPU-1N MHz

CPU-2N MHz

vs.

Heterogeneous AMP

• Applications with multiple constraints, e.g.:–Throughput vs. interrupt latency–Constant sensor data only needs small core

• Allows for “offloading” of processing by function

CPU-1

CPU-1

CPU-1

CPU-2

vs.

Challenges using multicore

Multicore considerations

• In general, applications perform faster with more cores• However:

–When the application has defects, they are generally much harder to detect and correct

–Traditional procedural-based coding may not lend itself well to parallelization

Problems with multicore• Inefficient parallelization• Data bottlenecks• I/O bottlenecks• Imbalanced workload

An RTOS may help:–Distribution of tasks/threads across the cores–Load balancing–Handling of inter-processor communication–RTOS Example: ThreadX SMP from Express Logic

Multicore multitasking

Decodedata

Filterdata

Core 1 Core 2

Decodedata

Core 1

Filterdata

Decodedata

Core 2

Filterdata

Debugging in Multicore Environments

Debugging multicore processors

Software tool desired features:• Visibility of all cores• Start and stop cores simultaneously or individually• Multicore breakpoints

– BP on 1 core stops execution on all cores– BP on core A with condition on core B

• Multicore Trace–Very challenging for multicores with different Trace capabilities

ARM CoreSight™

Source: ARM Ltd.

V8.40 New Feature Highlights• Streaming Trace

– Enhanced Profiling and Code Coverage• C18 Support

– Latest C Standardo Clarifies some undefined behaviorso No compatibility issues with C11

• Full C++17 Support• Enhanced multicore support

– More than 2 core “groups”• Improved source browser

– Separate thread for dramatically enhanced performance– Enhanced diagnostic messages

• Documentation Comments– Editor recognizes doxygen format comments– Will appear in tooltips and parameter hints for variables and functions

• Performance monitor enhancements for Cortex A and R

IAR Embedded Workbench SMP Support

IAR Embedded Workbench support today:1 project and debugger instance for all corescores can be stopped/run individually or together

IAR Embedded Workbench AMP Support

Master (Cortex-A) Slave (Cortex-M4)

Start/stop core0/core1 Start/stop all cores

Demo configuration

• ST Discovery board with Dual core processor

–Core 1: M7 at 400 MHz–Core 2: M4 at 200 MHz

• Both cores running FreeRTOS–Each core has its own project–Each core running its own copy of

FreeRTOS

Demo software

• Two separate projects in the same workspace– CM7 project has a task which sends messages to tasks

running on CM4– CM4 has two instances of receive task running– CM7 has “check” task to see if things are still running

• Debugger loads both projects– Starts an additional instance of Embedded Workbench for

second debugger

AMP Setup for demo

• Pick M7 project as “master”– Arbitrary, we need one master

to launch a slave project• Set up Master project in Debug

options, Multicore tab to point to other project to launch at debug time– Check box to enable master– Fill in details of slave project

Demonstration

Summary• Multicore enables performance gains

when Moore’s law runs out...• However, multicore presents

debugging challenges• Modern hardware and software tools

can help you overcome mulicore debugging challenges

Thank you for your attention!

www.iar.com

top related