mastering the challenge of multicore soc debugging
TRANSCRIPT
Mastering the challenge of multicore SoCdebugging
• Aaron Bauch, Sr FAE
Agenda
• Different types of multicore processing • Reasons for using multicore processors• Challenges of using multicore devices• Debugging in multicore environments• Debugging Demo• Questions
America4 offices
38 employees
Asia3 offices
24 employees
We are a dedicated team that offers superior technology and service that enable customers to create the products of today and the innovations of tomorrow.
Global organization
Europe4 offices
138 employees
Employees IAR Systems
Product / DevelopmentSales / Market / SupportAdministration
System performance
Performance is improved by...
• Compiler optimizations–Code motion–Loop unrolling–Function inlining
• Parallelism–Bit– Instruction–Data–Task
• Clock Frequency
Code motion
for(i=0;i<10;i++){
b = k * c;p[i] = b;
}
b = k * c;for(i=0;i<10;i++){
p[i] = b;}
Loop unrolling
/* copy 20 elements */for(i=0;i<20;++i){
a[i]=b[i];}
/* unrolled four times */for (i=0;i<20;i+=4){
a[i]=b[i];a[i+1]=b[i+1];a[i+2]=b[i+2];a[i+3]=b[i+3];
}
x=(x>>n)|(x<<(32-n)) MOV R0,R0,ROR R2
if ((x & 0x03) != 0)x >>= 2;
TST R0,#+0x3MOVNE R0,R0,LSR #+2
MUL R1,R0,R3ADD R2,R2,R1
MLA R2,R0,R3,R2
Recognize coding patterns
Parallelism
Some tasks are not suitable for parallelization
Parallelism
Other tasks are easily parallelized
SIMD
SIMD parallelism = Single Instruction, Multiple Data
Multicores
• Multiple cores on one chip can scale performance• Each core is a full CPU and can work independently or in concert
with other cores
Core 1
Core 2Core vs.
Different types of multicore processing
Homogenous multicore
• Two or more identical processors (cores) which can share a main memory, peripherals, interrupt controller etc.
• Each processor has its own registers and function units, and may have its own local memory or cache
Core
I/O
Local
Core
Local
Memory ...
Heterogeneous multicore
• Different cores share a main memory and peripherals• Can be used for application that need both real time performance
and signal processing capabilities
Core 1
Local
Core 2
Local
Shared memory
SMP vs. AMP
Symmetric Multi Processing (SMP):• Each core runs the same code from common memory• Requires homogenous system
Asymmetric Multi Processing (AMP):• Each core runs its own code or part of the application• Cores are independent of each other• Can be done with both homogenous and heterogeneous multicore processors
Overview
CPU Core 1
Local Cache
Sharedmemory/
Cache
CPU Core N
Local Cache
Interrupt Controlle
rI/O
Local Cache
Sharedmemory
Local Cache
CPU Core 1 CPU Core
2
Local Cache
Local Cache
CPU Core 1 CPU Core N
SMP AMP
Homogenous
Heterogeneous x
Sharedmemory/
Cache
Interrupt Controlle
rI/O
Reasons for Using Multicore
Homogeneous SMP
• High performance requirements–Max clock is 1-2GHz per core on ARM Cortex A9–More cores means more performance
• Multicore has easier communication and board layout vs. multi-device
CPU-1N MHz
CPU-1N MHz
CPU-1N MHz
CPU-2N MHz
vs.
Heterogeneous AMP
• Applications with multiple constraints, e.g.:–Throughput vs. interrupt latency–Constant sensor data only needs small core
• Allows for “offloading” of processing by function
CPU-1
CPU-1
CPU-1
CPU-2
vs.
Challenges using multicore
Multicore considerations
• In general, applications perform faster with more cores• However:
–When the application has defects, they are generally much harder to detect and correct
–Traditional procedural-based coding may not lend itself well to parallelization
Problems with multicore• Inefficient parallelization• Data bottlenecks• I/O bottlenecks• Imbalanced workload
An RTOS may help:–Distribution of tasks/threads across the cores–Load balancing–Handling of inter-processor communication–RTOS Example: ThreadX SMP from Express Logic
Multicore multitasking
Decodedata
Filterdata
Core 1 Core 2
Decodedata
Core 1
Filterdata
Decodedata
Core 2
Filterdata
Debugging in Multicore Environments
Debugging multicore processors
Software tool desired features:• Visibility of all cores• Start and stop cores simultaneously or individually• Multicore breakpoints
– BP on 1 core stops execution on all cores– BP on core A with condition on core B
• Multicore Trace–Very challenging for multicores with different Trace capabilities
ARM CoreSight™
Source: ARM Ltd.
V8.40 New Feature Highlights• Streaming Trace
– Enhanced Profiling and Code Coverage• C18 Support
– Latest C Standardo Clarifies some undefined behaviorso No compatibility issues with C11
• Full C++17 Support• Enhanced multicore support
– More than 2 core “groups”• Improved source browser
– Separate thread for dramatically enhanced performance– Enhanced diagnostic messages
• Documentation Comments– Editor recognizes doxygen format comments– Will appear in tooltips and parameter hints for variables and functions
• Performance monitor enhancements for Cortex A and R
IAR Embedded Workbench SMP Support
IAR Embedded Workbench support today:1 project and debugger instance for all corescores can be stopped/run individually or together
IAR Embedded Workbench AMP Support
Master (Cortex-A) Slave (Cortex-M4)
Start/stop core0/core1 Start/stop all cores
Demo configuration
• ST Discovery board with Dual core processor
–Core 1: M7 at 400 MHz–Core 2: M4 at 200 MHz
• Both cores running FreeRTOS–Each core has its own project–Each core running its own copy of
FreeRTOS
Demo software
• Two separate projects in the same workspace– CM7 project has a task which sends messages to tasks
running on CM4– CM4 has two instances of receive task running– CM7 has “check” task to see if things are still running
• Debugger loads both projects– Starts an additional instance of Embedded Workbench for
second debugger
AMP Setup for demo
• Pick M7 project as “master”– Arbitrary, we need one master
to launch a slave project• Set up Master project in Debug
options, Multicore tab to point to other project to launch at debug time– Check box to enable master– Fill in details of slave project
Demonstration
Summary• Multicore enables performance gains
when Moore’s law runs out...• However, multicore presents
debugging challenges• Modern hardware and software tools
can help you overcome mulicore debugging challenges
Thank you for your attention!
www.iar.com