MPSoC . . .
From System Specification to Hardware/Software Implementation
Frédéric Rousseau, Frédéric Pétrot{[email protected]}
TIMA Laboratory 2
Outline
Definitions and basic concepts Hardware and Software MPSoC Architecture Traditional design flow Abstraction layers Conclusion
TIMA Laboratory 3
Definitions (1)
Systems: the meaning depends on the context of discourse • "An aggregation or assemblage of things so combined by nature or
man as to form an integral of complex whole"(Encyclopedia America)
• "A regularly interacting or independent group of items forming a unified whole"(Webster's Dictionary)
• "A combination of components that act together to perform a function not possible with any of the individual parts"(IEEE std Dictionary of Electrical & Electronic Terms)
TIMA Laboratory 4
Definitions (2)
System-on-Chip (SoC)• System-on-a-chip or system on chip (SoC or SOC) refers to
integrating all components of a computer or other electronic system into a single integrated circuit (chip). It may contain digital, analog, mixed-signal, and often radio-frequency functions – all on one chip. A typical application is in the area of embedded systems (Wikipedia)
Embedded systems• An embedded system is a special-purpose computer system designed
to perform one or a few dedicated functions, often with real-time computing constraints. It is usually embedded as part of a complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such as a personal computer, can do many different tasks depending on programming (Wikipedia)
TIMA Laboratory 5
Definitions (3)
MPSoC (Multiprocessor SoC) (A. Jerraya & W. Wolf)• MPSoC are the latest incarnation of very large-scale integration
(VLSI) technology (about 109 of transistors in 2008). It is simply a system-on-chip that contains multiple instruction-set processors (CPU)
• In practice, most SoCs are MPSoCs because it is too difficult to design a complex system-on-chip without making use of multiple CPU
Heterogeneous MPSoC• MPSoC with CPU of different kinds.
Heterogeneity makes the programming task quite different from programming CMP architectures
TIMA Laboratory 6
MPSoC caracteristics
The fact that an MPSoC is a multiprocessor means• software design is an inherent part of the overall system design• parallelism is required to exploit the available computation
ressources
MPSoC design is interesting and challenging as it is a mix of hardware and software design disciplines• different from usual parallel programming• take benefit from integration: bandwidth, latency, wider
memory accesses
TIMA Laboratory 7
Example of MPSoC and applications
Emotion Engine from the Sony Playstation 2• 3 processors (general purpose CPU, 2 vector
processing units) CELL processor from Sony, Toshiba, IBM
(Playstation 3)• 9 processors (general purpose CPU, 8 processing
elements) Nomadik (from ST) for Nokia mobile phone ST7200 (from ST) for DVD or HDTV
• 5 processors (general purpose CPU, 4 digital signal processors)
DaVinci (Texas Instrument) for cameras• 3 processors (general purpose CPU, 2 digital signal
processors) Diopsis D940 (ATMEL) for Massive parallel
processor system (Petaflop)• 3 processors (general purpose CPU, 1 DSP VLIW, 1
network processor) × 2048
TIMA Laboratory 8
Software is running on MPSoC
Between 3 to 9 processors embedded for HDTV, SetTopBox, DVD, mobile and gaming applications
Each processor executes software• Code size for HDTV or DVD applications: 1 million of lines• Code size for iPod: 20 millions of lines• Human resources for DVD application: 100 man/year
TIMA Laboratory 9
Challenges in MPSoC design
Hardware design Software design Validation of SW and HW parts (here: simulation) HW/SW integration
With the following constraints• Timing constraints (real time computations)• Energy efficient• Area efficient• I/O connection capabilities• . . .
TIMA Laboratory 10
MPSoC Challenges: feature size
TIMA Laboratory 11
MPSoC Chalenges: Number of PEs
TIMA Laboratory 12
Outline
Definitions and basic concepts MPSoC Architecture (Hardware and Software) Traditional design flow Abstraction layers Conclusion
TIMA Laboratory 13
MPSoC generic architecture
Hardware is composed of:• CPUs• Memory• IP (Intellectual Properties)• Communication network• Communication mechanisms (DMA)• Network interface• Peripherals
Software is running on this architecture• Parallel• Relying on hardware support for synchronization, communication,
coherency, …
SoftwareApplicationSoftware
Application
MEMMEM
PeripheralsI/O
PeripheralsI/O
DMADMA
IPIP
CPUCPU
NetworkinterfaceNetworkinterface
TIMA Laboratory 14
MPSoC components (1)
CPU• General purpose CPU, DSP (VLIW), ASIP
Suppliers and/or processor family: ARM, MIPS, SPARC, POWERPC, INTEL, MOTOROLA, ST, TENSILICA, ATMEL, …
Memory• Local or global, shared or private, distributed, caches• Different types of technology: ROM, DRAM, SRAM, SDRAM, FLASH, …• Different types of use: data and program memory, caches, Scratch PAD
IP (Intellectual Properties)• Specific components that accelerates a given task provided by IP vendors
Peripherals• Specific or standard I/O (Serial or parallel ports, USB, …)
Implicit components• Interrupt controller, Memory controller, … MEMMEM
PeripheralsI/O
PeripheralsI/O
DMADMA
IPIP
CPUCPU
NetworkinterfaceNetworkinterface
TIMA Laboratory 15
MPSoC components (2)
Simple communication network• Buses (AHB, STbus, …)
Powerfull communication network• NoC (Network on Chip)• A NoC interconnects processors in the SoC
Network Interface• Component able to make physical and structural adaptation between a
communication network and another component (CPU, IP, communication network, …)
DMA (Direct Memory Access)• DMA allows certain hardware subsystems to access system memory for reading
and/or writing independently of the CPU. It is commonly used by disk drive controller, graphics and sound cards, and in MPSoC for intra-chip communication
Implicit components• Arbiter, …
MEMMEM
PeripheralsI/O
PeripheralsI/O
DMADMA
IPIP
CPUCPU
NetworkinterfaceNetworkinterface
TIMA Laboratory 16
NoC: Example of the Spidergon (ST)
High-speed, low-power, small silicon area, packet-based communications protocol
In the Spidergon topology, all of the IP blocks are arranged in a ring and each IP block is connected to its clockwise and its counter-clockwise neighbour as in a simple polygonal ring topology. In addition, however, each IP block is also connected directly to its diagonal counterpart in the network, which allows the routing algorithm to minimize the number of nodes that a data packet has to traverse before reaching its destination (from www.st.com)
Strategy of distributed routing
TIMA Laboratory 17
Example of MPSoC architecture: CELL
• IBM, Sony (Play Station 3)– 241 Millions of transistors, technology 90 nm and 65 nm, 3.2 GHz
TIMA Laboratory 18
Example of MPSoC architecture: STi7200
STi7200: Triple display, HDTV set-top box, dual decoder for H.264 and VC-1 (150 millions de transistors)
TIMA Laboratory 19
Example of MPSoC architecture: D940
• Diopsis D940– ARM 9 (200 MHz)– mAgicV DSP
• 10 arithmetic op/cycle• 1.8 GFLOPS (120 MHz)• 1.7 Mbits of on-chip SRAM
– A lot of peripherals• DBGU, USB, Ethernet, …
TIMA Laboratory 20
Example of MPSoC architecture: D940++
• Evolution of the D940– 9 tiles (1 RDT-tile with 1 DSP + 1 ARM9, 8 DET-tiles with 1 DSP)– A Network Processor (DNP) is inserted between each tile and the NoC – A Spidergon NoC is added– 6 DNP for extra communications– DXM for external memory
TIMA Laboratory 21
Software architecture: HdS
We are used to describing software by layers• Application (set of tasks)• Hardware dependant Software (HdS)
Operating System (OS) and communication primitives
Hardware Abstraction Layer (HAL)
• The HdS is responsible to provide specific services to the application layer, and hides the characteristics of the hardware
List of services: scheduling the application tasks, communication between the different tasks, external communication with other subsystems, hardware resources management and control, …
It facilitates the portability of the application (making the application independent of the HW architecture)
Comm OS
HAL
Application
HdS
TIMA Laboratory 22
Software architecture: The role of layers
Application• The application layer may be a multi-tasking description or a single task function of the
application targeted to be executed on the software (processor) subsystem• A task or thread is a lightweight process that runs sequentially • Multiple tasks can be executed in parallel by a single CPU or by multiple CPUs • On a single CPU, the multithreading generally occurs by time slicing, wherein a single
processor switches between different threads. This is managed by an Operating System Operating System (OS)
• The OS manages the sharing of the resources of the architecture. It is responsible for the initialization and management of the application tasks and communication between them. It provides services such as tasks scheduling, context switch, synchronization and interrupt management
Communication• This layer is responsible to manage the I/O operations and more generally the interaction with
the hardware components and the other subsystems. It may include different communication protocols, such as fifo (first-in-first-out) implemented in software, or communication using dedicated hardware components (DMA)
HAL (Hardware Abstraction layer)• The HAL provides a unique programming interface to manipulate hardware devices• The HAL is a thin software layer which totally depends on the type of processor that will
execute the software, but also depends on the hardware resources interacting with the processor.
TIMA Laboratory 23
Software architecture: The need of API
Application Programming Interface (API)• It is a set of functions, procedures, methods or classes that an operating
system, library or service provides to support requests made by programs (wikipedia)
• API facilitates the portability of each layer which uses API• HdS API
POSIX
• HAL API List of functions: context_switch(), boot() …
Task 1
HDS API
Comm OS
HAL API
HAL
Task 2 Task n
HdS
Applicationtasks
TIMA Laboratory 24
Lets now come back on earth!
Implementation of a KPN communication channelgiven the pthread manuals and the dpn header file• implement a SW FIFO communication API
FIFO have a single producer and a single consumer FIFO are lossless Initialization and accesses should follow the given SS1 prototypes
• implement a HW/SW FIFO communication API same constraints as above producer is first sw, consumer hw, and then opposite
I'm afraid you'll need me, ...
TIMA Laboratory 25
Solution
RTFM : pthread_create, pthread_mutex_lock/unlock, pthread_cond_wait/signal/broadcast, sem_wait/post
Excellent pthread tutorial at : https://computing.llnl.gov/tutorials/pthreads
Hope I had one like that when learning it Should think of how read and write occur concurrently
• reading will awake the writer, and stop when empty• writing will awake the reader and stop when full• should be able to sustain a constant throughput on parallel
hardware
TIMA Laboratory 26
Solution
static int channelRead(Channel *c, void *buf, unsigned long rsize){ int n = rsize*c->cellSize; while (n) { int p = c->status; int m = n>p? p : n; if (m <= c->bsize – c->readp) memcpy(buf, c->data+c->readp, m); else {
int end = c->bsize - c->readp; memcpy(buf, c->data + c->readp, end); memcpy(buf + end, c->data, m - end); } buf += m; c->readp = (c->readp + m) % c->bsize; n -= m; pthread_mutex_lock(&c->mutex); c->status -= m; pthread_cond_signal(&c->cond); if (c->status == 0 && n != 0) pthread_cond_wait(&c->cond, &c->mutex); pthread_mutex_unlock(&c->mutex); } return rsize;}
TIMA Laboratory 27
Solution
int channelWrite(Channel *c, void *buf, unsigned long wsize){ int n = wsize*c->cellSize; while (n) { int p = c->bsize - c->status; int m = n>p? p : n; if (m <= c->bsize - c->writep) memcpy(c->data + c->writep, buf, m); else { int end = c->bsize - c->writep; memcpy(c->data + c->writep, buf, end); memcpy(c->data, buf + end, m - end); } buf += m; c->writep = (c->writep+m)%c->bsize; n -= m; pthread_mutex_lock(&c->mutex); c->status += m; pthread_cond_signal(&c->cond); if (c->bsize - c->status == 0 && n != 0) pthread_cond_wait(&c->cond, &c->mutex); pthread_mutex_unlock(&c->mutex); } return wsize;}
TIMA Laboratory 28
Solution
int channelReadHS1(Channel *c, void *buf, unsigned long rsize){ int n = rsize * c->cellSize; int size = n; while (1) { int p = rkadr(c->producerAddress+FifoStatus)*c->cellSize; int m = n>p? p : n; int i; switch (c->cellSize) { /* Adapapt size here */ default: for (i=(size-n)/4; i<(m+size-n)/4; i++) ((int *)buf)[i]=rkadr(c->producerAddress); break; } n -= m; if (n==0)break; wkadr(c->producerAddress|FifoThreshold)= ((n/c->cellSize)>c->depth)? c->depth : (n/c->cellSize); unmaskInterrupt(c->producerAddress, c->producerInterrupt); sem_wait(&c->sem); } return rsize;}
TIMA Laboratory 29
Solution
void HS1InterruptHandler(void *arg){ Channel *c = (Channel *)arg;
maskInterrupt(c->producerAddress, c->producerInterrupt); sem_post(&((Channel *)c)->sem);}
TIMA Laboratory 30
Outline
Definitions and basic concepts MPSoC Architecture (Hardware and Software) Traditional design flow Abstraction layers Conclusion
TIMA Laboratory 31
Traditional overall design flow
Specification• Behavior, constraints, performances, …
Hardware/Software partitioning• To find the best trade-off between HW part and SW part
HW: Performance, energy consumption SW: Flexibility, lower cost
Concurrent and separated HW and SW design• HW: architecture and components design or reuse• SW: parallelization, layers design• Implicit steps: validation (by simulation)
Difficulty of the HW/SW integration• The software can be validated only when HW architecture
is ready• A part of the software is closely linked with the underlying
hardware architecture (HAL usually written in assembly language)
Hardware/Softwarepartitioning
Specification
Hardwaredesign
Softwaredesign
Hardware/SoftwareIntegration
MPSoC
TIMA Laboratory 32
Traditional overall design flow: in the real life
The hardware architecture already exists• Too long to design, too expensive for one application
Only designed by silicon vendors (ATMEL, ST, NXP, IBM, …)
• It is build to support a set of applications With programmable devices (CPU) With configurable features (caches, external memory, …)
We can focus on the SW design and validation• For each layer (HAL is usually provided with the HW
architecture)• By simulation and prototyping• We need to validate each SW layer separately
TIMA Laboratory 33
Outline
Definitions and basic concepts MPSoC Architecture (Hardware and Software) Traditional design flow Abstraction layers Conclusion
TIMA Laboratory 34
The need of abstraction levels
An abstraction level is a way of hiding the implementation details of a particular set of functionality• Indeed, not all details are requested during all the design
process For example, synchronization method (polling or interrupts) may be
decided later in the design process, as well as memory mapping, protocols, …
• A design flow usually includes different abstraction levels Each level takes into account only characteristics needed Refinement is the process to go from one abstraction level to a lower
abstraction level
TIMA Laboratory 35
Abstraction levels in SW design for MPSoC
We define 4 abstraction levels in SW design for MPSoC• System architecture
SW: A set of functions (application) communicating HW: High level description of the architecture Interest: Validation of the functionality
• Virtual architecture SW: A set of tasks for each processor HW: Description of the architecture + communication mechanisms (inter and intra
processor subsystem) Interest: Validation of the partitioning and mapping
• Transaction accurate architecture SW: A set of tasks for each processor + OS HW: Description of the architecture + communication mechanisms (inter processor
subsytem) Interest: Validation of the OS and communication (intra processor subsystem)
• Cycle Accurate SW: Full SW with all layers HW: Full HW architecture Interest: Validation of the HAL
TIMA Laboratory 36
System Architecture
Characteristics• SW: set of tasks or functions• HW: A set of components interconnected• Mapping: to define for each task or
function, on which processing unit (CPU) it is executed
Possible implementation• SW: Kahn process network, …• HW: XML, Matlab, … SW-
SS1 T2T1 T3SW-SS2
ApplicationSW-SS1 SW-SS2
TIMA Laboratory 37
Virtual Architecture
Characteristics• SW: Final application code & HdS API• HW: Explicit communication (done by the HW model)• Explicit mapping to execution subsystems• Implicit execution models & task control
Possible implementation• C code for tasks• SystemC for the HW
Virtual architecture
Abstract CPU SS1
Abstract CPU SS2HW SS
Interconnect
Abstract CPU SS1
Abstract CPU SS2HW SS
Interconnect
T1
HDS API
T1
HDS API
T2 T3
HDS API
T2 T3
HDS API
Task code of T1 static int B[10], C[20], D[10];void task_T1( ) { while(1) { recv_data (reg_ch, B, 10); F1(B,C); F2(C,D); send_data (gmem_ch, D, 10); }}
Communication API
Communication API
Memory
Computation code
void recv_data (REG_Ch* ch, void* dst, int size) { dst = ch->read (ch->address,size);}
class REG_Ch : public sc_prim_channel {word *buffer;public: word * read (unsigned int addr,int size) { for (i=0; i<size; i++)
*(ret+i)=*(buffer + addr +i); return ret; }…};
Communication primitive
TIMA Laboratory 38
Transaction Accurate Architecture
Characteristics• SW: Explicit OS, specific I/O, HAL API• HW: Explicit communication and peripherals &
abstract computation model of CPU
Possible implementation• C code for tasks and OS• SystemC for the HW
Transaction accurate architecture
T1 T2 T3
Comm OS
HAL API
HDS API
Comm OS
HAL API
HDS API
T1 T2 T3
Comm OS
HAL API
HDS API
Comm OS
HAL API
HDS API
CPU-SS2CPU-SS1Abstract
CPU1
PeriphInterface
Memory HW SS
Abstract CPU2
PeriphInterface
Memory
Interconnect
CPU-SS2CPU-SS1Abstract
CPU1
PeriphInterface
Memory HW SS
Abstract CPU2
PeriphInterface
Memory
Interconnect
Example of TA SW code
void __schedule(void) { int old_tid=cur_tid; cur_tid=new_tid; __cxt_switch(old_tid.cxt,cur_tid.cxt); …}
channel ch_fifo,ch_reg;void task_t2(void ) {… recv_data (ch_fifo,B, 10); recv_data (ch_reg,C,20); …}
main.c Communication SW
void recv_data (ch,dst,size) {switch (ch.protocol){case FIFO: If (ch.state==EMPTY) __schedule(); …case REG: dst = _io_read_reg (ch.addr,size);
extern void task_t2 (void);void __start (void){…create_task(task_t2); … }
task_T2.c OS
TIMA Laboratory 39
Cycle A(ccurate/pproximate) Bit Accurate
Characteristics• SW: Explicit HAL functions (in assembly language)• HW: Explicit execution models (CPU,
communication)• HdS implements com. API & task control over OS
and architecture Possible implementation
• SystemC and C code• Simulator (executable model) and C code
CPU-SS1CPU1 ISS
PeriphInterface
Memory HW SS
CPU2ISS
PeriphInterface
Memory
CPU-SS2
Interconnect
T1
HDS API
Comm OS
HAL APIHAL
T2
HDS API
Comm OS
HAL API
T3
HAL
CPU-SS1CPU1 ISS
PeriphInterface
Memory HW SS
CPU2ISS
PeriphInterface
Memory
CPU-SS2
Interconnect
CPU-SS1CPU1 ISS
PeriphInterface
Memory HW SS
CPU2ISS
PeriphInterface
Memory
CPU-SS2
Interconnect
T1
HDS API
Comm OS
HAL APIHAL
T2
HDS API
Comm OS
HAL API
T3
HAL
T1
HDS API
Comm OS
HAL APIHAL
T1
HDS API
Comm OS
HAL APIHAL
T2
HDS API
Comm OS
HAL API
T3
HAL
T2
HDS API
Comm OS
HAL API
T3
HAL
Virtual prototype
/* Context switch routine for ARM architecture*/void __cxt_switch(int old_fid_cxt, int cur_fid_cxt) { __asm { STMIA r0!, {r0 – r14}; save the old context registers
MRS r5, cpsr ; we get the cpsrMRS r4, spsr ; and the spsr . . .LDMIA r1, {r0, r14} ; load the new context registersMOV PC, Lr ; and we branch}
}
/*Context switch routine for MIPS R3000 architecture*/void __cxt_switch(int old_fid_cxt, int cur_fid_cxt) { __asm {
SW $at, 4*1($a0). . . # Save the old context registers SW $ra, 4*31($a0) LW $at, 4*1($a1) . . . # Load the new context registersLW $ra, 4*31($a1)}
}
TIMA Laboratory 40
Outline
Definitions and basic concepts MPSoC Architecture (Hardware and Software) Traditional design flow Abstraction layers Conclusion
TIMA Laboratory 41
Conclusion
Nowadays, most of electronic systems include at least one MPSoC
While the HW architecture is designed by silicon vendors, user applications are developed by multimedia application providers
Software development is a very long and tedious process for nowadays applications
Different abstraction levels required to order to develop and validate software layers early
Slides available soon at • http://users-tima.imag.fr/sls/petrot/