system on chip architectures

SYSTEM DESIGN

Mr. A. B. Shinde

Assistant Professor,

Electronics Engineering,

PVPIT, Budhgaon.

[email protected]

mailto:[email protected]

CONCEPT OF SYSTEM

A system is a collection of elements or components that are organized for a

common purpose.

A system is a set of interacting or interdependent components forming an

integrated design.

A system has structure: it contains parts (or components) that are directly

or indirectly related to each other;

A system has behavior: it exhibits processes that fulfill its function or

purpose;

A system has interconnectivity: the parts and processes are connected by

structural and/or behavioral relationships.

2

SYSTEM

Elements of a system

Input: The inputs are said to be fed to the systems in order to get

the output.

Output: Those elements that exists in the system due to the

processing of the inputs is known as output

Processor: It is the operational component of a system which

processes the inputs.

Control: The control element guides the system. It is the decision-

making sub-system that controls the activities like governing

inputs, processing them and generating output.

Boundary and interface: The limits that identify its components,

processes and interrelationships when it interfaces with another

system.

3

IMPORTANCE OF SYSTEM ARCHITECTURES

A system architecture is the conceptual model that defines the

structure, behavior (functioning) and more views of a system.

A system architecture can comprise:

system components,

the externally visible properties of those components,

the relationships between them.

It can provide a plan from which products can be procured, and

systems developed, that will work together to implement the overall

system.

4

SYSTEM ON CHIP

System-on-a-chip (SoC or SOC) refers to integrating all components

of a computer or other electronic system into a single integrated circuit

(chip).

It may contain digital, analog, or mixed-signal

– all on one semiconductor chip.

5

SIMDSINGLE INSTRUCTION MULTIPLE DATA

6

SIMD

Single Instruction Multiple Data (SIMD), is a class of parallel computers in

Flynn's taxonomy.

In computing, SIMD is a technique employed to Achieve data level

parallelism.

7

SIMD

SIMD machines are capable of applying the

exact same instruction stream to multiple

streams of data simultaneously.

This type of architecture is perfectly suited to

achieving very high processing rates, as

the data can be split into many different

independent pieces, and the multiple

instruction units can all operate on them at

the same time.

8

For example: each of 64,000 processors in a Thinking

Machines CM-2 would execute the same instruction at the same

time so that you could do 64,000 multiplies on 64,000 pairs of

numbers at a time.

SIMD

9

SIMD Processable Patterns SIMD Unprocesable Patterns

Brightness Computation by SIMD Operations

SIMD TYPES

Synchronous (lock-step):

These systems are synchronous, meaning that they are built in such a wayas to guarantee that all instruction units will receive the same instruction atthe same time, and thus all will potentially be able to execute the sameoperation simultaneously.

Deterministic SIMD architectures:

These are deterministic because, at any one point in time, there is only oneinstruction being executed, even though multiple units may be executing it.So, every time the same program is run on the same data, using the samenumber of execution units, exactly the same result is guaranteed at every stepin the process.

Well-suited to instruction/operation level parallelism:

The “single” in single-instruction doesn’t mean that there’s only oneinstruction unit, as it does in SISD, but rather that there’s only one instructionstream, and this instruction stream is executed by multiple processing unitson different pieces of data, all at the same time, thus achieving parallelism.

10

SIMD (ADVANTAGES)

An application where the same value is being added (or subtracted) to a largenumber of data points, a common operation in many multimedia applications.

One example would be changing the brightness of an image.

To change the brightness, the R G and B values are read from memory, a value isadded (or subtracted) from them, and the resulting values are written back out tomemory.

The data is understood to be in blocks, and a number of values can be loadedall at once.

Instead of a series of instructions saying "get this pixel, now get the next pixel",a SIMD processor will have a single instruction that effectively says "get lots ofpixels―. This can take much less time than "getting" each pixel individually, likewith traditional CPU design.

If the SIMD system works by loading up eight data points at once, the addoperation being applied to the data will happen to all eight values at the sametime.

11

SIMD (DISADVANTAGES)

Not all algorithms can be vectorized.

Implementing an algorithm with SIMD instructions usually requires

human labor; most compilers don't generate SIMD instructions from a typical

C program, for instance.

Programming with particular SIMD instruction sets can involve numerous

low-level challenges.

It has restrictions on data alignment.

Gathering data into SIMD registers and scattering it to the correct

destination locations is tricky and can be inefficient.

Specific instructions like rotations or three-operand addition aren't in some

SIMD instruction sets.12

SISDSINGLE INSTRUCTION SINGLE DATA

13

SISD

This is the oldest style of computer

architecture, and still one of the most

important: all personal computers fit within this

category

Single instruction refers to the fact that there

is only one instruction stream being acted on

by the CPU during any one clock tick;

single data means, analogously, that one and

only one data stream is being employed as

input during any one clock tick.

14

SISD

In computing, SISD is a term referring to

a computer architecture in which a single

processor, (uniprocessor) executes a single

instruction stream, to operate on data stored in

a single memory.

This corresponds to the Von Neumann

Architecture.

Instruction fetching and pipelined execution of

instructions are common examples found

in most modern SISD computers.

15

CHARACTERISTICS OF SISD

Serial Instructions are executed one after the other, in lock-step; thistype of sequential execution is commonly called serial, as opposed toparallel, in which multiple instructions may be processed simultaneously.

Deterministic Because each instruction has a unique place in theexecution stream, and thus a unique time during which it and it alone isbeing processed, the entire execution is said to be

deterministic, meaning that you (can potentially) know exactly what ishappening at all times, and, ideally, you can exactly recreate the process, stepby step, at any later time.

Examples:

All personal computers,

All single-instruction-unit-CPU workstations,

Mini-computers, and

Mainframes.16

MIMDMULTIPLE INSTRUCTION MULTIPLE DATA

17

MIMD

In computing, MIMD is a techniqueemployed to achieve parallelism.

Machines using MIMD have a numberof processors that functionasynchronously and independently.

At any time, different processors maybe executing different instructions ondifferent pieces of data.

MIMD architectures may be used in anumber of application areas such ascomputer-aided design/computer-aided manufacturing, simulation,modeling, and as communicationswitches. 18

MIMD

MIMD machines can be of either

shared memory or distributed

memory categories.

Shared memory machines

may be of the bus-based,

extended or hierarchical type.

—Distributed memory machines

may have hypercube or mesh

interconnection schemes.

19

MIMD: SHARED MEMORY MODEL

The processors are all connected to a "globally available" memory,via either a software or hardware means. The operating systemusually maintains its memory coherence.

Bus-based: MIMD machines with shared memory have processors which share a

common, central memory.

Here all processors are attached to a bus which connects them tomemory.

This setup is called bus-base point where there is too muchcontention on the bus.

Hierarchical: MIMD machines with hierarchical shared memory use a hierarchy

of buses to give processors access to each other's memory.

Processors on different boards may communicate through inter-nodalbuses.

Buses support communication between boards.

With this type of architecture, the machine may support over a thousandprocessors.

20

MIMD: DISTRIBUTED MEMORY MODEL

In distributed memory MIMD machines, each processor has its ownindividual memory location. Each processor has no directknowledge about other processor's memory.

For data to be shared, it must be passed from one processor toanother as a message. Since there is no shared memory, contention isnot as great a problem with these machines.

It is not economically feasible to connect a large number of processorsdirectly to each other. A way to avoid this multitude of directconnections is to connect each processor to just a few others.

The amount of time required for processors to perform simplemessage routing can be substantial.

Systems were designed to reduce this time loss and hypercubeand mesh are among two of the popular interconnection schemes.

21

MIMD: DISTRIBUTED MEMORY MODEL

Interconnection schemes:

Hypercube interconnection network: In an MIMD distributed memory machine with a hypercube system

interconnection network containing four processors, a processor and amemory module are placed at each vertex of a square.

The diameter of the system is the minimum number of steps it takes forone processor to send a message to the processor that is the farthestaway.

So, for example, In a hypercube system with eight processors and eachprocessor and memory module being placed in the vertex of a cube, thediameter is 3. In general, a system that contains 2^N processors with eachprocessor directly connected to N other processors, the diameter of thesystem is N.

Mesh interconnection network: In an MIMD distributed memory machine with a mesh

interconnection network, processors are placed in a two- dimensional grid.

Each processor is connected to its four immediate neighbors. Wraparound connections may be provided at the edges of the mesh.

One advantage of the mesh interconnection network over thehypercube is that the mesh system need not be configured inpowers of two.

22

MIMD: CATEGORIES

The most general of all of the major categories, a MIMD machine iscapable of being programmed to operate as if it were in fact any ofthe four.

Synchronous or asynchronous MIMD instruction streams canpotentially be executed either synchronously or asynchronously, i.e.,either in tightly controlled lock-step or in a more loosely bound “do yourown thing” mode.

Deterministic or non-deterministic MIMD systems are potentiallycapable of deterministic behavior, that is, of reproducing the exact sameset of processing steps every time a program is run on the same data.

Well-suited to block, loop, or subroutine level parallelism. The morecode each processor in an MIMD assembly is given domain over, themore efficiently the entire system will operate, in general.

Multiple Instruction or Single Program MIMD-style systems arecapable of running in true “multiple-instruction” mode, with everyprocessor doing something different, or every processor can be given thesame code; this latter case is called SPMD, “Single Program MultipleData”, and is a generalization of SIMD-style parallelism.

23

MISDMULTIPLE INSTRUCTION SINGLE DATA

24

MISD

In computing, MISD is a type of parallelcomputing architecture where manyfunctional units perform different operationson the same data.

Pipeline architectures belong to this type.

Fault-tolerant computers executing thesame instructions redundantly in order todetect and mask errors, in a manner knownas task replication, may be consideredto belong to this type.

Not many instances of thisarchitecture exist, as MIMD and SIMDare often more appropriate for common dataparallel techniques. 25

MISD

Another example of a MISD

process that is carried out routinely

at United Nations.

When a delegate speaks in a

language of his/her choice, his

speech is simultaneously

translated into a number of other

languages for the benefit of

other delegates present. Thus

the delegate‘s speech (a single

data) is being processed by a

number of translators

(processors) yielding different

results.

26

MISD

MISD Examples:

Multiple frequency filters operating on a single signal stream.

Multiple cryptography algorithms attempting to crack a single

coded message.

Both of these are examples of this type of processing where

multiple, independent instruction streams are applied simultaneously

to a single data stream.

27

PIPELINING

28

PIPELINING

In computing, a pipeline is a set of data processing

elements connected in series, so that the output of one element is

the input of the next one.

The elements of a pipeline are often executed in parallel or in time-

sliced fashion.

29

PIPELINING (CONCEPT AND MOTIVATION)

Consider the washing of a car:

A car on the washing line can have only one of the three steps done at

once. After the car has its washing, it moves for drying, leaving the

washing facilities available for the next car.

The first car then moves on to polishing, the second car to drying, and a

third car begins to have its washing.

If each operation needs 30 minutes each, then finishing all three cars

when only one car can be operated at once would take (??????) minutes.

On the other hand, using the washing line, the total time to complete

all three is (?????) minutes. At this point, additional cars will come off the

assembly line.

30

PIPELINING (IMPLEMENTATIONS)

Buffered, Synchronous pipelines:

Conventional microprocessors are synchronous circuits that use buffered,synchronous pipelines. In these pipelines, "pipeline registers" are inserted in-between pipeline stages, and are clocked synchronously.

Buffered, Asynchronous pipelines:

Asynchronous pipelines are used in asynchronous circuits, and have theirpipeline registers clocked asynchronously. Generally speaking, they use arequest/acknowledge system, wherein each stage can detect when it's finished.

When a stage is finished and the next stage has sent it a "request" signal, thestage sends an "acknowledge" signal to the next stage, and a "request" signal tothe previous stage. When a stage receives an "acknowledge" signal, it clocks itsinput registers, thus reading in the data from the previous stage.

Unbuffered pipelines:

Unbuffered pipelines, called "wave pipelines", do not have registers in-betweenpipeline stages.

Instead, the delays in the pipeline are "balanced" so that, for each stage, thedifference between the first stabilized output data and the last is minimized.

31

INSTRUCTION PIPELINE

An instruction pipeline is atechnique used in the design ofcomputers and other digitalelectronic devices to increasetheir instruction throughput (thenumber of instructions that can beexecuted in a unit of time).

The fundamental idea is to splitthe processing of a computerinstruction into a series ofindependent steps, with storageat the end of each step. Thisallows the computer's controlcircuitry to issue instructions at theprocessing rate of the sloweststep, which is much faster thanthe time needed to perform allsteps at once.

32

INSTRUCTION PIPELINE

For example, the classic RISC

pipeline is broken into five stages

with a set of flip flops between

each stage.

Instruction fetch

Instruction decode and register

fetch

Execute

Memory access

Register write back

33

PIPELINING (ADVANTAGES AND DISADVANTAGES)

Pipelining does not help in all cases. An instruction pipeline is said to befully pipelined if it can accept a new instruction every clock cycle. Apipeline that is not fully pipelined has wait cycles that delay the progressof the pipeline.

Advantages of Pipelining: The cycle time of the processor is reduced, thus increasing instruction issue- rate in

most cases.

Some combinational circuits such as adders or multipliers can be made faster byadding more circuitry. If pipelining is used instead, it can save circuitry.

Disadvantages of Pipelining: A non-pipelined processor executes only a single instruction at a time. This prevents

branch delays and problems with serial instructions being executed concurrently.Consequently the design is simpler and cheaper to manufacture.

The instruction latency in a non-pipelined processor is slightly lower than in apipelined equivalent. This is due to the fact that extra flip flops must be addedto the data path of a pipelined processor.

A non-pipelined processor will have a stable instruction bandwidth. Theperformance of a pipelined processor is much harder to predict and may vary morewidely between different programs.

34

PARALLEL

COMPUTING

35

PARALLEL COMPUTING

Parallel computing is a form of computation in which many

calculations are carried out simultaneously, operating on the principle

that large problems can often be divided into smaller ones, which

are then solved concurrently ("in parallel").

There are several different forms of parallel computing:

bit-level,

instruction level,

data, and

task parallelism.

Parallelism has been employed for many years, mainly in high-

performance computing.

As power consumption by computers has become a concern in recent

years, parallel computing has become the dominant issue in

computer architecture, mainly in the form of multicore processors.36

PARALLEL COMPUTING

Computer Software is written for serial computation. To solve aproblem, an algorithm is constructed and implemented as a serialstream of instructions. Only one instruction may execute at a time—after that instruction is finished, the next is executed.

Parallel computing, on the other hand, uses multipleprocessing elements simultaneously to solve a problem.

This is accomplished by breaking the problem into independentparts so that each processing element can execute its part of thealgorithm simultaneously with the others.

The processing elements can be diverse and include resources suchas a single computer with multiple processors, several networkedcomputers, specialized hardware or any combination of the above.

.

37

TYPES OF PARALLELISM

Bit-level parallelism: From the advent of VLSI in the 1970s until about 1986, speed-up in computer

architecture was driven by doubling computer word size— the amount ofinformation the processor can manipulate per cycle. Increasing the word sizereduces the number of instructions the processor must execute to perform anoperation on variables whose sizes are greater than the length of the word.

Instruction-level parallelism: A computer program is, a stream of instructions executed by a processor.

These instructions can be re-ordered and combined into groups which are thenexecuted in parallel without changing the result of the program. This is known asinstruction-level parallelism.

Data parallelism: Data parallelism is parallelism inherent in program loops, which focuses on

distributing the data across different computing nodes to be processed in parallel.

Task parallelism: Task parallelism is the characteristic of a parallel program that "entirely

different calculations can be performed on either the same or different sets of data‖This contrasts with data parallelism, where the same calculation is performed onthe same or different sets of data.

38


Bit-level parallelism is a form of parallel computing based on increasingprocessor word size.

Increasing the word size reduces the number of instructions the processormust execute in order to perform an operation on variables whose sizesare greater than the length of the word.

For example: Consider a case where an 8-bit processor must add two 16-bit integers. The

processor must first add the 8 lower-order bits from each integer, then add the 8higher-order bits, requiring two instructions to complete a single operation. A 16-bit processor would be able to complete the operation with single instruction

Historically, 4-bit microprocessors were replaced with 8-bit, then 16-bit, then 32-bitmicroprocessors. This trend generally came to an end with the introduction of 32-bit processors, which has been a standard in general purpose computing fortwo decades. Only recently, with the advent of x86-64 architectures, have64-bit processors become commonplace.

39


Instruction-level parallelism (ILP) is a measure of how many of theoperations in a computer program can be performed simultaneously.

Consider the following program:

For Example:

1. e = a + b

2. f = c + d

3. g = e * f

Here, Operation 3 depends on the results of operations 1 and 2, so it cannotbe calculated until both of them are completed. However, operations 1 and 2do not depend on any other operation, so they can be calculatedsimultaneously.

If we assume that each operation can be completed in one unit of timethen these three instructions can be completed in a total of two units of time,giving an ILP of 3/2.

40


Instruction-level parallelism (ILP):

A goal of compiler and processor designers is to identifyand take advantage of as much ILP as possible.

Ordinary programs are typically written under a sequentialexecution model where instructions execute one after theother and in the order specified by the programmer. ILP allowsthe compiler and the processor to overlap the execution ofmultiple instructions or even to change the order in whichinstructions are executed.

How much ILP exists in programs is very application specific. Incertain fields, such as graphics and scientific computing theamount can be very large. However, workloads such ascryptography exhibit much less parallelism.

41


Data parallelism (also known as loop-level

parallelism) is a form of parallelization of computing

across multiple processors in parallel computing

environments.

Data parallelism focuses on distributing the data across

different parallel computing nodes.

In a multiprocessor system executing a single set of

instructions (SIMD), data parallelism is achieved when

each processor performs the same task on different

pieces of distributed data. In some situations, a single

execution thread controls operations on all pieces of

data.42


Data parallelism

For instance, consider a 2-processor system (CPUs A and B) ina parallel environment, and we wish to do a task on some data‗d‘. It is possible to tell CPU A to do that task on one part of ‗d‘and CPU B on another part simultaneously, thereby reducingthe duration of the execution.

The data can be assigned using conditional statements

As a specific example, consider adding two matrices. In adata parallel implementation, CPU A could add all

elements from the top half of the matrices, while CPU B couldadd all elements from the bottom half of the matrices.

Since the two processors work in parallel, the job ofperforming matrix addition would take one half the time ofperforming the same operation in serial using 51 one CPUalone.

43


Task parallelism (also known as functionparallelism and control parallelism) is a form of parallelizationof computer code across multiple processors in parallelcomputing environments.

Task parallelism focuses on distributing execution processes(threads) across different parallel computing nodes.

In a multiprocessor system, task parallelism is achieved wheneach processor executes a different thread (or process) on thesame or different data.

The threads may execute the same or different code. In thegeneral case, different execution threads communicate with oneanother as they work. Communication takes place usually topass data from one thread to the next as part of a workflow.

44


Task parallelism

As a simple example, if we are running code on a 2-processor system (CPUs "a" & "b") in a parallelenvironment and we wish to do tasks "A" and "B" , it ispossible to tell CPU "a" to do task "A" and CPU "b" to dotask 'B" simultaneously, thereby reducing the runtime ofthe execution.

The tasks can be assigned using conditionalstatements.

Task parallelism emphasizes the distributed(parallelized) nature of the processing (i.e. threads), asopposed to the data (data parallelism).

45

THANK YOU

46

system on chip architectures

Engineering

system components

system design system

system elements

electronic system

purposea system

concept of system

system input

overall system