simd architecture

TERM PAPER

“SIMD

ARCHITECTURE”Submitted By:-

Nancy Mahajan

Roll No.-RB1801A22

Reg. No.-10809333

B.Tech. Cse

Submitted to:-

Mr. Vijay Garg

CSE Dept.

ACKNOWLEDGEMENT

First n foremost I want to thanks my electrical sciences Teacher "Mr. Vijay

Garg" for giving me a term paper on ”SIMD Architecture”. Such term papers

enhance our capabilities, mental ability and keep us up-to-date about the

related topic and subject.

Secondly I would like to thanks the whole library faculty of "Lovely

Professional University" for helping me preparing the project.

In the last I would like to thanks my parents for cooperating and helping in

everyway throughout the project.

TABLE OF CONTENTS

S.No. TOPIC PAGE No.

1 Introduction 4-7

2 SIMD Operations 8

3 Types Of SIMD Architecture

9-11

4 advantages 11-13

5 Disadvantages 13-14

6 Bibliography 15

IntroductionSIMD(Single-Instruction Stream Multiple-Data Stream) architectures are

essential in the parallel world of computers. Their ability to manipulate large

vectors and matrices in minimal time has created a phenomenal demand in

such areas as weather data and cancer radiation research. The power behind

this type of architecture can be seen when the number of processor

elements is equivalent to the size of your vector. In this situation,

componentwise addition and multiplication of vector elements can be done

simultaneously. Even when the size of the vector is larger than the number

of processors elements available, the speedup, compared to a sequential

algorithm, is immense.

SIMD ARCHITECTURE

The SIMD model of parallel computing consists of two parts:

1) a front-end computer of the usual von Neumann style,

2) A processor array.

The processor array is a set of identical synchronized processing elements

capable of simultaneously performing the same operation on different data.

Each processor in the array has a small amount of local memory where the

distributed data resides while it is being processed in parallel. The processor

array is connected to the memory bus of the front end so that the front end

can randomly access the local

SIMD ARCHITECTURE MODEL

processor memories as if it were another memory. Thus, the front end can

issue special commands that cause parts of the memory to be operated on

simultaneously or cause data to move around in the memory. A program can

be eveloped and executed on the front end using a traditional serial

programming language. The application program is executed by the front

end in the usual serial way, but issues commands to the processor array to

carry out SIMD operations in parallel. The similarity between serial and data

parallel programming is one of the strong points of data parallelism.

Synchronization is made irrelevant by the lock–step synchronization of the

processors. Processors either do nothing or exactly the same operations at

the same time. In SIMD architecture, parallelism is exploited by applying

simultaneous operations across large sets of data. This paradigm is most

useful for solving problems that have lots of data that need to be updated on

a wholesale basis. It is especially powerful in many regular numerical

calculations.

There are two main configurations that have been used in SIMD machines .

In the first scheme, each processor has its own local memory. Processors can

communicate with each other through the interconnection network. If the

interconnection network does not provide direct connection between a given

pair of processors, then this pair can exchange data via an intermediate

processor. The ILLIAC IV used such an interconnection scheme. The

interconnection network in the ILLIAC IV allowed each processor to

communicate directly with four neighboring

processors in an 8 _ 8 matrix pattern such that the i th processor can

communicate directly with the (i 2 1)th, (i þ 1)th, (i 2 8)th, and (i þ 8)th

processors. In the second SIMD scheme, processors and memory modules

communicate with each other via the interconnection network. Two

processors can transfer data between each other via intermediate memory

module(s) or possibly via intermediate processor(s). The BSP (Burroughs’

Scientific Processor) used the second SIMD scheme.

TWO SIMD SCHEMES

SIMD operations

The basic unit of SIMD love is the vector, which is why SIMD computing is

also known as vector processing. A vector is nothing more than a row of

individual numbers, or scalars.

A regular CPU operates on scalars, one at a time.(A superscalar CPU operates

on multiple scalars at once, but it performs a different operation on each

instruction.) A vector processor, on the other hand, lines up a whole row of

these scalars, all of the same type, and operates on them as a unit.

These vectors are represented in what is called packed data format.Data

are grouped into bytes (8 bits) or words (16 bits), and packed into a vector to

be operated on. One of the biggest issues in designing a SIMD

implementation is how many data elements will it be able to operate on in

parallel. If you want to do single-precision (32-bit) floating-point calculations

in parallel, then you can use a 4-element, 128-bit vector to do four-way

single-precision floating-point, or you can use a 2-element 64-bit vector to do

two-way SP FP. So the length of the individual vectors dictates how many

elements of what type of data you can work with.

TYPES OF SIMD ARCHITECTURE

There are two types of SIMD architectures we will be discussing. The first is

the True SIMD followed by the Pipelined SIMD. Each has its own advantages

and disadvantages but their common attribute is superior ability to

manipulate vectors.

True Simd (overview)

Both types of true SIMD architecture organizations differ only in connection

of memory models, M, to the arithmetic units, D. From above, the D, or

arithmetic units, are called the processing elements (PEs). In distributed

memory, each memory model is uniquely associated with a particular

arithmetic unit. The synchronized PE's are controlled by one control unit.

Each PE is basically an arithmetic logic unit with attached working registers

and local memories for storage of distributed data. The CU decodes the

instructions and determines where they should be executed. The scalar or

control type of instructions are executed in CU whereas the vector

instructions are broadcast to PE's. In shared memory SIMD machines, t he

local memories attached to PE's are replaced by memory modules shared by

all PE's through an alignment network. This configuration allows the

individual PE's to share their memory without accessing the CU.

True SIMD: Distributed Memory

The True SIMD architecture contains a single contol unit(CU) with multiple

processor elements(PE) acting as arithmetic units(AU). In this situation, the

arithmetic units are slaves to the control unit. The AU's cannot fetch or

interpret any instructions. They are merely a unit which has capabilities of

addition, subtraction, multiplication, and division. Each AU has access only to

its own memory. In this sense, if a AU needs the information contained in a

different AU, it must put in a request to the CU and the CU must manage the

transferring of information. The advantage of this type of architecture is in

the ease of adding more memory and AU's to the computer. The

disadvantage can be found in the time wasted by the CU managing all

memory exchanges.

True SIMD: Shared Memory

Another True SIMD architecture, is designed with a configurable association

between the PE's and the memory modules(M). In this architecture, the local

memories that were attached to each AU as above are replaced by memory

modules. These M's are shared by all the PE's through an alignment network

or switching unit. This allows for the individual PE's to share their memory

without accessing the control unit. This type of architecture is certainly

superior to the above, but a disadvantage is inherited in the difficulty of

adding memory.

Pipelined SIMD

Pipelined SIMD architecture is composed of a pipeline of arithmetic units with

shared memory. The pipeline takes different streams of instructions and

performs all the operations of an arithmetic unit. The pipeline is a first in first

out type of procedure. The size of the pipelines are relative. To take

advantage of the pipeline, the data to be evaluated must be stored in

different memory modules so the pipeline can be fed with this information as

fast as possible. The advantages to this architecture can be found in the

speed and efficiency of data processing assuming the above stipulation is

met.

It is also possible for a single processor to perform the same instruction on a

large set of data items. In this case, parallelism is achieved by pipelining—

• One set of operands starts through the pipeline, and

• Before the computation is finished on this set of operands, another set of

operands starts flowing through the pipeline.

Advantages of SIMD

The main advantage of SIMD is that processing multiple data elements at the

same time, with a single instruction, can dramatically improve performance.

For example, processing 12 data items could take 12 instructions for scalar

processing, but would require only three instructions if four data elements

are processed per instruction using Page 819 SIMD. While the exact increase

in code speed that you observe depends on many factors, you can achieve a

dramatic performance boost if SIMD techniques can be utilized. Not

everything is suitable for SIMD processing, and not all parts of an application

need to be SIMD accelerated to realize significant improvements.

SIMD offers greater flexibility and opportunities for better performance in

video, audio and communications tasks which are increasingly important for

applications. SIMD provides a cornerstone for robust and powerful

multimedia capabilities that significantly extend the scalar instruction set.

SIMD can provide a substantial boost in performance and capability for an

application that makes significant use of 3D graphics, image processing, audio

compression or other calculation-intense functions. Other features of a program

may be accelerated by recoding to take advantage of the parallelism and additional

operations of SIMD. Apple is adding SIMD capabilities to Core Graphics, QuickDraw

and QuickTime. An application that calls them today will see improvements from

SIMD without any changes. SIMD also offers the potential to create new applications

that take advantage of its features and power. To take advantage of SIMD, an

application must be reprogrammed or at least recompiled; however you do not

need to rewrite the entire application. SIMD typically works best for that 10% of the

application that consumes 80% of your CPU time -- these functions typically have

heavy computational and data loads, two areas where SIMD excels.

Check On Disadvantages

Because, in an SIMD machine, a single ACU provides the instruction stream

for all of the array processors, the system will frequently be under-utilized

whenever programs are run that require only a few PEs. To alleviate this

problem, multiple-SIMD (MSIMD) machines were designed. They consist of

multiple control units, each with its own program memory. The PEs are

controlled by U control units that divide the machine into U independent

virtual SIMD machines of various sizes. U is usually much smaller than N and

determines the maximum number of SIMD programs that can operate

simultaneously. The distribution of the PEs onto the ACUs can be either static

or dynamic.

The MSIMD machine architecture has several advantages over normal SIMD

machines, including:

Efficiency: If a program requires only a subset of the available PEs, the

remaining PEs can be used for other programs.

Multiple users: Up to U different users can execute different SIMD

programs on the machine simultaneously.

Fault detection: A program runs on two independent machine

partitions, and errors are detected by result comparison.

Fault tolerance: A faulty PE only affects one of the multiple SIMD

machines, and other machines can still operate correctly.

Bibliography

http://carbon.cudenver.edu/csprojects/CSC5809S01/Simd/archi.html

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=944733

http://arstechnica.com/old/content/2000/03/simd.ars

“Applications Tuning for Streaming SIMD Extensions.”

URL:http://developer.intel.com/technology/itj/Q21999/ARTICLES/art_5a.htm

Huff, Tom and Thakkar, Shreekant (1999). “Internet Streaming SIMD Extensions.”

Computer, vol 32 no 12, 26-34.

http://developer.intel.com/technology/itj/Q21999/ARTICLES/art_5a.htm

http://arstechnica.com/old/content/2000/03/simd.ars

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=944733

http://carbon.cudenver.edu/csprojects/CSC5809S01/Simd/archi.html

simd architecture

Documents