simd architecture
DESCRIPTION
term Paper SIMD ArchitectureTRANSCRIPT
TERM PAPER
“SIMD
ARCHITECTURE”Submitted By:-
Nancy Mahajan
Roll No.-RB1801A22
Reg. No.-10809333
B.Tech. Cse
Submitted to:-
Mr. Vijay Garg
CSE Dept.
ACKNOWLEDGEMENT
First n foremost I want to thanks my electrical sciences Teacher "Mr. Vijay
Garg" for giving me a term paper on ”SIMD Architecture”. Such term papers
enhance our capabilities, mental ability and keep us up-to-date about the
related topic and subject.
Secondly I would like to thanks the whole library faculty of "Lovely
Professional University" for helping me preparing the project.
In the last I would like to thanks my parents for cooperating and helping in
everyway throughout the project.
TABLE OF CONTENTS
S.No. TOPIC PAGE No.
1 Introduction 4-7
2 SIMD Operations 8
3 Types Of SIMD Architecture
9-11
4 advantages 11-13
5 Disadvantages 13-14
6 Bibliography 15
IntroductionSIMD(Single-Instruction Stream Multiple-Data Stream) architectures are
essential in the parallel world of computers. Their ability to manipulate large
vectors and matrices in minimal time has created a phenomenal demand in
such areas as weather data and cancer radiation research. The power behind
this type of architecture can be seen when the number of processor
elements is equivalent to the size of your vector. In this situation,
componentwise addition and multiplication of vector elements can be done
simultaneously. Even when the size of the vector is larger than the number
of processors elements available, the speedup, compared to a sequential
algorithm, is immense.
SIMD ARCHITECTURE
The SIMD model of parallel computing consists of two parts:
1) a front-end computer of the usual von Neumann style,
2) A processor array.
The processor array is a set of identical synchronized processing elements
capable of simultaneously performing the same operation on different data.
Each processor in the array has a small amount of local memory where the
distributed data resides while it is being processed in parallel. The processor
array is connected to the memory bus of the front end so that the front end
can randomly access the local
SIMD ARCHITECTURE MODEL
processor memories as if it were another memory. Thus, the front end can
issue special commands that cause parts of the memory to be operated on
simultaneously or cause data to move around in the memory. A program can
be eveloped and executed on the front end using a traditional serial
programming language. The application program is executed by the front
end in the usual serial way, but issues commands to the processor array to
carry out SIMD operations in parallel. The similarity between serial and data
parallel programming is one of the strong points of data parallelism.
Synchronization is made irrelevant by the lock–step synchronization of the
processors. Processors either do nothing or exactly the same operations at
the same time. In SIMD architecture, parallelism is exploited by applying
simultaneous operations across large sets of data. This paradigm is most
useful for solving problems that have lots of data that need to be updated on
a wholesale basis. It is especially powerful in many regular numerical
calculations.
There are two main configurations that have been used in SIMD machines .
In the first scheme, each processor has its own local memory. Processors can
communicate with each other through the interconnection network. If the
interconnection network does not provide direct connection between a given
pair of processors, then this pair can exchange data via an intermediate
processor. The ILLIAC IV used such an interconnection scheme. The
interconnection network in the ILLIAC IV allowed each processor to
communicate directly with four neighboring
processors in an 8 _ 8 matrix pattern such that the i th processor can
communicate directly with the (i 2 1)th, (i þ 1)th, (i 2 8)th, and (i þ 8)th
processors. In the second SIMD scheme, processors and memory modules
communicate with each other via the interconnection network. Two
processors can transfer data between each other via intermediate memory
module(s) or possibly via intermediate processor(s). The BSP (Burroughs’
Scientific Processor) used the second SIMD scheme.
TWO SIMD SCHEMES
SIMD operations
The basic unit of SIMD love is the vector, which is why SIMD computing is
also known as vector processing. A vector is nothing more than a row of
individual numbers, or scalars.
A regular CPU operates on scalars, one at a time.(A superscalar CPU operates
on multiple scalars at once, but it performs a different operation on each
instruction.) A vector processor, on the other hand, lines up a whole row of
these scalars, all of the same type, and operates on them as a unit.
These vectors are represented in what is called packed data format.Data
are grouped into bytes (8 bits) or words (16 bits), and packed into a vector to
be operated on. One of the biggest issues in designing a SIMD
implementation is how many data elements will it be able to operate on in
parallel. If you want to do single-precision (32-bit) floating-point calculations
in parallel, then you can use a 4-element, 128-bit vector to do four-way
single-precision floating-point, or you can use a 2-element 64-bit vector to do
two-way SP FP. So the length of the individual vectors dictates how many
elements of what type of data you can work with.
TYPES OF SIMD ARCHITECTURE
There are two types of SIMD architectures we will be discussing. The first is
the True SIMD followed by the Pipelined SIMD. Each has its own advantages
and disadvantages but their common attribute is superior ability to
manipulate vectors.
True Simd (overview)
Both types of true SIMD architecture organizations differ only in connection
of memory models, M, to the arithmetic units, D. From above, the D, or
arithmetic units, are called the processing elements (PEs). In distributed
memory, each memory model is uniquely associated with a particular
arithmetic unit. The synchronized PE's are controlled by one control unit.
Each PE is basically an arithmetic logic unit with attached working registers
and local memories for storage of distributed data. The CU decodes the
instructions and determines where they should be executed. The scalar or
control type of instructions are executed in CU whereas the vector
instructions are broadcast to PE's. In shared memory SIMD machines, t he
local memories attached to PE's are replaced by memory modules shared by
all PE's through an alignment network. This configuration allows the
individual PE's to share their memory without accessing the CU.
True SIMD: Distributed Memory
The True SIMD architecture contains a single contol unit(CU) with multiple
processor elements(PE) acting as arithmetic units(AU). In this situation, the
arithmetic units are slaves to the control unit. The AU's cannot fetch or
interpret any instructions. They are merely a unit which has capabilities of
addition, subtraction, multiplication, and division. Each AU has access only to
its own memory. In this sense, if a AU needs the information contained in a
different AU, it must put in a request to the CU and the CU must manage the
transferring of information. The advantage of this type of architecture is in
the ease of adding more memory and AU's to the computer. The
disadvantage can be found in the time wasted by the CU managing all
memory exchanges.
True SIMD: Shared Memory
Another True SIMD architecture, is designed with a configurable association
between the PE's and the memory modules(M). In this architecture, the local
memories that were attached to each AU as above are replaced by memory
modules. These M's are shared by all the PE's through an alignment network
or switching unit. This allows for the individual PE's to share their memory
without accessing the control unit. This type of architecture is certainly
superior to the above, but a disadvantage is inherited in the difficulty of
adding memory.
Pipelined SIMD
Pipelined SIMD architecture is composed of a pipeline of arithmetic units with
shared memory. The pipeline takes different streams of instructions and
performs all the operations of an arithmetic unit. The pipeline is a first in first
out type of procedure. The size of the pipelines are relative. To take
advantage of the pipeline, the data to be evaluated must be stored in
different memory modules so the pipeline can be fed with this information as
fast as possible. The advantages to this architecture can be found in the
speed and efficiency of data processing assuming the above stipulation is
met.
It is also possible for a single processor to perform the same instruction on a
large set of data items. In this case, parallelism is achieved by pipelining—
• One set of operands starts through the pipeline, and
• Before the computation is finished on this set of operands, another set of
operands starts flowing through the pipeline.
Advantages of SIMD
The main advantage of SIMD is that processing multiple data elements at the
same time, with a single instruction, can dramatically improve performance.
For example, processing 12 data items could take 12 instructions for scalar
processing, but would require only three instructions if four data elements
are processed per instruction using Page 819 SIMD. While the exact increase
in code speed that you observe depends on many factors, you can achieve a
dramatic performance boost if SIMD techniques can be utilized. Not
everything is suitable for SIMD processing, and not all parts of an application
need to be SIMD accelerated to realize significant improvements.
SIMD offers greater flexibility and opportunities for better performance in
video, audio and communications tasks which are increasingly important for
applications. SIMD provides a cornerstone for robust and powerful
multimedia capabilities that significantly extend the scalar instruction set.
SIMD can provide a substantial boost in performance and capability for an
application that makes significant use of 3D graphics, image processing, audio
compression or other calculation-intense functions. Other features of a program
may be accelerated by recoding to take advantage of the parallelism and additional
operations of SIMD. Apple is adding SIMD capabilities to Core Graphics, QuickDraw
and QuickTime. An application that calls them today will see improvements from
SIMD without any changes. SIMD also offers the potential to create new applications
that take advantage of its features and power. To take advantage of SIMD, an
application must be reprogrammed or at least recompiled; however you do not
need to rewrite the entire application. SIMD typically works best for that 10% of the
application that consumes 80% of your CPU time -- these functions typically have
heavy computational and data loads, two areas where SIMD excels.
Check On Disadvantages
Because, in an SIMD machine, a single ACU provides the instruction stream
for all of the array processors, the system will frequently be under-utilized
whenever programs are run that require only a few PEs. To alleviate this
problem, multiple-SIMD (MSIMD) machines were designed. They consist of
multiple control units, each with its own program memory. The PEs are
controlled by U control units that divide the machine into U independent
virtual SIMD machines of various sizes. U is usually much smaller than N and
determines the maximum number of SIMD programs that can operate
simultaneously. The distribution of the PEs onto the ACUs can be either static
or dynamic.
The MSIMD machine architecture has several advantages over normal SIMD
machines, including:
Efficiency: If a program requires only a subset of the available PEs, the
remaining PEs can be used for other programs.
Multiple users: Up to U different users can execute different SIMD
programs on the machine simultaneously.
Fault detection: A program runs on two independent machine
partitions, and errors are detected by result comparison.
Fault tolerance: A faulty PE only affects one of the multiple SIMD
machines, and other machines can still operate correctly.
Bibliography
http://carbon.cudenver.edu/csprojects/CSC5809S01/Simd/archi.html
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=944733
http://arstechnica.com/old/content/2000/03/simd.ars
“Applications Tuning for Streaming SIMD Extensions.”
URL:http://developer.intel.com/technology/itj/Q21999/ARTICLES/art_5a.htm
Huff, Tom and Thakkar, Shreekant (1999). “Internet Streaming SIMD Extensions.”
Computer, vol 32 no 12, 26-34.