parallel computing in india

PARALLEL COMPUTING IN INDIA

PARALLEL COMPUTING IN INDIA

What is parallel computing?

Parallel computing is a form of computation in which many instructions are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel)

Why Parallel Computing?

The primary reasons for using parallel computing: Save time - wall clock time Solve larger problems Provide concurrency (do multiple things at the same time)

Other reasons might include:

Cost savings - using multiple "cheap" computing resources instead of paying for time on a supercomputer. Overcoming memory constraints - single computers have very finite memory resources. For large problems, using the memories of multiple computers may overcome this obstacle.

Architecture Conceptsvon Neumann Architecture

Comprised of four main components: Memory Control Unit Arithmetic Logic Unit Input / Output

Memory is used to store both program instructions and data Program instructions are coded data which tell the computer to do something Data is simply information to be used by the programControl unit fetches instructions/data from memory , decodes the instructions and then sequentially coordinates operations to accomplish the programmed task.Arithmetic Unit performs basic arithmetic operationsInput / Output is the interface to the human operator

Flynns Classical Taxonomy

Single Instruction, Single Data (SISD):A serial (non-parallel) computer Single instruction only one instruction stream is being acted on by the CPU during any one clock cycle Single data: only one data stream is being used as input during any one clock cycle Deterministic execution

Single Instruction, Multiple Data (SIMD):A type of parallel computer Single instruction All processing units execute the same instruction at any given clock cycle Multiple data: Each processing unit can operate on a different data element Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. Two varieties: Processor Arrays and Vector Pipelines

Multiple Instruction, Single Data (MISD): A single data stream is fed into multiple processing units.Each processing unit operates on the data independently via independent instruction streams.Some conceivable uses might be :multiple frequency filters operating on a single signal stream multiple cryptography algorithms attempting to crack a single coded message.

Multiple Instruction, Multiple Data (MIMD): Multiple Instruction: every processor may be executing a different instruction stream Multiple Data: every processor may be working with a different data stream Execution can be synchronous or asynchronous, deterministic or non- deterministic

Parallel Computer Memory Architectures:Shared Memory i) Uniform Memory Access (UMA) ii)Non-Uniform Memory Access (NUMA) Distributed Memory Hybrid Distributed-Shared Memory

Shared Memory : Shared memory parallel computers have in common the ability for all processors to access all memory as global address space. Multiple processors can operate independently Changes in a memory location effected by one processor are visible to all other processors Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.

Shared Memory : UMAUniform Memory Access (UMA): Identical processors, Symmetric Multiprocessor (SMP)Equal access and access times to memory Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update.

Shared Memory : NUMA Non-Uniform Memory Access (NUMA): Often made by physically linking two or more SMPs One SMP can directly access memory of another SMP Not all processors have equal access time to all memories Memory access across link is slower If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA

Distributed Memory :Distributed memory systems require a communication network to connect inter-processor memory Processors have their own local memory Because each processor has its own local memory, it operates independently. Hence, the concept of cache coherency does not apply When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmers responsibility.

Hybrid Distributed-Shared Memory:

The shared memory component is usually a cache coherent SMP machine. Processors on a given SMP can address that machines memory as global The distributed memory component is the networking of multiple SMPs. SMPs know only about their own memory - not the memory on another SMP. Therefore, network communications are required to move data from one SMP to another.

The main fields that need advanced computing are:

Computational Fluid Dynamics Design of large structures Computational physics and Chemistry Climate Modeling Vehicle Simulation Image processing Signal Processing Oil reservoir Management Seismic data processing

India s contribution towards parallel computing

India has made significant strides in developing high-performance parallel computers.

Major Indian parallel computing projects:

PARAM(from CDAC, Pune) ANUPAM(from BARC, Bombay) MTPPS(from BARC, Bombay) PACE(from ANURAG, Hyderabad) CHIPPS(from CDOT, Bangalore) FLOSOLVER(from NAL, Bangalore)

PARAM (Centre for Development of Advanced Computing)

Indias first super computer PARAM 8000It was indigenously built in 1990 1 GF (Giga Flops) Parallel Machine 64 node prototype i.e. had 64 CPUs PARAM is Sanskrit and means Supreme Programming environment called PARAS

Based on Transputers 800/805 Theoretically peak performance was 1 giga flops Practically provided 100 to 200 Mflops Hardware upgrade was given to PARAM 8000 to produce the new PARAM 8600 Hardware upgradation was the integration of Intel i860 coprocessor with PARAM 8000It was a 256 CPU computer

PARAM 9000

Mission was to deliver teraflops range parallel system This architecture emphasizes flexibility PARAM 9000SS is based on SuperSarc Processors Operating speed of processor is 75 Mhz PARAM 10000 has a peak speed of 6.4 GFlops

PARAM Padma

Introduced in April 2003 Top speed of 1024 Gflops ( 1 Tflops) Used IBM Power4 CPUs Operating system was IBM AIX 5 First Indian computer to break 1 Tflops barrier

PARAM Yuva

Unveiled in November 2008 The maximum sustainable speed is 38.1 Tflops The peak speed is 54 Tflops Uses Intel 73XX with 2.9 Ghz each. Storage capacity of 25 TB up to 200 TB PARAM Yuva II released in February 2013 Peak performance of 524 Tflops Uses less power compared to its predecessor

ANUPAM

Developed by Bhabha Atomic Research Centre, Bombay 200 Mflops of sustained computing power was needed by them Based on standard MultiBus II i860 hardware

ANUPAM 860

First developed in December 1991 It made use of i860 microprocessor @ 40Mhz Overall sustained speed of 30 Mflops Upgraded version released on August 1992 has a computational speed of 52 Mflops Further upgradation provided a sustained speed of 110 Mflops which was released in November 1992

ANUPAM Alpha

Developed in July 1997 having a sustained speed of 1000 Mflops Made use of Alpha 21164 microprocessor @ 400 Mhz This system used complete servers / workstations as compute node instead of processor boards. Updated version released in March 1998 had a sustained speed of 1.5 Gflops.

ANUPAM Pentium

Started in January 1998 Main focus of its development is the minimization of cost The first version ANUPAM Pentium II/4 gave a sustained speed of 248 Mflops ANUPAM Pentium II was expanded in march 1999 with a sustained speed of 1.3Gflops

In April 2000 the system was upgraded to Pentium III/16 which gave a sustained speed of 3.5 Gflops

ANUPAM PIV 64 node has a sustained speed of 43 Gflop

Applications

All the three versions of ANUPAM was introduced to solve the computational problems at BARC. The main fields that ANUPAM being used are Molecular Dynamic Simulation Neutron Transport Calculation Gamma Ray Simulation by Monte Carlo method Crystal Structure Analysis

PACE

Developed by ANURAG (Advanced Numerical Research and Analysis Group) under DRDO Developed as a result of R & D in parallel computing Uses VLSI Started in 1998 Motorolla 68020 processor @ 16.67 MHz

Processor for Aerodynamic Computation and Evaluation (PACE) Used to design computational Fluid Dynamics needed in aircraft Developed version is Pace Plus 32 used in missile development More advanced version is PACE++

ANAMICA - Software

ANURAGs Medical Imaging and Characterization Aid (ANAMICA) Medical visualization software for data obtained from MRI , CT and Ultrasound Has both two dimensional and three dimensional visualization Used for medical diagnosis etc.

DHRUVA 3

Set up by DRDO for solving mission critical Defence Research and Development applications Used in design of aircraft E.g.: Advanced Medium Combact Aircraft (AMCA)

FLOSOLVER

Started in 1986 by National Aerospace Laboratories (NAL) Used in numerical weather prediction Varsha GCM could predict the weather accurately in two weeks advance uses FS Based on 16 bit Intel 8086 and 8087 processors Updated versions were released to increase the performance

CHIPPS

Developed to have indigenous digital switching technology Established in rural exchanges and secondary switching areas Speed of 200 Mflops is acquired

THANK YOU

Preeti chauhan

parallel computing in india

Technology