tuesday, september 04, 2006

39
Tuesday, September 04, 2006 I hear and I forget, I see and I remember, I do and I understand. -Chinese Proverb

Upload: carly-booker

Post on 03-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Tuesday, September 04, 2006. I hear and I forget, I see and I remember, I do and I understand. -Chinese Proverb. Today. Course Overview. Why Parallel Computing? Evolution of Parallel Systems. CS 524 : High Performance Computing. Course URL http://suraj.lums.edu.pk/~cs524a06 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tuesday, September 04, 2006

Tuesday, September 04, 2006

I hear and I forget,

I see and I remember,

I do and I understand.

-Chinese Proverb

Page 2: Tuesday, September 04, 2006

Today

Course Overview.Why Parallel Computing?Evolution of Parallel Systems.

Page 3: Tuesday, September 04, 2006

Course URLhttp://suraj.lums.edu.pk/~cs524a06

Folder on indus\\indus\Common\cs524a06

Website – Check Regularly: Course announcements, office hours, slides, resources, policies …

Course Outline

CS 524 : High Performance Computing

Page 4: Tuesday, September 04, 2006

Several programming exercises will be given throughout the course. Assignments will include popular programming models for shared memory and message passing such as OpenMP and MPI.

The development environment will be C/C++ on UNIX.

Page 5: Tuesday, September 04, 2006

Pre-requisites

Computer Organization & Assembly Language (CS 223)

Data Structures & Algorithms (CS 213) Senior level standing. Operating Systems?

Page 6: Tuesday, September 04, 2006

Five minute rule.

Page 7: Tuesday, September 04, 2006

Hunger For More Power!

Page 8: Tuesday, September 04, 2006

Hunger For More Power!

Endless quest for more and more computing power.

However much computing power there is, it is never enough.

Page 9: Tuesday, September 04, 2006

Why this need for greater computational power?

Science, engineering, businesses, entertainment etc., all are providing the impetus.

Scientists – observe, theorize, test through experimentation.

Engineers – design, test prototypes, build.

Page 10: Tuesday, September 04, 2006

HPC offers a new way to do science:

Computation used to approximate physical systems - Advantages include:

Playing with simulation parameters to study emergent trends

Possible replay of a particular simulation eventStudy systems where no exact theories exist

Page 11: Tuesday, September 04, 2006

Why Turn to Simulation?

When the problem is too . . .

Complex Large Expensive Dangerous

Page 12: Tuesday, September 04, 2006

Why this need for greater computational power?

Less expensive to carry out computer simulations.

Able to simulate phenomenon that could not be studied by experimentation. e.g. evolution of universe.

Page 13: Tuesday, September 04, 2006

Why this need for greater computational power?

Problems such as: Weather prediction. Aeronautics (airflow analysis, structural mechanics,

engine efficiency etc) . Simulating world economy. Pharmaceutical (molecular modeling). Understanding drug receptor interactions in brain. Automotive crash simulation.are all computationally intensive.

The more knowledge we acquire the more complex our questions become.

Page 14: Tuesday, September 04, 2006

Why this need for greater computational power?

In 1995, the first full length computer animated motion picture, Toy Story, was produced on a parallel system composed on hundreds of Sun workstations. Decreased cost Decreased Time (Several months on several

hundred processors)

Page 15: Tuesday, September 04, 2006

Why this need for greater computational power?

Commercial Computing has also come to rely on parallel architectures.

Computer system speed and capacity Scale of business. OLTP (Online transaction processing)

benchmark represent the relation between performance and scale of business.

Rate performance of system in terms of its throughput in transactions per minute.

Page 16: Tuesday, September 04, 2006

Why this need for greater computational power?

Vendors supplying database hardware or software offer multiprocessor systems that provide performance substantially greater than uniprocessor products.

Page 17: Tuesday, September 04, 2006

One solution in the past: Make the clock run faster.

The advance of VLSI technology allowed clock rates to increase and larger number of components to fit on a chip.

However there are limits…

Electrical signal cannot propagate faster than the speed of light: 30cm/nsec in vacuum and 20cm/nsec in copper wire or optical fiber.

Page 18: Tuesday, September 04, 2006

Electrical signal cannot propagate faster than the speed of light: 30cm/nsec in vacuum and 20cm/nsec in copper wire or optical fiber.

10-GHz clock - signal path length 2cm in total

100-GHz clock - 2mm

1 THZ (1000 GHz) computer will have to be smaller than 100 microns if the signal has to travel from one end to the other and back with a single clock cycle.

Page 19: Tuesday, September 04, 2006

Another fundamental problem:Heat dissipationThe faster a computer runs: more heat it

generatesHigh end Pentium systems: CPU cooling

system bigger than the CPU itself.

Page 20: Tuesday, September 04, 2006

Evolution of Parallel Architecture

New dimension added to design space: Number of processors.

Driven by demand for performance at acceptable cost.

Page 21: Tuesday, September 04, 2006

Evolution of Parallel Architecture

Advances in hardware capability enable new application functionality, which places a greater demand on the architecture.

This cycle drives the ongoing design, engineering and manufacturing effort.

Page 22: Tuesday, September 04, 2006

Evolution of Parallel Architecture

Microprocessor performance has been improving at a rate of about 50% per year.

A parallel machine of hundred processors can be viewed as providing to applications computing power that will be available in 10 years time.

1000 processors 20 year horizonThe advantages of using small, inexpensive,

mass produced processors as building blocks for computer systems are clear.

Page 23: Tuesday, September 04, 2006

Technology trendsWith technological advance, transistors, gates

etc have been getting smaller and faster. More can fit in same area.

Processors are getting faster by making more effective use of ever larger volume of computing resources.

Possibilities: Place more computer system on chip including

memory and I/O. (Building block for parallel architectures. System-on-a-chip)

Or multiple processors on chip. (Parallel architecture on single-chip regime)

Page 24: Tuesday, September 04, 2006

Microprocessor Design Trends

Technology determines what is possible.Architecture translates the potential of

technology into performance.Parallelism is fundamental to

conventional computer architecture. Current architectural trends are leading to

multiprocessor designs.

Page 25: Tuesday, September 04, 2006

Bit level Parallelism

From 1970 to 1986 advancements in bit-level parallelism

4bit, 8 bit, 16 bit and so-on Doubling the data path reduces the number

of cycles required to perform an operation.

Page 26: Tuesday, September 04, 2006

Instruction level Parallelism

Mid 1980s to mid 1990sPerforming portions of several machine

instructions concurrently.Pipelining (kind of parallelism also)Fetch multiple instructions at a time and

issue them in parallel to distinct function units in parallel (superscalar)

Page 27: Tuesday, September 04, 2006

Instruction level Parallelism

However… Instruction level parallelism is worthwhile only

if processor can be supplied with instructions and data fast enough.

Gap between processor cycle time and memory cycle time has grown wider.

To satisfy increasing bandwidth requirements, larger and larger caches are placed on chip with the processor.

cache miss control transfer

Limits

Page 28: Tuesday, September 04, 2006

In mid 1970s, the introduction of vector processors marked the beginning of modern supercomputing Perform operations on sequences of data

elements rather than individual scalar data Offered advantage of at least one order of

magnitude over conventional systems of that time.

Page 29: Tuesday, September 04, 2006

In late 1980s a new generation of systems came on market. These were microprocessor based supercomputers that initially provided about 100 processors and increased roughly to 1000 in 1990.

These aggregation of processors are known as massively parallel processors (MPPs).

Page 30: Tuesday, September 04, 2006

Factors behind emergence of MPPs Increase in performance of standard

microprocessors Cost advantage Usage of “off-the-shelf” microprocessors

instead of custom processors Fostered by government programs for

scalable parallel computing using distributed memory.

Page 31: Tuesday, September 04, 2006

MPPs claimed to equal or surpass the performance of vector multiprocessors.

Top500 Lists the sites that have the 500 most powerful installed

computer systems. LINPACK benchmark

• Most widely used metric of performance on numerical applications

• Collection of Fortran subroutines that analyze and solve linear equations and linear least squares problems

Page 32: Tuesday, September 04, 2006

Top500 (Updated twice a year since June 1993) In the first Top500 list there were already 156

MPP and SIMD systems present (around 1/3rd)

Page 33: Tuesday, September 04, 2006

Some memory related issues

Time to access memory has not kept pace with CPU clock speeds.

SRAM Each bit is stored in a latch made up of transistors Faster than DRAM, but is less dense and requires

greater powerDRAM

Each bit of memory is stored as a charge on a capacitor 1GHz CPU will execute 60 instructions before a

typical 60ns DRAM can return a single byte

Page 34: Tuesday, September 04, 2006

Some memory related issues

Hierarchy Cache memories

Temporal localityCache lines (64, 128, 256 bytes)

Page 35: Tuesday, September 04, 2006

Parallel Architectures: Memory Parallelism

One way to increase performance is to replicate computers.

Major choice is between shared memory and distributed memory

Page 36: Tuesday, September 04, 2006

Memory Parallelism

In mid 1980s, when 32-bit microprocessor was first introduced, computers containing multiple microprocessors sharing a common memory became prevalent.

In most of these designs all processors plug into a common bus.

However, a small number of processors can be supported by bus

Page 37: Tuesday, September 04, 2006

UMA bus based SMP architecture

If the bus is busy, when a CPU wants to read or write memory, the CPU waits for CPU to become idle.

Contention of bus can be manageable for small number of processors only.

The system will be totally limited by bandwidth of the bus and most of the CPUs will be idle most of the time.

Page 38: Tuesday, September 04, 2006

UMA bus based SMP architecture

One way to alleviate this problem is to add a cache to each CPU.

Less bus traffic if most reads can be satisfied from the cache and system can support more CPUs.

Single bus limits UMA microprocessor to about 16-32 CPUs.

Page 39: Tuesday, September 04, 2006

SMP

SMP (Symmetric multiprocessor) Shared memory multiprocessor where the

cost of accessing a memory location is same for all processors.