throughput oriented aarchitectures
DESCRIPTION
computer architecture article related to throughput oriented architecturesTRANSCRIPT
1
Throughput Oriented Architectures
2
Contents
• Throughput oriented Processors• Hardware Multithreading• Many Simple Processing Units • SIMD Execution • GPUs• NVIDIA GPU architecture• Throughput oriented programming• Conclusion
3
Key Points:
• Throughput oriented processors tackle problems where parallelism is abundant.
• Due to their design ,programming throughput oriented processors requires much more emphasis on parallelism and scalability than programming sequential processors.
• GPUs are the leading exemplars of modern throughput-oriented architectures .
4
Throughput-Oriented Architectures:
• Throughput and latency are two fundamental measures for processor performance.
• Traditional Scalar microprocessors are latency oriented architectures.
• Throughput oriented processors arise from the assumption that they will work where parallelism is abundant.
• Throughput oriented architectures rely on three key architectures:
1. Emphasis on many simple processing cores2. Extensive Hardware multi-threading3. SIMD Execution
5
Hardware Multithreading:
• A computation in which parallelism is abundant can be decomposed into a collection of concurrent sequential tasks that execute in parallel or across many threads.
• A thread is able to execute the instruction stream corresponding to a single sequential task.
• Multithreading weather in hardware or software provides a way of tolerating latency.
• Hardware multi-threading as a design strategy for improving aggregate performance on parallel workloads has a long history.
6
Hardware Multithreading:
• Tera, Sun Niagara and NVIDIA GPU22 uses multithreading for high throughput performance.
• Simultaneous multithreading is used to improve the efficiency of superscalar sequential processors.
• HEP, Tera and NVADIA G20 shows characteristics of throughput-oriented processors.
7
Many simple processing units:
• High density transistors consists of many simple processing units.
• Throughput oriented architectures achieve higher level of performance by using simple and many processing units.
• The instructions execute in the order they are in the program.• Saving in chip area allow many parallel processing units and
gives higher throughput on parallel workloads.
8
SIMD execution:
• Parallel processors uses form of SIMD execution to improve aggregate throughput.
• Two basic catagories of SIMD machines are SIMD processor array and vector processor.
• SIMD processor arrays consists of many processing units and single control unit.
• Vector processor consist of traditional scalar instructions and vector instructions operating on data vectors of fixed width.
9
• GPUs are similar to a computer's CPU. A GPU, however, is designed specifically for performing the complex mathematical and geometric calculations that are necessary for graphics rendering.
GPU:
10
• Difference between a CPU and GPU .• A CPU comprise of a few cores enhanced for serial
sequence.• And a GPU comprise of thousand of smaller more
efficient cores make for handling multiple tasks concurrently.
CPU And GPU:
11
CPU ANG GPU:
12
• Floating Point performance is 1000GFLOPS• On-chip scratchpads is 48KB/SM. • Off-chip memory bandwidth is 100GB/s
NVIDIA Fermi Graphical Processing Unit.
13
NVIDIA v Intel:
14
Performance per watt:
15
Microarchitecture of GPU
16
Reduction tree:
17
• Throughput oriented processors assume parallelism is more focused, rather than scarce, and it target is maximizing total throughput of all tasks rather than minimizing the latency of one task.
• A fully general purpose chip can not affords to aggressively trade for increased total performance at the cost of single thread performance.
Conclusion