gpu systems
DESCRIPTION
Our unique 1U GPU servers allow you to use the latest GPUs (Tesla, GTX285, Quadro FX5800) for visualization or offloading processing in a small form factor. These are built on Intel\'s latest Nehalem processors.TRANSCRIPT
![Page 1: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/1.jpg)
GPU SystemsAdvanced Clustering’s offerings for GPGPU computing
advanced clustering technologieswww.advancedclustering.com • 866.802.8222
![Page 2: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/2.jpg)
what is GPU computing
• The use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing
• Model is to use a CPU and GPU together in a heterogenous computing model
• CPU is used to run sequential portions of application
• Offload parallel computation onto the GPU
2
![Page 3: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/3.jpg)
history of GPUs
• GPUs designed with fixed function pipelines for real-time 3D graphics
• As complexity of GPU increased they were designed to be more programable to easily implement new features
• Scientists and engineers discovered that the originally purpose built GPUs could also be re-programmed for General Purpose computing on a GPU (GPGPU)
3
![Page 4: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/4.jpg)
history of GPUs - continued
• The nature of 3D graphics meant GPUs have very fast floating-point units, which are also great for scientific codes
• Originally very difficult to program, GPU vendors have realized another market for their products and developed specially designed GPUs and programming environments for scientific computing
• Most prominent is NVIDIA Tesla GPU and their CUDA programming environment
4
![Page 5: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/5.jpg)
GPUs vs. CPUs
5
Quad-core CPU
240 Core Tesla GPU•Traditional x86 CPUs are available today with 4 cores: 6, 8, 12 core in the future
•NVIDIA’s Tesla GPU is shipping with 240 cores
![Page 6: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/6.jpg)
GPUs vs. CPUs - continued
6
![Page 7: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/7.jpg)
why use GPUs?
• Massively parallel design: 240 cores per GPU
• Nearly 1 teraflop of single precision floating-point performance
• Designed as an accelerator card to add into your existing system - does not replace your current CPU
• Maximum of 4GB of fast dedicated RAM per GPU
• If your code is highly parallel it’s worth investigating
7
![Page 8: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/8.jpg)
why not use GPUs?
• Fixed RAM sizes on GPU - not upgradable or configurable
• Large power requirements of 188W
• Still requires a host server and CPU to operate
• Specialized development tools required, does not run standard x86 code
• Current development tools are specific to NVIDIA cards - no support for other manufacturer’s GPUs
• Your code maybe difficult to parallelize
8
![Page 9: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/9.jpg)
developing for GPUs
• Current development model: CUDA parallel environment
• The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel.
• Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel.
• Currently an extension for the C programming language - other languages in development
9
![Page 10: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/10.jpg)
NVIDIA GPUs
• All of NVIDIA’s recent GPUs support CUDA development
• Tesla cards designed exclusively for CUDA and GPGPU code (no graphics support)
• GeForce cards designed for graphics can be used for CUDA code as well
• Usually slower, less cores, or less RAM - but a great way to get started at low price points
• Development and testing can be done on almost any standard GeForce GPU and run on a Tesla system
10
![Page 11: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/11.jpg)
GeForce vs. Tesla
11
![Page 12: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/12.jpg)
GPU future
• More products coming: AMD Stream processor line of products, similar to NVIDIA’s Tesla
• Standard, portable programming via OpenCL
• OpenCL (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming. Create portable code for a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs.
• More info: http://www.khronos.org/opencl/
12
![Page 13: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/13.jpg)
building GPU systems
• Building systems to house GPUs can be difficult:
• Requires lots of engineering and design work to be able to be able to power and cool them correctly
• GPUs were originally designed for visualization and gaming; size and form-factor were not as important
• When used for computation data-center space is limited and expensive - need to find a way to implement GPUs in existing infrastructure
13
![Page 14: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/14.jpg)
traditional GPU servers
14
•Large tower style cases
•Rackmount servers 4U or larger
•Either choice is not an efficient use of limited data center spaceText
![Page 15: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/15.jpg)
GPUs are large
15
1.5” deep
10.5” long
4.6” tallThe size of the GPU has
limited it’s application
![Page 16: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/16.jpg)
GPUs are power hungry
16
=•GPU Cards can use a lot of power - as much as 270W
•Lots of power equals lots of heat
•Difficult to put into a small space and cool effectively
![Page 17: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/17.jpg)
GPU system options
17
Advanced Clustering has two solutions to the power, heat, and density problems:
NVIDIA’s Tesla S1070
Advanced Clustering’s 15XGPU nodes
![Page 18: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/18.jpg)
NVIDIA’s tesla S1070
• The S1070 is an external 1U box that contains 4x Tesla C1060 GPUs
• The S1070 must be connected to one or two host servers to operate
• S1070 has one power supply and dedicated cooling for the 4x GPUs
• Only available with the C1060 GPU cards pre-installed
18
![Page 19: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/19.jpg)
tesla S1070 - front view
19
![Page 20: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/20.jpg)
tesla S1070 - rear view
20
![Page 21: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/21.jpg)
tesla S1070 - inside view
21
![Page 22: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/22.jpg)
host interface cards (HIC)
22
• The Host Interface Card (HIC) connects Tesla S1070 to Server
• Every S1070 requires 2 HICs
• Each HIC bridges the server to two of the four GPUs inside of the S1070
• HICs can be installed in 2 separate servers, or 1 server
• HICs are available in PCI-e 8x and 16x widths
![Page 23: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/23.jpg)
tesla S1070 block diagram
23
Cables to HICs in Host System(s)
Tesla S1070
![Page 24: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/24.jpg)
connecting S1070 to 2 servers
24
Tesla S1070
Server#1
Server#2
Most servers do not have enough PCI-e bandwidth, so S1070 is designed to allow connecting to 2 separate machines.
![Page 25: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/25.jpg)
connecting S1070 to 1 server
25
Tesla S1070
ServerIf the server has enough PCI-e lanes and expansion slots one Tesla S1070 can be connected to one server.
![Page 26: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/26.jpg)
example cluster of S1070s
26
HIC #1
HIC #2
HIC #1
HIC #2
HIC #1
HIC #2
HIC #1
HIC #2
HIC #1
HIC #2
• 10x 1U compute nodes with 2x CPUs each
• 5 Tesla S1070 with 4x GPUs each
• Balanced system of 20 CPUs and 20 GPUs
• All in 15U of rack space
![Page 27: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/27.jpg)
S1070s pros and cons
•Pros
• External enclosure to hold GPUs doesn’t require a special server design to hold the GPUs
• Easy to add GPUs to any existing system
• 4 GPUs in only 1U of space
• Multiple HIC card configurations including PCI-e 8x or 16x
• Thermally tested and validated by NVIDIA
•Cons
• Two GPUs share one PCI-e slot in the host server limiting bandwidth to the GPU card
• Most 1U servers only have 1x PCI-e expansion slot which is occupied by the HIC - this limits ability to use interconnects like InfiniBand or 10 Gigabit Ethernet
• Limited configuration options, only Tesla cards, no GeForce or Quadro options
27
![Page 28: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/28.jpg)
S1070 - specifications
28
![Page 29: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/29.jpg)
advanced clustering GPU nodes
• The 15XGPU line of systems is a complete two processor server and GPU in 1U
• Server fully configured with latest quad-core Intel Xeon processors, RAM, hard drives, optical, networking, InfiniBand and GPU card
• Flexible to support various GPUs, including:
• Tesla C1060 card
• GeForce series
• Quadro series
29
![Page 30: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/30.jpg)
GPU node - front
30
![Page 31: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/31.jpg)
GPU node - rear
31
![Page 32: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/32.jpg)
GPU node - inside
32
![Page 33: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/33.jpg)
GPU node - block diagram
33
Advanced Clustering 15XGPU
node
Simplified design, host server completely integrated with GPU no external components
to connect to.
![Page 34: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/34.jpg)
example cluster of GPU nodes
34
• 15x 1U compute nodes
• 2x CPUs each
• 1x GPU integrated in each node
• Entire system contains 30x CPUs and 15x GPUs
• All in 15U of rack space
![Page 35: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/35.jpg)
GPU nodes - thermals
35
•System carefully engineered to ensure all components will fit in the small form factor
•Detailed modeling and testing to make sure the system components (CPU and memory) and the GPU are adequately cooled
![Page 36: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/36.jpg)
GPU nodes pros and cons
•Pros
• Entire server and GPU all enclosed in a 1U package
• Flexibility in GPU choice: Tesla, GeForce, and Quadro supported
• Full PCI-e bandwidth to GPU
• Full-featured server with the latest quad-core Intel Xeon CPUs
• Can be used for more than computation, use the GPU for video output as well
•Cons
• Only 1x GPU per server
• Requires purchase of new servers, not an upgrade or add-on
• Not as dense of a solution as S1070 for 4x GPUs
36
![Page 37: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/37.jpg)
GPU nodes
• The GPU node concept is unique to Advanced Clustering
• Only vendor shipping a 1U with integrated Tesla or high-end GeForce / Quadro card
• Available for order as the 1X5GPU2
• Dual Quad-Core Intel Xeon 5500 series processors
• Choice of GPU
37
![Page 38: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/38.jpg)
• Processor
• Two Intel Xeon 5500 Series processors
• Next generation "Nehalem" microarchitecture
• Integrated memory controller and 2x QPI chipset interconnects per processor
• 45nm process technology
• Chipset
• Intel 5500 I/O controller hub
• Memory
• 800MHz, 1066MHz, or 1333MHz DDR3 memory
• Twelve DIMM sockets for support up to 144GB of memory
• GPU
• PCI-e 2.0 16x double height expansion slot for GPU
• Multiple options: Tesla, GeForce, or Quadro cards
• Storage
• Two 3.5" SATA2 drive bay
• Support RAID level 0-1 with Linux software RAID (with 2.5" drives)
• DVD+RW slim-line optical drive
• Management
• Integrated IPMI 2.0 module
• Integrated management controller providing iKVM and remote disk emulation.
• Dedicated RJ45 LAN for management network
• I/O connections
• Two independent 10/100/1000Base-T (Gigabit) RJ-45 Ethernet interfaces
• Two USB 2.0 ports
• One DB-9 serial port (RS-232)
• One VGA port
• Optional ConnectX DDR or QDR InfiniBand connector
• Electrical Requirements
• High-efficiency power supply (greater than 80%)
• Output Power: 560W
• Universal input voltage 100V to 240V
• Frequency: 50Hz to 60Hz, single phase
15XGPU2 - specifications
38
![Page 39: Gpu Systems](https://reader031.vdocuments.net/reader031/viewer/2022013105/54c8abd34a795999668b462a/html5/thumbnails/39.jpg)
availability
• Both the Tesla S1070 and 15XGPU GPU nodes are available and shipping now
• For price and custom configuration contact your Account Representative
• (866) 802-8222
• http://www.advancedclustering.com/go/gpu
39