cs 380 - gpu and gpgpu programming lecture 2: introduction ... · cs 380 - gpu and gpgpu...
TRANSCRIPT
CS 380 - GPU and GPGPU ProgrammingLecture 2: Introduction; GPU Architecture 1
Markus Hadwiger, KAUST
2
Reading Assignment #2 (until Feb. 17)
Read (required):• GLSL book, chapter 4 (The OpenGL Programmable Pipeline)
• GPU Gems 2 book, chapter 30 (The GeForce 6 Series GPU Architecture)available online:
http://download.nvidia.com/developer/GPU_Gems_2/GPU_Gems2_ch30.pdf
3
Example: Fluid Simulation and Rendering
• Compute advection of fluid– (Incompressible) Navier-Stokes solvers– Lattice Boltzmann Method (LBM)
• Discretized domain; stored in 2D/3D textures– Velocity, pressure– Dye, smoke density,
vorticity, …
• Updates in multi-passes
• Render current frame
Courtesy Mark Harris
4
Example: Volumetric Special Effects
• NVIDIA Demos– Smoke, water– Collision detection with
voxelized solid (Gargoyle)
• Ray-casting– Smoke: direct volume rendering– Water: level set / isosurface
Courtesy Keenan Crane
6
Example: Level-Set Computations
• Implicit surface represented by distance field
• The level-set PDE is solved to update the distance field
• Basic framework with a variety of applications
7
Example: Diffusion Filtering
De-noising• Original
• Linear isotropic
• Non-linear isotropic
• Non-linear anisotropic
8
Example: Linear Algebra Operators
Vector and matrix representation and operators• Early approach based on graphics primitives
• Now CUDA makes this much easier
• Linear systems solvers
Courtesy Krüger and Westermann
9
Example: GPU Data Structures
Glift: Generic, Efficient, Random-Access GPU Data Structures• “STL“ for GPUs
• Virtual memory management
Courtesy Lefohn et al.
10
What‘s in a GPU?
Lots of floating point processing power• Stream processing cores
different names:stream processors,CUDA cores, ...
• Was vector processing, now scalar cores!
Still lots of fixed graphics functionality• Attribute interpolation (per-vertex -> per-fragment)
• Rasterization (turning triangles into fragments/pixels)
• Texture samping and filtering
• Depth buffering (per-pixel visibility)
• Blending/compositing (semi-transparent geometry, ...)
• Frame buffers
What can the hardware do?
RasterizationDecomposition into fragmentsInterpolation of colorTexturing
Interpolation/Filtering Fragment Shading
Fragment OperationsDepth Test (Z-Test)Alpha Blending (Compositing)
Pixels
Graphics Pipeline
Vertices Primitives Fragments
GeometryProcessing
FragmentOperations
Scene Description Raster Image
Rasterization
Geometry Processing
Per-VertexLighting
Clipping,Perspect.Divide
PrimitiveAssemblyTransformation
Multiplication withModelview and
Projection Matrix
Per-VertexLocal Illumination
(Blinn/Phong)
GeometricPrimitives
(Points, LinesTriangles)
Primitives
Clip SpaceTo
Screen Space
Vertices
GeometryProcessing Rasterization Fragment
Operations
GeometryProcessing Rasterization Fragment
OperationsFragment
Operations
TextureFetch
TextureApplication
Polygon Rasterization
PrimitiveVertices
Decompositionof primitives
into fragments
Interpolation oftexture coordinates
Filtering oftexture color
Primitives Fragments
Rasterization
Combination ofprimary color with
texture color
Combination ofprimary color with
texture color
Fragment Operations
StencilTest
AlphaBlending
DepthTest
AlphaTest
Discard allfragments within
a certain alpha range
Discard afragment ifthe stencil buffer is set
Discard alloccludedfragments
GeometryProcessing Rasterization Fragment
Operations
Pixels
Graphics Pipeline
Vertices Primitives Fragments
GeometryProcessing
FragmentOperations
Scene Description Raster Image
RasterizationVertexShader
FragmentShader
Programmable Pipeline
17
Direct3D 10 Pipeline (~OpenGL 3.2)
New geometry shader stage:• Vertex -> geometry -> pixel shaders
• Stream output after geometry shader
Courtesy David Blythe, Microsoft
18
Direct3D 11 Pipeline (~OpenGL 4.0)
New tessellation stages• Hull shader
(OpenGL: tessellation control)
• Tessellator(OpenGL: tessellation primitive generator)
• Domain shader(OpenGL: tessellation evaluation)
• Trend of adding new stageslikely to continue...
• ... or full flexibility such asin Intel MIC (Larrabee) architecture?