status – week 240
DESCRIPTION
Status – Week 240. Victor Moya. Summary. Post Geometry Pipeline. Rasterization. Triangle Setup. Triangle Traversal. Interpolation. Current status. Post Geometry Pipeline. Divide by w? Clipping? NVidia doesn’t seem to have geometric clipping. Alpha kill in NV2x for user clip planes. - PowerPoint PPT PresentationTRANSCRIPT
Status – Week Status – Week 240240
Victor MoyaVictor Moya
SummarySummary
Post Geometry Pipeline.Post Geometry Pipeline. Rasterization.Rasterization. Triangle Setup.Triangle Setup. Triangle Traversal.Triangle Traversal. Interpolation.Interpolation. Current status.Current status.
Post Geometry PipelinePost Geometry Pipeline
Divide by w?Divide by w? Clipping?Clipping?
NVidia doesn’t seem to have geometric NVidia doesn’t seem to have geometric clipping.clipping.
Alpha kill in NV2x for user clip planes.Alpha kill in NV2x for user clip planes. ATI seems to have geometric clipping.ATI seems to have geometric clipping.
Proper user clipping.Proper user clipping. No support for transformed and lit vertex clipping.No support for transformed and lit vertex clipping.
What do we do?What do we do?
Post Geometry PipelinePost Geometry Pipeline
Clipping:Clipping: 6 frustum clip planes.6 frustum clip planes. At least 6 user clip planes.At least 6 user clip planes. Hardware requeriments:Hardware requeriments:
Plane – edge intersection (?).Plane – edge intersection (?). Generates new vertices (for triangles 1 or 2).Generates new vertices (for triangles 1 or 2).
– Interpolate output attributes at the new vertex.Interpolate output attributes at the new vertex. Can generate new triangles (for triangles 1).Can generate new triangles (for triangles 1).
– Affects primitive assembly.Affects primitive assembly.
At least frustum clipping should be fast.At least frustum clipping should be fast.
Post Geometry PipelinePost Geometry Pipeline
Viewport TransformationViewport Transformation Delay to end of rasterization (at Delay to end of rasterization (at
conversion from fixed point to float conversion from fixed point to float point fragment attributes).point fragment attributes).
Use fixed point device coordinates [-Use fixed point device coordinates [-1, 1] for rasterization.1, 1] for rasterization.
Rasterization.Rasterization.
MC
StF
StOC
StC
PA TS
TT
Int
StL
Shader
1
1
1
1 1
1
A*TL+L
A*TL+L
2 1 1 1
MC: Memory Controller Shader: Vertex Shader
StF: Streamer Fetch PA: Primitive Assembly
StL: Streamer Loader TS: Triangle Setup
StOC: Streamer Output Cache TT: Triangle Traversal
StC: Streamer Commit Int: Interpolation
RasterizationRasterization
We can divide it in three phases:We can divide it in three phases: Setup.Setup.
Calculate linear equation coefficients, start values Calculate linear equation coefficients, start values and slopes.and slopes.
Perform area and face culling.Perform area and face culling. Traversal.Traversal.
Traverse the triangle generating fragments inside Traverse the triangle generating fragments inside the triangle.the triangle.
Clipping of fragments by frustum and user clip. Clipping of fragments by frustum and user clip. Interpolation.Interpolation.
Interpolate all fragment attributes for the Interpolate all fragment attributes for the generated fragment.generated fragment.
Primitive Assembly
vertex attributes
vertex from theStreamer Commit
Triangle Setup
Triangle Traversal
start &offset
Interpolate
vertex attributes
Fragment FIFO
EdgeEquationvalues
vertexposition
vertexposition
Triangle SetupTriangle Setup Use 2DH rasterization setup.Use 2DH rasterization setup. Create matrix (inverse or just adjoint Create matrix (inverse or just adjoint
matrix?) from the three vertex 2DH positions.matrix?) from the three vertex 2DH positions. Calculate determinant.Calculate determinant. Cull for sign (face culling) and zero (zero Cull for sign (face culling) and zero (zero
area).area). Send the edge equation coefficients or/and Send the edge equation coefficients or/and
start and slope values to Triangle Traversal.start and slope values to Triangle Traversal. Optional: send other equations (1/w, clip Optional: send other equations (1/w, clip
planes, interpolators …).planes, interpolators …).
Triangle SetupTriangle Setup Adjoint rasterization matrix adj(M):Adjoint rasterization matrix adj(M):
First level: 18 muls.First level: 18 muls. Second level: 9 adds.Second level: 9 adds. aa00 = y = y11ww22 – y – y22ww11
aa11 = y = y22ww00 – y – y00ww22
aa2 2 = y= y00ww11 – y – y11ww00
bb00 = x = x22ww11 – x – x11ww22
bb11 = x = x00ww22 – x – x22ww00
bb22 = x = x11ww00 – x – x00ww11
cc00 = x = x11yy22 – x – x22yy11
cc11 = x = x22yy00 - x - x00yy22
cc22 = x = x00yy11 – x – x11yy00
Triangle SetupTriangle Setup
Matrix determinant det(M):Matrix determinant det(M): 1 DP3: {w1 DP3: {w00, w, w11, w, w22} X {c} X {c00, c, c11, c, c22}}
Inverse matrix MInverse matrix M-1-1 (not needed?): (not needed?): First level: 1 reciproque: 1/det(M).First level: 1 reciproque: 1/det(M). Second level: 9 muls.Second level: 9 muls.
Edge equations:Edge equations: MM-1-1 rows. rows. EE00 = [a = [a00, b, b00, c, c00]] EE11 = [a = [a11, b, b11, c, c11]] EE22 = [a = [a22, b, b22, c, c22]]
Triangle SetupTriangle Setup 1/w equation:1/w equation:
Sum of rows (param vector {1, 1, 1}).Sum of rows (param vector {1, 1, 1}). Can be calculated as the sum of the edge Can be calculated as the sum of the edge
equations.equations. Additional equations:Additional equations:
param vector {uparam vector {u00, u, u11, u, u22} X M} X M-1-1 : 3 DP3. : 3 DP3. Frustum/Viewport clip:Frustum/Viewport clip:
DD00 = [1, 0, -x = [1, 0, -x00]] DD11 = [-1, 0, x = [-1, 0, x00 + w] + w] DD22 = [0, 1, -y = [0, 1, -y00]] DD33 = [0, -1, y = [0, -1, y00 + h] + h]
*
*
*
++
*
*
DP3
Triangle TraversalTriangle Traversal
Different algorithms:Different algorithms: I don’t know which is better.I don’t know which is better. Scanline.Scanline. Centerline (PixelVision).Centerline (PixelVision). Tiled (Neon, McCormack).Tiled (Neon, McCormack). Incremental and Hierarchical Hilbert Incremental and Hierarchical Hilbert
Order (McCool).Order (McCool). Others?Others?
Triangle TraversalTriangle Traversal
Traversal algorithm effects:Traversal algorithm effects: Can improve the texture pattern access Can improve the texture pattern access
(Neon, Hilbert).(Neon, Hilbert). Can improve framebuffer memory access Can improve framebuffer memory access
(Neon).(Neon). Traversal algorithm requeriments:Traversal algorithm requeriments:
Must produce at least 2x2 fragments per Must produce at least 2x2 fragments per cycle or multiples (2 2x2 or 3 2x2, etc).cycle or multiples (2 2x2 or 3 2x2, etc).
Must be efficient and generate the less Must be efficient and generate the less fragments outside the triangle.fragments outside the triangle.
Antialiasing?Antialiasing?
Triangle TraversalTriangle Traversal Uses edge equation coefficients and/or start Uses edge equation coefficients and/or start
and slope values calculated from then to and slope values calculated from then to walk the triangle.walk the triangle.
One ‘step’ per cycle.One ‘step’ per cycle. Fixed point arithmetic : integer addition.Fixed point arithmetic : integer addition. Requires to save state (2 to 3 saved states) Requires to save state (2 to 3 saved states)
or must use walk back (spends cycles).or must use walk back (spends cycles). Tests (sign) the edge equations values at n Tests (sign) the edge equations values at n
positions per cycle.positions per cycle. May test frustum and znear/zfar clip at the May test frustum and znear/zfar clip at the
same time.same time.
Triangle TraversalTriangle Traversal
Hardware requeriments:Hardware requeriments: Multiple fixed point adders.Multiple fixed point adders. Multiple sign testers.Multiple sign testers. Registers for current (at least 3 for Registers for current (at least 3 for
each edge equation) and saved states.each edge equation) and saved states. Registers for edge slops/increments Registers for edge slops/increments
(as many as fragments generated per (as many as fragments generated per cycle and edge equations?).cycle and edge equations?).
TraversalAlgorithm
+
+
+
TE
ST
Interpolation.Interpolation.
Using barycentric method:Using barycentric method: Use the edge equation result (McCool):Use the edge equation result (McCool):
FF00(x,y) = E(x,y) = E00
FF11(x,y) = E(x,y) = E11
FF22(x,y) = E(x,y) = E22
Calculate sum of edge equations at the Calculate sum of edge equations at the fragment: fragment:
R’(x,y) = FR’(x,y) = F00 (x,y) + F (x,y) + F11(x,y) + F(x,y) + F22(x,y)(x,y) Calculate reciproque:Calculate reciproque:
r = 1/R’(x,y)r = 1/R’(x,y) Interpolate attribute at the fragment:Interpolate attribute at the fragment:
ppkk(x,y) = p(x,y) = pk0k0rFrF00 (x,y) + p (x,y) + pk1k1rFrF11(x,y) + p(x,y) + pk2k2rFrF22(x,y)(x,y)
InterpolationInterpolation
Alternative (Olano & Greer):Alternative (Olano & Greer): At setup:At setup:
Use 2DH method and calculate coefficients for all Use 2DH method and calculate coefficients for all the attributes.the attributes.
Calculate 1/w (sum of rows) coefficients.Calculate 1/w (sum of rows) coefficients. Requires a vector matrix mul per attribute.Requires a vector matrix mul per attribute.
At traverse/interpolation:At traverse/interpolation: Interpolate 1/w and attributes using fixed point Interpolate 1/w and attributes using fixed point
incremental arithmetic.incremental arithmetic. Calculate reciproque of 1/w.Calculate reciproque of 1/w. Mul interpolated attribute by reciproque of 1/wMul interpolated attribute by reciproque of 1/w
InterpolationInterpolation Barycentric coordinates (McCool):Barycentric coordinates (McCool):
no cost at setup.no cost at setup. store the parameter values at the three triangle store the parameter values at the three triangle
edges.edges. fixed: 1 addition, 1 reciproque and 3 mulsfixed: 1 addition, 1 reciproque and 3 muls per parameter: 1 DP3.per parameter: 1 DP3.
Interpolation using Olano & Greer:Interpolation using Olano & Greer: vector matrix mul at setup per parameter and 1/w: 3 vector matrix mul at setup per parameter and 1/w: 3
DP3.DP3. store current state and slope increment for all the store current state and slope increment for all the
parameters and 1/w.parameters and 1/w. fixed: 1 addition, 1 reciproquefixed: 1 addition, 1 reciproque per parameter: 1 addition, 1 mul.per parameter: 1 addition, 1 mul.
InterpolationInterpolation
How many attributes/parameters How many attributes/parameters can be interpolated per cycle?can be interpolated per cycle? XBOX: XBOX:
5 interpolators?5 interpolators? general interpolator: color diffuse + color general interpolator: color diffuse + color
specular (shared).specular (shared). Texture interpolators: 4?Texture interpolators: 4? Note: each of those interpolators is for a Note: each of those interpolators is for a
4D vector.4D vector.
VERTEX ATTRIBUTES
+ 1/x
*
*
*
*
*
*
+
FRAGMENT ATTRIBUTES
Current statusCurrent status Implemented Primitive Assembly box (with Implemented Primitive Assembly box (with
trivial degenerate triangle rejection).trivial degenerate triangle rejection). Added Added GPU_VERTEX_OUTPUT_ATTRIBUTE GPU_VERTEX_OUTPUT_ATTRIBUTE
register.register. Boolean vector of Boolean vector of MAX_VERTEX_ATTRIBUTES that stores MAX_VERTEX_ATTRIBUTES that stores
if a vertex output register is written in the shader (and if a vertex output register is written in the shader (and therefore must be transmited).therefore must be transmited).
Now the transmission latency for vertex Now the transmission latency for vertex between the Shader and Streamer Commit between the Shader and Streamer Commit and between Streamer Commit and and between Streamer Commit and Primitive Assembly is determined by the Primitive Assembly is determined by the number of ouput attributes. number of ouput attributes.
Current StatusCurrent Status
Started Triangle Setup box and Started Triangle Setup box and support classes.support classes.
Current StatusCurrent Status
Comments:Comments: Streamer Loader to Shader Streamer Loader to Shader
transmission should also have transmission should also have transmission latency penalty?transmission latency penalty?
Where are stored the vertex output Where are stored the vertex output attributes?attributes?
How many times we must pay the How many times we must pay the vertex transmission penalty?vertex transmission penalty?
Current StatusCurrent Status
Signal Analyzer:Signal Analyzer: Already works with large traces.Already works with large traces.
ReferencesReferences
Triangle Scan Conversion using 2D Triangle Scan Conversion using 2D Homogeneous CoordinatesHomogeneous Coordinates, Marc , Marc Olano, Trey Greer.Olano, Trey Greer.
Tiled Polygon Traversal Using Half-Tiled Polygon Traversal Using Half-Plane Edge FunctionsPlane Edge Functions, Joel , Joel McCormack, Robert McNamara.McCormack, Robert McNamara.
Incremental and Hierarchical Hilber Incremental and Hierarchical Hilber Order Edge Equation Polygon Order Edge Equation Polygon RasterizationRasterization, Michael D. McCool, , Michael D. McCool, Chris Wales, Kevin Moule.Chris Wales, Kevin Moule.
ReferencesReferences
A Parallel Algorithm for A Parallel Algorithm for Polygon RasterizationPolygon Rasterization, Juan , Juan Pineda.Pineda.