status – week 272
DESCRIPTION
Status – Week 272. Victor Moya. Vertex Shader. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented with a FP array (3DLabs P10). Dynamic branching. No texture/vertx buffer load. No vertex kill. Vertex Shader. Shader Model. - PowerPoint PPT PresentationTRANSCRIPT
Status – Week Status – Week 272272
Victor MoyaVictor Moya
Vertex ShaderVertex Shader
VS 2.0+ (NV30) based Vertex VS 2.0+ (NV30) based Vertex Shader model.Shader model.
Multithreaded?? Implemented with Multithreaded?? Implemented with a FP array (3DLabs P10).a FP array (3DLabs P10).
Dynamic branching.Dynamic branching. No texture/vertx buffer load.No texture/vertx buffer load. No vertex kill.No vertex kill.
Vertex ShaderVertex ShaderVertexLoader
Instructions
+1
IR
VTIdL
VTId
VTId
+1
+1
IR
Constants
OP1 OP2 OP3
Swizzle
Negate
ALU
VTId
VTId
CC
CC
CC
CC
TMP0
TMP1
TMP2
TMP3
VIN0 VIN1 VIN2 VIN3
Address0
Address1Address2Address3
VOT0 VOT1 VOT2 VOT3
PC
PCPCPC STACK
STACK
STACK
STACK
IR
BRANCH
lastp
c
lastp
c
from vertex bufferfrom commandprocessor
Shader ModelShader Model
Mono/Multithreaded Shader based in Mono/Multithreaded Shader based in NV30 instruction set.NV30 instruction set.
A Shader is a stream processor:A Shader is a stream processor: Input Stream => Input Register BankInput Stream => Input Register Bank
16 registers in a Vertex Shader16 registers in a Vertex Shader 12 registers in Pixel Shader 12 registers in Pixel Shader
Output Stream => Output Register BankOutput Stream => Output Register Bank ~16 registers in Vertex Shader~16 registers in Vertex Shader ~4 registers in Pixel Shader~4 registers in Pixel Shader
Constant Memory/Register BankConstant Memory/Register Bank Up to 256 in Vertex ShaderUp to 256 in Vertex Shader
Shader ModelShader Model Instruction Cache/MemoryInstruction Cache/Memory
Up to 256 in Vertex ShaderUp to 256 in Vertex Shader 1024 in Pixel Shader1024 in Pixel Shader Shared between different processors (?)Shared between different processors (?)
Temporary and Auxiliary RegistersTemporary and Auxiliary Registers 16 (Vertex Shader), 32/64 (Pixel Shader)16 (Vertex Shader), 32/64 (Pixel Shader) Address RegistersAddress Registers Condition Code RegisterCondition Code Register Boolean RegisterBoolean Register Loop countersLoop counters etc.etc.
Shader ModelShader Model Multithreaded:Multithreaded:
numThreads: Number of streams that the shader numThreads: Number of streams that the shader can store. Includes idle and loading/unloading can store. Includes idle and loading/unloading threads. Structures affected: Input and Output threads. Structures affected: Input and Output register banks.register banks.
numActiveThreads: Number of active (in numActiveThreads: Number of active (in execution) threads. Structures affected: execution) threads. Structures affected: temporary and auxiliary registers. PC table (in temporary and auxiliary registers. PC table (in the Simulator Box).the Simulator Box).
Constant/Parameter Memory and Instruction Constant/Parameter Memory and Instruction Cache/Memory shared between all the threads. Cache/Memory shared between all the threads. It is also shared between different Shaders (but It is also shared between different Shaders (but this isn’t provided with the current model).this isn’t provided with the current model).
Test ModelTest Model
Three boxes:Three boxes: Loader: gets commands (input stream, new Loader: gets commands (input stream, new
programs and parameters) from a file.programs and parameters) from a file. Fetch: fetch instructions from a Shader Fetch: fetch instructions from a Shader
program memory.program memory. Decode/Execute: decodes and executes Decode/Execute: decodes and executes
instructions, takes into account instructions, takes into account dependencies.dependencies.
Writer: receives output stream and writes it Writer: receives output stream and writes it in a file.in a file.
Test ModelTest Model Wires:Wires:
Command: sends commands read from the Command: sends commands read from the input file to the fetch box. Latency varies for input file to the fetch box. Latency varies for each kind of command and the data size.each kind of command and the data size.
New Shader Program: loads new instructions.New Shader Program: loads new instructions. New Shader Parameters: loads new parameters in New Shader Parameters: loads new parameters in
constant memory.constant memory. New Input: sends a new input (Vertex Input 16 4D New Input: sends a new input (Vertex Input 16 4D
registers).registers). Sync: for synchronization between Loader and Sync: for synchronization between Loader and
Fetch (execution of a Shader Program depends Fetch (execution of a Shader Program depends from the Shader Output with the dynamic from the Shader Output with the dynamic branch model). Latency 1.branch model). Latency 1.
Test ModelTest Model Wires:Wires:
Instruction: Fetch send new instructions to Instruction: Fetch send new instructions to Decode/Execute. Instruction EXIT marks end of Decode/Execute. Instruction EXIT marks end of Shader Program (Decode/Execute send Output to Shader Program (Decode/Execute send Output to Writer). Latency 1.Writer). Latency 1.
NewPC: Fetch recieves control flow changes from NewPC: Fetch recieves control flow changes from Decode/Execute. Latency 1.Decode/Execute. Latency 1.
Execute: Drives execution latency for each Execute: Drives execution latency for each instruction. Variable latency (1 – 5?).instruction. Variable latency (1 – 5?).
Output: Decode/Execute sends the Shader Output: Decode/Execute sends the Shader Program result for the current output to the Program result for the current output to the logger box (Writer). Latency constant but greater logger box (Writer). Latency constant but greater than 1 (4 or 5?).than 1 (4 or 5?).
Test ModelTest Model
Instruction Set:Instruction Set: Encoding in 128 bits. See file.Encoding in 128 bits. See file.
Emulation:Emulation: Separate library: ShaderEmulator.Separate library: ShaderEmulator.
ShaderEmulatorShaderEmulator Performs the functional emulation of the Performs the functional emulation of the
shader:shader: Instruction (static) management and execution.Instruction (static) management and execution. Keeps the shader state.Keeps the shader state.
Implementation: Implementation: Support for differnt MODELS?: VS1, VS2, PS1, Support for differnt MODELS?: VS1, VS2, PS1,
PS2.PS2. How to implement models? Different classess? How to implement models? Different classess?
Switch/case?Switch/case? Where to keep structures related with control Where to keep structures related with control
flow? Ex: stack, PC table.flow? Ex: stack, PC table.
ShaderEmulatorShaderEmulator
Interface:Interface: ShaderEmulator(numThreads, numActiveThreads, ShaderEmulator(numThreads, numActiveThreads,
shaderModel)shaderModel) LoadShaderProgram(code)LoadShaderProgram(code) ResetShaderState(numThread)ResetShaderState(numThread) ReadShaderState(numThread, data)ReadShaderState(numThread, data) LoadShaderState(numThread, data)LoadShaderState(numThread, data) ExecuteShaderInstruction(numThread, PC)ExecuteShaderInstruction(numThread, PC)
ShaderInstructionShaderInstruction
Decoded shader instruction.Decoded shader instruction. What to do with shader models? Invalid What to do with shader models? Invalid
instructions in different models.instructions in different models. Interface:Interface:
ShaderInstruction(code)ShaderInstruction(code) Different functions/attributes to get decoded Different functions/attributes to get decoded
information from the instruction (input information from the instruction (input registers, output registers, mask, swizzle, registers, output registers, mask, swizzle, condition codes, etc.).condition codes, etc.).
ShaderExecInstructionShaderExecInstruction
Stores a instance of an instruction that Stores a instance of an instruction that is being executed.is being executed.
Carries information about the execution:Carries information about the execution: ShaderInstruction: decoded instruction.ShaderInstruction: decoded instruction. PC: instruction memory address.PC: instruction memory address. state: decode/execution/writeback/locked/…state: decode/execution/writeback/locked/… result: result of the instruction. result: result of the instruction. startCycle: cycle in which the instruction startCycle: cycle in which the instruction
was fetched.was fetched. Other statistics?Other statistics?
ShaderExecInstructionShaderExecInstruction
Implementation:Implementation: Avoid dynamic creation of objects.Avoid dynamic creation of objects. Static pool.Static pool. Created at fetch, destroyed at Created at fetch, destroyed at
decode/execute (writeback).decode/execute (writeback). Can be managed by the own Can be managed by the own
ShaderExecInstruction class? (static).ShaderExecInstruction class? (static).
Test ModelTest ModelLOADER
FETCH
DECODE/EXECUTE
WRITER
command sync
instruction newPC
execute
output
InputFile
OutputFile
Code ManagementCode Management
Directory structure:Directory structure: /emu (or /emulator): functional /emu (or /emulator): functional
emulation classes and functions.emulation classes and functions. /sim (or /simulator): simulation /sim (or /simulator): simulation
classes and functions.classes and functions. /support: support functions (IO, /support: support functions (IO,
Types, etc.).Types, etc.).