opengl [es] - microsoft€¦ · opengl [es] optimizations +shanee nishry @lunarsong. performance....

71
OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

OpenGL [ES] Optimizations

+Shanee Nishry@Lunarsong

Page 2: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Performance. Why should I care?

Page 3: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Chart data source info here

Tim

e pe

r fra

me

(ms)

6

11

16

21

26

Performance. Why should I care?Dropped to 30 FPS due to

V-Sync

Page 4: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Performance. Why should I care?

Smooth Game Experience

Fast Loading TimesMore Features

Page 5: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Rule number #1 of optimizing…

Identify GPU

Download Profiler

Find Problems

Repeat

Page 6: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Random good news

Page 7: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Random good news

Page 8: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

What is Vulkan

Page 9: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

What is Vulkan• Low overhead graphics & compute API.

• Explicit control over command execution on the GPU. “Not a closed box”.

• Multi-threading friendly, allows better CPU usage.

• SPIR-V: Shaders represented in intermediate state, therefore fast compile & private.

• Vulkan is Coming to Android! (And so is OpenGL ES 3.2)

Page 10: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Back to topic…OpenGL [ES] Optimizations

Page 11: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Some basics…

Enable Face Culling

Generate Mip Maps

Reduce OpenGL Redundancy

Z Pre-pass / Sorted Draw

Interleaved Vertices

Disable Alpha Blend

Only for 3D games! or 2d with scaled sprites

Not needed on tile renderer such as PowerVR and Mali

Page 12: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Adaptive Scalable Texture Compression

Page 13: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

DOWNLOADINGADDITIONAL ASSETS

200MB / 2.7GB

Page 14: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

DOWNLOADINGADDITIONAL ASSETS

200MB / 2.7GB

I hope you areon WI-FI

Page 15: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Loading…

Please wait

Page 16: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Loading…

Please wait

NOW WHAT?

Page 17: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Decompress* (Except in cases of BMP)

GPU

Copy to GPU

CPU

Read from disc

PNG

JPG

WebP

BMP

Page 18: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

CPU

PNG

JPG

WebP

GPU

Decompress* (Except in cases of BMP)

Copy to GPURead from disc

BMP

Decompression is Expensive!

Page 19: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Use hardware accelerated textures…

ASTCDXT PVRTCS3TC

ASTC is awesome! But requires Extension / GLES 3.2…

Hardware Texture GPU

Still Compressed

Read from disc & copied to GPU

Page 20: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should
Page 21: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Original Bitmap: 1.1MB PNG: 299KB

4x4 ASTC Size: 262KB

Page 22: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Difference Map Difference Map

5x Enhanced

Page 23: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Original: Resolution: 512 x 512 Format: RGBA

Size on disk: Bitmap: 1.1MB PNG: 299KB

4x4 (8 bitrate) 262KB

6x6 (3.56 bit)119KB

8x8 (2 bitrate) 70KB

Difference Map

5x Enhanced

Mali Texture Compression Tool: http://goo.gl/2Rhnyl

Page 24: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Optimizing Shader Compilation (Skip if not using many shaders…Profile!)

Page 25: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint vertexShader = glCreateShader( GL_VERTEX_SHADER );glShaderSource( vertexShader, 1, &vertexSource, 0 );glCompileShader( vertexShader ); GLuint fragmentShader = glCreateShader( GL_FRAGMENT_SHADER );glShaderSource( fragmentShader, 1, &fragmentSource, 0 );glCompileShader( fragmentShader );

GLuint program = glCreateProgram();glAttachShader( program, vertexShader );glAttachShader( program, fragmentShader );glLinkProgram( program );

glDetachShader( program, vertexShader );glDetachShader( program, fragmentShader );

Shader Compilation

Page 26: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint vertexShader = glCreateShader( GL_VERTEX_SHADER );glShaderSource( vertexShader, 1, &vertexSource, 0 );glCompileShader( vertexShader ); GLuint fragmentShader = glCreateShader( GL_FRAGMENT_SHADER );glShaderSource( fragmentShader, 1, &fragmentSource, 0 );glCompileShader( fragmentShader );

GLuint program = glCreateProgram();glAttachShader( program, vertexShader );glAttachShader( program, fragmentShader );glLinkProgram( program );

glDetachShader( program, vertexShader );glDetachShader( program, fragmentShader );

Shader Compilation

Slow. Increased Loading Times.

Page 27: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Android Cache (ANDROID_blob_cache)

Binary Upload (OES_get_program_binary)

Shader Compilation: Fixed with…

Some EffortNo work needed! :-)

Not using many shader? Cache might be enough… Profile!

Page 28: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint program = glCreateProgram(); // ... Shader compiled as normal

// Retrieve shader length GLint binaryLength; glGetProgramiv( program, GL_PROGRAM_BINARY_LENGTH, &binaryLength );

// Create shader binary GLenum binaryFormat; void* binary = (void*)malloc( binaryLength ); glGetProgramBinary( program, binaryLength, NULL, &binaryFormat, binary );

// Save binary & binaryFormat to file…

Shader Compilation: Binary UploadCompile as usual

Get program binary length for creating a sufficient buffer

Retrieve program binary and save it to disk

Page 29: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

// Program variables GLint binaryLength; GLenum binaryFormat; void* binary;

// Read binary & format from file

GLuint program = glCreateProgram(); glProgramBinary( progObj, binaryFormat, binary, binaryLength ); free(binary);

// Error checking! :D Glint success; glGetProgramiv( progObj, GL_LINK_STATUS, &success );

Shader Compilation: Binary Upload

Read binary from file fill binary buffer and format

Create Shader Program and set it using the binary

Page 30: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Shader Binary caveats…

Binary varies across hardware DO NOT SHIP WITH APK

Can break with driver update

CHECK FOR ERRORS! (rebuild if needed)

Much faster loading! {~^-v-^~}

Page 31: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Reducing Driver Overhead

Page 32: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

DirectX 12

Reducing driver overhead…

Metal API

Vulkan

Page 33: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

DirectX 12

Reducing driver overhead…

Metal API

VulkanWhat can we do now?

Page 34: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Step OneEasy improvements

Page 35: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint shaderProgram; GLuint positionHandle;

positionHandle = glGetAttribLocation( shaderProgram, “u_vPosition”);

glVertexAttribPointer( positionHandle, 3, GL_FLOAT, GL_FALSE, 0, 0 );

glEnableVertexAttribArray( positionHandle );

// ...

Example 1: Layout Qualifiers

Page 36: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint shaderProgram; GLuint positionHandle;

positionHandle = glGetAttribLocation( shaderProgram, “u_vPosition”);

glVertexAttribPointer( positionHandle, 3, GL_FLOAT, GL_FALSE, 0, 0 );

glEnableVertexAttribArray( positionHandle );

// ...

Example 1: Layout Qualifiers

Slow, non-deterministic (can use glBindAttribLocation)

Page 37: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Example 2: Setting Uniform

Page 38: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint shaderProgram; float matWorldViewProjection[16];

// Get the uniform handle GLint handleWVP = glGetUniformLocation( shaderProgram, “g_matWorldViewProj” );

// Assign our uniform glUniformMatrix4fv( handleWVP, 1, false, matWorldViewProjection );

Example 2: Setting Uniform

Page 39: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint shaderProgram; float matWorldViewProjection[16];

// Get the uniform handle GLint handleWVP = glGetUniformLocation( shaderProgram, “g_matWorldViewProj” );

// Assign our uniform glUniformMatrix4fv( handleWVP, 1, false, matWorldViewProjection );

Example 2: Setting Uniform

Slow, non deterministic. Again.

Page 40: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

uniform mat4 g_matWorldViewProj; uniform vec4 g_vecCameraPos;

in vec4 u_vPosition; in vec2 u_vTexCoords;

// Insert awesome vertex shader here

Uniforms & Attributes Unknown uniform location

Unknown attribute location

Page 41: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

uniform mat4 g_matWorldViewProj; uniform vec4 g_vecCameraPos;

in vec4 u_vPosition; in vec2 u_vTexCoords;

// Insert awesome vertex shader here

layout(location = 0) uniform mat4 g_matWorldViewProj; layout(location = 1) uniform vec4 g_vecCameraPos;

layout(location = 0) in vec4 u_vPosition; layout(location = 1) in vec2 u_vTexCoords;

void main() {

Explicit Locations!

Uniforms & Attributes

Page 42: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

uniform mat4 g_matWorldViewProj; uniform vec4 g_vecCameraPos;

in vec4 u_vPosition; in vec2 u_vTexCoords;

// Insert awesome vertex shader here

layout(location = 0) uniform mat4 g_matWorldViewProj; layout(location = 1) uniform vec4 g_vecCameraPos;

layout(location = 0) in vec4 u_vPosition; layout(location = 1) in vec2 u_vTexCoords;

void main() {

Explicit Locations!

Uniforms & Attributes

More performant,deterministic

Page 43: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Example 3: Binding for Draw

Page 44: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

// drawglBindBuffer( GL_ARRAY_BUFFER, vertex_buffer_object );

glEnableVertexAttribArray( 0 ); glVertexAttribPointer( 0, 3, GL_FLOAT, GL_FALSE, 32, 0 );

glEnableVertexAttribArray( 1 ); glVertexAttribPointer( 1, 2, GL_FLOAT, GL_FALSE, 32, 12 );

glEnableVertexAttribArray( 2 ); glVertexAttribPointer( 2, 3, GL_FLOAT, GL_FALSE, 32, 20 );

// Draw elementsglDrawElements( GL_TRIANGLES, count, GL_UNSIGNED_SHORT, 0 );

Example 3: Binding for Draw

Page 45: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

// drawglBindBuffer( GL_ARRAY_BUFFER, vertex_buffer_object );

glEnableVertexAttribArray( 0 ); glVertexAttribPointer( 0, 3, GL_FLOAT, GL_FALSE, 32, 0 );

glEnableVertexAttribArray( 1 ); glVertexAttribPointer( 1, 2, GL_FLOAT, GL_FALSE, 32, 12 );

glEnableVertexAttribArray( 2 ); glVertexAttribPointer( 2, 3, GL_FLOAT, GL_FALSE, 32, 20 );

// Draw elementsglDrawElements( GL_TRIANGLES, count, GL_UNSIGNED_SHORT, 0 );

Example 3: Binding for Draw

Slow, in-effecientAnd just plain ugly…

Page 46: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

glBindBuffer( GL_ARRAY_BUFFER, vertex_buffer_object );

// Create VAO GLuint vao; glGenVertexArrays( 1, &vao ); glBindVertexArray( vao );

// Set the VAO State glEnableVertexAttribArray( 0 ); glVertexAttribPointer( 0, 3, GL_FLOAT, GL_FALSE, 32, 0 );

glEnableVertexAttribArray( 1 ); glVertexAttribPointer( 1, 2, GL_FLOAT, GL_FALSE, 32, 12 );

glEnableVertexAttribArray( 2 ); glVertexAttribPointer( 2, 3, GL_FLOAT, GL_FALSE, 32, 20 );

Example 3: Vertex Array Object (VAO)

Bind vertex buffer object for correct vertex attirbs

Generate Vertex Array Object and set the vertex attributes

Page 47: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

{ // ... // on draw

// Bind Vertex Array Object glBindVertexArray( vao );

// Draw elements glDrawElements( GL_TRIANGLES, count, GL_UNSIGNED_SHORT, 0 );

}

Example 3: Vertex Array Object (VAO)

Bind Vertex Array Object vertex attirbs are saved!

Profit!

Page 48: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Vertex Array Object (VAO): Caveats

Page 49: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Step TwoRendering all the things

Page 50: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Mesh Scene

Page 51: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Draw Call

Communicates with drivers / GPU

Draw Call

Communicates with drivers / GPU

Draw Call

Communicates with drivers / GPU

Page 52: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Draw Call

Communicates with drivers / GPU

Wasted CPU Cycles

Draw Call

Communicates with drivers / GPU

Wasted CPU Cycles

Draw Call

Communicates with drivers / GPU

Page 53: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Draw Call

Communicates with drivers / GPU

Draw Call

Communicates with drivers / GPU

Draw Call

Communicates with drivers / GPU

Wasted CPU Cycles Wasted CPU Cycles

Page 54: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Draw Call

Communicates with drivers / GPU

Draw Call

Communicates with drivers / GPU

Draw Call

Communicates with drivers / GPU

Wasted CPU Cycles Wasted CPU Cycles

Must Reduce Draw calls

Page 55: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Reducing Draw Calls

Draw Less? Geometry Instancing

Batch, batch, batch!

Huh, you say what?!

Beware of indexed access at low number of instances

(profile!)

Page 56: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Geometry Data (VBO1)

Position UV Normal

Instance Data (VBO2 / UBO / SSBO)

WorldViewProj Color

WorldViewProj Color

WorldViewProj Color

Fill Instance Buffer

Bind buffersgI Draw Elements

Instanced

Get the sample code NDK 64-bit target…/samples/MoreTeapots/

Geometry Instancing:

Page 57: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

{ // pre-draw for ( int i = 0; i < num_instances; ++i ) { buffer[i].matWorldViewProj = transforms[i] * matViewProj; }}

{ // draw glBindBuffer( GL_UNIFORM_BUFFER, ubo ); glBindVertexArray( vao );

glDrawElementsInstanced( GL_TRIANGLES, count, GL_UNSIGNED_SHORT, num_instances );}

Geometry InstancingCopy instance data to buffer

Bind objects and draw VAO and UBO instance data

Page 58: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

• glMultiDrawElements

• glMultiDrawElementsEXT

But what about multiple meshes?

Not available in OpenGL ES

Pack multiple meshes Sort by program Profit!

• glMultiDrawElementsIndirect

Page 59: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

• Texture Arrays

Ok, but what about textures?

• Bindless Textures

• Texture Atlas • Texture Atlas (not ideal…)

Retrieve Texture Handle

Make Texture Resident

Make Handle Known to Shader

Profit!

Page 60: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

GLuint* m_TextureIds[NUM_TEXTURES]; // the textures

// On init after creating the textures GLuint64EXT* m_TextureHandles = new GLuint64[ NUM_TEXTURES ];

for( i = 0; i < NUM_TEXTURES; ++i ) { m_TextureHandles[i] = glGetTextureHandleNV( m_TextureIds[i] ); glMakeTextureHandleResidentNV( m_TextureHandles[i] ); }

Bindless Textures

// On Bind GLuint handleSamplersLocation( shader->getUniformLocation("samplers") ); glUniform1ui64vNV( handleSamplersLocation, NUM_TEXTURES, m_TextureHandles );

Init:

Bind:

Make index known via instance data buffer

Page 61: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Step ThreeLove Your Cache

Page 62: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

char data[1000000]; unsigned int sum = 0;

for (int i=0; i<1000000; ++i) { sum += data[i*1]; }

#DataOrientedDesign

char data[16000000]; unsigned int sum = 0;

for (int i=0; i<1000000; ++i) { sum += data[i*16]; }

Method #1 Method #2

Page 63: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Mill

isec

onds

0.001

0.01

0.1

1

10

Iterations

1000 5000 10000 50000 100000 500000 1000000

+1 Increment +16 Increment

#DataOrientedDesign

Nexus 6 Average 5x slower!

Page 64: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Cache Miss

Cache Miss

0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128

Cache Miss

Cache Miss

Cache Miss

Cache Miss

Cache Miss

Cache Miss

Cache Miss

Cache Miss

Cache Miss

#DataOrientedDesign

int i = 0; i += 1;

int i = 0; i += 8;

Example: CPU cache size 16 bytes

Page 65: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

#DataOrientedDesign

class PhysicsObject { public: void UpdatePosition( const vec3& velocity ); }

class Physics { public: void UpdatePositions( vec3* positions, const vec3* velocities, const uint32_t count ); }

Page 66: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

#DataOrientedDesign

class PhysicsObject { public: void UpdatePosition( const vec3& velocity ); }

class Physics { public: void UpdatePositions( vec3* positions, const vec3* velocities, const uint32_t count ); }

Design for the many NOT the single

Page 67: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Data Oriented Design

Premature Optimization?

Page 68: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Data Oriented Design

Premature Optimization?

Page 69: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

Data Oriented Design

Premature Optimization? Data Oriented Design: the RIGHT way to code

Page 70: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

How is this related to OpenGL?

Geometry Instancing

Faster Iteration

Reducing OpenGL

Redundancy

Populating Uniforms

Compute Shaders

Cleaner, strict code

CPU Bonus:

SIMD Vectorization

Page 71: OpenGL [ES] - Microsoft€¦ · OpenGL [ES] Optimizations +Shanee Nishry @Lunarsong. Performance. Why should I care? Chart data source info here) 6 11 16 21 26 Performance. Why should

+Shanee Nishry @Lunarsong

Thank you!

#GameDev #OpenGL