gdc march 1999scalability - r huddy scalability advanced d3d programming richard huddy...

28
GDC March 1999 Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy [email protected]

Upload: fay-rodgers

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Scalability

Advanced D3D Programming

Richard Huddy

[email protected]

Page 2: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Basic Objectives

• To produce the best experience on every users machine

• To exploit all of the resources available

• To cope with a broad spread of hardware

• To avoid ‘bottoming out’ during the shelf-life of the game / engine

Page 3: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

What is a high-end PC?

A 125+ mega-texel device

A 125+ mega-pixel device

A fast CPU ( >= 350MHz)

AGP 2X/4X Bus

Lots of system RAM ( >= 64MB)

Huge frame buffers (16 to 32 MB)

Multi-Texture at low cost

Page 4: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Power Trends

0

50

100

150

200

250

300

350

400

450

500

1st Gen(Virge)

2nd Gen(Voodoo)

3rd Gen(TNT)

4th Gen(???)

CPU Speed

Fill Rate

Appreciate the absolute values and the ratios.

?

Page 5: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

So what’s the problem?

Second generation hardware: A

aGraphics b c

CPU B C

time

A

aGraphics b c

CPU B C

time

Third generation hardware:

Wow, 10% faster!

BeginScene()

EndScene()

EndScene()

Page 6: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

What can you do to help?

Scalability is the key:• Run at higher screen resolutions• Run at higher color depths• Use more complex rendering techniques on

good hardware• Ship multiple geometry models• Protect your CPU• Unlock the frame rate

Page 7: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Higher Screen Resolutions

1) Include direct support for higher resolution modes (uses lots of disk space).

2) Store high resolution art and filter down to produce lower resolution art.

3) Store low resolution art and pixel double:If you have art at 512x384 use it for 1024x768

If you have art at 640x480 use it on 1280x1024

(but only use a 1280x960 viewport)

Page 8: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Higher Color Depths

• Runs at much the same speed but gives the user a much richer experience

• Uses frame buffer memory constructively• You can re-use the previous 16 bit assets• The main performance loss in true color is

often due to texture management

But beware the Frame Buffer + Z Buffer depth constraint on Riva TNT

Page 9: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Complex Rendering Techniques - I

• Environment Mapping– Beware of spending too much CPU on this.

• Dual Texture Lighting

• Bump Mapping

• Use more alpha transparency– But see also “Alpha sort issues” later on…

Please try to use the extra fill rate!

Page 10: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

• Trilinear mipmapping for almost everything

• Use Detail textures

• Large textures for extra realism

• 32 bit textures - where it’s a quality win

• Compressed textures as long as quality is not compromised

Complex Rendering Techniques - II

Page 11: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Protect your CPU

The big ones:

• __ftol and other ‘type conversion’ nightmares

• sqrt()– that’ll be seventy cycles please...

• Reciprocal square root– One hundred and nine cycles through the FPU…

• Transform and lighting (more on that later)

Page 12: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Removing __ftol

• Remember that the compiler doesn’t have a choice but you can check the output

• Write you own inline assembler conversion routine if…– You can accept differing rounding rules

This doesn’t break the optimiser!

Page 13: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Replacement for sqrt()

• Sqrt seems ‘natural’ if you are normalising vectors, calculating environment map coordinates or calculating distances - but it’s sloooow

• Sample code is available from the developer web site or from me directly and will be in future versions of the SDK.

Page 14: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Saturation Arithmetic (C)

Limiting a floating point number to lie in the range 0.0 to 1.0 inclusive (traditional method):

if (f < 0.0)

f = 0.0;

else if (f > 1.0)

f = 1.0;

Page 15: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Saturation Arithmetic (Pentium)

if (*(long *)&f < 0)

*(long *)&f = 0;

else if (*(long *)&f > 0x3f800000)

*(long *)&f = 0x3f800000;

• This is faster on a Pentium class processor since the FPU is “non-optimal” (i.e. slow) and the integer unit is much faster.

Page 16: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Saturation Arithmetic (Pentium II)

• Use the “cmov” instructions:cmp [f],0

cmovb [f],0

cmp [f],3f800000

cmova [f],3f800000

Faster since unpredictable branches are the bottleneck here. Unavailable on a Pentium.

Page 17: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Unlock the Frame Rate

• It’s essential that your physics model can run at high refresh rates.– At least 100fps

• 30 or 60 fps limits are not acceptable and lead to flat performance on high end hardware

Page 18: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

The Value of Batching

Case Specifics:

• The average # of ‘Polys Per Call’ (PPC) to DrawPrimitive was 2.6, producing 40fps

• Removing state changes to raise the average PPC to ~50 produced 58fps– Most of the removed state changes were

“reasonable”, i.e. not logically redundant– The changes did not reduce visual quality at all– PPC of 200 is optimal

Page 19: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Alpha Sort Issues

The “standard” solution is…

1) Draw all non-alpha polys (sort by texture)

2) Draw all alpha polys in back to front order with Z compare enabled and Z update disabled. This copes with overlapping alpha polys but you can’t sort by texture. (Intersection requires decimation).

Page 20: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Alpha Sort with Bounding Boxes

When you are ready to draw your alpha polys then draw non-overlapping sets using the sort-by-texture technique as before

AB

C

Viewport

Here, you can safely draw all of A before any of B or C…

B&C need sorting

Page 21: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Geometry - Part 1

• Use the DX6 Transform and Clip engine - it’ll be nearly as fast as your best efforts

• It takes advantage of CPU specific optimisations done by Intel, AMD etc.

• It uses the guard band clipping region to enhance performance

• Use the DX7 interface ASAP

Page 22: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Geometry - Part 2

• This gets you ready for hardware which can do the job much faster than the CPU

• Tell the chip designers if you need anything non-standard

• If you think DX is too slow then use a run-time benchmark to select between DX and your own code

Page 23: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

DIPVB()Geometry - Part 3

• Use the DX pipeline for geometry which may be rendered

• Use your own transform for bounding boxes, collisions, portals etc

• Treat hardware T&L as– Write only– Not necessarily pixel identical to CPU T&L

Page 24: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Geometry - Part 4

• Consider choosing between models at game start-up time

• More complex Geometry should be several times more complex

• Introduce some LOD management

• Your artists are probably generating more complex models and then throwing them away

Page 25: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Lighting - Part 1

• If the DX Lighting model is good enough then there are people who want to help you

• Multi-texture shadow maps and light maps can be very fast now– remember that (multi-pass != multi-texture)

• Tell the chip companies what you need

Page 26: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Lighting - Part 2

• Support more lights

• User a richer set of light types

• Scale with available power

• If you have more complex geometry you get better lighting quality

Page 27: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Summary

• Use the D3D pipeline as much as possible

• ‘Use’ the CPU carefully- ‘Abuse’ the fill rate

• Get on board with DX7

• Offer the richest experience possible

• You may have to treat the PC as two distinct platforms, ‘High-end’ and ‘Low-end’

Page 28: GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

GDC March 1999 Scalability - R Huddy

Questions

?Richard Huddy

[email protected]

www.nvidia.com

? ?

? ??

?