texture virtualization for terrain renderingdrivenbynostalgia.com/files/texture virtualization for...

Texture Virtualization for Terrain Rendering

Daniel Cornel∗

Vienna University of Technology

Figure 1: Demonstration of CUDA-accelerated virtual texturing by Hollemeersch et al. [Hollemeersch et al. 2010] using aerial photographyof the Antelope Island State Park, Utah. Image retrieved from [Multimedia Lab 2011].

Abstract

Virtual texturing is a technique that allows the use of arbitrarilylarge textures within the limited physical video memory. Througha paging and streaming system, only the currently visible parts of amipmap chain are stored in the video memory while the rest of thedata may reside in any other memory or storage device. Not onlydoes this enable the use of unique and very detailed textures, butmakes high resolution images such as satellite or aerial photographydata usable in real-time applications without further modificationsor downsampling.

This work sketches the virtual texturing pipeline and discusses thebenefits and limitations of it. Due to the nature of terrains in real-time applications, the discussed methods are of particular impor-tance for performant and photorealistic terrain rendering and arethus viewed with regard to these properties and needs. Special em-phasis is devoted to recent developments in virtual texturing andpossible future fields of application as well as acceleration tech-niques.

Keywords: terrain rendering, virtual texturing, clipmap

1 Introduction

Realistic real-time rendering of terrain surfaces has been subject toresearch for the past few years, because the unique characteristics ofterrains restrict the application of conventional rendering methods.

∗e-mail: [email protected]

In contrast to regular objects placed in the scene and represented bya closed surface with finite extents, a terrain represents a very largesurrounding that extends over the whole scene. Due to the largeextents, it is most often partially, but never completely visible, withsome regions very close to the eye and some regions very far away.Thus, both microstructure details for the nearby and macrostructuredetails for the more distant surrounding are needed for the surface toappear as a landscape shaped by nature without artifacts or artificialrepetitions.

Since the introduction of texture mapping, textures have been a veryimportant way of adding visual details to geometry. However, dueto the size of terrains and the limitations of texture size in hardware,mapping a single texture containing the whole terrain surface to thegeometry is not feasible. There have been several approaches toovercome this problem by synthesizing the terrain texture througheither procedural texture generation or multi-texturing. The lattermethod has proven very suitable for real-time application and hasbeen used in game engines such as the CryENGINE 2 [Mittring2008] and the Unreal Engine 3 [Epic Games 2012]. Sophisticatedimplementations such as procedural shader splatting [Andersson2007] used in the Frostbite Engine procedurally generate severallow- and high-frequency textures which are blended in the render-ing step, leading to very few visible repetitive patterns due to thevast amount of possible texture combinations. Texture streamingcan be used to reduce the number of texture parts that need to bestored in the video memory, however in naive implementations,this requires pre-generated information of which parts to streamfor which part of the scene, based on the view point. A downsideof multi-texturing approaches is that, depending on the number oftextures used, the resulting image might either look washed out orrequire a lot of texture lookups. A second limitation is the depen-dence on a regular tessellation of the terrain geometry such that thetexturing can be done for individual tiles of the terrain.

In the end, terrains can be textured quite aesthetically with texturesynthesis approaches nowadays, but only with blended synthetictexture tiles. With an increasingly high degree of realism achievablein rendering and the ability to generate very high resolution images,e.g. by aerial photography, methods are needed to use these images

as textures in photorealistic rendering applications. Besides the ob-vious requirement to work within limited video memory, such amethod has to perform in real-time on common hardware and has tooffer the same visual quality as common texture mapping. Further-more, texturing should be independent from the mesh tessellationwithout constraints on geometry or texture data. The first conceptto accomplish this was the clipmap [Tanner et al. 1998]. Clipmapsmake it possible to store only a needed subset of a mipmap chain inthe video memory, thus allowing to use arbitrarily large textures intheory. As this concept is a milestone in texture virtualization, theidea of clipmaps is sketched in Section 2.

The further development of clipmaps finally led to extensive adap-tations of the content creation and rendering pipelines, summarizedunder the term virtual texturing. Most of these modifications havealready been proposed in 2004 [Lefebvre et al. 2004]. However,it was not until the announcement of the video game Rage by idSoftware and its eventual release in 2011 that virtual texturing re-ceived wider attention. Thus, literature concerning the topic issparse, which is why the MA thesis by Mayer [Mayer 2010] cur-rently serves as the most comprehensive publication on the matter.Mayer also proposes standardized terms and a reference implemen-tation with profound evaluations. This serves as reference for thegeneral structure of a virtual texturing system which is discussed indetail in Section 3. Another recent publication by Neu [Neu 2010]providing a similar reference implementation is also taken into ac-count. The aim of this report is to give an insight into the concept ofa state-of-the-art virtual texturing system as well as its open prob-lems. As the contributions of Mayer and Neu date back to 2010,an overview over recent developments and applications of virtualtexturing is given in Section 3.5.

2 Clipmaps

The clipmap introduced by Tanner et al. [Tanner et al. 1998] is aclipped mipmap that holds only the subset of the mipmap that is po-tentially visible in the current frame. The idea behind the clipmap isthat with a very large texture, all data of a mipmap is never neededin one frame. As the mipmap level selection for texture samplingaims for a 1:1 ratio between pixels and texels, the window size di-rectly limits the number of texels used from a mipmap level. Thus,the mipmap levels larger than the stack size (see Figure 2), i.e. thewindow size plus a constant margin, can be clipped and do not needto reside fully in the video memory.

In contrast to the constant window size, the view point will likelychange during the execution of the application, so a clipmap stackhas to be updated continuously to contain the needed data. A clipcenter is calculated directly from the view point and specifies acenter of interest in texture space and, through the clipmap stack,rings of decreasing texture resolution around it. If the size of amipmap level is greater than the stack size, a region of the stacksize around the clip center is cut out of the mipmap level and storedin the clipmap stack. If the mipmap level is smaller than the stacksize, it covers the whole texture at a low resolution and is stored inthe clipmap pyramid. This way it can be used as a fallback lookuptexture for every region of the texture if the desired level of detailof the region around the clip center is not yet available to the videocard.

This concept is analogous to the virtual memory management of op-erating systems where contents of the main memory are outsourcedto a cheaper hard disk space. Together with a page table keepingtrack of all outsourced memory pages, this allows to address a mem-ory space much larger than the physical main memory. Instead ofvirtual memory, virtual textures are handled now and their smallest

Figure 2: Illustration of a clipmap. Instead of the whole mipmapchain (blue), only the clipmap stack (green) and the clipmappyramid are stored in the video memory. Image retrieved from[NVIDIA Corporation 2007].

units are uniformly sized texture tiles which are loaded from thehard disk first and then streamed from the main memory. This in-troduces several problems, like the determination of which tile to bestreamed, scheduling of the streaming and addressing of the tiles.

It is obvious that the coherence of the potentially visible tiles be-tween two frames can be used to significantly reduce streaming.As the view point moves, neighboring tiles of the currently visibleones are very likely to be needed soon, so they will be streamedwhen possible. If the new tiles are needed for rendering beforestreaming is complete, a lower resolution of the desired tile is usedas fallback texture. For this lower resolution is included in the nextlower clipmap level, it is important to stream the required texturesfrom bottom up, so fallback tiles of all required tiles are accessiblefirst. To access tiles of the complete mipmap in the clipmap system,Tanner et al. [Tanner et al. 1998] proposed a toroidal addressingscheme for clipmap levels and tiles such that after change of theclip center, the new tiles can be appended to the ones residing inthe video memory easily. With the proposed wraparound address-ing, however, the limited precision and numeric range supported inhardware limit the maximum number of addressable mipmap levelsand therefore the size of the whole texture. This is why the au-thors did a second virtualization step of the clipmap itself, based onthe observation that a single polygon usually only requires samplesfrom very few clipmap levels and thus it is sufficient to store a vir-tual clipmap with this limited range in the video memory. For thisadditional virtualization, a new addressing hardware was proposedsuch that the clipmap system could be implemented in hardware.

A core problem of clipmaps is the tile determination by the center ofinterest. The spatial relation between the clip center and the mappedtile does not take into account any occlusion and is not even neces-sarily unambiguous. As a result, a major part of the tiles stored inthe video memory is not visible because the corresponding geome-try is occluded or outside the view frustum. This and the hardwarerequirements for the complex addressing system dampened the im-pact of clipmaps on terrain rendering for games, but the guidingidea of virtualization has been refined ever since. However, a note-worthy implementation of modified clipmaps called MegaTexturesis Splash Damage’s Enemy Territory: Quake Wars released in 2007[Kalra and van Waveren 2008]. Clipmaps have been proven verysuitable for geometry virtualization as geometry clipmaps wherevertex buffers rather than textures are used for storage. With geom-etry clipmaps, terrain deformation [Crause et al. 2011] and adaptivelevel-of-detail systems [Losasso and Hoppe 2004; Asirvatham andHoppe 2005] can be designed very efficiently.

Figure 3: The virtual texturing pipeline. The IDs of all needed tiles are stored in the needbuffer. For rendering, the indirection texture is usedto translate virtual to physical texture coordinates. With these, the tiles stored in the video memory can be accessed for texturing. Imageretrieved from [Hollemeersch et al. 2010].

3 Virtual Texturing

Clipmaps were introduced at the time when hardware was not pro-grammable and streaming of tiles was expensive due to CPU-GPUbandwidth constraints. This is why the virtualization of texturesthrough clipmaps could not be used for general rendering applica-tions back then. Although the video memory size and bandwidthhave increased in today’s hardware, this is no solution to the prob-lem as the texture size is still by far too large and tendentially grow-ing. Instead, the solution is to reduce the streaming cost through re-fined, adaptive tile determination and management processes to fitexisting bandwidth constraints. The fundamental steps to achievethis have been sketched by Lefebvre et al. [Lefebvre et al. 2004]and Barrett [Barrett 2008] who provide efficient and complete vir-tual texturing systems to solve the streaming problem. After split-ting the physical texture including its whole mipmap chain into tilesof uniform size, the untextured geometry is rendered once to outputthe visible texture coordinates of the geometry. These coordinatesare stored in a buffer that is read back to the CPU to determine thevisible texture tiles, which are then uploaded to the video memory.In a second, final rendering step, the virtual texture coordinates ofthe geometry are translated to physical texture coordinates point-ing to the uploaded tile in the video memory. For the translation,an indirection texture is maintained that stores all physical texturecoordinates at the positions of their virtual texture coordinates for afast lookup.

So, the major differences to clipmaps are an improved screen-spacedetermination of visible tiles and the use of an indirection textureanalogous to a page table rather than toroidal addressing based onthe mipmap level. It is important to see that, despite the name, vir-tual texturing is a whole system rather than just a texturing step.The whole pipeline in its current form can be separated into differ-ent stages (see Figure 3) which are outlined in this section.

3.1 Texture Creation

Virtual texturing requires a complete mipmap chain divided intouniformly sized tiles that are stored in inexpensive memory. Forcorrect filtering at the tile borders, additional data has to be storedfor each tile. Although a new tool chain for splitting and merging isneeded to create these tiles, it offers several advantages compared totexturing methods mentioned before. Artists or users have the pos-sibility to work with the texture as a whole or just with parts of it.This also allows the use of already existing textures without mod-ification and, in contrast to clipmaps, is not limited to large-scale

textures. Current virtual texturing systems are not updated usingspatial information in texture space but visibility information, soany textures can be merged together to a huge texture atlas, avoid-ing frequent texture changes when rendering. With this, uniquetexturing as used in Rage (see Section 3.5) is possible, meaningthat every surface of a scene can be textured uniquely with a singletexture.

Mayer [Mayer 2010] devotes a lot of attention to further topics ofthe texture creation which cannot be covered in this work. Theseinclude efficient UV unwrapping for unique textures, packing oftiles in a texture atlas, optimal tile sizes, data reduction and texturecompression.

3.2 Determination of Visible Tiles

To determine which tiles have to be stored in the video memory,the first step of the system is to analyze which tiles are (potentially)visible in the current frame. Also, a prediction of which tiles willbe needed in the succeeding frames is desirable to efficiently pre-cache them. To get the visibility information of the scene, it isrendered to a target that can be read back to the CPU in a first pass.The read back is usually a bottleneck of the system, which is whyacceleration techniques are crucial for a high performance.

3.2.1 Generation and Analysis of Visibility Information

For the determination of the visible texture regions, Lefebvre et al.[Lefebvre et al. 2004] propose a two-pass system where the geome-try is rendered to a texture load map in texture space in the first pass.This is a map of the texture space that contains visibility informa-tion of all tiles. Then, tiles are probably being used if a part of thegeometry was written to the corresponding pixel of the texture loadmap. To determine the tiles exactly, numerous subsequent level-of-detail calculations are needed to address the right tile or tile area.This approach allows a conservative tile determination, because oc-clusions are not taken into account. In practice, it is no longer usedfor tile determination because of the high amount of necessary cal-culations and the less-than-ideal results and is thus not discussed indetail.

A more natural and exact two-pass approach proposed by Barrett[Barrett 2008] is to render the visible geometry in screen space asusual to the needbuffer [Neu 2010] in the first pass. Instead of ap-plying the real texture to the geometry, however, the texture coordi-nates as well as the estimated mipmap level are written to the need-

buffer (see the leftmost image of Figure 3 for an example). Thisinformation, the tile ID, unambiguously identifies each tile of thevirtual texture. Since a tile is usually required by several fragments- especially for linearly mapped terrains -, the result can be storedin a buffer smaller than the actual display size, thus reducing bothrendering and read-back effort. This is because the needbuffer hasto be read back to the main memory to analyze the data on the CPU,which induces additional data traffic through the whole system. Inthis analysis, a list of all required tiles is created and then handedto the management process. An important point stated by Mayer[Mayer 2010] is that the manual mipmap estimation in the shader,done with the dFdx and dFdy functions, does not necessarily pro-duce the same result as the mipmap selection by OpenGL, since itis not closely specified. For cases like this, the new OpenGL exten-sion ARB texture query lod can be used which returns the result ofthe OpenGL mipmap selection as if a texture lookup had been per-formed. The new shader function provided at fragment level is notonly more accurate, but also faster than the manual mipmap levelcalculation.

A drawback of both two-pass solutions is that all geometry has tobe processed twice which causes considerable additional effort. Al-though the needbuffer resolution can be smaller than the displaysize and low detail geometry can be used for the first pass, it re-mains computationally expensive. With multiple render targets,doing both the tile determination and the final rendering in a sin-gle pass is possible. Then, instead of doing the tile determinationfor the current frame before rendering it, the set of tiles calculatedin the last frame is used. This might not be exact, but in practice, thedeviations will be imperceptible because of the temporal coherencebetween the needbuffers of two succeeding frames. Even more im-portant, the streaming process cannot provide the tiles just requiredimmediately but with a few frames delay, so even the two-pass ap-proach is lagging behind. A more concerning problem is that withmultiple render targets, the needbuffer has to have the same size asthe main render target, leading to significantly increased data trans-fer between the GPU and CPU.

3.2.2 GPU-Accelerated Needbuffer Processing

To reduce the read-back delay, Hollemeersch et al. [Hollemeer-sch et al. 2010] propose a GPGPU method for data reduction withCUDA before downloading the buffer. Although their implementa-tion relies on multiple render targets, this method can be used like-wise in the two-pass approach. Since the needbuffer tends to con-tain a lot of redundant information because of the spatial coherencebetween neighboring pixels, the idea is to generate a buffer thatonly contains unique tile IDs. Usually, this buffer is much smallerthan the original needbuffer, so the read back is faster. One result ofthis implementation is depicted in Figure 1, using large-scale aerialphotography to reconstruct an island in real-time.

It has to be mentioned that evaluations done by Hollemeersch etal. [Hollemeersch et al. 2010] do not show any difference in per-formance when using the full or a reduced needbuffer resolution.In contrast, the OpenCL implementation of this method done byMayer [Mayer 2010] performs significantly better with a lower res-olution. This can have several causes, including internal API differ-ences between CUDA and OpenCL as well as hardware generationand vendor limitations. Hollemeersch et al. point out that depend-ing on the hardware, CUDA might either just lock the data of theneedbuffer for use or copy it, which of course would increase theeffort. Finally, it is noteworthy that the implementation of virtualtexturing by Neu [Neu 2010] also uses multiple render targets andtherefore the full-sized needbuffer, but does not process the databefore the read back. Yet, the presented results suggest an overall

high visual quality of the method, which has been evaluated for var-ious existing indoor and outdoor game levels to proof suitability forvideo games.

3.3 Tile Management and Streaming

The tile management stage is the main component of the texturevirtualization pipeline because this is where access to the virtualtexture is provided. It receives a list containing the IDs of all tilesneeded for the current frame from the previous tile determinationstage and ensures their availability. For this, three tasks are per-formed, namely maintenance of an indirection texture, loading andstreaming of tiles.

3.3.1 Maintenance of the Indirection Texture

The indirection texture, comparable to a page table, is a structurethat translates virtual texture coordinates into physical ones, whichis necessary because the tiles in the physical video memory arestored in a tile cache. This tile cache is a large texture subdividedinto addressable frames of the same size as tiles. Since the tilecache is not large enough to store all tiles of the virtual texture, thecontents will change constantly depending on the currently visiblesurfaces. It is the task of the indirection texture to keep track of allthese changes. Therefore, each tile of the virtual texture includingits mipmap chain is represented in the indirection texture by oneentry. For the rendering step, entry means texel because a physicaltexture or a texture array [Mittring 2008] is generated. However, asthe structure is needed in both the management stage and the ren-dering, it might also be represented by a quad-tree on the CPU side.The quad-tree also has to store state information for each tile suchas if it is already loaded. The position of a tile in the virtual texturecorresponds to its position in the indirection texture, which is whythe tile ID stored in the needbuffer can be used to directly look upthe value of the indirection texture for this tile.

For each tile requested by the tile determination stage it is checkedif it already resides in the tile cache. If this is not the case andit is not currently being loaded or streamed either, streaming of ithas to be initiated. All tiles to be streamed are stored in a priorityqueue which is constantly updated according to specified criteriasuch as the distance to the view point or the frequency of requestsfor this tile. The tile priority can be seen as a measure for the visualimpact of this tile or its absence on the scene. Generally, tiles oflower resolution have a higher priority because they can be used asfallback textures, which leads to a progressive increase of detailsall over the scene at runtime. If a conservative tile determinationor a tile prediction is implemented, these additional tiles can bepre-cached as well if there are spare capacities. Neu [Neu 2010]devotes much attention to various page priority heuristics and offersdetailed evaluations of the visual quality using Structural Similarityas a measure. Additionally, he introduces the concept of a Look-Ahead Camera used to track the current camera motion and predictthe tiles needed in the next frames.

3.3.2 Loading of a Requested Tile

Before the tile of highest priority can be streamed, it has to beloaded from the secondary storage. Doing this in near real-timeis a non-trivial task due to the variety of hardware configurationsa developer has to consider as target system. In a simple scenario,all tiles residing on the hard disk drive are loaded into the mainmemory and are then ready to be streamed. Even in this case, the

diversity of memory interface, size and speed as well as storage de-vice type (HDD, solid-state drive) and interface is huge. A minorimprovement would be to create a second caching level in the mainmemory to pre-load neighboring tiles of recently loaded once inadvance to lessen the impact of a slower hard disk on the system.

Tiles might also be stored on portable or optical storage devices. Inthe case of video games for consoles, tiles might have to be loadeddirectly from a DVD which performs quite poorly. Not to speakof the limited storage capacity of DVDs that made it a necessityto store the game Rage and the heavily compressed texture dataon three dual-layer DVDs that have to be swapped at runtime. Tooptimize reading from optical devices, larger chunks have to be readsequentially, thus influencing the choice of the tile size [Mittring2008]. A deeper insight into loading terrain textures from slowstorage devices is given by van Waveren [van Waveren 2008].

Yet another and particularly promising way of data acquisition isover Internet. Mittring already presented the idea of streamingwhole game levels for multiplayer games as an alternative to op-tical devices. An actual realization of streaming over Internet hasbeen provided recently by Andersson and Goransson [Anderssonand Goransson 2012] for a WebGL implementation of virtual tex-turing. It is imaginable that for portable devices, virtual texturingwith network streaming will be used for tasks such as displayingsatellite photos in GPS navigation services. In such cases, a nearreal-time streaming to the video memory is usually not a priority,so delays due to the network transfer of tiles and the limited com-putational power to process them are bearable.

3.3.3 Compression and Encoding of a Tile

After the tile has been loaded into the main memory, dependingon its compression (e.g. JPEG, PNG), it has to be decompressed.Benchmarks by Mayer [Mayer 2010] show significant performancedifferences between different compressions and decompressing li-braries which have to be taken into account. Generally, JPEG offersboth higher (lossy) compression in the secondary storage and fasterdecompression compared to PNG. After decompressing the tile, itis usable by the GPU, but relatively large in size. The streamingdelay is usually the bottleneck of the virtual texturing pipeline, soa texture compression that is supported by the GPU such as DXTis suggested [Hollemeersch et al. 2010; Mayer 2010]. With DXT,a significant reduction of the upload bandwidth and video mem-ory consumption can be achieved. A promising addition to theGPGPU utilization for virtual texturing suggested in the contribu-tions of both Hollemeersch et al. and Mayer is texture transcodingto DXT on the GPU. Instead of decompressing the tiles on the CPU,then recompress them to DXT and streaming them, the compressed(JPEG) tiles can be transcoded from their current encoding to DXTon the GPU. Since JPEG offers a much better compression ratiothan DXT, the upload bandwidth can be further reduced. The GPUtranscode, however, recently fell out of favor when used in Rageto accelerate streaming, at which it occasionally failed (see Section3.5).

3.3.4 Streaming of a Tile

Streaming itself is done asynchronously in an own thread that con-tinuously checks for loaded and possibly CPU transcoded tiles. Ifsuch a tile exists, a frame in the physical tile has to be found. Inthe simple case, the tile cache is not yet full and the tile can be up-loaded to one of the cache’s empty frames. Once the cache is full,it has to be determined whether to replace one of the existing tilesin the cache or to discard the already loaded tile. The latter should

happen when all cached tiles are currently needed and each of thetiles has a higher priority than the recently loaded one. To keeptrack of visibility information and e.g. the last query of a tile forleast recently used scheduling, the additional tile attributes can bestored in the CPU-side representation of the indirection texture.

If the tile passes this last test, it is finally uploaded to the physicalmemory. Then, the indirection texture has to be updated accord-ingly. The entry representing the new tile has to be set such thatit stores the position of the tile cache frame it resides in as well asits original mipmap level which is already stored in the informationretrieved from the needbuffer. If another tile has been replaced forthe new one, all entries of the indirection texture pointing at thistile cache frame have to be changed to point to the frame of themost appropriate fallback tile for the replaced one. This is the tileof the next lower virtual mipmap level that covers the area of theprevious tile and resides in the cache. Thus, uploading the low-est mipmap level tile covering the whole virtual texture in a singletile guarantees a fallback tile for each tile at all time. The last stepof the management stage is to upload the changes of the CPU-sideindirection texture to the GPU. Figure 3 shows an example of an in-direction texture. It can be seen that the outer, currently not visibleregions of the virtual texture are mapped to a few low-resolutiontiles, apparent through the large homogeneous areas.

3.4 Rendering

In the last stage of the pipeline, the generated data, namely the tilecache texture and the indirection texture, has to be used to tex-ture the visible geometry correctly. As the per-vertex texture co-ordinates of the geometry correspond to the location in the virtualtexture, a translation step comparable to the address translation ofclipmaps is needed. Instead of using toroidal addressing, a lookupinto the indirection texture is used, which is depicted in Figure 4.

Figure 4: Translation from virtual to physical texture coordinates.Image retrieved from [van Waveren and Hart 2010].

3.4.1 Address Translation with the Indirection Texture

The correct texture coordinates for the physical texture depend onthe position of the tile in the tile cache that covers the relevant re-gion as well as the internal offset within this tile. Since each texelof the indirection texture corresponds to a tile of the virtual texture,an unfiltered lookup can be performed with the interpolated per-fragment texture coordinates at fragment stage. This fetch alreadyreturns the tile position in the cache because this is exactly what

the indirection texture stores. Now it should also be clear why thefallback tiles are propagated down to all children in the quad-treerepresentation at indirection texture updates. It guarantees that foreach virtual texture coordinate, a tile in the cache can be found,even if it is of low resolution.

Calculating the internal offset is a bit more difficult as it is rela-tive to the virtual tile position and thus dependent on the originalmipmap level of the tile. This is why the mipmap level has alreadybeen written to the indirection texture when streaming the tile. Themipmap level can be stored as the third component of the indirec-tion texture and is fetched together with the physical tile position.The virtual texture coordinates then have to be biased and scaledaccording to the mipmap level. An exemplary derivation of the so-lution is presented by van Waveren and Hart [van Waveren and Hart2010] (see Figure 4). However, shader implementations with con-straints such as quadratic power-of-two textures can use arithmetictricks to perform the translation with very few instructions [Barrett2008; Mittring 2008; Hollemeersch et al. 2010].

With the physical texture coordinate retrieved, a second lookup intothe tile cache texture can be performed to get the texture informa-tion. From then, rest of the rendering is done as usual. At thispoint it should be stated that the format or content of the tiles canbe arbitrary. Thus, all types of data such as normal, specular, dis-placement, light and shadow maps and - although of questionablepracticability - even indirection textures [Mayer 2010] can be vir-tualized and used in the shader. If the tile cache is represented bya texture array rather than a 2D texture, it is easy to store tiles ofdifferent attributes for the same surface at the same x and y coordi-nates in the texture array but on different layers. Then, the texturecoordinate translation has to be done only once for a lookup in alldesired layers.

3.4.2 Texture Mapping with Filtering

When sampling a texture, filtering is of course desirable. Relyingon the hardware filtering, however, leads to artifacts at the tile bor-ders. Residing in the tile cache, the tile is surrounded by arbitrarytiles, so all information needed for bilinear or anisotropic filteringhas to be included in the tile itself. Doing the filtering manuallyin the shader is possible, but requires to translate the texture co-ordinates for each sample, leading to an unfeasible overhead [Neu2010]. This is why in Section 3.1 it has already been stated thatthe tile creation tools need to append borders for correct filtering,as the neighborhood of a tile border has to be taken into account forthe filtering. However, this either reduces the usable area of the tileor increases the tile size, both leading to potentially higher uploadbandwidth for the sake of visual quality. When using DXT encodedtiles, it is necessary to append full 4 x 4 texel blocks to the borderfor correct filtering [Mittring 2008].

Trilinear filtering generally requires a second mipmap level to sam-ple from. Barrett [Barrett 2008] proposes the use of a physical tilecache with one additional mipmap level to store the cache contentsin half size, which is sufficient because virtual texturing by designalready samples from the best matching level. This enables the us-age of the graphic API’s trilinear filtering, but is, according to Mit-tring [Mittring 2008], an unnecessary increase of video memoryconsumption (by a quarter) that does not pay. The reason for thisis that the half-resolution data created for the mipmap level alreadyexist somewhere in the system, maybe even in the tile cache. Thus,Neu [Neu 2010] suggests to rely on the API’s bilinear filtering andto do the trilinear filtering manually in the fragment shader. Forthis, twice the amount of texture lookups into indirection textureand tile cache are needed. Additionally, the tile management has

to ensure that the required tile of next-lower resolution is cached,too, so a slightly modified tile determination algorithm requests anadditional direct parent tile of each visible tile. This increases thestreaming effort and the number of tiles needed in the tile cache, buta portion of the additional tiles will already be cached as fallbacktextures.

3.4.3 Reduction of Pop-In Artifacts

Although at this point the virtual texturing system should behave asif the large-scale texture was used with common texture mapping,one deviation from the virtual ground truth is hard to eliminate.The whole pipeline is designed for asynchronous updates such thatit never stalls on uploads or read-backs. Thus, whenever a taskis not yet finished and the rendering starts, it has to handle thoseunfinished jobs, which is what the fallback tiles are used for. Thefaster the pipeline performs, the faster all textures are available tothe shader for the desired output. A last question is how to behavewhen the streaming of a new tile has finished. If the new tile is usedfor texturing straight away in the next frame, the higher detail of thetile will pop in suddenly in a disturbing manner. It is similar to thepop in of level-of-detail algorithms, which is dealt with by forcinga smooth transition of the levels at runtime, e.g. by blending theresults of both levels.

In virtual texturing, blending can be used easily if trilinear filteringis used, since this already implements gradual blending of textures.Depending on the time since the arrival of the new tile, the lower-resolution tile is weighted manually to force the smooth transitionto the higher detail [Mayer 2010]. For blending with bilinear fil-tering only, van Waveren and Hart [van Waveren and Hart 2010]propose dithered updating of the tile cache. This means that eachframe, only a part of the new tile is uploaded and combined withthe lower-resolution tile. For this, the region of the lower-leveltile that is covered in the new tile is upsampled and stored in thetile cache. Then, this tile is gradually updated with portions of thehigher-resolution tile each frame until the new contents completelyreplaced the old ones. This, of course, again leads to more stream-ing and delayed streaming of other tiles, so a good compromise hasto be found.

3.5 Recent applications

In the preceding sections, the complete design of a state-of-the-artvirtual texturing system including acceleration techniques has beenpresented. The structure is mainly based on the contributions ofMayer [Mayer 2010] and Neu [Neu 2010], which both date backto 2010. Since then, virtual texturing has become a valuable toolfor the terrain rendering in large-scale applications, including butnot limited to video games [van Waveren and Hart 2010; Widmark2012] such as the popular video game Rage. This section is devotedto these recent developments. The aim is to give an overview overspecific applications of the presented system in the already widefield of texture virtualization. A particularly interesting topic is thevirtualization of general data such as geometry or volume data.

3.5.1 id Software’s Rage

Since its announcement in 2007, the video game Rage by id Soft-ware has been eagerly anticipated because of the use of texture vir-tualization. In contrast to the clipmap-like MegaTextures imple-mentation used for terrain texture virtualization in the games En-emy Territory: Quake Wars and BRINK by Splash Damage, Mega-

Textures have been extended to a complete virtual texturing systemfor Rage [van Waveren 2009]. Right after the release in 2011, how-ever, a lot of PC customers had severe troubles with the renderingand especially the virtual texturing in the game. Besides tearingand glitches that showed wrong tiles scattered all over the screen,the main causes of artifacts were tile popping and the generally slowstreaming of tiles [Burnes 2011]. In many cases, only fallback tileswere shown and obviously never replaced.

For this report, the game has been tested at 1920 x 1200 pixelsscreen resolution on a system including a GeForce GTX 480 videocard, Intel i7-950 CPU, 12 GB of main memory and a solid-statedrive. With this, a 180 degree turn in game took approximatelyfive seconds to replace all low-resolution fallback textures. Addi-tionally, even the tiles of higher resolution looked blurry. To fixthese problems, a game patch as well as a reference to console com-mands that change virtual texturing behavior such as the tile cachesize were published. After applying both patch and configurationchanges, the visual quality of the rendering increased drastically tothe point where no more popping artifacts or fallback textures werevisible on the test system.

Figure 5: Screenshot of the virtually textured terrain in Rage as pre-sented at the SIGGRAPH ’09. Image retrieved from [van Waveren2009].

According to Burnes [Burnes 2011], the cause of the initial perfor-mance issues is a biased adaptive quality regulation for PC systems.On the console platforms the game ran fine from the start - despitethe higher loading latency from DVDs as stated in Section 3.3, be-cause the hardware is uniformly specified and the system could betweaked to fit the hardware requirements. In contrast, the varietyof PC hardware combinations made it a necessity to dynamicallyadapt parts of the pipeline to the hardware at runtime to a prede-fined upper level to prevent side effects. This also includes GPUtranscoding. As mentioned before, the purpose of GPU transcodingis to take work load off the CPU by decompressing loaded textureson the GPU. Obviously, this has a huge impact on the overall perfor-mance in Rage, so with the patch, a possibility is given to limit thenumber of GPU transcodings per frame or to disable it completely.Rage seems to use the GPU to capacity already so that the hardwareacceleration proposed by Hollemeersch et al. [Hollemeersch et al.2010] in fact slows down the system. This might be due to the factthat JPEG-like decompression is hard to parallelize and thus cannotbenefit substantially from a GPU implementation.

Rage can be considered a milestone in texture virtualization notonly because of its general extent and performance, but because it

also virtualizes the small-scale textures for unique texturing. Theambitious aim to use a unique texture for each piece of geometry inthe scene can partially be blamed for the game’s bumpy start sincethe amount of tiles that have to be streamed each frame is massive.It is all the more surprising that id Software finally managed to im-plement it at an interactive frame rate and overall impressive quality(see Figure 5). Yet, while this allows all textures to be treated in thesame way, the visual quality does not benefit very much from itin Rage compared to recent games with conservative texture map-ping for smaller objects. The benefit of texture virtualization forthe obviously large terrain textures with regards to the high level ofdetail compared to multi-texturing approaches, on the other hand,is beyond doubt.

3.5.2 Virtual Texturing for WebGL

Besides the gaming segment, virtual texturing will certainly be ofincreasing importance for geographical data visualization. Espe-cially online and mobile services for mapping, positioning and nav-igation could greatly benefit from the accelerated display of aerialimages. That the implementation of virtual texturing is generallypossible on mobile platforms has been proven by Andersson andGoransson [Andersson and Goransson 2012] with their WebGL im-plementation. An interactive frame rate in this implementation,however, could not always be achieved. The predominant issuesof WebGL and OpenGL ES 2.0 implementations tend to be the re-strictive feature set and unsatisfactory browser support. As men-tioned by the authors, the loading of DXT compressed images isstill not possible and with the lack of pixel buffer objects, a read-back of the needbuffer or upload of a tile stalls the whole threaduntil completion. However, it should only be a matter of time be-fore these features are supported. Currently, GPGPU accelerationis also impossible through WebGL. The API, WebCL, is alreadyunder development though, so some of the tweaks presented mightbe applicable in WebGL soon.

3.5.3 General Data Virtualization

The rise of virtual texturing linked to Rage also triggered the re-search and development of acceleration methods. Thus, increas-ing attention is devoted to data virtualization in general. As withclipmaps, the concept of virtual texturing can be extended for ge-ometry virtualization. One recent contribution by Sugden et al.[Sugden and Iwanicki 2011] is the Mega Meshes system - in anal-ogy to MegaTextures - which combines sculpturing tools and virtualtexturing in game development. In the content creation, enormousgeometry data is stored in a hierarchical structure like a quad-treewith several subdivision levels that can be streamed. At runtime,texture data according to the subdivision level such as normals andambient occlusion data is used in a virtual texturing system.

Virtualizing volume data, which is commonly represented by 3Dtextures, has been introduced by Crassin et al. [Crassin et al. 2008]under the term GigaVoxels. In 2011, Crassin proposed a completesystem for volume virtualization usable for e.g. biomedical appli-cations and voxel-based game engines [Crassin 2011]. This is par-ticularly interesting because voxels allow to overcome the (artifi-cial) separation of geometry and surface texture. Thus, geometryand texture streaming could be unified, although content creationwould get much more expensive for such an engine.

A contribution by Mayer et al. [Mayer et al. 2011] presents thecombination of virtual texturing and point cloud rendering for cul-tural heritage preservation. The point cloud, however, is not vir-tualized in this application but relies on another proven method of

out-of-core rendering. Still, this shows the direction in which real-time rendering is heading.

3.5.4 Hardware-Implemented Virtual Texturing

As a result of this uprising, partial hardware support for the virtual-ization pipeline seems to be usable in the near future. Video cardsusing AMD’s Graphics Core Next architecture such as the RadeonHD 7970 offer a redesigned fragment shader stage as well as sup-port for partially resident textures [Bilodeau et al. 2012] throughthe AMD sparse texture extension. With these it is possible to de-fine a large-scale texture with only partial allocation. For the use invirtual textures, they could make indirection texture and tile cacheobsolete, as all visible tiles can be stored right in the sparse texture.If the virtual texture coordinates can be used to access the partiallyresident texture through an internal texture coordinate translation,the whole virtualization step can be done by the hardware. For tiledetermination, shaders have been modified such that lookups to apartially resident texture fetch the stored data - if present - and ad-ditionally return a fetch state. If no data is stored at the lookuplocation, a Fail is returned that can be read back to the CPU, in-dicating that the corresponding tile is not yet available and needsto be streamed. A second state is the LOD warning which accord-ing to a predefined level-of-detail limit for a texture indicates that ahigher-resolution representation of the current tile might be neededsoon, thus offering hardware supported tile prediction. Even taskslike the filtering are re-simplified with this extension. Future imple-mentations of virtual texturing using partially resident textures willshow if the extension holds up to its promises.

4 Conclusion and Outlook

Due to their properties, terrains can benefit greatly from the currentaerial and satellite image acquisition techniques. They extend overthe whole scene, so a huge amount of data is required to texturethem in a satisfactory way. This exceeds the capabilities of cur-rent hardware in both texture size and total memory consumption.Based on the observation that only a fraction of the texture dataand its mipmaps is needed to texture one frame at runtime, differ-ent approaches have been suggested to detect and upload only thisinformation. Resorting to the virtual memory management of oper-ating systems, the use of clipmaps for texture data virtualization hasbeen introduced, which evolved into the more flexible and efficientvirtual texturing. The virtual texturing pipeline as presented in thisreport relies on a multi-threaded system that is divided into threeindependent stages. A main benefit of this system compared to pre-vious methods is the tile determination in screen space, providingan exact solution for the tile visibility problem.

Several possibilities for quality improvement and acceleration atdifferent stages of the virtual texturing in both software and hard-ware have been discussed, although their benefit for general ap-plications is yet to be evaluated. Especially the hardware supportfor partially resident textures in AMD video cards gives hope thatvirtualization in general and virtual texturing in particular is madeapplicable for the masses. Until now, implementing virtual textur-ing is a complex task that only pays off for large-scale applications.What is even harder is to get the single components of the pipelineto work together in an efficient manner. The difficult start of virtualtexturing flagship Rage has shown that the interaction of these com-ponents requires a lot of fine-tuning and leaves room for more ro-bust solutions in the future. A point to consider game-dependentlyis if unique texturing of small object is useful and pays off withrespect to the streaming overhead compared to static textures. The

insight that the GPGPU acceleration of texture decompression canhave a negative impact on the performance should lead to a morecareful work load distribution over all components in such dynamicsystems. One additional deficiency that has to be pointed out isthat WebGL and OpenGL ES do not seem to be fit for real-timevirtual texturing yet. With the lack of features already common inOpenGL, virtual texturing on mobile devices is one or even twosteps behind. For the aimed-at application on these devices, how-ever, an interactive frame rate might not be needed.

Considering the quality achievable with unique large-scale texturesthat can be composed by artists as a whole, virtual texturing outper-forms current terrain texturing methods. With its exact tile deter-mination and the possibility to minimize artifacts through suitabletile priority calculation, it is also superior to other current texturestreaming methods. This is why it is safe to assume that the virtual-ization of all kinds of data will be continued and adopted in futurerendering systems.

References

ANDERSSON, S., AND GORANSSON, J. 2012. Virtual Texturingwith WebGL. Master’s thesis, Department of Computer Science& Engineering, Chalmers University of Technology, Gothen-burg.

ANDERSSON, J. 2007. Terrain Rendering in Frostbite Using Pro-cedural Shader Splatting. In Proceedings of ACM SIGGRAPH’07 course notes, course 28, Advanced Real-Time Rendering in3D Graphics and Games, ACM, 38–58.

ASIRVATHAM, A., AND HOPPE, H. 2005. Terrain Rendering Us-ing GPU-based Geometry Clipmaps. In GPU Gems 2, M. Pharr,Ed. Addison Wesley, 27–45.

BARRETT, S., 2008. Sparse Virtual Textures.http://silverspaceship.com/src/svt/, retrieved Apr 14, 2012.

BILODEAU, B., SELLERS, G., AND HILLESLAND, K. 2012. Par-tially Resident Textures on Next-Generation GPUs. Game De-velopers Conference.

BURNES, A., 2011. How To Unlock Rage’s High ResolutionTextures With A Few Simple Tweaks.http://www.geforce.com/whats-new/articles/how-to-unlock-rages-high-resolution-textures-with-a-few-simple-tweaks/,retrieved Apr 15, 2012.

CRASSIN, C., NEYRET, F., AND LEFEBVRE, S. 2008. InteractiveGigaVoxels. Tech. Rep. RR-6567, INRIA.

CRASSIN, C. 2011. GigaVoxels: A Voxel-Based Render-ing Pipeline For Efficient Exploration Of Large And DetailedScenes. PhD thesis, University of Grenoble.

CRAUSE, J., FLOWER, A., AND MARAIS, P. 2011. A System forReal-Time Deformable Terrain. In Proceedings of the SAICSIT’11, ACM, 77–86.

EPIC GAMES, 2012. Terrain Advanced Textures.http://udn.epicgames.com/Three/TerrainAdvancedTextures.html,retrieved May 13, 2012.

HOLLEMEERSCH, C.-F., PIETERS, B., LAMBERT, P., AND VANDE WALLE, R. 2010. Accelerating Virtual Texturing UsingCUDA. In GPU Pro - Advanced Rendering Techniques, W. En-gel, Ed. AK Peters, 623–642.

KALRA, A., AND VAN WAVEREN, J. 2008. Threading GameEngines: QUAKE 4 & Enemy Territory QUAKE Wars. GameDevelopers Conference.

LEFEBVRE, S., DARBON, J., AND NEYRET, F. 2004. Unified Tex-ture Management for Arbitrary Meshes. Tech. Rep. RR-5210,INRIA.

LOSASSO, F., AND HOPPE, H. 2004. Geometry Clipmaps: TerrainRendering Using Nested Regular Grids. In Proceedings of ACMSIGGRAPH ’04, ACM, 769–776.

MAYER, I., SCHEIBLAUER, C., AND MAYER, A. J. 2011. VirtualTexturing in the Documentation of Cultural Heritage. Geoinfor-matics FCE CTU.

MAYER, A. J. 2010. Virtual Texturing. Master’s thesis, Institute ofComputer Graphics and Algorithms, Vienna University of Tech-nology.

MITTRING, M. 2008. Advanced Virtual Texture Topics. In ACMSIGGRAPH ’08 classes, ACM, 23–51.

MULTIMEDIA LAB, 2011. Demos - Multimedia Lab. Ghent Uni-versity. http://multimedialab.elis.ugent.be/demonstrations,retrieved Apr 1, 2012.

NEU, A. 2010. Virtual Texturing. CoRR abs/1005.3163.

NVIDIA CORPORATION, 2007. Clipmaps whitepaper.http://developer.download.nvidia.com/SDK/10/direct3d/Source/Clipmaps/doc/Clipmaps.pdf, retrieved Apr 10, 2012.

SUGDEN, B., AND IWANICKI, M. 2011. Mega Meshes - Mod-elling, rendering and lighting a world made of 100 billion poly-gons. Game Developers Conference.

TANNER, C. C., MIGDAL, C. J., AND JONES, M. T. 1998. TheClipmap: A Virtual Mipmap. In Proceedings of ACM SIG-GRAPH ’98, ACM, 151–158.

VAN WAVEREN, J., AND HART, E. 2010. Using Virtual Textur-ing to Handle Massive Texture Data. NVIDIA GPU TechnologyConference.

VAN WAVEREN, J., 2008. Geospatial Texture Streaming FromSlow Storage Devices.http://software.intel.com/en-us/articles/geospatial-texture-streaming-from-slow-storage-devices/, retrieved Apr 15, 2012.

VAN WAVEREN, J. 2009. id tech 5 Challenges - From TextureVirtualization to Massive Parallelization. ACM SIGGRAPH ’09– Beyond Programmable Shading Course.

WIDMARK, M. 2012. Terrain in Battlefield 3: A modern, completeand scalable system. Game Developers Conference.

texture virtualization for terrain renderingdrivenbynostalgia.com/files/texture virtualization for...

Documents