the framebuffer february 6, 2003. a configurable pixel cache for fast image generation, gorris et...

28
The Framebuffer The Framebuffer February 6, 2003 February 6, 2003

Upload: gian-winters

Post on 29-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

The FramebufferThe Framebuffer

February 6, 2003February 6, 2003

Page 2: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

““A Configurable Pixel Cache for A Configurable Pixel Cache for Fast Image Generation”,Fast Image Generation”,

Gorris et al.Gorris et al.

ProblemProblem Processor speeds have increased to the point Processor speeds have increased to the point

that the frame buffer is now the bottleneck.that the frame buffer is now the bottleneck.

SolutionSolution Cache MemoriesCache Memories

Page 3: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

OverviewOverview

Three considerations in frame buffer design:Three considerations in frame buffer design: Input mechanism to the frame bufferInput mechanism to the frame buffer RAM used to store the imageRAM used to store the image Output mechanism used by the frame buffer to Output mechanism used by the frame buffer to

refresh the displayrefresh the display

Traditional approach:Traditional approach: Sequential memory locations lie along a scan lineSequential memory locations lie along a scan line Sequential pixels provided at a high rate to refresh the Sequential pixels provided at a high rate to refresh the

displaydisplayrefresh requires significant percentage of RAM bandwidthrefresh requires significant percentage of RAM bandwidth

Page 4: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Pixel-cache ApproachPixel-cache Approach

Pixel-cache holds a tile of frame buffer Pixel-cache holds a tile of frame buffer pixelspixelsHigh speed scan converter calculates High speed scan converter calculates intensity one pixel at a timeintensity one pixel at a timeScan converter writes pixels serially into a Scan converter writes pixels serially into a pixel cachepixel cacheOnce the bounds of the tile are exceeded, Once the bounds of the tile are exceeded, the contents are transferred in parallel to the contents are transferred in parallel to the frame bufferthe frame buffer

Page 5: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

AssumptionsAssumptions

Tiles are non-overlapping and aligned on Tiles are non-overlapping and aligned on boundaries that are integer multiples of their boundaries that are integer multiples of their height and widthheight and widthWhy? Quick mapping between scan converter Why? Quick mapping between scan converter addresses and tile/bit addressesaddresses and tile/bit addresses FFscanscan(x, y) = MSBs(x, y) = MSBstiletile(x,y) + LSBs(x,y) + LSBsbitbit(x, y)(x, y)

Tile address is used to access a tile of pixels in Tile address is used to access a tile of pixels in the frame bufferthe frame bufferBit Address is used to access individual pixels Bit Address is used to access individual pixels within the cachewithin the cache

Page 6: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

ImplementationImplementation

Pixel cachePixel cache Single 48-pin Integrated Circuit (IC)Single 48-pin Integrated Circuit (IC) 1.6 micron CMOS1.6 micron CMOS 3700 gates3700 gates 16.6 MHz16.6 MHz

Frame bufferFrame buffer 256K-bit VRAM256K-bit VRAM 4x4 or 16x1 tiles4x4 or 16x1 tiles

Page 7: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Main ComponentsMain Components

Data CacheData Cache: stores pixel intensity data from the : stores pixel intensity data from the scan converterscan converter data port is bi-directional for fast reads and writesdata port is bi-directional for fast reads and writes

Source RegisterSource Register: holds current tile being written : holds current tile being written to frame bufferto frame buffer frame buffer writes overlapped with data cache inputframe buffer writes overlapped with data cache input

Replacement Rule Register/Logic:Replacement Rule Register/Logic: used for used for boolean operations on the frame bufferboolean operations on the frame buffer data from scan converter can be used with old frame data from scan converter can be used with old frame

buffer data being written back to the frame bufferbuffer data being written back to the frame buffer

Page 8: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Components – cont.Components – cont.

Destination Register:Destination Register: holds existing contents of frame holds existing contents of frame buffer buffer

used by replacement rule logic to perform boolean logic with used by replacement rule logic to perform boolean logic with incoming pixel data and committed frame buffer dataincoming pixel data and committed frame buffer data

operation can be overlapped with data cache writes and source operation can be overlapped with data cache writes and source register transfersregister transfers

Pattern Register:Pattern Register: holds frame buffer data to be blended holds frame buffer data to be blended with scan converter datawith scan converter data

similar to destination registersimilar to destination register useful for generating repeating patternsuseful for generating repeating patterns

Z Cache & Z Pipeline Register:Z Cache & Z Pipeline Register: used to buffer depth used to buffer depth informationinformation

equivalent to data cache and source register but for z dataequivalent to data cache and source register but for z data

Page 9: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Tile OrganizationTile Organization

Larger tilesLarger tiles give higher performance give higher performance more pixel updates per frame buffer memory cyclemore pixel updates per frame buffer memory cycle but increase the size and cost of the pixel cachebut increase the size and cost of the pixel cache

Number of pixels updated is function of Number of pixels updated is function of organization and operation performedorganization and operation performed randomly oriented vectors – square tilesrandomly oriented vectors – square tiles horizontal vectors – linear horizontal tile structurehorizontal vectors – linear horizontal tile structure See Figures 4 and 5See Figures 4 and 5

No Silver BulletNo Silver Bullet

Page 10: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Z-BufferZ-Buffer

Requires storing the z value for each Requires storing the z value for each displayed pixeldisplayed pixel both pixel intensity and z value are updated or both pixel intensity and z value are updated or

left aloneleft alone

Idea: read several z values in parallel and Idea: read several z values in parallel and overlap compare and update with writing overlap compare and update with writing of previously updated values to memoryof previously updated values to memory

Page 11: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

RequirementsRequirements

Take advantage of unused video ramTake advantage of unused video ram 1280x1024 w/ 256k VRAM = 728x1024 free1280x1024 w/ 256k VRAM = 728x1024 free

Reconfigurable frame bufferReconfigurable frame buffer 8 to 32 planes in multiples of 8 planes8 to 32 planes in multiples of 8 planes

Allow z-buffering on all configurationsAllow z-buffering on all configurations

Off-screen frame buffer memory should Off-screen frame buffer memory should not limit z resolution not limit z resolution

Page 12: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Z-Buffer CacheZ-Buffer Cache

See the paperSee the paper

Page 13: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

PerformancePerformance

Dependent uponDependent upon rate at which pixels are stored in the cacherate at which pixels are stored in the cache cache hit ratecache hit rate time to write a tile to the frame buffertime to write a tile to the frame buffer

Line-drawingLine-drawing random 30-pixel vectors with 4x4 tilesrandom 30-pixel vectors with 4x4 tiles 9 Mpixels/sec or 300,000 vectors/sec with cache9 Mpixels/sec or 300,000 vectors/sec with cache 1/3 the performance without cache1/3 the performance without cache

PolygonsPolygons random 30x30 pixel squares with 16x1 tilesrandom 30x30 pixel squares with 16x1 tiles 15 Mpixels/sec or 16,000 polygons/sec with cache15 Mpixels/sec or 16,000 polygons/sec with cache 1/5 the performance without cache 1/5 the performance without cache

Page 14: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

““Aliasing and Anti-Aliasing”, Moller Aliasing and Anti-Aliasing”, Moller and Hainesand Haines

Anti-aliasing is the process of removing Anti-aliasing is the process of removing visual artifacts or more specifically visual artifacts or more specifically “jaggies”“jaggies”

Need sampling and filteringNeed sampling and filtering rendering an image is a sampling taskrendering an image is a sampling task texels need to be resampled for texture texels need to be resampled for texture

mapping to give good resultsmapping to give good results a sequence of images for animation need to a sequence of images for animation need to

be sampled at uniform time intervalsbe sampled at uniform time intervals

Page 15: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

SamplingSampling

Want to represent information (signals) digitally Want to represent information (signals) digitally to to reducereduce the amount of information the amount of information note: too note: too littlelittle information can cause aliasing information can cause aliasing

To reconstruct the To reconstruct the originaloriginal signal, sampling signal, sampling frequency needs to be more than frequency needs to be more than 2x2x maximum maximum frequency of the signal sampled (Nyquist)frequency of the signal sampled (Nyquist)

This implies a signal is bandlimitedThis implies a signal is bandlimited

In 3D graphics, point samples (edges of In 3D graphics, point samples (edges of polygons) are not bandlimited BUT textures are!polygons) are not bandlimited BUT textures are!

Page 16: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

ReconstructionReconstruction

How do we recreate the original signal? How do we recreate the original signal? Box filter (nearest neighbor)Box filter (nearest neighbor) worst filter to use (noncontinuous)worst filter to use (noncontinuous) but simplebut simple

Tent filter (aka triangle filter)Tent filter (aka triangle filter) better than box filter (continuous)better than box filter (continuous)

Ideal lowpass filter (sinc filter)Ideal lowpass filter (sinc filter) perfect reconstructionperfect reconstruction but impractical – filter width can become infinite and but impractical – filter width can become infinite and

negativenegative

Page 17: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

ResamplingResampling

Used to magnify or minify a sampleUsed to magnify or minify a sampleMagnification is the simpler caseMagnification is the simpler case have continuous signal (remember the sinc have continuous signal (remember the sinc

filter)filter) just resample at desired intervalsjust resample at desired intervals

MinificationMinification frequency of original sample is too high to frequency of original sample is too high to

avoid aliasingavoid aliasing need to refilter the signal (see the paper)need to refilter the signal (see the paper)

Page 18: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Screen-Based Anti-AliasingScreen-Based Anti-Aliasing

Screen based anti-aliasing has no Screen based anti-aliasing has no knowledge of objects being renderedknowledge of objects being rendered

General strategyGeneral strategy use a sampling pattern for the screen and use a sampling pattern for the screen and

then weight and sum the samples to produce then weight and sum the samples to produce a pixel colora pixel color

Page 19: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Supersampling - FSAASupersampling - FSAA

Supersampling:Supersampling: Anti-aliasing algorithm Anti-aliasing algorithm that takes more than one sample per pixelthat takes more than one sample per pixel

Full-Scene Anti-aliasing (FSAA)Full-Scene Anti-aliasing (FSAA) renders the scene at a higher resolutionrenders the scene at a higher resolution averages the neighboring samples to create averages the neighboring samples to create

an imagean image common in consumer hardwarecommon in consumer hardware costly but simplecostly but simple

Page 20: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Supersampling – Accumulation Supersampling – Accumulation BufferBuffer

Accumulation BufferAccumulation Buffer buffer that has the same resolution as the buffer that has the same resolution as the

desired image but more bits of colordesired image but more bits of color view is moved half a pixel in the screen x- or view is moved half a pixel in the screen x- or

y- direction as needed, images are summed in y- direction as needed, images are summed in the accumulation buffer then after rendering the accumulation buffer then after rendering they are averagedthey are averaged

part of OpenGLpart of OpenGL costly for real-time renderingcostly for real-time rendering

Page 21: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Supersampling – T-bufferSupersampling – T-buffer

T-bufferT-buffer variant of accumulation buffervariant of accumulation buffer 22nn image and z-buffers used for rendering image and z-buffers used for rendering some logic to determine what buffer gets what data some logic to determine what buffer gets what data

and how the buffers are combined (averaged)and how the buffers are combined (averaged) data can be sent to all buffers simultaneouslydata can be sent to all buffers simultaneously screen offsets (x- and y-) can be set screen offsets (x- and y-) can be set perper buffer buffer

Benefits for anti-aliasing?Benefits for anti-aliasing? work can be done in parallelwork can be done in parallel no programming needed to support anti-aliasing no programming needed to support anti-aliasing

(single pass)(single pass)

Page 22: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Multisampling – “A-buffer”Multisampling – “A-buffer”

A-bufferA-buffer computes a polygons approximate coverage of each computes a polygons approximate coverage of each

grid cellgrid cell takes more than one sample per pixel in a single passtakes more than one sample per pixel in a single pass shares computations among the samples for a grid shares computations among the samples for a grid

cellcell

Commonly used in software to generate high Commonly used in software to generate high quality renderings but not in real-timequality renderings but not in real-time

Focused on edge anti-aliasing and transparencyFocused on edge anti-aliasing and transparency

Page 23: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

LimitationsLimitations

LimitationsLimitations size of the coverage masksize of the coverage mask

even at 8x8 aliasing is still visibleeven at 8x8 aliasing is still visible box filter often used for simplicitybox filter often used for simplicity

worst filterworst filter

Page 24: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Gaussian FiltersGaussian Filters

Benefit: allows samples to affect more Benefit: allows samples to affect more than one pixelthan one pixel

Approximation of sinc function with limits Approximation of sinc function with limits (removes infinite and negative width)(removes infinite and negative width)

Generically referred to as Gaussian filters Generically referred to as Gaussian filters due to basis on the Gaussian bell curve due to basis on the Gaussian bell curve equationequation

Page 25: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Quincunx (NVIDIA)Quincunx (NVIDIA)

Real-time anti-aliasing scheme with samples Real-time anti-aliasing scheme with samples that affect more than one pixelthat affect more than one pixel““Quincunx” means an arrangement of five Quincunx” means an arrangement of five objects, four of them in a square and the fifth in objects, four of them in a square and the fifth in the centerthe centerPattern approximates a 2D tent filterPattern approximates a 2D tent filterUses a weighted average for the samplesUses a weighted average for the samples center sample – 1/2 weightcenter sample – 1/2 weight corner samples – 1/8 weightcorner samples – 1/8 weight

Superior to FSAA but can introduce errorSuperior to FSAA but can introduce error

Page 26: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

So far so good…So far so good…

Not every object on the screen can be Not every object on the screen can be perfectly sampledperfectly sampled Ex. arbitrarily small objectsEx. arbitrarily small objects

Regular sampling patterns will always Regular sampling patterns will always exhibit some form of aliasingexhibit some form of aliasing

Solution? Distribute samples randomly Solution? Distribute samples randomly over a pixel and use a different sampling over a pixel and use a different sampling pattern per pixelpattern per pixel

Page 27: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Stochastic SamplingStochastic Sampling

Why does it work? Randomization tends to Why does it work? Randomization tends to replace repetitive aliasing with noise which our replace repetitive aliasing with noise which our visualize system is more tolerant ofvisualize system is more tolerant ofJittering – most common stochastic sampling Jittering – most common stochastic sampling methodmethod assume N samples per pixelassume N samples per pixel divide the pixel area into N regions of equal areadivide the pixel area into N regions of equal area place a sample randomly in each regionplace a sample randomly in each region final pixel color is computed by an average of the final pixel color is computed by an average of the

samplessamples

Notable: 3Dlab’s SuperScene antialiasing Notable: 3Dlab’s SuperScene antialiasing hardware scheme uses jitteringhardware scheme uses jittering

Page 28: The Framebuffer February 6, 2003. A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point

Other SamplingOther Sampling

Interleaved samplingInterleaved sampling ATI SMOOTHVISIONATI SMOOTHVISION AT&T Pixel MachinesAT&T Pixel Machines SGI VGXSGI VGX

Poisson disk samplingPoisson disk sampling pattern in which nonuniformly distributed points are pattern in which nonuniformly distributed points are

seperated by a minimum distanceseperated by a minimum distance

Molnar’s schemeMolnar’s scheme adaptive refinementadaptive refinement useful in interactive applicationsuseful in interactive applications sampling rate kept low while the scene changes and sampling rate kept low while the scene changes and

is increased as the scene becomes staticis increased as the scene becomes static