an economic high performance pixel memory with 3-d solids capability

An Economic High Performance Pixel Memory with 3-D Solids Capability

F.R. Belch*

Abstract

For economic raster graphics equipment capable of the display of high resolution dynamic pictures of 3- D solids, 3-D wire frames, and 2-D wire frames - in decreasing order of stringency, the design of the pixel memory system is a key issue, due to the high pixel writing speeds required. This paper discusses some new architectural possibilities for such a memory system.

To give some idea of problem scale, present investigations indicate that pixel generation rates from an economic tiling circuit of upto 25 Mhz may be possible - 40 ns per pixel. However, using contemporary dynamic RAM’S memory read-modify- write cycle, including computation on the data read from the memory (for example masking in the new pixel data or z-depth-buffer arithmetic) may take 400 ns i.e. a speed difference of about 10:l exists.

The matching of these differing data rates demands a departure from traditional pixel memory architecture. Here a possible philosophy of approach is discussed together with the results of a simulation study. The proposed architecture is able to keep costs, bulk and power consumption down by using cheap, industry standard, high density memory com- ponents. Its use is additionally shown to be compati- ble with the generation of dynamic views of bolids using depth buffer methods.

1. Introduction

High performance raster graphic displays are expensive, one reason being the expense of the pixel memory system. This comes from the need to use large numbers of costly, low density, high performance static RAM’s in order to have fast pixel read and write times to achieve high pixel processing rates. The devices use high currents and so result in substantial bulk and power consumption.

%erranti Computer Systems Ltd. Wythenshawe Division Simonsway Wythenshawe Manchester M22 5 L A England

A prime objective of this study has been to investigate the use of relatively cheap, much higher density but much slower dynamic RAM’s to keep costs, bulk, and power consumption down, and yet still obtain a high performance architecture.

It is noteworthy that the more recent video dynamic RAM’s whilst addressing the problems of high resolution screen support with simultaneous memory update, still have the same basic read- modify-write access times for random pixel writing, as the conventional dynamic RAM’s from which they are derived, and do not directly provide a solution to the problem.

There has been a desire to avoid specially developed logic enhanced memory devices as others have used, for example Fuchs et al,’ both on the grounds of development cost and timescale and unit cost. Instead, none other than industry standard memory devices have been contemplated.

2. Controller Architecture

Figure 1 outlines a generic display controller for generating dynamic 3-D or 2-0 pictures. Equipments vary in functionality; the paper discusses pixel memory system design only, and the diagram is just to indicate possible operational context. The drawing or tiling circuits write pixels into the memory system, bear on its design, and deserve mention.

They are a set of similar circuits able together to perform multi-dimensional linear interpolation. When drawing simple 2-D vectors, scan conversion interpolates integer rounded y for integer x or vice- versa depending on vector angles of less or greater than 45 degrees. Depth cued vectors with z- proportional endpoint intensities need z-proportional interpolation too.

Triangular polygon fill of 2-D solid areas uses hatching vectors with endpoint values established by interpolation along two edges. 3-D trihedra are triangular polygons with vertice z interpolation added to effectively give depth cued hatching vectors.

Monochrome intensities at trihedron vertices for Gouraud shading2 add a further hatching vector interpolation circuit, multiplied by 3 for colour Gouraud or monochrome Phong shading3 and 3 again for colour Phong shading.

North-Holland Computer Gruphics Forum 6 (1987) 13-25

14

Host i/f -+

F.R. Belch 1 High Performance Pixel Memory

Supervisory Display G~~~~~~ Drawing Pixel Video

Processors Memory Circuits System Circuits and he- -b File -b processor --b or Tiling --b Memory +Generation

Load 2D or 3D File Vector or Pixel wire fnme or rdversol rih&on write 6: Screen

solid file - 2D or 3D co-ordinate hidden Suppon '

optional polyhedra transforni & shade surfilce tesslate clip and interpolate remove

light

c Keyboard Dan Tablet

Hard Copy etc.

Figure 1. Outline of generic display controller

The greater number of quantities to be inter- polated the greater number of circuits. It is hoped to present a future paper on this topic.

Summarising, drawing or tiling circuits generate and deliver to the memory system a stream of pixels at upto 25 Mhz i.e. 40 ns per pixel. Each pixel's data comprises, in the case of 3-D or 2-D wire frame or 2-D solid area applications, a set of x , y , and pixel colour values or, in the case of 3-D solids requiring hidden surface removal, a set of x,y,z and pixel wlour values. All values are binary integers. Colour and z are destined to be written into multiplane pixel memories at locations defined b y x a n d y .

Figure 2 shows a memory configuration to give dynamic pictures for the first type of application where z data is absent. The address look-up circuit converts a screen x y co-ordinate into a pixel memory word and bit address. Prior to this stage the x,y w-ordinate is independent of pixel memory module organisation allowing variation of this pro- vided that look-up arrangements are modified. Further discussion of address look-up is deferred.

The principle of operation is that one memory module is written with a picture and then used to support the screen whilst the sewnd is reset to an

initial state prior to receiving a new picture resulting from file retraversal. In this configuration video RAM'S are not required. Flicker is never present since one module always supports the screen whilst the other receives results of new file traversal. This takes place at rates upto frame rate i.e. down to 16 millisecond intervals. More static applications need only one memory module, generally using video RAM'S so that screen support and picture redraw or edit may take place simultaneously.

Figure 3 shows a maximum configuration to give dynamic pictures of 3-D solids using 3 memory modules. These are variants of the same design differing only in the number of planes mounted and characterisation links. Each module typically provides at least 12 planes of memory. Modules 1 and 2 alternately provide screen support as before; additionally they share the use of a common z-depth- buffer. This is reset to an initial state at the comple- tion of the generation of a new view, and subse- quently used with the pixel memory receiving the next view, able to send it a write inhibit signal. The pixel data paths to modules 1 and 2 include w n c e p tual equal delays D, comparable to a pixel clock period, linkable in or out on the modules themselves.

Memory Module 2

Alternate With 2

For Screen Support

Alternate With 1

For Screen Support

Figure 2. Memory system for 2D and 3D wire frame

F.R. Belch / Hixh Performance pixel Memory

- Word Address Orher X.Y, z Logic ~ i l i ~ ~ Colour Address z, Colour

r b Lmk-up - Circuit Circuit

From

Common Write

Buffer Signul

- Alternate Pixel +

b zDepth -+ Inhibit

With 1

Pixel

Bit Address Support

+ D L

Memory Module - For Screen SUppOK

Figure 3. Memory system for solids

2-depth buffer operation is along traditional lines described in Newman and Sproul14 and Foley and Van Dam.5 The buffer is initialised to the maximum z value and operation when a pixel is received is shown by the figure 4 flowchart. These operations are carried out by the memory module itself at the full pixel rate. Figure 4 gives basic hidden surface removal, but the module provides more sophisticated arithmetic operations, as described by Shen,6 to allow for the additional possibilities of a contour view (z = a specified value enables pixel write) or sectioning ( a t h i t is imposed). The reason for the digital delays in the pixel data paths of memory modules 1 and 2 is to allow for earlier reading of the z-depth-buffer and the deduction of the write enable/disable signal, before either module 1 or 2 is written to.

Is z of current pixel c z value in depth buffer L]

Enable pixkl write into pixel memory system 1 or 2

Disable pixel write into pixel memory system 1 or 2

Write z of current pixel into z depth buffer

Figure 4.Z-depth-buffer computation

3. Pixel Memory Architecture

3.1. Traditional Pixel Memory Organisation

Pixel memory architecture is new. To understand its operation a traditional organisation is first con- sidered, so that contrasting aspects are appreciated.

Direction of Screen Spot Scan ____)

PI

Figure 5. Organisation of multiple pixel memory planes

Figure 6 . Detail of a single pixel memory plane

Such an organisation for a multiplane pixel memory is shown in figure 5. Numbers of bits per word on typical systems are 8, 16, 20 or 32. Here a 20 bit wide word memory is shown. Further detail of bit organisation within a word is shown in figure 6. Historical reasons for word organised pixel memories are:

a) to interface to word/byte read/write operations of CPU’s and VLSI graphics controllers

b) to allow several bits representing consecutive pixels along the line of screen spot scan to be read simultaneously from memory to over- come the limiting factor of memory chip access time.

once a number of memory chips of given s u e have been assembled to form a plane of given size they implicitly define word size.

For s e n d pixel data of the type discussed, the first reason for having a word organised memory disap pears because the interface is at bit level. The second reason remains, since even if video RAM’s

c)

16 F.R. Belch /High Performance Pixel Memoty

20 bit wide memory word read or write

I Address(es) of location(s) in memory chips

Figure 7. Memory chip organisation

are used, some degree of organised parallel access is needed to match video and memory access rates.

For double buffered pixel memories the case for the use of video RAM’S is less strong than for single memory systems, since a memory is never called up to simultaneously support a picture and receive picture update data but only one or the other. This may be significant in keeping down system physical s u e and dynamic RAM and is more expensive for the same storage.

Figure 7 shows access circuitry for a single plane of memory using 20 chips. When a word is accessed the same address is simultaneously applied to all chips, each originating one bit, so that a 20 bit word is obtained. The time taken to read, write or modify a word is given by the read, write or read- modify-write memory chip access times. A typical chip contains 64k or 256k or 1024k bits so the 20 chip array contains this number of 20 bit words. Usually, word addresses are consecutive along a screen line, and follow in sequence for subsequent lines.

3.2. New Pixel Memory Architecture

Figure 8 shows a pixel memory module using the new architecture with conventional RAM’S. Its organisation is contrasted with that of figure 7. Pixel data flows in from the tiling circuit through the input buffer on the left hand side. When the memory is being used for screen support the pixel data flows out through the output buffer on the right hand side. If video RAM’S are used the circuit differs marginally.

Unlike the previous arrangement with all chips cycling at once, each chip or set of chips (for a multi-plane memory) associated with a word bit is cycled individually by its own memory control circuit, individual cycles being out of phase with each other by multiples of the pixel generation period. Here, each circuit is implemented using a custom gate array. With varying economics discrete com- ponents such as PAL’S may be used.

All gate arrays contain identical logic and their basic operation is as follows: Each memory controller is capable of executing a memory cycle

~~~~ .....................

- 1 I Input Buffer

bit address, word address, z or colour ____)

control

Buffer U

Figure 8. Block diagram of a pixel memory module

F.R. Belch f High Performance Pixel Memory 17

I 2 3 4 5 6 .._ 15 1617 18 1920 17 18 1920 1 2 ... 11 12 13 14 15 I6 13 14 I5 16 I7 18 ... 7 8 9 10 I 1 12 9 1011 12 13 14 ... 3 4 5 6 7 X 5 6 7 R 9 I0 ... 19’0 1 2 3 4

1 2 3 4 5 6 ... 15 I6 I7 18 I920

independently. With a 25 Mhz pixel generation rate from the tiler a read-modify-write cycle including computation with contemporary dynamic RAM’S typically takes 10 pixel generation clocks or 400 ns. The system is run synchronously with respect to the pixel generation rate from the tiler i.e. with a 25 Mhz clock.

A controller recognises a word bit address on the bus of a pixel residing within a chip set under its control. If the chip set is not being cycled then the controller staticises the word address and the z or colour data from the bus, a memory cycle is initiated and the controller becomes busy. If a chip set is already being cycled when the next pixel is presented, then a busy signal is returned to the tiling circuit with stops generating pixels until the chip set concerned is ready to receive new data.

This means that there is a large degree of overlap between a particular memory cycle complet- ing and other pixels not concerned with that chip set being processed.

As well as the basic logic described each gate array contains:

1 2 3 4 5 6 . . . 17 I8 1920 I 2 ,.. 13 14 15 16 I7 I8 ... 9 10 I I 12 13 14 ... 5 6 7 8 9 10 ... I 2 3 4 5 6 ...

logic for masking in pixels to provide plane enableldisable facilities logic for the r-depth-buffer operations of figure 4 concerned with hidden surface removal, contouring and sectioning.

logic for memory scanning for screen report. logic for accepting broadcast pixels sent to all devices simultaneously. logic for carrying out simple write, rather than read-modify-write cycles for high speed memory preset. logic for accepting parametric data along the same buses which are normally used for pixel data - for example defining the area of memory to be scanned for screen support, and the screen resolution. logic for generating the delay required in the pixel memory modules reference the common depth buffer.

A detailed consideration of the logic contained within each controlling gate array lies outside the scope of the present discussion.

3.3. New Pixel Memory Organisation If the memory is conventionally organised as in figure 6 then although horizontal writing speed is high, performance falls off rapidly with increasing

angle until in the vertical direction it drops to a rate determined by the read-modify-write cycle time. To improve performance more sophisticated bit map- ping arrangements are used as in figure 9 which shows which memory chip o u t of 1-20 contains the bit at that particular screen position. Although shown for 20 bits the same principies can be applied to any word length.

Figure 9. Alternative pixel memory plane organisation

The first line of words has exactly the same arrangement as in figure 7. In the second line of words all bits in each word are circularly shifted to the left by 4 places. For the second screen line, no longer are 20 consecutive bits numbered 1-20 held in 20 corresponding memory chips numbered 1-20, instead, bit 1 in any 20 bit group is now held in chip 17, bit 2 is held in chip 18, bit 3 in chip 19 and bit 4 in chip 20. Because the shift of each word is cyclic, bit 5 of any 20 bit group is now held in chip 1 and so on upto bit 20 which is held in chip 16.

With the third line of words, all bits in each word are Circularly shifted to the left by 8 bits. Simi- lar considerations apply here as to the second line except that now bit 1 in any 20 bit group is now held in chip 13 and so on upto bit 8 which is held in chip 20. Then bit 9 is held in chip 1 and so on upto bit 20 which is held in chip 12.

This pattern is repeated every 5 lines.

3.4. Result of Using Pattern From figure 9, the results of using this pattern are seen. The same memory chip is not involved with every pixel along a vertical vector as was the case previously - no fewer than 5 different chips are cycled. This gives about a 5:l improvement in vertical speed over the previous arrangement. If the read-modify-write time is 400 ns, then a vertical pixel write rate of about 400/5 = 80 ns per pixel is achievable. The horizontal write rate remains as high as it was previously, because 20 M e r e n t chips are st i l l cycled - more than enough to keep up with the tiler output rate.

For vector drawing in other directions speed varies in ways which are not obvious. For the pat-

18 F.R. Belch 1 High Performance Pixel Memory

tern shown the poorest writing rate occurs at 14 degrees to the horizontal.

Here is a rectangular 4 x 5 patch taken at random from memory.

14 15 16 17 10 1 1 12 13 6 7 8 9 2 3 4 5

18 19 20 1

Every chip is represented, but none appears more than once, a characteristic of this type of pattern. If the current pixel write position of a vector lies on a path comer with the vector heading into the path then, irrespective of vector direction, different independently cyclable chips are encountered and write rate is enhanced. This has been termed an ‘enfiladed’ memory since the design intention is that whatever the vector direction the maximum number of M e r e n t chips are cyclically encountered. Due to area coherence the architecture is also substantially effective for character and symbol da ta

Figure 10 shows process timing for a vector drawn close to the vertical (76 degrees). Because of overlapping of memory cycles the vector is drawn at maximum pixel rate even though the chip read- modify-write cycle takes 10 times as long. This is true for most other directions.

There are many pattern variants and an objective of the study has been to experiment with these using different numbers of chips and arrangements to find the most efficient ones to use. Complete free- dom of choice is not possible since complicated patterns make word serialisation for screen support difficult.

4. Simulation Program To study the effects of different patterns and collect

statistics on minimum, maximum and average pixel rates a simulation program has been written.

For specified patterns the program performs calculations on fans of vectors as shown in figure 11 of a given constant radius (in terms of pixels actually written) from a given x,y co-ordinate start point, starting angle and finishing angle, with given number of degrees step. Vector draw times are calculated by evaluating individual pixel write times using routines simulating the hardware for the pattern being investigated.

S t a r t angle A f i n i s h annlc I

Radius var iable

t

4 1 x,y co-ords. I of centre var iable I

Figure 11. Fan of vectors used for simulation

All parameters are variable to determine the characteristics and effectiveness of a given pattern.

Bresenham’s algorithm7 is used to calculate pixel co-ordinates since this is characteristic of the generating hardware in practice.

An array of busy timers is held in the program, one for each memory chip. When a chip becomes active then the busy timer location is set with a value equal to the chip delay, and as simulation time passes this is counted down. In practice, to make the program run more quickly, all timers are decremented in a single cycle with the minimum outstanding delay.

Time Clock Periods

< 10 x 4 >< 10 >< 4 >< 10 > < read-modify-write >< wait >< read-modify-write x wait >< read-modify-write >

- GA 15-b ......... pb ......... - - GA 8 -b ......... pw .........

- GA 19-----) ......... -b. ........ - GA 3 -b ......... ______). ........ - GA 12.-b ......... ______). ........ - GA 16-w ......... -+. ........ - GA 1 -+ ......... -b. ........ - GA 5 --b ......... .-& ......... - - GA 9.-b ......... ______b ......... - - GA 14- ......... pb ......... - - GA 18- ......... pb .........

Figure 10. Overlapping memory cycles (76 degrees vector)

F.R. Belch /High Performance Pixel Memory 19

5. Discussion of Measurements For vectors of constant length drawn in different directions the number of pixels actually

5.1. Comparison Between Conventional and written using Bresenham's and similar algorithms is Shifted Pattern proportional to cos(theta) in each 45 degree octant. AU time measurements are tiler pixel generation At 45 degrees, for example, on$' 0.707 as many pix- clock periods. In practice these may be of the order els need to be written in comparison with a horizon- of 40 ns. ~~h~~ and average write times/pixel tal or vertical vector. Lf pixel write rate is constant, are calculated. then linear write rate increases as 45 degrees is

0'

Shiked chip pattern

W'

14'

2.48

1.95 2.44

10.0

Figure 12. Polar plot of pixel write time vs direction


approached, being 1.414 times as fast at this angle. Line density is undesirably reduced. For 2-D diagrams the linear write rate, reflected in the timelradius unit may be easier to use than the pixel write rate since draw time estimation is possible using average linear write rate. However, tradition- ally equipment manufacturers have always specified pixel write rate and this convention is maintained here although it is a generally conservative measure.

The polar plots of figure 12 compare write times for all directions firstly for 20 bit wide memory using a conventional pattern but individually cycling chips and then using the shifted pattern with a 5 line repeat as discussed previously. A chip busy time of 10 clocks is assumed.

Although the conventional pattern performs well for vectors within 45 degrees of the horizontal, write time increases rapidly beyond this upto the

3.8

3.6

3.4

3.2

3.0

i; 2.8

5 x 2.6

.- 0

0 0 - g 2.4

F .= 2.2 D

z 2 2.0 - Y

1.8

1.6

1.4

1.2

1 .o

chip busy time of 10 clock periods in the vertical direction.

The plot for the shifted pattern has a some- what spiky appearance with maxima occurring at unexpected angles. It exhibits 180 degree symmetry as one would expect with vectors drawn in opposite directions at the same rate. The greatest maximum (2.48 clocks at 14 degrees) is a factor of 4.03 better than the worst with the conventional pattern. The vertical time of 1.95 is a factor of 5.13 better.

The average pixel write time is proportional to the square root of the area enclosed by each plot. 3.61 for the conventional pattern is quite a respect- able figure, being about 144 ns per pixel, but 1.36 for the shifted pattern is a factor of 2.65 better at 54 ns per pixel.

To put the comparisons into context, if independent chip cycling was not used, and 20 bit

Maximum (at 14'1

I I

I 3 4 5 6 7 8 9 10 11 12 13 14 15

Chio Busy Time Clock Periods

Figure 13. Pixel write time vs chip busy time

F.R. Belch 1 High Performance Pixel Memory 21

words containing single pixels were always written - a characteristic of many VLSI graphics controller chips, then the plot would be a circle of radius chip busy time - in this case 10 clock periods, giving a constant write time of 400 ns per pixel.

Drawing time is insensitive to vector x,y origin, a fact which is not apparent from the plots.

5.2. Timing Variation With Chip Busy Time

Figure 13 shows timing variation with chip busy - time this includes the memory chip read-modify- write cycle time plus computation time on the data accessed. All timings are for the 5 line repeat pattern as before. Maximum and average values are plotted.

For a 5: l variation in chip busy time the average time/pixel varies by only 1.79:1 although the maximum timelpixel changes more (3.72:l) reflecting an increased variance - undesirable since pictures may be encountered having a predominance of vectors in worst case directions.

Contemporary dynamic RAM’S yield operation in the 7-10 clock periods region giving average times/pixel of 1.15- 1.36 clock periods or 46-54 ns.

5.3. Timing Variation With Chip Numbers And Pattern

The figures in this section are for different chip numbers and pattern from the one so far discussed. A 10 clock period chip busy time is always assumed.

5.3.1. Sixteen Chips in a 4 X 4 Arrangement.

This uses 16 chips as may be required for a lower resolution display, using a 4 line repeat, and a 4 bit x shift on successive lines.

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 2 3 4 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 2 3 4 5 6 7 8 13141516 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

Average time/pixel 1.48 Maximum time/pixel 2.48 (at 14 degrees)

5.3.2. Twenty Chips in a 4 Line Repeat.

This is a variation of the 20 chip pattern using a 4 line repeat and a 5 bit x shift on successive lines. Although these figures are similar, the 5 line repeat pattern gives better vertical vector performance (1.95 cf 2.44) which in view of applications mentioned later makes it more worthwile.

I 2 3 4 5 6 7 8 91011 I213 1415161718 1920 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 1 2 3 4 5 11121314151617181920 I 2 3 4 5 6 7 8 9 1 0 16 17 18 19 20 I 2 3 4 5 6 7 8 9 10 1 1 I2 13 14 15


5.3.3. Thirty two chips in an 8 Line Repeat. This is a 32 chip pattern as may be required for a higher resolution delay using an 8 line repeat and a 4 bit x shift on successive lines.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 32 5 6 7 8 9 10 11 12 13 14 15 ._. 32 1 2 3 4 9 10 1 1 12 13 14 15 ... 32 1 2 3 4 5 6 7 8

13 14 15 ... 32 1 2 3- 4 5 6 7 8 9 10 11 12 17 ... 32 1 2 3 4 5 6 7 8 9 10 1 1 12 ... 16 21 22 23 24 25 26 27 28 29 30 31 32 1 2 ... 19 20 25 26 27 28 29 30 31 32 1 2 3 4 5 .__ 22 23 24 29 30 31 32 1 2 3 4 5 6 7 8 ._. 25 26 27 28


The performance improvements obtained by increasing the number of chips from 16 upward are not dramatic. From 16 to 20 the average time/pixel improves from 1.48 to 1.36 clocks, about 8% speed improvement for a 25% increase in the number of chips. From 20 to 32 the time improves from 1.36 to 1.19, about 13% improvement for a 60% increase in the number of chips.

However, the number of chips to be used is most likely to be governed by memory size and shape and chip size, rather than performance. If the required resolution is 1280 X 1024 and 64k bit chips are to be used then the use of 20 bit words is largely forced.

5.4. Short Vector Timing Variation With Radius

Figure 14 shows for short vectors (upto 16 pixels long) variation of average write time with radius, using 20 chips and the 5 line repeat pattern.

Very short vectors (upto 5 pixels long) are written at the highest possible rate because of about 4 pixels worth of chip buffering present in the pattern even at the worst case angle (14 degrees).

At 5, 9 and 15 pixels chip buffering runs out and performance lapses until a further block of chips becomes available.

The short vector average of 1.16 remains generally lower than the long vector average. Worst case angle figures show similar characteristics.


I

I 1 2 3 4 5 6 7 8 9 10 11 1 2 1 3 1 4 1 5 1 6 Vector Radius Pixels

Figure 14. Pixel write time vs short vector radius

5.5. Consecutive Short Vectors

Consecutive short vector performance is less con- strained than might appear. Again a rectangular 4x5 patch is taken at random from memory.

important since there may be a predominance of vectors in these directions.

7. Address Look-up Circuit 19 20 1 2 I5 16 17 18 I 1 12 13 14 7 8 9 1 0 3 4 5 6

If a short vector terminates on a pixel near the patch centre, the worst case is for the next vector to retrace or take a patch angularly close to it. A chip busy delay is incurred whilst a recently cycled chip becomes non busy. All other directions lead to the cycling of different chips. An important requirement of vector drawing is it should not redraw common end point pixels shared by consecutive vectors.

6. Criteria for Pattern Selection

6.1. Importance of High Horizontal Rate

Optimal horizontal writing rate may be important for polyhedral fill if the tiler fills solid areas with hor- izontally generated data. In simple 2-D systems, pro- vision for simultaneous writing of several pixels in a word i.e. a word or a partial word write, is useful to improve filling speed. This is less useful for 3-D solids because consecutive horizontal pixels are by no means certain to have the same colour/intensity value if Gouraud or Phong shading is active. I data suffers from similar problems.

6.2. Horizontal and Vertical Rates In many process control and network monitoring applications high horizontal and vertical rates are

This converts a screen x,y co-ordinate into a memory word and bit address. With patterns of the type described, bit addresses are cycled by a constant dependent on the shift on successive lines. The circuit design depends on the pattern used.

Figure 15 is a circuit suitable for memory dimensions of 1280 X 1024 using 20 bit words with a 4 bit cyclic shift on successive lines upto 5 .

The word address is formed by dividing the x

10 rns. Y 10 bits

;,:- 6 1,s. bits

16 bit word address

-3 512x3 r n . s . 1

X 2 Is - 2 I.S. bits

5 bit bit addr unc ycied 1

Figure 15. Address look-up circuit detail

co-ordinate by 20, to form the least significant 6 bits, then concatenated with the 10 bity co-ordinate. Economic arithmetic logic is too slow and the division by 20 is carried out by PROM look-up using the 9 most significant bits of x (the least significant 2 bits are not required).

The uncycled bit address given by the remainder of the x co-ordinate divided by 20 can be obtained by concatenating the least significant 2 bits


of the x co-ordinate with a 3 bit divide by 5 remainder operation carried out again by PROM look-up from the 9 most significant bits of x .

Line by line cycling is applied to the 5 bit result by forming the 3 bit remainder of y / 5 by applying the 10 bit y co-ordinate to a lk X 3 PROM look-up table. The 3 bit output indicating which line of 5 is involved is then combined with the 5 bit uncycled bit address to give an 8 bit value applied to a 256 X 5 PROM look-up table from which the cycled bit address is obtained.

Address look-up for 8, 16, or 32 rather than 20 bit words is trivial and the circuit virtually disap pears.

8. Pixel Serialisation for Screen Support

Pixel serialisation for screen support needs explana- tion. Using conventional rather than video RAM’S the data paths for writing pixel data into memory can also largely be used to read data out for screen support. In a double buffered system a memory module is not called upon to write pixels and simultaneously support the screen.

The gate arrays provide a natural serialisation system. Figure 16 adds detail to figure 8 showing an organisation for partial serialisation for a 12 plane system using 20 bit wide words. The figure shows bit and word addresses buffered and distributed along a common highway to all arrays whilst z or colour data is buffered and distributed along 4 separate highways having output buffers for serialised data at the far end. Arrays are then connected in ascending order cyclically to the 4 highways which then correspond

Figure 16. Pixel

bit address and word address

to groups of 4 consecutive bits cyclically shifted on successive display lines.

In display support mode memory scanning logic within each successive group of 4 arrays ini- tiates a memory read cycle and outputs data for 4 consecutive pixels onto the 4 highways. This subse- quently passes through the output buffers to the final high speed ECL serialisation stages. A daisy chain connecting arrays is used to signal that a group must start its cycle. At the beginning of a line the group beginning the sequence corresponds to the cyclic shift introduced for that line. This implies the arrays containing 1-5 line counters and taking appropriate action on count value.

To support a 60 hz non-interlaced 1280 X 1024 display requires a pixel rate of about 100 Mhz. To generate this, groups of 4 pixels are produced at 25 Mhz or every 40 nanosecods. There are 5 groups of 4 gate arrays so that the cycle for any single group of arrays completes in 5 X 40 = 200 nanoseconds. At present this read cycle time is achievable by 100 nanosecond selection 64k X 4 bit dynamic RAM’S.

With video RAM’S a slightly different circuit is needed and longer chip cycle times may be used but the principles remain largely the same.

9. System Performance Estimation

9.1. Variable Resdution

Since the number screen pixels is proportional to the square of the linear resolution a reduction of resolution by some factor improves picture write time performance by the square of this.

... B

l+ jqz 1:; z or colour senal oixels

memory module pixel serialisation detail

24

Normal Page mode write cycle write cycle

200 nanoseconds 110 nanoseconds

F.R. Belch / H i g h Performance Pixel Memory

Nibble mode write cycle

50 nanoseconds

I 9Ml I 737.2811 Pixel\

(56%. l.x:l) I o u

76X (25% 4-1) 512

Figure 17. Organisation of memory for variable resolution

Figure 17 shows use of the same memory module at reduced resolutions. The screen origin is located in the lower left hand comer of the memory which always retains the same ‘shape’ irrespective of selected resolution. The same address translation circuit can then be used irrespective of this, there is just a restriction on maximum x,y co-ordinate value. The figure gives resolution examples and numbers of screen pixels expressed as a percentage of the maximum, and the inverse of this, the performance multiplier. Resolution parameters can be con- veyed to arrays along the pixel bus as previously mentioned. Externally a change in scanning fre- quency and monitor is necessary.

9.2. Memory Preset

Prior to writing new picture data the memory region used must be preset. This varies from a window upto the full screen. Because this sets a whole block of memory to the same value it can use higher write rates than the general pixel write mechanism. Preset time is subtracted from the total available for picture generation whilst maintaining a given picture repeti- tion rate.

To improve write time all arrays are cycled simultaneously since all write the same data. With dynamic RAM’s the briefer write rather than read- modify-write cycle or page mode or nibble mode cycles can be used.

Comparative figures for 100 nanosecond access dynamic RAM’S are:8

At present one manufacturer9 produces video RAM’s with the fast preset, presetting 256 RAM locations in a single cycle reducing the time to negli- gible proportions. It is a matter of weighing whether increased cost and real estate is worth the performance improvement.

9.3. Preset T ime Page mode write cycles used with a 20 bit word yield 110/20 = 5.5 nanoseconds per pixel. For a display of I280 X 1024 this gives a whole screen preset time of 7.2 milliseconds.

Expressed as a percentage of 40 milliseconds for 25 pictures/second this is 18% giving some indi- cation of performance reduction caused by preset.

This may be a worst case figure since software awareness that dynamic parts of the picture occupy certain screen regions allows presetting of these alone, saving time. On systems minus depth buffers this may be offset if, for the painter’s algorithm, multiple overlapping pages need presetting so that the same screen area is preset more than once.

.

10. Full Screen Write Time The time to write every screen pixel with a distinct colour is at best (assuming 46 ns average write time) :

1280 X 1024 X 46 = 60.3 milliseconds + 7.2 milliseconds = 67.5 milliseconds corresponding to approximately 15 new pictures/second.

It is possible to estimate the percentage of screen writable at 25 hz i.e. allowing 40 milliseconds:

Draw time available after screen preset = 40 - 7.2 = 32.8 milliseconds. Percentage of screen = 323160.3 = 54%.

For solids there will be a tendency to write front and rear surfaces once each so the number of pictures/second and percentage of screen writable will tend to be half of the above figures namely 7 and 27% respectively.

All this assumes the tiler constantly active generating pixels, which will not happen in practice. They do, however, assume maximum resolution.

11. Cost Implications

Until recently the development and unit costs of the memory control gate array would have been prohibi- tive. However, development cost for a design on a CAD system such as DAISY10 has reached an


acceptable level, and for moderate quantities unit cost is now of the same order of cost as the memory chips themselves. Alternatively, the use of discrete and/or PAL’S offers a feasible solution for simpler 2-D and 3-D write frame systems with limited numbers of planes, where as few as five or six 20 pin PAL‘S may be sufficient for each memory control circuit.

References

1. H. Fuchs, J. Goldfeather, J.P. Hultquist, S. Spach, J.D. Austin, F.P. Brooks, J.G. Eyles, and J. Poulton. “Fast spheres, shadows, tex- tures, transparencies, and image enhancements in pixel-planes,” Computer Graphics 19( 3 ) ,

H. Gouraud, “Computer display of curved SUJ-

faces,” NTIS AD-762 018, University of Utah, Department of Computer Science (June, 1971).

pp. 111-120 (July, 1985). 2.

3.

4.

5.

6.

7.

8.

9.

10.

Phong Bui-Tong, “Illumination for cornputer- generated pictures,” CACM 18(6), pp. 31 1-317 (June, 1975). W.M. Newman and R.F. Sproull, Principles of interactive computer graphics, McGraw- Hill (1979). J.D. Foley and A. van Dam, Fundamentah 5f interactive computer graphics, Addison-Wesley (1982). Tsu Y Shen, Three dimensional display system, US. Patent No. 4475104 (October, 1984). J.E. Bresenham, “Algorithm for computer control of a digital plotter,” IBM Syst. J. 4( l),

MOS, Memoy data book, Texas Instruments Inc.. Am90C644, CMOS dual memory array, Advanced micro devices Inc., Sunnyvale CA.

Daisy Systems Corp., Sunnyvale CA.

pp. 25-30 (1965).

an economic high performance pixel memory with 3-d solids capability

Documents