video note

7/27/2019 Video Note

1/4

Digital Video

Introduction to Digital Video

Video is the electonic signal or data that, when rendered through a display, appears as movingimages. Video is captured by a video camera whose sophistication ranges from those used in astudio to cameras in a mobile phone. The principle of operation is the same; an image is capturedas light from the scene passes through a lens to be brought in focus on the imaging device.

Typically the imaging device is made up of thousands of individual sensors each capturing onepixel of the image. In order to capture colour images the light hitting the sensor is split into Red,Green and Blue (RGB), using suitable filters, with each component having its own sensor i.e. eachpixel effectively has three sensors. Any colour can be represented as a combination of red, greenand blue. When the sensors are exposed to light from the scene, by opening a shutter, theintensity is recorded as an analogueelectronic voltage. The amount of light can be adjusted bychanging the aperturesize (how big the hole is that lets light in from the outside world) and/or theshutter speed (how long the hole is open for). In low light conditions a large aperture and/or slowshutter speed will be necessary. The image, captured as an analogue voltage with RGB values for

each pixel, is converted to a digital one by digitizing the analogue voltages for each pixel using ananalogue-to-digital converter; typically using eight bits for each value.

Up to this point the basic process is the same for (digital) still camera and video camera. The bigdifference comes because the video camera has to take tens of pictures per second to captureanything moving in the scene so that it will look like natural movement when played back on thedisplay device. For television 25 (30 in the US system) pictures-per-second has been traditionallyused which is similar to the 24 used in cinema. For some low-end devices 12 pictures per secondis used. When filming fast moving scenes, such as sports, rates up to 60 pictures-per-second maybe used.

The video in standard definition television using the PAL system adopted in the UK and much of

Europe has a resolution of 720 x 576 pixels matching the resolution of the original analoguesystem. Given this, the video bit-rate being generated by the camera is:720x576 pixels x 3 colours x 8 bits x 25 pictures per second = 248.8Mbit/s. That's without anyaudio or extra bits that we'll need to indicate the beginning and end of a picture.

On a camera we need to store the video using whatever media is available; tape, DVD, memorycard, hard-drive. As an example; with the bit-rate above a 16GByte memorycard would be full inabout eight and a half minutes -no feature length movies here.

What about if we want to communicate the video as television for example. Here we need to knowthat our TV transmission systems such as terrestrial, satellite and cable are limited to 50Mbit/s atmost (assuming we only need to carry one program) to see that we have a problem. What about

transmitting it using broadband? Even with an exceptionally fast connection we still won't haveenough bits-per-second to carry our video.

Of course we know that video is transmitted via satellite, terrestrial and broadband verysuccessfully, so what happens to enable this? At first glance we can see that reducing theresolution (that is reducing the number of pixels) will lower the bit-rate (half the resolution =>halfthe bit-rate). We could also reduce the picture rate (e.g. use 10 frames per second instead of 25)or we could reduce the number of bits used in our analogue-to-digital converter. However all ofthese would affect what we would see on the screen and how we perceive the video quality. Weneed to reduce the bit-rate somehow without us noticing when we come to display the video.

Fortunately there are techniques that can be used which we notice less when it comes to theviewed video. They rely on how we perceive the video image.

Communication Technology Notes R Germon 2010 1


2/4

Digital Video

The first trick is to notice that we are less able to see detail in colour than we are brightness. If wehave a series of lines where just the colour changes between them (no brightness change) thelines need to be further apart than if they are black and white (just brightness change) in order tosee distinct lines rather than a continous colour (or grey). In more technical terms we can seegreater resolution in luminance (brightness) than we can colour. To make use of this in ourtreatment of video the RGB values can be transformed into YCRCBor Luma, Chroma Red andChroma Blue*. The Luma value carries the black and whiteness or brightness (luminance) and theChroma values carry the colour information (this is just a transformation of the values we canalways transform them back; and we do when it comes to display the video). The advantage of thistransformation is that we can now treat the Luma and Chroma values separately. We find thatprovided we keep the same number of pixels showing black and whiteness (the Luma values) wecan have fewer pixels showing the coloured (Chroma) part. Typically we can get away with half asmany chroma pixels as luma both horizontally and vertically and not notice the difference when weconvert back to a full set of RGB values that a display will require. This way we can half thenumber of bits representing each picture, and therefore the bit-rate, without really noticing.

Another step we can take is to loose spatial detail in each picture that we don't really notice. Itturns out that to resolve fine detail we need high contrast; so fine detail that doesn't have largeluma or chroma changes can be lost without us noticing. This is the principle of JPEG still imagecompression that is also used in video. Teasing out the fine detail involves transforming our pixelvalues into a form that identifies the fine detail. Fortunately there is a mathematical tool that doesjust this; the Discrete Cosine Transform. Once transformed, approximations are made to the valuesrepresenting fine detail, which often turn out to be zero which can then be coded very efficiently.Choosing how close the approximations are is set when the degree of compression or video qualityis set. We can typically half the number of bits per picture and the bit-rate using these techniqueswithout noticeable degradation in picture quality.

Video picture sequences are usually made up of consecutive pictures that have a lot of similarity.Only a small amount of the image changes from one picture to the next. Parts of the image thathave moved will typically have the same pixel values as the corresponding pixels in the previousand following pictures; just in a different position in the image. Where blocks of pixels are the sameit is not necessary to store or transmit the values again instead a position vector pointing to thereference pixels can be coded. Typically blocks of 16x16 pixels are treated in this way giving thepotential for enormous bit savings. This yields three types of pictures in the compressed video:Intrapictures (Keyframes in editing) which have no reference to adjacent pictures where just thestill image techniques described above are used, Predictedpictures that use past pictures asreference and Birectionallypredicted pictures that use past and future pictures as reference.

Once all this has been done there is still scope for further reduction of bit-rate by noting that some

bit sequences are more likely than others and coding accordingly.

These techniques combined enable up to a 100-fold reduction in bit-rate whilst maintaining abroadcast quality image.

This discussion started with a standard definition picture. The trend to have bigger TVs and watchvideo up close (e.g. on a monitor) has meant there is a drive towards higher resolution images(commonly known as High Definition) and even consumer camcorders and phones now have HDrecording capability. So-called full-HD in widescreen format has 1920x1080 pixels; five times thestandard definition. This drives the need for video compression even harder. Actual

* This trick was actually used with analogue TV transmissions to allow colour transmissions simultaneously with the

original monochrome. The Y component was used in monochrome receivers. Colour receivers could also decode thelow bandwidth chroma signals.



3/4

Digital Video

implementations typically employ refinements of the principles discussed above, for example inMPEG 4. Until recently, mobile phone screen resolution was a maximum of about 480x320 pixels(e.g. Iphone 2 and 3), with most being somewhat below this, gives a pixel count nearly three timeslower than standard definition. In terms of video delivery this works in our favour.

So through compression we can reduce the bit-rate required to store or transmit video to a levelthat is commensarate with that available from existing communication channels and the capacity ofstorage media. On the other-hand it does make things substantially more complicated. Anotherconsequence is that we now require a very reliable (virtually error free) transmission or storagechannel to prevent the bare-bones video data from being corrupted and hence the displayed imagedistorted.Whilst compression techniques used for broadcast TV are standardized (so that all TV/ set-top-boxmanufacturers can build a working decoder, they actually use MPEG 2) that is not the case when itcomes to computer based video and IP based distribution.

In the world of the PC there are a number of different compression techniques (codecs) and an

even greater number of ways of wrapping up the compressed data and mixing it (multiplexing) withother streams such as audio and textual information; the so-called container format. Fortunatelymost media players are able to handle the most common of these.

The Internet and IP networks generally are relatively new medium for distributing video and aresignificantly different to the broadcast transmission systems traditionally used for TV. In the latter,dedicated, fixed bandwidth is available for a particular channel and the image quality is usuallyvery high. The programs themselves are usually of a high production quality and censorship of thecontent implicit. Internet video can have greatly varying quality in terms of both image andproduction.

Video can be delivered over IP networks in three main ways:

Download and Play: Here a video file must be downloaded in its entirity before it can be played. Acopy of the video file is stored locally to the media player. The video quality available is dependentonly on how the video was captured and encoded to create the file.

Progressive Download: Here the video file is opened by the media player whilst it it still beingdownloaded. However the play-rate (this is fixed by the compressed video bit-rate) and downloadrate (what is available from the communication path) are independent. The file on the server has toexist before this can happen so is not suitable for streaming a live source.

Live Streaming: Here the video is transmitted at the same rate as it is renderred; there is no localstorage (apart from a small amount of buffering). The communication bandwidth must be at least

as big as the video bit-rate at all times. This technique usually requires special protocols and isnecessary if the video stream is from a live feed*.

Whereas in broadcast video all receivers receive the same signal (i.e. a single video stream istransmitted to all recievers) an IP network is usually used in a unicastfashion one stream per

*Apple has recently introduced an alternative protocol called http live streaming that is required to streamvideo to Iphones. In this a file or stream (in MPEG-TS format) is broken up into small files by a segmenter.An index file keeps track of which files have been received and rendered so that the next file can berequested. Streams of different quality may be available which cna be chosen depending on availablebandwidth.Apple reference:

http://developer.apple.com/library/ios/#documentation/NetworkingInternet/Conceptual/StreamingMediaGuide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40008332-CH1-DontLinkElementID_29



4/4

Digital Video

receiver even if the stream is the same video. The network bandwidth required therefore increasesat the same rate as the number of receivers. One way round this is to use multicasting. This iswhere a packet is only duplicated when necessary in the multicast supported network (the routershave to support multicasting). Alternatively, a content delivery network may be used where thevideo content is duplicated at the edges of the network close to the end-user.

*


video note

Documents