the second principle of biological processing seems to be : minimization of the amount of
Post on 30-Dec-2015
27 Views
Preview:
DESCRIPTION
TRANSCRIPT
MULTIMEDIA SIGNAL PROCESSING
ALGORITHMS
PART II – MINIMIZATION OF THE AMOUNT OF INFORMATION TO BE PROCESSED AND
BASIC ALGORITHMS
The second principle of biological processingSEEMS TO BE :
MINIMIZATION OF THE AMOUNT OF
INFORMATION TO BE PROCESSED
THAT IS THE PROCESSING SYSTEM ELIMINATES AS MUCH INFORMATION AS POSSIBLE AND USES ONLY ABSOLUTELY NECESSARY MINIMUM TO ACHIEVE ITS TASKS
Why this principle is reasonable? Minimizing information to be processed saves energy, increases speed, reduces effort and is overall logical to do. This is not limited to biology but also applies to technical systems.
IN PREVIOUS LECTURES THIS PRINCIPLE
WAS EVIDENT SEVERAL TIMES:
WE ARE ABLE TO RECOGNIZE OBJECTSBASED ON VERY MINIMAL INFORMATIONTHIS MEANS PROCESSING SYSTEM IS ABLE TO REDUCE INFORMATION TOMINIMUM OR IN OTHER WORDS TOEXTRACT THE NECESSARY MINIMUM
SO WE CAN HAVE THE MAIN PRINICPLE FOR THIS
COURSE : FOR EFECTIVE MULTIMEDIA SIGNAL
PROCESSING ONE HAS TO MINIMIZE THE AMOUNT
OF INFORMATION PROCESSED, EXTRACT
THE ABSOLUTELY NECESSARY MINIMUM
FOR THE PROCESSING TASK. HOW TO DO THIS IS
NOT ALWAYS CLEAR AND EASY, WE NEED TO
STUDY THIS.
The second principle, as indicated before can be statistical
processing, producing results matched to the most likely
signals happening in the real world. But this principle also
has to be applied correctly.
ASSUME WE HAVE COMPUTER WITH CAMERAAND DIGITIZER CARD AND WE WOULD LIKE TO EXTRACT VISUAL INFORMATION ABOUT ENVIRONMENT LIKE OUR EYES DO (OR WE HAVE MICROPHONES AND WE WOULD LIKE TO EXTRACT ACOUSTICAL INFORMATION LIKE OUR EARS DO)
HOW WE SHOULD PROGRAM THE COMPUTER?
NOW LET US GO TO TECHNOLOGYASSUME WE HAVE COMPUTER SYSTEM:
Let’s think about typical example which is already
becoming popular in cameras:
We would like to implement algorithms which will mark
faces in pictures, recognize familiar faces. This may of
course extended to other objects and complete scenes, for
example camera would recognize if the picture is taken of
familiar building or landscape. The problem is not easy
since objects can be seen from different viewpoints,
lighting, time.
But the input to algorithm which we have is digitized picture
• WHAT IS THE PICTURE AFTER DIGITIZATION?
IT IS A MATRIX OF NUMBERS. THE MATRIX SIZE CAN BE EG. 256X256 OR
720x576 – TELEVISION PICTURE 1024X768 - COMPUTER MONITOR1920x1080- HIGH DEFINITION TELEVISION
PICTURE MATRIX ELEMENTS ARE USUALLY 8-BIT NUMBERS, THIS CORRESPONDS TO 256 LEVELS OF LIGHTWHICH IS ENOUGH.
COLOR PICTURES ARE DESCRIBED BY THREE SUCH MATRICES FOR EACH BASIC COLOR
HERE IS A PICTURE FROM MARS LANDERAND PART OF THE MATRIX NEARTHE OBJECT
WHAT WILL HAPPEN WHEN THE PICTURE RESOLUTION IS TOOSMALL?
RESOLUTION WILL BE IMPAIREDLESS DETAILS VISIBLE
HERE WE SEE WHAT WILL HAPPENWHEN RESOLUTION IS REDUCED FROM 512X512TO 32X32
WHAT IS THE SIZE OF
ONE TV PICTURE IN BITS?
720x576x3x8-bit = about 10 Mbits
• TOPIC: COLOR PROCESSING
IMAGES ARE REGISTEREDIN THREE BASIC COLORCOMPONENTS: RGB=RED, GREEN, BLUE
MIXTURE OF THESE COLORSPROVIDES OTHER COLORS
WE HAVE TO USE THREEIMAGE MATRICES TO REPRESENT ONE COLOR PICTURE
RGB REPESENTATION IS USED FOR DISPLAY, E.G.COMPUTER MONITORS ORTELEVISION PANELSARE DRIVEN BY R,G,BSIGNALS
• COLOR IMAGE AND RGB COMPONENTS
• WE OFTEN PERFORM CONVERSION TO MORE SUITABLE COLOR SPACE
TWO SUCH SPACES ARE VERY USEFUL:
YUV SPACE AND HSV SPACE
YUV SPACE :
Y – INTENSITY OF (WHITE) LIGHT
U, V – COLOR CHROMINANCES
TO OBTAIN YUV REPRESENTATION
WE TAKE THE R,G,B COLOR MATRICES
FOR A PICTURE AND CONVERT THEM BY ->
• RGB->YUV TRANSFORMATION
B
G
R
V
U
Y
100.0515.0615.0
437.0289.0148.0
114.0587.0299.0
NOTE: Y IS BLACK AND WHITE COMPONENT, THAT IS MIXTURE OF R, G, B WHICH GIVES GRADATIONS OF WHITE COLOR, FROM BLACK TOGREY TO WHITE.
U AND V ARE COLOR COMPONENTS – DO NOT HAVE PHYSICAL MEANINGTHUS HERE INTENSITY OF LIGHT IS SEPARATED FROM COLOR INFORMATION
• AFTER THIS TRANSFORMATION
INSTEAD OF THREE R,G,B MATRICES
WE GET THREE MATRICES Y, U, V
TRANSFORMATION IS INVERTIBLE SO ALL INFORMATION IS PRESERVED
BUT NOW WE CAN PLAY A TRICK:
HUMAN VISUAL PROCESSING IS MUCH LESS SENSITIVE TO COLOR INFORMATION THAN TO
BLACK AND WHITE LIGHT INTENSITY
INFORMATION
THUS, MATRICES U,V CAN BE REDUCED IN
SIZE
• SUBSAMPLING OF MATRICES U AND V
FOR 4 ELEMENTS OF Y THERE WILL BE
TAKEN ONLY ONE ELEMENT OF U,V
Y1 Y2 U U V V ELEMENTS U AND V CAN BE E.G.
Y3 Y4 U U V V AVERAGE VALUE OF ORIGINAL
4 ELEMENTS U AND V
THUS MATRICES U,V CAN BE REDUCED
BY FACTOR OF 4 IN SIZE
RETURNING BACK TO RGB FORM WILL
NOT CHANGE THE PICTURE VISUALLY
• THE RGB->YUV TRANSFORMATION
USES DIRECTLY PROPERTY OF HUMAN
VISION WHICH ALLOWS:
- TO REDUCE THE SIZE OF COLOR IMAGES
(IMPORTANT FOR COMPRESSION)
- TO USE ONLY LIGHT INTENSITY WITHOUT COLOR INFORMATION (FOR E.G. RECOGNITION OF OBJECTS)
• ANOTHER TRANSFORMATION IS HSI
HSI IS MORE RELATED TO HUMAN PERCEPTION WHERE WE CAN SEE SATURATION OF COLORS THAT IS WE CAN TELL ”REDNESS”, ’BLUENESS’ OF COLORS AND SO ON.TO GET THE HSI REPRESENTATION WE MAP RGB INTO H – HUE (COLOR) S – SATURATION (AMOUNT OF WHITE MIXED WITH COLOR) I - INTENSITY (AMOUNT OF GREY LEVEL
EQUATIONS FOR HSI FROM RGB AND VICE VERSA:
BASIC ASPECTS OF THE HSI REPRESENTATION:
ON A CUBE THERE ARE SOME
OTHER ’BASIC’ COLORS
APART OF RGB, MAIN
DIAGONAL IS THE AMOUNT
OF WHITE
ON THE DIAMOND WE SEE
COLORS AROUND HEXAGON
HEIGHT IS AMOUNT OF
WHITE, SATURATION IS X-AXIS LOOK WHERE IS THE I (V) AXIS, S AXIS AND HUE ANGLE
• HSI TRANSFORMATION IS USEFUL SINCE WE GET REPRESENTATION IN COLOR SPACE WHICH CORRESPONDS TO THE PROPERTY OF HUMAN VISION, THAT IS INTENSITY LEVEL CAN BE ESTIMATED. COLOR SATURATION, AND THE COLOR ITSELF.
DIGRESSION ON COLOR SENSORSASSUME YOU BUY DIGITAL CAMERA WITH E.G.
5 MEGAPIXELS.
WHAT DOES THIS MEAN?
IT TURNS OUT THAT THE PIXEL DEFINTION IS
DIFFERENT FOR DIFFERENT APPLICATIONS.
TRADITIONALLY
1 PIXEL = R, G, B COLOR COMBINATIONS
SO WE NEED 3 COLOR SENSORS FOR
CAMERA OR
3 COLOR ELEMENTS FOR DISPLAY
FOR EXAMPLE:
LCD COMPUTER MONITOR WITH RESOLUTION OF
1280X1024 PIXELS
HAS 1280X1024 ELEMENTS FOR EACH R, G, B COLOR,
THAT IS IT HAS 1280X1024X3 DISPLAY ELEMENTS.THE DISPLAY ELEMENTS ARE CALLED
SUBPIXELS, ONE PIXEL IS COMPOSED OF THREE
SUBPIXELS R G B
IN DIGITAL CAMERAS THIS IS DIFFERENTSENSOR IN DIGITAL CAMERAS LOOKS LIKE THIS:
IN DIGITAL CAMERAS EVERY COLOR SUBPIXEL COUNTS AS ”PIXEL”THE PIXELS ARE ARRANGED INA MATRIX CALLED BAYER SENSOREACH ”CAMERA” PIXEL IS MADEBY 4 COLOR PIXELS: 1 RED,2 GREEN, 1 BLUE (REMEMBER THAT MOST OF VISIBLE LIGHT IS GREEN)
WE CAN NOTICE THAT ”FULL” COLOR PIXEL CANBE MADE FROM OVERLAPPING SQUARES BY HALF SHIFT
PIXEL 1PIXEL 2
SO THE E.G. 5 MILION PIXELS IN DIGITALCAMERA IS NOT EXACTLY 5 MILIONIN THE DISPLAY SENSE.IT SHOULD BE DIVIDED BY 4 OR BY TWO IF WE TAKE INTO ACCOUNT INTERPOLATION
BUT THERE ARE TWO EXCEPTIONS:
THERE ARE VIDEO CAMERAS WHICH HAVE 3 CCD SENSORSSEPARATELY ONE FOR EACH R,G,B COLORS
IN 3 CCD VIDEO CAMERAS OPTICALSYSTEM SPLITS LIGHT INTO 3 SENSORS WHICH PICKUP R,G,B COLORS.TOTAL NUMBER OF PIXELS CORRESPONDS TO THE NUMBER OF PIXELS IN DISPLAY
ANOTHER EXCEPTION IS FOVEON SENSOR
IN FOVEON, THERE IS ONE SENSORBUT IT MEASURES ALL 3 RGB COLORSIN ONE AREA THIS IS BASED ON THEFACT THAT PHOTONS GO TO DIFFERENT DEPTHS IN THE SEMICONDUCTOR DEPENDING ONTHEIR WAVELENGHTS www.foveon.com
COMPARISON:
WE CAN SEE THAT SINGLE SENSOR DEVICESHAVE LOWER RESOLUTION THAN 3 SENSORDEVICES OR FOVEON.
BUT THEY ARE EASIEST TO PRODUCE
SO THE NUMBER OF THEIR COLOR PIXELS IS INCREASING ALL THE TIME AND RESOLUTIONPROBLEM IS SOLVED.....
• The elimination of information based on color
is an example of much more general principle:
Input signal
Elimination of information
Output signal,representationof the input signalwhich is ”just good enough”for specific task
How to produce the ”good enough” representation is the essential problemto solve
Next we will show example of representation by edges
• EDGE DETECTION LINEAR FILTERING: AREA AROUND EVERY POINT IN THE IMAGE MATRIX IS MULTIPLIED
z l mu x vn p q BY VALUES FROM
ANOTHER MATRIX AND RESULT IS SUMMED UP
• DEPENDING ON THE MATRIX BY WHICH
WE MULTPILY WE HAVE SEVERAL TYPES
OF FILTERS:
LOW PASS – SUM OF FILTER COEFFICIENTS
IS ONE
BANDPASS – SUM OF FILTER COEFFICIENTS
IS ZERO
HIGPASS - SUM IS BETWEEN ZERO AND
ONE
• WE SAID THAT IN HUMAN VISUAL SYSTEM
IN THE RETINA PROCESSING ELEMENTS
ARE SENSITIVE FOR CHANGES IN LIGHT
LEVEL.
THIS IS EQUIVALENT TO BANDPASS
FILTERING
SPECIAL CLASS OF BANDPASS FILTERS
IS CALLED EDGE DETECTORS SINCE THEY
ARE DESIGNED TO DETECT SHARP CHANGES IN
IMAGE LIGHT INTENSITY
• LET US CONSIDER THE FOLLOWING
SITUATION – WHITE BAR ON BLACK
BACKGROUND OR OPPOSITE
OUR VISUAL SYSTEM AND WE HERE ARE INTERESTEDMOSTLY IN AREAS WHERE LIGHT IS CHANGINGIT VALUE, SHARP CHANGESIN LIGHT VALUE ARE CALLEDEDGES
HOWEVER, THERE IS A PROBLEMHERE: WHAT EXACTLY IS SHARPCHANGE IN INTENSITY?THIS IS NOT WELL DEFINEDON THE RIGHT WE SEE SOME EXAMPLES OF LIGHT CHANGE:RAMP EDGE – LIGHT INCREASING GRADUALLYSTEP EDGE – SHARP TRANSITION
NARROW LINE
ROOF EDGE
THERE COULD BE MANY MORESUCH EXAMPLES!
• EDGE DETECTION IS EQUIVALENT
TO DIFFERENTIATION IN
CONTINUOUS FUNCTION DOMAIN
0),(
x
yxFif F(x,y)=const
BUT IN IMAGES WE HAVE LIMITED NUMBEROF PIXELS SO WE CAN PERFORM ONLY APPROXIMATE DIFFERENCING
• EDGE DETECTORSHERE WE HAVE TWO MATRICES
OF FILTERS FOR DIFFERENCING
NOTE THAT THE FIRST ONE WILL PROVIDE ZERO OUTPUT
WHEN THERE ARE CONSTANT
VALUES IN VERTICAL DIRECTION
AND SECONDE WHEN THERE
ARE IN HORIZONTAL
DIRECTION
• NOW LET’S TAKE THE OUTPUTS OF
BOTH FILTERS AND COMBINE THEM
TOGETHER, FOR EXAMPLE BY
VHZ THE OUTPUT WILL NOWBE QUITE INDEPENDENTFROM THE DIRECTIONOF EDGES
NOTE THATGC/GR IS EQUIVALENTTO THE DIRECTIONOF AN EDGE
• HERE WE HAVE EXAMPLE OF RESULTS:
- ORIGINAL PICTURE
- HORIZONTAL DETECTOR
- VERTICAL DETECTOR
- BOTH COMBINED
AS WE CAN SEE THE COMBINED OUTPUTGIVES BORDERS OF OBJECTS SO WE CAN RECOGNIZE IT EVEN IFTHERE IS LITTLE INFORMATIONTHIS MAY CORRESPOND IN SOME WAYTO HOW HUMAN SYSTEM WORKS
• WHY WE USED JUST SUCH MATRIX FOR
EDGE DETECTION?
THERE CAN BE MANY SUCHMATRICES USED, SOME OF THEM ARE SHOWN HERE,
AND MANY OTHERS ARE KNOWN
THEY DIFFER IN PROPERTIESAND OPERATION IN NOISE
E.G. PREWITT, SOBEL ARE GOOD
• IF WE TALK ABOUT OPERATION IN NOISY
IMAGES, THERESHOLDING IS IMPORTANT
AFTER RUNNING A DETECTOR WE GET
OUTPUT SIGNAL. UNFORTUNATELY THIS
CAN BE MADE BY NOISE, NOT BY EDGE.
EDGE DETECTORS CAN BE SENISITVE TO
NOISE.
WE THRESHOLD THE OUTPUT SIGNAL
IF IT IS > THAN SOME VALUE T
IT IS CLASSIFIED AS EDGE
HERE OPERATION OF EDGEDETECTOR IN NOISY CONDITIONSWITH THRESHOLDING IS SHOWN:AT LOW NOISE LEVEL IT IS GOOD
AT HIGHER NOISE LEVEL, WE GETSOME NOISE POINTS CLASSIFIEDAS EDGES, AND SOME EDGEPOINTS ARE MISSING (WE SEE GOOD EDGE)AT VERY HIGH NOISE LEVEL,THE DETECTOR OPERATIONBREAKS UP COMPLETELY ANDNO EDGE IS DETECTEDNOTE THAT WE CAN SEE SOMEEDGE IN THIS PICTURE
SO IN NOISY CONDITIONS THERE ARE PROBLEMS
WITH EDGE DETECTORS BUT SOMEHOW IN HUMAN
VISION THEY WORK VERY WELL – HOW???
RESEARCHERS MOTIVATED BY HUMAN VISION
NOTICED THAT FILTERING ELEMENTS IN HUMAN
RETINA AT THE BACK OF THE EYE ARE MORE
COMPLICATED THAN SIMPLE DETECTORS HERE.
• MOTIVATED BY OBSERVATION OF HUMAN SYSTEM AND SOME CONSIDERATION OF OPTIMAL NOISE ATTENUATION A ZERO-CROSSING, OR LAPLACIAN-OF-GAUSSIAN DETECTOR WAS DESIGNED
THIS DETECTOR IS OBRAINEDBY TAKING SECONDDERIVATIVE OF GAUSSIAN CURVE
222 2/)(2
224 ]
21[/1 syxe
s
yxs
The resulting curve has
characteristic ’Mexican’ hat shape
NOW IF WE TAKE SECOND DERIVATIVE OF THE OUTPUT,WE NOTICE THAT EDGE IS WHEN SIGNAL CROSSES ZERO !
• ZERO CROSSING EDGE DETECTOR WILL
BE BETTER IN NOISY CONDITIONS BUT IT
IS MORE COMPLICATED SINCE IT
REQUIRES MUCH MORE OPERATIONS FOR
CALCULATION
Assuming the we have such detector the next problem is how to build representation based on edges and this is shown next
• LINKING EDGE POINTS TO FORM CONTOURS OF OBJECTS:
WE LINK OUPUT POINTS FROM EDGE DETECTOR WHEN THEIR VALUES ARE SMILAR:
- SIMILARITY MEANS
- AMPLITUDE DIFFERENCE IS SMALLER
THAN SOME THRESHOLD
- ANGULAR DIRECTION IS SIMILAR
LINKED EDGES ARE THOUGHT TO BELONG TO
SAME OBJECT
• EXAMPLE
ORIGINALPICTURE
HORIZONTALDETECTOR
VERTICAL DETECTOR
RESULTOF EDGE LINKING
• SEGMENTATION
HOW TO EXTRACT OBJECTS FROM PICTURES? THIS CAN BE DONE BASED ON
FEATURES SUCH AS INTENSITY OR COLOR
• WE CAN GROUP AREAS WITH SPECIFIC
FEATURES BY LINKING THEM TOGETHER
IF TWO AREAS HAVE THE SAME FEATURE
WE LINK THEM TOGETHER
SEGMENTATION ALGORITHM
START WITH SOME AREA AND DIVIDE IT
IN FOUR PARTS, CONTINUE DIVISION UNTIL ONLY PARTS
WITH SPECIFIC FEATURE ARE KEPT
• THRESHOLDING
WE NEED TO DIFFERENTIATE BETWEEN THE
’USEFUL’ DATA AND ’NONEUSEFUL’
THRESHOLDING WORKS ON THE PRINCIPLE
THAT USEFUL SIGNAL IS STRONGER.
IF SIGNAL < T WE SET IT TO ZERO.
HOW TO SELECT T?
IF THE THRESHOLDIS SELECTED HEREWE CAN SEPARATEBACKGROUND ANDOBJECT
FOR THRESHOLDING,HISTOGRAM CAN BE USED SINCEIT OFTEN PROVIDES VIEW HOWOBJECT AND BACKGROUND CANBE SEPARATED
HOWEVER, FULLY AUTOMATICTHERSHOLDING IS DIFFICULTSINCE NOISE AND OBJECTLIGHT INTENSITIES MAY BE NOTCOMPLETELY SEPARATED
• FEATURE DETECTION
FEATURES ARE SMALL PARTS OF OBJECTS
WHICH ARE CRITICAL FOR RECOGNITION
AND REPRESENTATIONFEATURES
MMSP Irek Defée
• HOW TO DETECT FEATURES?THIS IS QUITE DIFFICULT PROBLEM.
FEATURES ARE OFTEN COMPOSED OF SHORT
LINE SEGMENTS, E.G. CORNERS THIS CORNERIS COMPOSEDOF TWO LINES
WE CAN THINK TO APPLY EDGEDETECTOR AND THRESHOLDING FORFINDING FEATURES
CORNEREDGE
• FOR COMPACT REPRESENTATION WE HAVE TO ELIMINATE ALL NONRELEVANT
SIGNAL ELEMENTS. THIS IS TASK SIMILAR TO MEDIA COMPRESSION MEDIA COMPRESSION HAS A GOAL TO
MINIMIZE DESCRIPTION OF MEDIA WHILE PRESERVING PERCEPTUAL QUALITY.
THIS IS ALSO IMPORTANT TO GENERAL MULTIMEDIA SIGNAL PROCESSING SINCE IT
MINIMIZES THE AMOUNT OF INFORMATION TO BE PROCESSED.
:
MEDIA SIGNAL IS A STREAM OF BITS
HOW TO REDUCE THENUMBER OF BITS NEEDEDFOR THE DESCRIPTION?
THIS CAN BE DONE IN 2 WAYS:-MORE EFFICIENT DESCRIPTION OF BITSTREAM-ELIMINATING PERCEPTUALLY INSIGNIFICANT INFORMATION
Technically this is called compression of information
COMPRESSION CAN BE DONE ON
BIT LEVEL -> BIT STREAM
BLOCK-LEVEL -> SMALL BLOCKS
OBJECT-LEVEL -> OBJECTS IN PICTURES
PICTURE-LEVEL -> SAME PICTURE IN
DIFFERENT SIZES IS VERY SIMILAR
COMPRESSION IS ALSO RELATED TO REPRESENTATION OF VISUAL INFORMATIONLET’S TAKE THE FOLLOWING EXAMPLE:
ihg
fed
cba This is matrix of 3x3 points taken from a picture. Each point represents number from0-255, that is 8-bit number.
How many different signal matrices can be constructed out of these numbers?(28)9 = 272 - this is huge number
ONLY MEANIGNFUL INFORMATION FROM THESE MATRICES MUST BE EXTRACTED. BUT WHAT IS THIS INFORMATION? IT IS ABOUT SPECIFIC SIGNAL CHANGES....
What are then those changes in small areas of pictures
which might be of interest?
1. We were talking until now about edges
We also mentioned that there can be different types of
edges in pictures
2. There can be also other types of information in these
small areas (e.g. lines)
3. The question is how to account for this information?
Let see some examples: What is there?
Dark line? Dark Line? Edge?Plus grey dots? Roof edge? Edge with white Plus black dots? dot?
We see here that interpretation of small areas of picturesis ambiguous, several interpretations are possible.Sometimes a feature looks like nonideal or contaminatedby other feature
Dots? Line?Diagonal edge?
So how to interpret such real signals?There has to be very efficient extraction mechanism allowing for - extraction of multiple features- dealing with imperfect features
What seems to be very important is that features are made by grouping pixels which are touching and have similar values.
Second, sometimes features might be imprefect. Thus, we have to try to assign each pixel where it might belong – to some feature(s) or not.
We take center pixel and try to find a group of pixels to which it belongs. Pixel belongs if it has the same value, similar value or its value can be INTERPOLATED from neighbouring pixels.
Where the center pixel belongs?
It belongs to vertical grey linebecause pixel values are same,it belongs to diagonal edgeif its value can be interpolated from neighbouring pixels, thatis the pixel values change in linear way
Pixel intensityvalues,center pixelvalue isaverageof the other two
So we can try to assign pixel to neighbouring
pixels. But there will be a problem if we look
into larger area, Pixels may belong to many
different areas
It will be good to detectregularity in the areas
When areas are irregularthey may be random and thus not interesting
How to find regularity?
By transforming area of a picture using periodic
(orthogonal) basis, e.g. Fourier Transform.
But Fourier transform has complex values which is
not the most efficient (2 real numbers)
In practice there are two other transforms used:
Discrete Cosine Transform, DCT and hierarchical
4x4 transform related to it
DCT TRANSFORMATION
• DCT : Discrete Cosine Transform • Reduction of spatial redundancy • Transform block size: 8 x 8 in our case
f x y c c f u vx u y v
where u v x y c k
k
uu v
v
k
( , ) ( , )cos[( )
]cos[( )
]
, , , , ,..., / ,
,
1
42 1
162 116
0 1 7 1 2 0
1 0
0
7
0
7
1 2
3 4
5 6
Y- black and whiteblocks
Cb Cr
16 lines
16 pixelsFor colorpictures we take blocks:
Color blocks
DCT in the matrix formOne dimension:
N
kn
NcnkHH kkn
)
2
1(cos
2),(
N
lm
N
kn
NcmnHH kkn
)
2
1(cos)
2
1(cos
2),(
Two dimensions:
• FOR N=4 WE HAVE
DCT basis vectors
For N=8
For N=4
Basis vectors are obtained by multiplying vertical and horizontal cosine functions
• Example of DCT calculation
Input matrix
Calculation of 1-D DCT for columns of the inputmatrix
Calculation of 1-D DCT for the rows of the previous
Enlarged picturewith selectedblock
The block values DCT values
THE DCT TRANSFORM IS A MAPPING
FROM PICTURE BLOCK INTO
FREQUENCY DOMAIN
SINCE THERE WILL BE FEW HIGH
FREQUENCIES NORMALLY, THERE
WILL BE MANY ZEROS OR SMALL
NUMBERS IN THE DCT MATRIX
• EXAMPLE OF THE DCT CALCULATION
140 144 147 140 140 155 179 179144 152 140 147 140 148 167 179152 155 136 167 163 162 152 172168 145 156 160 152 155 136 160 ORIGINAL PICTURE BLOCK162 148 156 148 140 136 147 162147 167 140 155 155 140 136 162 136 156 123 167 162 144 140 147148 155 136 155 152 147 147 136 12 16 19 12 11 27 51 47 16 24 12 19 12 20 39 51 24 27 8 39 35 34 24 44 40 17 28 32 24 27 8 32 34 20 28 20 12 8 19 34 19 39 12 27 27 12 8 34 8 28 –5 39 34 16 12 19 20 27 8 27 24 19 19 8 SHIFTED BLOCK
IN PRACTICE SINCE PICTURE VALUES ARE IN (0,255)
WE SHIFT THEM TO (-127 , 128)
• BLOCK AFTER DCT 185 –17 14 –8 23 –9 –13 –8
20 –34 26 –9 –10 10 13 6
-10 –23 –1 6 –18 3 -20 0
-8 -5 14 –14 -8 –2 -3 8
-3 9 7 1 -11 17 18 15 MANY SMALL NUMBERS
8 0 -2 3 -1 -7 -1 -1
0 -7 –2 1 1 4 –6 0
The DCT values allow to detect and evaluate
periodical structures in small areas.
Sometimes this may be very useful.
DCT has some drawbacks: It requires real
numbers (cosine functions) and high precision
of calculations.
Another transform was introduced recentlyto improve on the DCT. This transform is obtained by rounding the coefficients in theDCT matrix }{ DCTHroundH
When = 2.5 the following transform is obtained
1221
1111
2112
1111
H =
This transformhas extremely simplecoefficients, no multiplications areinvolved
This transformation matrix is very simple.
We can see that the rows of the matrix
correspond to caclulations detecting:- average value of four signal samples- periodical function with period 1- periodical function with period 2 (row 4)
Thus we get signal decomposition into
periodical functions.
ENERGY IN THE DCT DOMAIN
Average 8 bit/pel Equal bit alloc
Average 3.2 bit/pel Unequal bit alloc
Compression
10 8 4 2
DCT
Inverse DCT
DCT coeff
DCT coeff DCT coeff
8 bit/pelbit/pel
Lowest freq.(DC)
Highest freq.
Large entropySmall entropy
QUANTIZATION
Quantization means removing information which is
not relevant.
Example: rounding of numbers,
round(4.076756) = 4
It turns out that high frequency information is not
very relevant for human vision. It can be thus
removed.
QUANTIZATION
High frequencies in DCT can be removed by
quantizing. Let K will be a value, we make the
operation:
n x round(K/n)
This will round K to in the interval delimited by
valus K-n/2, K+n/2
We can round numbers in such intervals:
QUANTIZATION INTERVALS
$f
f
$f
f
Uniform symmetric midtreader
Uniform symmetric midriser
QUANTIZATION MATRICES FOR DCT
• 16 11 10 16 24 40 51 61 • 12 12 14 19 26 58 60 55 • 14 13 16 24 40 57 69 56 • 14 17 22 29 51 87 80 62 • 18 22 37 56 68 109 103 77 • 24 35 55 64 81 104 113
92 • 49 64 78 87 103 121 120
101 • 72 92 95 98 112 100 103
99
17 18 24 47 99 99 99 99 18 21 26 66 99 99 99 99 24 26 56 99 99 99 99 99 47 66 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
99
For luminance Y For chrominance U, V
Each number in the DCT matrix is quantized (divided and rounded)by a number in the quantization matrix above. Notice that highfrequencies have much higher quantization values.
• EXAMPLE of DCT CALCULATION
140 144 147 140 140 155 179 179144 152 140 147 140 148 167 179152 155 136 167 163 162 152 172168 145 156 160 152 155 136 160 ORIGINAL PICTURE BLOCK162 148 156 148 140 136 147 162147 167 140 155 155 140 136 162 136 156 123 167 162 144 140 147148 155 136 155 152 147 147 136 12 16 19 12 11 27 51 47 16 24 12 19 12 20 39 51 24 27 8 39 35 34 24 44 40 17 28 32 24 27 8 32 34 20 28 20 12 8 19 34 19 39 12 27 27 12 8 34 8 28 –5 39 34 16 12 19 20 27 8 27 24 19 19 8 SHIFTED BLOCK
• BLOCK AFTER DCT 185 –17 14 –8 23 –9 –13 –8
20 –34 26 –9 –10 10 13 6
-10 –23 –1 6 –18 3 -20 0
-8 -5 14 –14 -8 –2 -3 8
-3 9 7 1 -11 17 18 15 MANY SMALL NUMBERS
8 0 -2 3 -1 -7 -1 -1
0 -7 –2 1 1 4 –6 0
• QUANTIZATION THE DCT VALUES ARE DIVIDED BY SPECIAL CONSTANTS AN ROUNDED
3 5 7 9 11 13 15 17
5 7 9 11 13 15 17 19 QUANTIZATION TABLE 7 9 11 13 15 17 19 21 9 11 13 15 17 19 21 23 61 –3 2 0 2 0 0 -1 11 13 15 17 19 21 23 25 4 –4 2 0 0 0 0 013 15 17 19 21 23 25 27 -1 –2 0 0 –1 0 –1 015 17 19 21 23 25 27 29 0 0 1 0 0 0 0 017 19 21 23 25 27 29 31 0 0 0 0 0 0 0 0 AFTER QUANTIZATION 0 0 –1 0 0 0 0 0 OF THE MATRIX FROM 0 0 0 0 0 0 0 0 THE PREVIOUS PAGE 0 0 0 0 0 0 0 0
Another example – reconstruction of a block from quantized DCT coefficients
We see that approximation is better when more coefficientsare taken
THE ROLE OF DCT AND QUANTIZATION
Quantized DCT coefficients preserve very effectivelycontent of small picture blocks. That is relevant perceptualinformation is well preserved and nonrelevant eliminated.
DCT is thus very good in the representation of image featureswith minimized information. This is practically confirmed since the DCT is used in image and video compression standards,called JPEG, MPEG. These standards are used in digital cameras, digital television, DVD discs and internet media players.
• Minimization of information in video
Video is composed of picture sequences,
25-30 pictures per second
One can observed that video is composed
of ’shots’ or ’scenes’. These are short segments
which have the same content. In single shot
the difference between two subsequent pictures
(taken at 40 ms interval) is very small
Information representing video scene can be minimized as follows: - Pick and compress first picture - Calculate motion compensated difference between the second picture and first one - Calculate the motion compensated difference between the restored second picture and the third one - Continue for all pictures in the scene
So we only need information about first (compressed) picture and differences between other pictures to preserve initial information from all pictures. Thiswill result in huge saving of information
• Example
The difference is mostly caused by motion of objects
• Movement of objects- there is problem with
object borders, to avoid it we consider movements of small picture blocks and try to
detect if they moved
• The difference between two pictures can be
reduced if motion vector of objects is found
and motion is compensated, that is object
which moved in the second picture is moved
back by its motion vector.
16x16 blocks 8x8 blocks 4x4 blocks Error is lower when the blocks are smaller
• It is also possible to detect movements of blocks
with greater accuracy than 1 pixel, by
interpolation between pixels
Half-pixelinterpolation
Difference images will be smaller
Quarter pixelinterpolation
Video information reduction
• Instead of having information about all pictures it is enough to have
1. The first picture 2. Motion- compensated Motion vectors representing difference between movements of picture blocks subsequent pictures
This is very significant reduction of information and also provides movementof objects information which is very important
top related