objective assessment of image and video...

Objective Assessment of Image and Video Quality

Vladimir PetrovićKTIOS, FTN, UNS

1

Aims Define motivation for image and video quality evaluation Introduce the problem domain of IQA and VQA Introduce methods and approaches used to tackle the

problem Look in detail at representative metrics and various steps in

the process of quality evaluation

VP, SSIP2017 2

VP, SSIP2017 3

Problem Images and videos are subject to a wide variety of distortions

during acquisition, processing, compression, transmission and reproduction

VP, SSIP2017 4

Motivation (1) There is intense interest in being able to determine image and

video quality for a variety of reasons E.g. monitoring quality degradations (QoS)

Motivation (2)

Your life may depend on knowing what is real and what isn’t.

VP, SSIP2017 5

Original stream from the sensor Compressed for a low bandwidth channel

Motivation (3)

Benchmarking and optimising a variety of processing methods.

VP, SSIP2017 6

Unprocessed Processed

VP, SSIP2017 7

Evaluating Quality Humans effortlessly determine quality of what they are seeing Objective evaluation of perceived quality turns out to be

challenging

Benchmark Human Visual System: “mark 1 eyeball” Subjective trials in relevant conditions can tell us exactly the

perceived level of quality of any signal But they are complex to organise, to be statistically relevant Require lots of time, equipment and effort to produce results

VP, SSIP2017 8

DefinitionAutomatically determine perceived image or video quality of a displayed signal.

Automatically = computationally, objectively

Essentially predict how a representative cohort of observers would rate the presented image/video.

VP, SSIP2017 9

Subjective score

Objective score

Quality is Multi-Faceted Depending on the context it can include effects such as:

Aesthetic quality as subjective impression thereof Utility of the signal for a particular task/purpose Fidelity of original information

Usually measured as subjective impression expressed on a numerical scale

VP, SSIP2017 10

DMOS = 0.55DMOS = 0.41MOS = 0.59

Objective Quality Metrics Algorithms that process an image/video and return a quality

score, usually a single scalar Apart from the test signal their inputs can include the original,

Reference signal and other relevant information

VP, SSIP2017 11

Q Obj= 0.780Q Obj= 0.803Q Obj= 0.921

Practical Problem Domain Metrics categorised by practical availability of reference Full Reference (FR) evaluation

Pristine original (Reference) is available to the metric Evaluation is performed through a direct comparison with the degraded

Reduced Reference (RR) A small fraction of, usually abstracted, information from the original is

available Evaluation is performed by comparing with this information

No-Reference (NR) Original is not available, only degraded/received test signal

VP, SSIP2017 12

Classification of Methods Based on application scope

General approaches Attempting evaluation of generic “quality” of the signal

Application specific Evaluating specific aspects of quality: sharpness, noise, contrast … Evaluating a specific type of information: dim targets, broadcast …

VP, SSIP2017 13

Methodology Variety of methods devised over the past 40 years

Error based methods Physiological and psychological vision models Structural similarity methods Natural scene statistics and information theoretic metrics Machine learning methods

General purpose viewing quality evaluation performance close to theoretical maximum At least on generic publicly available datasets

VP, SSIP2017 14

Past and Present Most obvious metric is to measure the difference between the

reference and the test signals Local differences can then be summed up into a global score

Mean squared error (MSE, L2) and derived PSNR are best known examples

VP, SSIP2017 15

VP, SSIP2017 16

Limitations of MSE Fails on general, varied evaluation

All of the degraded images below have approximately the same MSE

Degradation AnalysisIt is the images with local structural changes that exhibit loss of perceived quality.

VP, SSIP2017 17

OriginalNo Structural Change Local Structural Change

Delving FurtherFurthermore, the more local structure degradation there is – the worse the quality.⇒ There is an evident relationship between structure loss and quality.

VP, SSIP2017 18

Structural Similarity Quality Evaluation We can extract local structure from reference and test images … and compare them directly using a similarity/distance model Repeat this systematically across the scene

VP, SSIP2017 19

TestReference

Similarity/Distance Model

Similarity Score

Structural Similarity Maps Showing structural similarity at each location in the scene Depending on similarity model, usually scaled 0 to 1, where

0 is complete disagreement and 1 is identical structures

VP, SSIP2017 20

Reference TestStructural Similarity Map

Structural Similarity Evaluation Similarity maps essentially local measures of quality Can be integrated into global similarity/quality scores

VP, SSIP2017 21

Similarity/Distance Model

PoolingGlobal Similarity/Quality

Score

Similarity/Distance Models Evaluate distance between two structures

Similarity is essentially = 1 – Distance

Tightly tied to structure extraction/representation method Many models proposed in literature

Window MSE Normalised correlation SSIM – Structure Similarity Index QAB – gradient structure distance …

VP, SSIP2017 22

Structural Similarity Index Probably best known objective IQ metric Three term evaluation where μ is mean and σ is standard

deviation of the signal evaluated over a local window

VP, SSIP2017 23

[Wang and Bovik ’02, Wang, Bovik, Sheikh, Simoncelli ‘04]

Distance in illumination

Distance in contrast

Distance in structure

Abbreviated form

Structural Similarity as Quality Measure Comparing SSIM scores to subjective scores reveals monotonic

relationship SSIM can be used directly as a measure of quality

Assessment performance depends also on pooling method

VP, SSIP2017 24

SSIM Results SSIM provides a more realistic quality estimate than MSE

VP, SSIP2017 25

Gradient Structural Similarity Model Structure represented through local image gradients

Extracted using gradient operators, e.g. Sobel Compared between Reference and Test images Used to determine relative, perceptual importance of various locations

across the scene

VP, SSIP2017 26

Local Gradient

Extraction

Gradient Distance Model

Perceptual Importance Estimation

Reference

Test

Pooling

Str. Similarity Model

Structural Similarity = 1 – Gradient Loss

True Visual Information

Degraded Visual Information

Information Preservation Estimates

Gradient Change Estimation

Perceptual Loss Estimation

We’re not interested in absolute gradient loss, only the perceived loss

Gradient Distance Given gradient components sx and sy at each location n,m We first evaluate its magnitude and orientation

Then measure distance in gradient magnitude and orientation between the test and reference images (A and B):

Giving us linear structural distance estimates at each n,m Similarity map

VP, SSIP2017 28

( ) ( )2 2

max

( , ) ( , )( , )

x yA A

A

s n m s n mg n m

g

+=

( , )( , ) arctan( , )

yA

A xA

s n mn ms n m

α

=

( , ) , ( , ) ( , )( , )

( , )( , ) , ( , ) ( , )( , )

BA B

AABg

AA B

B

g n m C g n m g n mg n m C

n mg n m C g n m g n mg n m C

+ > +∆ = + ≤ +

( , ) ( , )( , ) A BAB

n m n mn mα

α α ππ

− −∆ =

Perceptual Similarity Response of biological systems, including humans is non linear We quantify perceived information similarity with a non-linear

mapping of gradient distance (magnitude and orientation) Local perceptual similarity scores

VP, SSIP2017 29

)(1 σ−∆+Γ

=ke

Q

1 – Gradient Distance

Aggregating Scores First combine similarity in gradient orientation and magnitude

Then sum across the entire scene

QAB is a global quality score between 0 – complete perceived loss of information from the input 1 – perfect representation (no perceptible quality degradation)

VP, SSIP2017 30

( , ) ( , ) ( , )AB AB ABgQ n m Q n m Q n mα= ⋅

∑∑

∀

∀=mn

mnAB

AB

mnw

mnQmnwQ

,

,

),(

),(),(

Metric Performance Again good, monotonic relationship with subjective scores Relationship far more linear compared to SSIM

VP, SSIP2017 31

Pooling Integrates local quality scores into a single global score over

Space: field of view of the scene Time: all the frames in the video sequence

Simplest pooling model is mean of local scores Assumes all locations/frames are equally important

Weighted summation gives us more freedom

VP, SSIP2017 32

Q = 0.487

Perceptual Importance Not all areas of the scene are equally important to observers Perceptual importance can guide the pooling process To obtain more relevant quality scores

VP, SSIP2017 33

Visual Attention Perceptual importance is inherent to all of us Its manifestation in the HVS is Attention We can use attention models to derive perceptual importance Attention is driven by a host of factors:

Context Motion Visibility/Contrast Structure Familiarity

Most are difficult to model without higher cognition

VP, SSIP2017 34

Contrast Based Importance A good approximation of perceptual importance is local contrast

Measured say through local gradient magnitude

Our attention is drawn to areas of high contrast

VP, SSIP2017 35

22 ),(),(),(),( mnsmnsmngmnw yx +==

Complete Structural Similarity Evaluation

VP, SSIP2017 36

∫

Reference and Test Images

Structural Similarity

Σ

Perceptual Importance

Q = 0.56×

Quality Based Pooling An interesting observation during quality research People devote more attention to areas of poor quality So perceptual importance of poor quality regions is higher

Local quality can be used to determine perceptual importance

Same true in video, where the worst frames determine quality

VP, SSIP2017 37

UoM Live

Video Quality Evaluation Obviously harder than image quality, but closely related Simplest approach is to apply IQ metrics on each frame

That may ignore some important temporal information

Many IQ models can be adapted to work on dynamic information

VP, SSIP2017 38

Structural Similarity Video QualityMeasure representation of true scene information in degraded video

As a proxy for subjective impression

1. Estimate local structural similarity across space and time Between all locations and frames of Reference and Test videos

2. Quantify perceived information loss At each location and time

3. Pool scores spatially across entire field of view Into frame quality scores

4. Pool frame scores into a single video quality score

VP, SSIP2017 39

Gradient Preservation Video Quality

VP, SSIP2017 40

Input Sequence

Temporal Information Extraction

Visual Information Loss/Preservation Model

Video Quality Performance Score

Output Sequence

Spatial Information Extraction

Chromacity Information Extraction

Spatio-Temporal Perceptual Importance Evaluation

Temporal Information Loss Model

Spatial Information Loss Model

Chromacity Information Loss Model

x

Information Extraction: Colour and Space Transform RGB video to more suitable space: say HSV Use gradient operators to extract spatial structure from

intensity (value) channel

Information Extraction: Time

Use temporal operator analogous to Sobel Evaluated over 3 subsequent frames, at all locations

Broader time base adds robustness to noise

Temporal gradient, gt Inter-frame difference

Analogous to IQ metric, at each location compare between videos: Spatial gradient magnitude and orientation Temporal gradient magnitude Colour vectors (2D)

Process with perceptual non-linearities

Information Loss Models

δ Cn,m,t

Spatial responseTemporal response

Once again local quality maps are produced This time in three different dimensions:

Space, time and colour

Multi-dimensional Local Quality Maps

Spatial Structure Similarity

Temporal Structure Similarity

Reference Test - Compressed

Combining the Quality Dimensions Qs, Qt and Qc define the preservation of spatial, temporal and

chromatic information at each location They can be combined linearly using a weighted summation

But which weights to use? Determine using optimisation against subjectively annotated videos

VP, SSIP2017 45

ABcc

ABtt

ABss

AB QkQkQkVQ ++=

ks kt

Optimal [ks, kt, kc]=[0.8, 0.15, 0.05]

Pooling into Frame Scores Pool local quality estimates for each pixel in each frame Define perceptual importance at each pixel

Now we have not only spatial but temporal contrast too

And use them to weight local information preservation estimates into a frame quality estimate

VP, SSIP2017 46

TmnTmnTmn gtgw ,,,,,, +=

∑∑

∀

∀=mn

mnAB

ABT

Tmnw

TmnQTmnwQ

,

,

),,(

),,(),,(

Video Sequence Quality We pool frame quality scores into global quality score We use p% of the worst frames in the sequence to determine

global objective quality estimate Compromise estimate for p is 20%

VP, SSIP2017 47

Frames

pTAB

pN

TVQVQ

∑ ∈= frames worst %)(

Video Quality over Time Video quality is a function of time

i.e. it can change during a sequence

VP, SSIP2017 48

Localised Quality Scores Local spatial, temporal and colour quality estimates provide

further insight into video quality

VP, SSIP2017 49

Spatial Temporal Colour

How accurate is it? Extremely high compression low resolution data 5 scenarios, 4 resolutions, 4 codecs

VP, SSIP2017 50

Subjective Score

Objective Score

No Reference Quality Evaluation Much tougher proposition

Where to begin? What to measure?

Again, done naturally by people Growing number of methods available that essentially mimic

how we do it Natural statistics methods Machine learning methods

VP, SSIP2017 51

Natural Statistics Metrics We don’t have a reference to compare to but we can learn

what natural – good quality images look like Quality degradations changes certain image properties Learn statistics of these properties from pristine and distorted

images Natural scene statistics

VP, SSIP2017 52

[Saad et al. 12,]

MSCN - mean subtracted contrast normalized coefficients

Natural Statistics Approaches Several approaches available using mainly local structural

properties throughout the scene DCT coefficients statistics Mean subtracted contrast normalized coefficients Discrete Wavelet transform coefficients Gabor filter responses

Measure distance between natural and test image distributions

VP, SSIP2017 53[Saad et al. 12, Mittal et. al. 12, Su et. al. 13 …]

Objective Assessment of Image and Video QualityAimsProblemMotivation (1)Motivation (2)Motivation (3)Evaluating QualityBenchmarkDefinitionQuality is Multi-FacetedObjective Quality MetricsPractical Problem DomainClassification of MethodsMethodologyPast and PresentLimitations of MSEDegradation AnalysisDelving FurtherStructural Similarity Quality EvaluationStructural Similarity MapsStructural Similarity EvaluationSimilarity/Distance Models Structural Similarity IndexStructural Similarity as Quality MeasureSSIM ResultsGradient Structural Similarity ModelStructural Similarity = 1 – Gradient LossGradient DistancePerceptual SimilarityAggregating ScoresMetric PerformancePoolingPerceptual ImportanceVisual AttentionContrast Based ImportanceComplete Structural Similarity EvaluationQuality Based PoolingVideo Quality EvaluationStructural Similarity Video QualityGradient Preservation Video QualityInformation Extraction: Colour and SpaceInformation Extraction: TimeInformation Loss ModelsMulti-dimensional Local Quality MapsCombining the Quality DimensionsPooling into Frame ScoresVideo Sequence QualityVideo Quality over TimeLocalised Quality ScoresHow accurate is it?No Reference Quality EvaluationNatural Statistics MetricsNatural Statistics Approaches

objective assessment of image and video...

Documents