objective assessment of image and video...
TRANSCRIPT
-
Objective Assessment of Image and Video Quality
Vladimir PetrovićKTIOS, FTN, UNS
1
-
Aims Define motivation for image and video quality evaluation Introduce the problem domain of IQA and VQA Introduce methods and approaches used to tackle the
problem Look in detail at representative metrics and various steps in
the process of quality evaluation
VP, SSIP2017 2
-
VP, SSIP2017 3
Problem Images and videos are subject to a wide variety of distortions
during acquisition, processing, compression, transmission and reproduction
-
VP, SSIP2017 4
Motivation (1) There is intense interest in being able to determine image and
video quality for a variety of reasons E.g. monitoring quality degradations (QoS)
-
Motivation (2)
Your life may depend on knowing what is real and what isn’t.
VP, SSIP2017 5
Original stream from the sensor Compressed for a low bandwidth channel
-
Motivation (3)
Benchmarking and optimising a variety of processing methods.
VP, SSIP2017 6
Unprocessed Processed
-
VP, SSIP2017 7
Evaluating Quality Humans effortlessly determine quality of what they are seeing Objective evaluation of perceived quality turns out to be
challenging
-
Benchmark Human Visual System: “mark 1 eyeball” Subjective trials in relevant conditions can tell us exactly the
perceived level of quality of any signal But they are complex to organise, to be statistically relevant Require lots of time, equipment and effort to produce results
VP, SSIP2017 8
-
DefinitionAutomatically determine perceived image or video quality of a displayed signal.
Automatically = computationally, objectively
Essentially predict how a representative cohort of observers would rate the presented image/video.
VP, SSIP2017 9
Subjective score
Objective score
-
Quality is Multi-Faceted Depending on the context it can include effects such as:
Aesthetic quality as subjective impression thereof Utility of the signal for a particular task/purpose Fidelity of original information
Usually measured as subjective impression expressed on a numerical scale
VP, SSIP2017 10
DMOS = 0.55DMOS = 0.41MOS = 0.59
-
Objective Quality Metrics Algorithms that process an image/video and return a quality
score, usually a single scalar Apart from the test signal their inputs can include the original,
Reference signal and other relevant information
VP, SSIP2017 11
Q Obj= 0.780Q Obj= 0.803Q Obj= 0.921
-
Practical Problem Domain Metrics categorised by practical availability of reference Full Reference (FR) evaluation
Pristine original (Reference) is available to the metric Evaluation is performed through a direct comparison with the degraded
Reduced Reference (RR) A small fraction of, usually abstracted, information from the original is
available Evaluation is performed by comparing with this information
No-Reference (NR) Original is not available, only degraded/received test signal
VP, SSIP2017 12
-
Classification of Methods Based on application scope
General approaches Attempting evaluation of generic “quality” of the signal
Application specific Evaluating specific aspects of quality: sharpness, noise, contrast … Evaluating a specific type of information: dim targets, broadcast …
VP, SSIP2017 13
-
Methodology Variety of methods devised over the past 40 years
Error based methods Physiological and psychological vision models Structural similarity methods Natural scene statistics and information theoretic metrics Machine learning methods
General purpose viewing quality evaluation performance close to theoretical maximum At least on generic publicly available datasets
VP, SSIP2017 14
-
Past and Present Most obvious metric is to measure the difference between the
reference and the test signals Local differences can then be summed up into a global score
Mean squared error (MSE, L2) and derived PSNR are best known examples
VP, SSIP2017 15
-
VP, SSIP2017 16
Limitations of MSE Fails on general, varied evaluation
All of the degraded images below have approximately the same MSE
-
Degradation AnalysisIt is the images with local structural changes that exhibit loss of perceived quality.
VP, SSIP2017 17
OriginalNo Structural Change Local Structural Change
-
Delving FurtherFurthermore, the more local structure degradation there is – the worse the quality.⇒ There is an evident relationship between structure loss and quality.
VP, SSIP2017 18
-
Structural Similarity Quality Evaluation We can extract local structure from reference and test images … and compare them directly using a similarity/distance model Repeat this systematically across the scene
VP, SSIP2017 19
TestReference
Similarity/Distance Model
Similarity Score
-
Structural Similarity Maps Showing structural similarity at each location in the scene Depending on similarity model, usually scaled 0 to 1, where
0 is complete disagreement and 1 is identical structures
VP, SSIP2017 20
Reference TestStructural Similarity Map
-
Structural Similarity Evaluation Similarity maps essentially local measures of quality Can be integrated into global similarity/quality scores
VP, SSIP2017 21
Similarity/Distance Model
PoolingGlobal Similarity/Quality
Score
-
Similarity/Distance Models Evaluate distance between two structures
Similarity is essentially = 1 – Distance
Tightly tied to structure extraction/representation method Many models proposed in literature
Window MSE Normalised correlation SSIM – Structure Similarity Index QAB – gradient structure distance …
VP, SSIP2017 22
-
Structural Similarity Index Probably best known objective IQ metric Three term evaluation where μ is mean and σ is standard
deviation of the signal evaluated over a local window
VP, SSIP2017 23
[Wang and Bovik ’02, Wang, Bovik, Sheikh, Simoncelli ‘04]
Distance in illumination
Distance in contrast
Distance in structure
Abbreviated form
-
Structural Similarity as Quality Measure Comparing SSIM scores to subjective scores reveals monotonic
relationship SSIM can be used directly as a measure of quality
Assessment performance depends also on pooling method
VP, SSIP2017 24
-
SSIM Results SSIM provides a more realistic quality estimate than MSE
VP, SSIP2017 25
-
Gradient Structural Similarity Model Structure represented through local image gradients
Extracted using gradient operators, e.g. Sobel Compared between Reference and Test images Used to determine relative, perceptual importance of various locations
across the scene
VP, SSIP2017 26
Local Gradient
Extraction
Gradient Distance Model
Perceptual Importance Estimation
Reference
Test
Pooling
Str. Similarity Model
-
Structural Similarity = 1 – Gradient Loss
True Visual Information
Degraded Visual Information
Information Preservation Estimates
Gradient Change Estimation
Perceptual Loss Estimation
We’re not interested in absolute gradient loss, only the perceived loss
-
Gradient Distance Given gradient components sx and sy at each location n,m We first evaluate its magnitude and orientation
Then measure distance in gradient magnitude and orientation between the test and reference images (A and B):
Giving us linear structural distance estimates at each n,m Similarity map
VP, SSIP2017 28
( ) ( )2 2
max
( , ) ( , )( , )
x yA A
A
s n m s n mg n m
g
+=
( , )( , ) arctan( , )
yA
A xA
s n mn ms n m
α
=
( , ) , ( , ) ( , )( , )
( , )( , ) , ( , ) ( , )( , )
BA B
AABg
AA B
B
g n m C g n m g n mg n m C
n mg n m C g n m g n mg n m C
+ > +∆ = + ≤ +
( , ) ( , )( , ) A BAB
n m n mn mα
α α ππ
− −∆ =
-
Perceptual Similarity Response of biological systems, including humans is non linear We quantify perceived information similarity with a non-linear
mapping of gradient distance (magnitude and orientation) Local perceptual similarity scores
VP, SSIP2017 29
)(1 σ−∆+Γ
=ke
Q
1 – Gradient Distance
-
Aggregating Scores First combine similarity in gradient orientation and magnitude
Then sum across the entire scene
QAB is a global quality score between 0 – complete perceived loss of information from the input 1 – perfect representation (no perceptible quality degradation)
VP, SSIP2017 30
( , ) ( , ) ( , )AB AB ABgQ n m Q n m Q n mα= ⋅
∑∑
∀
∀=mn
mnAB
AB
mnw
mnQmnwQ
,
,
),(
),(),(
-
Metric Performance Again good, monotonic relationship with subjective scores Relationship far more linear compared to SSIM
VP, SSIP2017 31
-
Pooling Integrates local quality scores into a single global score over
Space: field of view of the scene Time: all the frames in the video sequence
Simplest pooling model is mean of local scores Assumes all locations/frames are equally important
Weighted summation gives us more freedom
VP, SSIP2017 32
Q = 0.487
-
Perceptual Importance Not all areas of the scene are equally important to observers Perceptual importance can guide the pooling process To obtain more relevant quality scores
VP, SSIP2017 33
-
Visual Attention Perceptual importance is inherent to all of us Its manifestation in the HVS is Attention We can use attention models to derive perceptual importance Attention is driven by a host of factors:
Context Motion Visibility/Contrast Structure Familiarity
Most are difficult to model without higher cognition
VP, SSIP2017 34
-
Contrast Based Importance A good approximation of perceptual importance is local contrast
Measured say through local gradient magnitude
Our attention is drawn to areas of high contrast
VP, SSIP2017 35
22 ),(),(),(),( mnsmnsmngmnw yx +==
-
Complete Structural Similarity Evaluation
VP, SSIP2017 36
∫
Reference and Test Images
Structural Similarity
Σ
Perceptual Importance
Q = 0.56×
-
Quality Based Pooling An interesting observation during quality research People devote more attention to areas of poor quality So perceptual importance of poor quality regions is higher
Local quality can be used to determine perceptual importance
Same true in video, where the worst frames determine quality
VP, SSIP2017 37
UoM Live
-
Video Quality Evaluation Obviously harder than image quality, but closely related Simplest approach is to apply IQ metrics on each frame
That may ignore some important temporal information
Many IQ models can be adapted to work on dynamic information
VP, SSIP2017 38
-
Structural Similarity Video QualityMeasure representation of true scene information in degraded video
As a proxy for subjective impression
1. Estimate local structural similarity across space and time Between all locations and frames of Reference and Test videos
2. Quantify perceived information loss At each location and time
3. Pool scores spatially across entire field of view Into frame quality scores
4. Pool frame scores into a single video quality score
VP, SSIP2017 39
-
Gradient Preservation Video Quality
VP, SSIP2017 40
Input Sequence
Temporal Information Extraction
Visual Information Loss/Preservation Model
Video Quality Performance Score
Output Sequence
Spatial Information Extraction
Chromacity Information Extraction
Spatio-Temporal Perceptual Importance Evaluation
Temporal Information Loss Model
Spatial Information Loss Model
Chromacity Information Loss Model
x
-
Information Extraction: Colour and Space Transform RGB video to more suitable space: say HSV Use gradient operators to extract spatial structure from
intensity (value) channel
-
Information Extraction: Time
Use temporal operator analogous to Sobel Evaluated over 3 subsequent frames, at all locations
Broader time base adds robustness to noise
Temporal gradient, gt Inter-frame difference
-
Analogous to IQ metric, at each location compare between videos: Spatial gradient magnitude and orientation Temporal gradient magnitude Colour vectors (2D)
Process with perceptual non-linearities
Information Loss Models
δ Cn,m,t
Spatial responseTemporal response
-
Once again local quality maps are produced This time in three different dimensions:
Space, time and colour
Multi-dimensional Local Quality Maps
Spatial Structure Similarity
Temporal Structure Similarity
Reference Test - Compressed
-
Combining the Quality Dimensions Qs, Qt and Qc define the preservation of spatial, temporal and
chromatic information at each location They can be combined linearly using a weighted summation
But which weights to use? Determine using optimisation against subjectively annotated videos
VP, SSIP2017 45
ABcc
ABtt
ABss
AB QkQkQkVQ ++=
ks kt
Optimal [ks, kt, kc]=[0.8, 0.15, 0.05]
-
Pooling into Frame Scores Pool local quality estimates for each pixel in each frame Define perceptual importance at each pixel
Now we have not only spatial but temporal contrast too
And use them to weight local information preservation estimates into a frame quality estimate
VP, SSIP2017 46
TmnTmnTmn gtgw ,,,,,, +=
∑∑
∀
∀=mn
mnAB
ABT
Tmnw
TmnQTmnwQ
,
,
),,(
),,(),,(
-
Video Sequence Quality We pool frame quality scores into global quality score We use p% of the worst frames in the sequence to determine
global objective quality estimate Compromise estimate for p is 20%
VP, SSIP2017 47
Frames
pTAB
pN
TVQVQ
∑ ∈= frames worst %)(
-
Video Quality over Time Video quality is a function of time
i.e. it can change during a sequence
VP, SSIP2017 48
-
Localised Quality Scores Local spatial, temporal and colour quality estimates provide
further insight into video quality
VP, SSIP2017 49
Spatial Temporal Colour
-
How accurate is it? Extremely high compression low resolution data 5 scenarios, 4 resolutions, 4 codecs
VP, SSIP2017 50
Subjective Score
Objective Score
-
No Reference Quality Evaluation Much tougher proposition
Where to begin? What to measure?
Again, done naturally by people Growing number of methods available that essentially mimic
how we do it Natural statistics methods Machine learning methods
VP, SSIP2017 51
-
Natural Statistics Metrics We don’t have a reference to compare to but we can learn
what natural – good quality images look like Quality degradations changes certain image properties Learn statistics of these properties from pristine and distorted
images Natural scene statistics
VP, SSIP2017 52
[Saad et al. 12,]
MSCN - mean subtracted contrast normalized coefficients
-
Natural Statistics Approaches Several approaches available using mainly local structural
properties throughout the scene DCT coefficients statistics Mean subtracted contrast normalized coefficients Discrete Wavelet transform coefficients Gabor filter responses
Measure distance between natural and test image distributions
VP, SSIP2017 53[Saad et al. 12, Mittal et. al. 12, Su et. al. 13 …]
Objective Assessment of Image and Video QualityAimsProblemMotivation (1)Motivation (2)Motivation (3)Evaluating QualityBenchmarkDefinitionQuality is Multi-FacetedObjective Quality MetricsPractical Problem DomainClassification of MethodsMethodologyPast and PresentLimitations of MSEDegradation AnalysisDelving FurtherStructural Similarity Quality EvaluationStructural Similarity MapsStructural Similarity EvaluationSimilarity/Distance Models Structural Similarity IndexStructural Similarity as Quality MeasureSSIM ResultsGradient Structural Similarity ModelStructural Similarity = 1 – Gradient LossGradient DistancePerceptual SimilarityAggregating ScoresMetric PerformancePoolingPerceptual ImportanceVisual AttentionContrast Based ImportanceComplete Structural Similarity EvaluationQuality Based PoolingVideo Quality EvaluationStructural Similarity Video QualityGradient Preservation Video QualityInformation Extraction: Colour and SpaceInformation Extraction: TimeInformation Loss ModelsMulti-dimensional Local Quality MapsCombining the Quality DimensionsPooling into Frame ScoresVideo Sequence QualityVideo Quality over TimeLocalised Quality ScoresHow accurate is it?No Reference Quality EvaluationNatural Statistics MetricsNatural Statistics Approaches