why visual quality assessment? - lina karamlina.faculty.asu.edu/eee508/lectures/eee508_vqa.pdf ·...

Sample image-and video-based applications

• Entertainment

• Communications

• Medical imaging

• Security

• Monitoring

• Visual sensing and control

• Art

Why Visual Quality Assessment?

Copyright 2010 by Lina J. Karam

What is Quality? • Fidelity

• Satisfaction

• Performance

• Aesthetic

• Diagnostic

• Other

Some uses of Quality Assessment • Monitoring & Improving the quality of service (QoS) and quality

of experience (QoE)

• Performance evaluation

• Improved operation

• Perceptually improved design

• Authentication



Quality affected by

• Sensing, capturing devices

• Display, printing, reproduction

• Attacks and Protection

• Compression

• Transmission

• Environment

• Human vision

• Viewing position



Basic Imaging System


Imaging Device

Imaged Scene

DIGITIZER STORAGE PROCESS

Enhancement, Restoration, Compression for transmission

Sampling + Quantization

Compression

Quality of captured image depends on:

• Imaging optics, sensors, and electronics

•“Color” filter characteristics

• Digitization

• Processing

• Compression

Basic Imaging System


Imaging Device

Imaged Scene

DIGITIZER STORAGE PROCESS

Enhancement, Restoration, Compression for transmission

Sampling + Quantization

Compression

• Different storage and transmission media depending on application

• Multimedia applications over wireless portable devices gaining popularity: limited bandwidth and storage

- Video over IP

- Portable devices: power issues in addition to shared bandwidth and error-prone environment result in much lower data rate transfer

- Harsh environments and security: operation under very low power and very low bandwith at below 20 Kbits/sec

• Data Storage Devices: CDs and DVDs Data throughput (read and write rates) is much lower (few mega bits per second) than storage capacity (few gigabits per second) – 1xBlu-ray DVD: 32 Mbps


Compression Artifacts

Image and video coding standards

• Transform based

• Block-based DCT coding: JPEG, MPEGx, H.26x

• Wavelet-based coding: JPEG 2000

• Motion compensation for video

• Quantization

Copyright 2010 by Lina J. Karam EUVIP 2010

Common Compression Artifacts

• Blocking artifacts in block-based DCT codecs

• Ringing artifacts in wavelet-based codecs

• Blurriness – loss of detail and sharpness due to removal of high frequency transform coefficients

• Graininess – due to quantization of retained transform coefficients

• Contouring

• Color bleeding

• Mosquito noise in video

• Motion jerkiness in video

• Ghosting

• Flickering



Degradations due to block-based DCT transform

coding



http://www.elecard.com/products/j2kwavelet.php

JPEG - 10,696 Bytes 757x507 Butterfly

JPEG2000 - 10,436 Bytes 757x507 Butterfly


Common Compression Artifacts

Ringing

Mosquito Noise


Human Vision and Perception

Quality affected by the human visual system

• Characteristics and limitations of the human visual

system

• Some distortions are introduced

• Some distortions are masked

Saliency – visual attention

• Faces in images, eyes, mouth

• High-contrast objects

• Motion

• Snakes….


Objective Visual Quality Models and Metrics

Goal: estimate automatically and “reliably” quality of visual media

Subjective assessment are expensive and not practical for real-time implementations

Subjective tests are important for evaluating the performance of objective visual quality metrics

Subjective tests need to follow strict and repeatable evaluation conditions

ITU-T recommendations: www.itu.int/ITUT/ Publications/ recs.html

Video Quality Experts Group (VQEG) reports: www.vqeg.org

EEE 508

Visual Quality Assessment

• Image/Video fidelity criteria

– Useful for

• rating performance of image/video processing

techniques

• measuring image/video quality and user satisfaction

– Issues:

• Viewing distance

• Subjective versus objective measures in evaluating

image/video quality

EEE 508

Image Quality Assessment – Subjective criteria:

• Use rating scales – goodness scales (rates image quality)

Overall, global Group

Excellent (5) Best (7)

Good (4) Well above average (6)

Fair (3) Slightly above average (5)

Poor (2) Average (4)

Unsatisfactory (1) Slightly below average (3)

Well below average (2)

Worst (1)

– Impairment scales (rates an image based on level of degradation present in image compared to ideal image; useful in applications such as image coding and compression)

Not noticeable (1)

Just noticeable (2)

Definitely noticeable but only slight impairment (3)

Impairment not objectionable (4)

Somewhat objectionable (5)

Definitely objectionable (6)

Extremely objectionable (7)

• MOS (Mean Opinion Score) calculates average rating of observers

EEE 508

Visual Quality Assessment – Traditional Quantitative criteria:

• The most common set of traditional quantitative criteria used are based on the mean square error (MSE) norm.

• In most applications, the mean square error is expressed in terms of a Signal-to-Noise Ratio (SNR), which is defined in decibels (dB)

where = mean square error

often approximated by the average least squares error:

2

1 1

2 ,,1 M

i

N

j

polse jiIjiIMN

22 ,, jiIjiIE pomse

Original image Processed image

2

mse

2

2

10log10mse

dBSNROriginal image variance

Error variance (MSE)

EEE 508

Visual Quality Assessment

– Traditional Quantitative criteria: • Other types of SNR used in image coding applications:

- Peak-to-Peak SNR (dB) = PPSNR

- Peak SNR (dB) = PSNR (more commonly used)

• PSNR generally results in values 12 to 15 dB above the value of SNR

• SNR or PSNR are usually measures of quality; they usually correlate well with perceptual quality in image coding applications at high or very low bit rates; but they might not well correlate at low bit rates

• Commonly used because of mathematical tractability (easy to compute and handle in developing image processing algorithms)

2

2

10

imagereferenceofvaluepeaktopeaklog10

e

PPSNR

2

2

10

image referenceof valuepeaklog10

e

PSNR

EEE 508

Image Quality Assessment

RMSE = 8.5 RMSE = 9.0

Design and Evaluation of Quality Metrics

Content Database

Raw content

Processing Test

Content

Subjective Testing

Objective Visual Quality

Metric

Statistical Analysis

Performance Assessment

Reference Content

Mean Opinion Score (MOS)

[1] LIVE Database , http://live.ece.utexas.edu/research/quality/

[1]

Predicted MOS

Optional

Metric, M

MO

Sp

Nonlinear logistic function

MOS DMOS Raw Scores Z Scores


Visual Database


Performance Evaluation of Quality Metrics

Popular performance evaluation measures •Pearson Correlation Coefficient (PCC): measures prediction

accuracy, i.e., the ability of metric to predict subjective MOS with a

low error

•Spearman rank order correlation coefficient (SROCC): measures

prediction monotonicity; i.e., it measures if increase (decrease) in

one variable results in increase (decrease) in the other variable,

independent of the magnitude of increase (decrease).

•Outlier Ratio (OR): measures consistency, i.e., the degree to which

the metric maintains the prediction accuracy; it is defined as the

percentage of the number of predictions outside the range of

2 times the standard deviations of the subjective results.

Other

• RMSE and MAE of objective scores

• Hypothesis testing and F statistics

Visual Quality Databases

What is a visual quality database?

-Set of images/videos (typically with varying content)

-Subjective assessment scores

Why are visual quality databases needed?

- To assess the performance of objective or automatic

methods of quality assessment and compare their

performance

- To understand human visual perceptual properties


Existing Image quality Databases

LIVE Image (Release 2)

• JPEG compressed images (169 images)

• JPEG2000 compressed images (175 images)

• Gaussian blur (145 images)

• White noise (145 images)

• Bit errors in JPEG2000 bit stream (145 images)

Tampere Image Database 2008 (TID 2008)

• 25 reference images x 17 types of distortions x 4 levels of distortions

IRCCyN/IVC Database

10 original images, 235 distorted images generated from 4 different distortion types (JPEG,JPEG 2000, Rayleigh Fading, Blurring)

Toyama Database

14 original images, 168 distorted images generated from 2 distortion types (JPEG, JPEG 2000)



Existing Video quality Databases

VQEG

• H.263 compression

• MPEG-2 compression

LIVE Video

• MPEG-2 compression

• H.264 compression

• Simulated transmission of H.264 compressed bitstreams through error-

prone IP networks and through error-prone wireless networks





Test NR Objective

Metric Quality •No Reference (NR)

•Full Reference (FR) Reference

Test

FR Objective Metric

Quality

Reference

Test

RR Objective Metric

Quality

Features

•Reduced Reference (RR)

Fidelity Aesthetic



•Full Reference (FR) Reference

Test

FR Objective Metric

Quality

Camera Calibration/Tuning Application



Reference

Test

RR Objective Metric

Quality

Features

•Reduced Reference (RR)

Sample features from Reference Test



Test NR Objective

Metric Quality •No Reference (NR)



Full Reference Reduced Reference No Reference

Perceptual (HVS) Visual Media Characteristics Hybrid

Frequency

Domain Pixel

Domain Hybrid

Natural Scene

Statistics Visual Features Hybrid


Full Reference Perceptual-based Model

Reference

Test

Multi-channel Decomposition

Multi-channel Decomposition

.

.

.

Compute locally adaptive detection thresholds

(JNDs) at each location in each channel

. . .

Computer difference at each location in

each channel

.

.

.

. . .

Normalize by local JNDs

Pool over foveal regions

Pool all foveal differences over

entire image/video

D

Q = 1/D

Basis of several metrics: -Watson’s Spatial Standard Observer (SSO) metric -Watson’s Video Standard Observer (VSO) metric -Liu, Karam, & Watson JPEG2000 compression distortion quantification and control -Watson’s DCTune - Hontsch & Karam DCT-based JPEG compression distortion and control - Hontsch & Karam perceptually lossless compression


Perceptually lossless compression

Original image, 8 bits per pixel Processed image, 0.35 bits per pixel


Perceptual Quality-based JPEG2K compression

Original image, 8 bits per pixel



Conventional JPEG2K, 0.586 bit per pixel



Perceptual JPEG2K, 0.586 bit per pixel


Other FR Metrics based on contrast detection

thresholds

•Visual SNR, or VSNR (Chandler & Hemami, ITIP, 2007) • Weighted SNR, WSNR (Mitsa & Varkur, 93) •Noise Quality Measure, NQM (Damera-Venkata et al., ITIP, 2000)





Frequency

Domain Pixel

Domain Hybrid

Natural Scene



Quality Metrics based on Natural Scene Statistics

Basic Assumption: Distortions are not natural in terms of

Natural Scene Statistics (NSS).



Structural SIMilarity (SSIM) Index The SSIM metric is calculated on various patches of an image. The measure between two patches

and

of size N×N is:

Multi-Scale Structural SIMilarity (MS-SSIM) Index

mean of

mean of

variance of

variance of

covariance of

and

Popular SSIM (Structural SIMilarity) FR Metric (Wang et al., ITIP 04)

•The SSIM between two subimages x and y is given by

- x and y are the means of x and y - x and y are the variances of x and y -covxy is the covariance used to stabilize the division

• SSIM index for image is average of SSIM indices over all subimages • Extensions: MS-SSIM, CWSSIM, VSSIM, …

• Other FR NSS Metrics:

-Universal Quality Index (Wang & Bovik, ISPL, 02) – earlier SSIM

-Image Fidelity Criterion (Sheikh et al., ITIP, 05) – GSM in

wavelet domain

-Visual Information Fidelity (Sheikh et al., ITIP, 06) – adds HVS

• RR NSS Metric: Reduced Reference Image Quality Assessment

(Wang & Simoncelli,05) Copyright 2010 by Lina J. Karam

Quality Metrics based on Natural Scene Statistics

Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality

assessment: From error visibility to structural similarity," IEEE

Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr.

2004. http://www.ece.uwaterloo.ca/~z70wang/research/ssim/


Other Sources of Information





Frequency

Domain Pixel

Domain Hybrid

Natural Scene



No Reference Blur Metric: Just-Noticeable Blur

and Probability of Detection

Just-Noticeable Blur (JNB) concept: “The minimum amount of

perceived blurriness around an edge given a contrast higher than

the Just Noticeable Difference (JND)”.




CPBD (Cumulative Probability of Blur Detection) Metric

< 0.63




CPBD (Cumulative Probability of Blur Detection) Metric

= 0.9 = 1.7




Performance evaluation of CPBD using LIVE Database Set 1: All 174 Gaussian blurred images in LIVE.

Set 2: 30 Gaussian blurred images with varying foreground and background

blur quantities.

Set 3: All 227 jpeg-2000 compressed images in LIVE .




Performance evaluation of CPBD using TID 2008 Database




Performance evaluation of CPBD using IVC Database

Performance evaluation of CPBD using Toyama Database


Other Sources of Information

• R. Ferzli and L. J. Karam, “A No-Reference Objective Image Sharpness Metric Based on the Notion of Just

Noticeable Blur (JNB),” IEEE Transactions on Image

Processing, vol. 18, no. 4, pp. 717-728, April 2009.

•N. D. Narvekar and L. J. Karam, “A No-Reference Image Blur Metric Based on the Cumulative Probability

of Blur Detection (CPBD),” IEEE Trans. on Image

Processing, vol. 20, No. 9, pp. 2678-2683, Sept. 2011.

• http://ivulab.asu.edu/Quality

http://ivulab.asu.edu/Qualityhttp://ivulab.asu.edu/Qualityhttp://ivulab.asu.edu/Qualityhttp://ivulab.asu.edu/Qualityhttp://ivulab.asu.edu/Qualityhttp://ivulab.asu.edu/Qualityhttp://ivulab.asu.edu/Quality

•Existing still-image quality assessment metrics can be applied to assess

video and pooling over frames

•PVQM (Swisscom/KPN): Leader in VQEG Phase 1 study;

uses a linear combination of three distortion indicators, namely edginess,

temporal decorrelation, and color error to measure the perceptual quality

(visual feature based and weighted combinations of distortion indicators

related to these features).

•VQM (NTIA): Leader in VQEG Phase 2 study and standardized by ITU-T

and ISO; provides several quality models, such as the Television model, the

General Model, and the Video Conferencing Model, with several calibration

options prior to feature extraction (Visual feature based and weighted

combinations of distortion indicators related to features); main impairments

considered in General Model include blurring, block distortion,

jerky/unnatural motion, noise, and error blocks


Competitive FR Video Quality Metrics

•PEVQ (Opticom): Leader in VQEG Multimedia Phase 1 study; builds upon

PVQM ; became part of ITU-T Recommendation J.247 (FR MM video,

2008)

• MOVIE index (Seshadrinathan & Bovik, ITIP, 2009): spatio-temporal

multi-channels, visual masking, temporal quality assessed along computed

motion trajectories, builds on principles from SSIM and VIF



•Issue with current video quality metrics:

Existing still-image quality assessment metrics results on video are very

competitive with state-of-the-art video quality metrics

• Better video quality models are needed.



Performance on LIVE Video Database

why visual quality assessment? - lina karamlina.faculty.asu.edu/eee508/lectures/eee508_vqa.pdf ·...

Documents