note - institute for telecommunication sciences - its · web vieworiginal hd video sequences that...

Draft VQEG Hybrid Testplan Version 1.4

Hybrid Perceptual/Bitstream GroupTEST PLAN

Draft Version 12.8907November 4Jan. 25, 2010

Contacts: Jens Berger (Co-Chair) Tel: +41 32 685 0830 Email: [email protected] Lee (Co-Chair) Tel: +82 2 2123 2779 Email: [email protected] David Hands (Editor) Tel: +44 (0)1473 648184 Email: [email protected] Staelens (Editor) Tel: +32 9 331 49 75 Email: [email protected] Dhondt (Editor) Tel: +32 9 331 49 85 Email: [email protected] Pinson (Editor) Tel: +1 303 497 3579 Email: [email protected]

Hybrid Test Plan DRAFT version 1.4. June 10, 2009

Editors Note: unresolved issues or missing data are annotated by the string <<XXX>>

mailto:[email protected]






Editorial History

Version Date Nature of the modification

1.0 May 9, 2007 Initial Draft, edited by A. Webster (from Multimedia Testplan 1.6)

1.1 Revised First Draft, edited by David Hands and Nicolas Staelens

1.1a September 13, 2007

Edits approved at the VQEG meeting in Ottawa.

1.2 July 14, 2008 Revised by Chulhee Lee and Nicolas Staelens using some of the outputs of the Kyoto VQEG meeting

1.3 Jan. 4, 2009 Revised by Chulhee Lee, Nicolas Staelens and Yves Dhondt using some of the outputs of the Ghent VQEG meeting

1.4 June 10, 2009 Revised by Chulhee Lee using some of the outputs of the San Jose VQEG meeting

1.5 June 23, 2009 The previous decisions are incorporated.

1.6 June 24, 2009 Additional changes are made.

1.7 Jan. 25, 2010 Revised by Chulhee Lee using the outputs of the Berlin VQEG meeting

1.8 Jan. 28, 2010 Revised by Chulhee Lee using the outputs of the Boulder VQEG meeting

1.9 Jun. 30, 2010 Revised by Chulhee Lee during the Krakow VQEG meeting

2.0 Oct. 25, 2010 Revised by Margaret Pinson

Hybrid Testplan DRAFT version 1.0. 9 May 20072/223


Contents

1. Introduction 6

2. List of Definitions 7

3. List of Acronyms 8

4. Overview: ILG, Proponents, Tasks and Schedule 9

4.1 Division of Labor 94.1.1 Independent Laboratory Group (ILG) 94.1.2 Proponent Laboratories 104.1.3 VQEG 11

4.2 Overview 114.2.1 Compatibility Test Phase: Training Data 114.2.2 Testplan Design 114.2.3 Evaluation Phase 114.2.4 Common Set 12

4.3 Publication of Subjective Data, Objective Data, and Video Sequences 12

4.4 Test Schedule 12

4.5 Advice to Proponents on Pre-Model Submission Checking 13

6. SRC Video Restrictions and Video File Format 15

6.1 Source Sequence Processing Overview and Restrictions 15

6.2 SRC Resolution, Frame Rate and Duration 15

6.3 Source Test Material Requirements: Quality, Camera, Use Restrictions. 16

6.4 Source Conversion 166.4.1 Software Tools 166.4.2 Colour Space Conversion 166.4.3 De-Interlacing 176.4.4 Cropping & Rescaling 17

6.5 Video File Format: Uncompressed AVI in UYVY 18

6.6 Source Test Video Sequence Documentation 19

6.7 Test Materials and Selection Criteria 19

7. HRC Creation and Sequence Processing 21

7.1 Reference Encoder, Decoder, Capture, and Stream Generator 21



7.2 Video Bit-Rates (examples) <<XXX>> 21

7.3 Frame Rates <<XXX>> 22

7.4 Pre-Processing 22

7.5 Post-Processing 22

7.6 Coding Schemes 22

7.7 Rebuffering 23

7.8 Transmission Errors 237.8.1 Simulated Transmission Errors 237.8.2 Live Network Conditions 25

7.9 PVS Editing 25

8. Calibration and Registration 27

9. Experiment Design 30

9.1 Video Sequence Naming Convention <<XXX>> 30

9.2 Check on Bit-stream Validity 31

10. Subjective Evaluation Procedure 32

10.1 The ACR Method with Hidden Reference 3210.1.1 General Description 3210.1.2 Viewing Distance, Number of Viewers per Monitor, and Viewer Position 33

10.2 Display Specification and Set-up 3310.2.1 QVGA and WVGA Requirements 3310.2.2 HD Monitor Requirements 3410.2.3 Viewing Conditions 35

10.3 Subjective Test Video Playback 36

10.4 Evaluators (Viewers) 3610.4.2 Subjective Experiment Sessions 3710.4.3 Randomization 3810.4.4 Test Data Collection 38

10.5 Results Data Format 38

11. Objective Quality Models 40

11.1 Model Type and Model Requirements 41

11.2 Model Input and Output Data Format 4211.2.1 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models 4211.2.2 Full reference hybrid perceptual bit-stream models 4211.2.3 Reduced-reference Hybrid Perceptual Bit-stream Models 4311.2.4 Output File Format – All Models 44

11.3 Submission of Executable Model 44



11.4 Registration 45

12. Objective Quality Model Evaluation Criteria <<XXX>> 46

12.1 Post Subjective Testing Elimination of SRC or PVS 46

12.2 Evaluation Procedure 46

12.3 PSNR <<XXX>> 47

12.4 Data Processing 4712.4.1 Video Clips and Scores Used in Analysis 4712.4.2 Inspection of Viewer Data 4812.4.3 Calculating DMOS Values 4812.4.4 Mapping to the Subjective Scale 4812.4.5 Averaging Process 4912.4.6 Aggregation Procedure 49

12.5 Evaluation Metrics 4912.5.1 Pearson Correlation Coefficient 5012.5.2 Root Mean Square Error 5012.5.3 Outlier Ratio 51

12.6 Statistical Significance of the Results 5212.6.1 Significance of the Difference between the Correlation Coefficients 5212.6.2 Significance of the Difference between the Root Mean Square Errors 5212.6.3 Significance of the Difference between the Outlier Ratios 53

13. Recommendation 54

14. Bibliography 55

ANNEX I Instructions to the Evaluators 56

ANNEX II Background and Guidelines on Transmission Errors 58

ANNEX III Fee and Conditions for receiving datasets 61

ANNEX IV Method for Post-Experiment Screening of Evaluators 62

ANNEX V. Encrypted Source Code Submitted to VQEG 64

ANNEX VI. Definition and Calculating Gain and Offset in PVSs 65

APPENDIX I. Terms of Reference of Hybrid Models (Scope As Agreed in June, 2009)66

1. Introduction 6







4.2 Overview 114.2.1 Compatibility Test Phase: Training Data 114.2.2 Testplan Design 114.2.3 Evaluation Phase 114.2.4 Common Set 12











6.7 Test Materials and Selection Criteria 19

7. HRC Creation and Sequence Processing 21

7.1 Reference Encoder, Decoder, Capture, and Stream Generator 21






7.7 Rebuffering 23

7.8 Transmission Errors 23



7.8.1 Simulated Transmission Errors 237.8.2 Live Network Conditions 25

7.9 PVS Editing 25

8. Calibration and Registration 27












11.2 Model Input and Output Data Format 4211.2.1 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models 4211.2.2 Full reference hybrid perceptual bit-stream models 4211.2.3 Reduced-reference Hybrid Perceptual Bit-stream Models 4311.2.4 Output File Format – All Models 44






12.3 PSNR <<XXX>> 47

12.4 Data Processing 4712.4.1 Video Clips and Scores Used in Analysis 47



12.4.2 Inspection of Viewer Data 4812.4.3 Calculating DMOS Values 4812.4.4 Mapping to the Subjective Scale 4812.4.5 Averaging Process 4912.4.6 Aggregation Procedure 49




14. Bibliography 55



ANNEX V Method for Post-Experiment Screening of Evaluators 62

ANNEX VI. Encrypted Source Code Submitted to VQEG 64

ANNEX VII. Definition and Calculating Gain and Offset in PVSs 65

APPENDIX I. Terms of Reference of Hybrid Models (Scope As Agreed in June, 2009) 66

1. Introduction 6





4.2 Overview 114.2.1 Testplan Design 114.2.2 Compatibility Test Phase: Training Data 114.2.3 Evaluation Phase 114.2.4 Common Set 12




4.4 Test Schedule <<XXX>> 12









6.7 Test Materials and Selection Criteria <<XXX>> 18

7. HRC Creation and Sequence Processiong 20

7.1 Reference Decoder, Encoder, Capture, and Stream Generator 20






7.7 Transmission Errors 227.7.1 Rebuffering 227.7.2 Simulated Transmission Errors 227.7.3 Live Network Conditions 24

7.8 PVS Editing 24

7.9 Calibration and Registration 25





9.1 The ACR Method with Hidden Reference 29



9.1.1 General Description 299.1.2 Viewing Distance, Number of Viewers per Monitor, and Viewer Position 30







10.2 Model Input and Output Data Format 3910.2.1 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models 3910.2.2 Full reference hybrid perceptual bit-stream models 3910.2.3 Reduced-reference Hybrid Perceptual Bit-stream Models 4010.2.4 Output File Format (All Models) 41






11.3 PSNR <<XXX>> 44

11.4 Data Processing 4411.4.1 Video Clips and Scores Used in Analysis 4411.4.2 Inspection of Viewer Data 4511.4.3 Calculating DMOS Values 4511.4.4 Mapping to the Subjective Scale 4511.4.5 Averaging Process 4611.4.6 Aggregation Procedure 46






13. Bibliography 52




Some of the video data and bit-stream data might be published. See section <<XXX>> for details. 56





1. Introduction 6











6.2 Duration of SRC 14








7. HRC Creation and Sequence Processiong <<XXX>> 20

7.1 Reference Decoder, Encoder, Capture, and Stream Generator <<XXX>> 20











9.2 Display Specification and Set-up 309.2.1 QVGA Requirements 309.2.2 SD Requirements 319.2.3 HD Monitor Requirements 32

9.3 Subjective Test Video Playback <<XXX>> 33

9.4 Evaluators (Viewers) 349.4.2 Subjective Experiment Sessions 359.4.3 Randomization 369.4.4 Test Data Collection <<XXX>> 36






10.2 Model Input and Output Data Format <<XXX>> 4010.2.1 Full reference hybrid perceptual bit-stream models 4110.2.2 Reduced-reference Hybrid Perceptual Bit-stream Models 4210.2.3 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models 43






11.3 PSNR <<XXX>> 46

11.4 Data Processing 4611.4.1 Subjective Data Analysis 4611.4.2 Subjective Data Analysis 4711.4.3 Calculating DMOS Values 4711.4.4 Mapping to the Subjective Scale 4711.4.5 Averaging Process 4811.4.6 Aggregation Procedure 48




13. Bibliography 54

ANNEX I <<XXX>> Instructions to the Evaluators 55









1. Introduction 6











































11.3 PSNR <<XXX>> 46

11.4 Data Processing 4611.4.1 Subjective Data Analysis 4611.4.2 Subjective Data Analysis 47



11.4.3 Calculating DMOS Values 4711.4.4 Mapping to the Subjective Scale 4711.4.5 Averaging Process 4811.4.6 Aggregation Procedure 48




13. Bibliography 54







APPENDIX I. Terms of Reference of Hybrid Models (SCOPE AS AGREED IN JUNE, 2009) 65

1. Introduction 6



















11.3 PSNR <<XXX>> 46





13. Bibliography 54









1. Introduction 6




















11.3 PSNR <<XXX>> 46





13. Bibliography 54

Annex I <<XXX>> Instructions to the Evaluators 55

Annex II Background and Guidelines on Transmission Errors 57



Annex VI. Encrypted Source Code Submitted to VQEG 63


1. Introduction 6





4.2 Overview 114.2.1 Testplan Design 114.2.2 Compatibility Test Phase: Training Data 114.2.3 Evaluation Phase 11



4.2.4 Common Set 12
























9.2 Display Specification and Set-up 309.2.1 QVGA Requirements 30



9.2.2 SD Requirements 319.2.3 HD Monitor Requirements 32












11.3 PSNR <<XXX>> 46





13. Bibliography 54









1. Introduction 6




















11.3 PSNR <<XXX>> 46





13. Bibliography 54







1. Introduction 6




4.1 Division of Labor 114.1.1 Independent Laboratory Group (ILG) 114.1.2 Proponent Laboratories 124.1.3 Everyone: ILG, Proponents and Other Members 13

4.2 Overview 13



4.2.1 Testplan Design 134.2.2 Compatibility Test Phase: Training Data 134.2.3 Evaluation Phase 144.2.4 Common Set 14























8.2 Display Specification and Set-up 328.2.1 QVGA Requirements 32



8.2.2 SD Requirements 338.2.3 HD Monitor Requirements 34


8.4 Length of Sessions 36





9.2 Model Input and Output Data Format 429.2.1 Full reference hybrid perceptual bit-stream models 439.2.2 Reduced-reference Hybrid Perceptual Bit-stream Models 449.2.3 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models 45


9.4 Registration 46




10.3 PSNR <<XXX>> 48





12. Bibliography 56






ANNEX IV <<XXX>> Subjective Test Software Package and Required Computer Specification 63




1. Introduction 6




4.1 Independent Laboratory Group (ILG) 11

4.2 Proponent Laboratories 12





6. Sequence Processing and Data Formats 16

6.1 Source Sequence Processing Overview and Restrictions 166.1.1 Duration of SRC and PVS 176.1.2 Source Test Material Requirements: Quality, Camera, Use Restrictions. 176.1.3 Software Tools 176.1.4 Colour Space Conversion 176.1.5 De-Interlacing 186.1.6 Cropping & Rescaling 186.1.7 Rescaling 196.1.8 File Format 196.1.9 Source Test Video Sequence Documentation 20

6.2 Test Materials and Selection Criteria <<XXX>> 206.2.1 Selection of Test Material (SRC) 21



6.3 Hypothetical Reference Circuits (HRC) <<XXX>> 226.3.1 Video Bit-rates (exemplary) <<XXX>> 226.3.2 Simulated Transmission Errors 226.3.3 Live Network Conditions 246.3.4 Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs 256.3.5 Frame Rates <<XXX>> 266.3.6 Pre-Processing 266.3.7 Post-Processing 266.3.8 Coding Schemes 276.3.9 Processing and Editing Sequences 27






7.4 Length of Sessions 34





8.2 Model Input and Output Data Format 408.2.1 Full reference hybrid perceptual bit-stream models 418.2.2 Reduced-reference Hybrid Perceptual Bit-stream Models 428.2.3 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models 43


8.4 Registration 44



9.2 PSNR <<XXX>> 46

9.3 Data Processing 479.3.1 Subjective Data Analysis 479.3.2 Subjective Data Analysis 479.3.3 Calculating DMOS Values 479.3.4 Mapping to the Subjective Scale 48



9.3.5 Averaging Process 489.3.6 Aggregation Procedure 49




11. Bibliography 55








1. Introduction 6



4. Overview of ILG, Proponents, Tasks and Schedule 11



4.3 Training Data and Publication of Data 12





6.1 Source Sequence Processing Overview and Restrictions 156.1.1 Duration of SRC and PVS 166.1.2 Source Test Material Requirements: Quality, Camera, Use Restrictions. 166.1.3 Software Tools 166.1.4 Colour Space Conversion 166.1.5 De-Interlacing 176.1.6 Cropping & Rescaling 176.1.7 Rescaling 186.1.8 File Format 186.1.9 Source Test Video Sequence Documentation 19

6.2 Test Materials <<XXX>> 196.2.1 Selection of Test Material (SRC) 20




7.1 The ACR Method with Hidden Reference 277.1.1 General Description 277.1.2 Application Across Different Video Formats and Displays 287.1.3 Display Specification and Set-up 297.1.4 Subjective Test Video Playback <<XXX>> 327.1.5 Length of Sessions 337.1.6 Evalulators (Viewers) 337.1.7 Viewing Conditions 347.1.8 Subjective Experiment Sessions 347.1.9 Randomization 357.1.10 Test Data Collection <<XXX>> 36

7.2 Data Format 367.2.1 Results Data Format 36



8.2 Model Input and Output Data Format 40


8.4 Registration 44



9.2 PSNR <<XXX>> 46







11. Bibliography 55








1. Introduction 6




4.1 The ACR Method with Hidden Reference 114.1.1 General Description 114.1.2 Application across Different Video Formats and Displays 124.1.3 Display Specification and Set-up 134.1.4 Test Method 164.1.5 Length of Sessions 174.1.6 Evalulators (Viewers) 17



4.1.7 Viewing Conditions 184.1.8 Experiment design 184.1.9 Randomization 194.1.10 Test Data Collection <<XXX>> 20

4.2 Data Format 204.2.1 Results Data Format 204.2.2 Subjective Data Analysis 21

5. Overview of Test Laboratories, Responsibilities and Schedule 22



5.3 Training Data 23



6.1 Sequence Processing Overview 256.1.1 Duration of Source Sequences <<XXX>> 266.1.2 Camera and Source Test Material Requirements 266.1.3 Software Tools 266.1.4 Colour Space Conversion 266.1.5 De-Interlacing 276.1.6 Cropping & Rescaling 276.1.7 Rescaling 286.1.8 File Format 286.1.9 Source Test Video Sequence Documentation 29



6.4 Reference Decoder <<XXX>> 36


7.1 Model Type 42



7.4 Registration 46





8.2 PSNR <<XXX>> 48

8.3 Data Processing <<XXX>> 498.3.1 Calculating DMOS Values 498.3.2 Mapping to the Subjective Scale 498.3.3 Averaging Process 508.3.4 Aggregation Procedure 50




10. Bibliography 56


Annex II Example EXCEL Spreadsheet 59

Annex III Background and Guidelines on Transmission Errors 60

ANNEX IV Fee and Conditions for receiving datasets 63

ANNEX V <<XXX>> Subjective Test Software Package and Required Computer Specification 64

ANNEX VI Method for Post-Experiment Screening of Evaluators 67

ANNEX VIII Definition and Calculating Gain and Offset in PVSs 76

APPENDIX II <<XXX>> EXAMPLE INSTRUCTIONS TO THE SUBJECTS 80

1. Introduction 7






4.1 The ACR Method with Hidden Reference 124.1.1 General Description 124.1.2 Application across Different Video Formats and Displays 134.1.3 Display Specification and Set-up 144.1.4 Test Method 174.1.5 Length of Sessions 184.1.6 Evalulators (Viewers) 184.1.7 Viewing Conditions 194.1.8 Experiment design 194.1.9 Randomization 204.1.10 Test Data Collection <<XXX>> 21







1. Finalization of the candidate working systems which include reference encoder, container, server, packet capturer, extractor and reference decoder (June 2010). 24

2. Finalization of the working systems (Oct 2010) 24

3. Source video sequences are collected & sent to point of contact. (as soon as possible). Strong needs for European HD materials. 24

4. NDA for SRC video distribution (July 2010) 24

Approval of the test plan (next VQEG meeting, Jan 2011). 24

5. 24

6. Declaration of intent to participate and the number of models to submit (Approval of testplan + 1 month) 24

7. Fee payment if applicable (Approval of testplan + 2 month) 25

8. Proponents submit their models (executable and, only if desired, encrypted source code). Procedures for making changes after submission will be outlined in a separate document. To be approved prior to submission of models. (Approval of testplan + 6 month). 25

9. Training data exchange: (Approval of testplan + 3 month). 25



10. Test design by ILGs and proponents: (Model submission + 2 month). 25

11. Test design review by ILGs and proponents: (Model submission + 3 month).25

12. ILG will send exactly the number of SRCs required. (Model submission + 4 month) 25

13. ILG creates common sets and send them to ILG/Proponents. (Model submission + 4 month) 25

14. The relevant organizations generate the PVSs, using the scenes that were sent to them and send all the PVSs to a common point of contact who will distribute them to ILGs and proponents. (Model submission +5 month) 25

15. Proponents check calibration of all PVSs and identify potential problems. They may ask the ILG to review the selection of test material and replace if necessary. (Model submission + 6 month) 25

16. If a proponent or ILG testlab believes that any experiment is unbalanced in terms of qualities or have calibration problems, they may ask the ILG to review the selection of test material. If a majority of ILG agrees, then selection of PVSs will be amended. An even distribution of qualities from excellent to bad is desirable. (Model submission + 6 month) 25

ILGs and p 25

17. roponents run their subject test & submits results to the ILG. (Model submission + 8 month). 25

18. Proponents submit their objective data. (Model submission + 8 month) 25

19. Verification of submitted models by ILG (Model submission + 9 month) 25

20. ILG distribute subjective and objective data to the proponents and other ILG (Model submission + 9 month) 25

21. Statistical analysis (Model submission + 10 month) 25

22. Draft final report (Model submission + 12 month) 25

23. Approval of final report (Model submission + 12 month 25


6.1 Sequence Processing Overview 276.1.1 Duration of Source Sequences <<XXX>> 286.1.2 Camera and Source Test Material Requirements 286.1.3 Software Tools 28



6.1.4 Colour Space Conversion 286.1.5 De-Interlacing 296.1.6 Cropping & Rescaling 296.1.7 Rescaling 306.1.8 File Format 306.1.9 Source Test Video Sequence Documentation 31





7.1 Model Type 44



7.4 Registration 48



8.2 PSNR <<XXX>> 50





10. Bibliography 58











1 Introduction 8

2 List of Definitions 9

3 List of Acronyms 11

4 Subjective Evaluation Procedure 13

4.1 The ACR Method with Hidden Reference 134.1.1 General Description 134.1.2 Application across Different Video Formats and Displays 144.1.3 Display Specification and Set-up 154.1.4 Test Method 184.1.5 Length of Sessions 194.1.6 Evalulators (Viewers) 194.1.7 Viewing Conditions 204.1.8 Experiment design 204.1.9 Randomization 214.1.10 Test Data Collection <<XXX>> 22


5 Overview of Test Laboratories, Responsibilities and Schedule 24







6 Finalization of the candidate working systems which include reference encoder, container, server, packet capturer, extractor and reference decoder (June 2010). 26

7 Finalization of the working systems (Oct 2010) 27

8 Source video sequences are collected & sent to point of contact. (as soon as possible). Strong needs for European HD materials. 28

9 NDA for SRC video distribution (July 2010) 29

Approval of the test plan (next VQEG meeting, Jan 2011). 29

10 30

11 Declaration of intent to participate and the number of models to submit (Approval of testplan + 1 month) 31

12 Fee payment if applicable (Approval of testplan + 2 month) 32

13 Proponents submit their models (executable and, only if desired, encrypted source code). Procedures for making changes after submission will be outlined in a separate document. To be approved prior to submission of models. (Approval of testplan + 6 month). 33

14 Training data exchange: (Approval of testplan + 3 month). 34

15 Test design by ILGs and proponents: (Model submission + 2 month). 35

16 Test design review by ILGs and proponents: (Model submission + 3 month).36

17 ILG will send exactly the number of SRCs required. (Model submission + 4 month) 37

18 ILG creates common sets and send them to ILG/Proponents. (Model submission + 4 month) 38

19 The relevant organizations generate the PVSs, using the scenes that were sent to them and send all the PVSs to a common point of contact who will distribute them to ILGs and proponents. (Model submission +5 month) 39

20 Proponents check calibration of all PVSs and identify potential problems. They may ask the ILG to review the selection of test material and replace if necessary. (Model submission + 6 month) 40

21 If a proponent or ILG testlab believes that any experiment is unbalanced in terms of qualities or have calibration problems, they may ask the ILG to review the



selection of test material. If a majority of ILG agrees, then selection of PVSs will be amended. An even distribution of qualities from excellent to bad is desirable. (Model submission + 6 month) 41

ILGs and p 41

22 roponents run their subject test & submits results to the ILG. (Model submission + 8 month). 42

23 Proponents submit their objective data. (Model submission + 8 month) 43

24 Verification of submitted models by ILG (Model submission + 9 month) 44

25 ILG distribute subjective and objective data to the proponents and other ILG (Model submission + 9 month) 45

26 Statistical analysis (Model submission + 10 month) 46

27 Draft final report (Model submission + 12 month) 47

28 Approval of final report (Model submission + 12 month 48

6 Sequence Processing and Data Formats 49

6.1 Sequence Processing Overview 496.1.1 Duration of Source Sequences <<XXX>> 506.1.2 Camera and Source Test Material Requirements 506.1.3 Software Tools 506.1.4 Colour Space Conversion 506.1.5 De-Interlacing 516.1.6 Cropping & Rescaling 516.1.7 Rescaling 526.1.8 File Format 526.1.9 Source Test Video Sequence Documentation 53




7 Objective Quality Models 63



7.1 Model Type 66



7.4 Registration 70

8 Objective Quality Model Evaluation Criteria <<XXX>> 72


8.2 PSNR <<XXX>> 72




9 Recommendation 79

10 Bibliography 80








1 Overview 102

2 Terms of Reference – Hybrid models 103



2.1 Objectives and Application Areas 1032.1.1 Model Types NR/RR/FR 1032.1.2 Target Resolutions 1032.1.3 Target Distortions 1042.1.4 Model Input 1042.1.5 Results 1042.1.6 Model Validation 1042.1.7 Model Disclosure 1042.1.8 Time Frame 104

2.2 Relation to other Standardization Activities <<XXX>> 105


Appendix I.1. Introduction 6

Appendix I.2. List of Definitions 7

Appendix I.3. List of Acronyms 9

Appendix I.4. Subjective Evaluation Procedure 11

Appendix I-4.1. The ACR Method with Hidden Reference 11Appendix I-4.1.1. General Description 11Appendix I-4.1.2. Application across Different Video Formats and Displays 12Appendix I-4.1.3. Display Specification and Set-up 13Appendix I-4.1.7. Test Method 16Appendix I-4.1.8. Length of Sessions 17Appendix I-4.1.9. Evalulators (Viewers) 17Appendix I-4.1.11. Viewing Conditions 18Appendix I-4.1.12. Experiment design 18Appendix I-4.1.13. Randomization 19Appendix I-4.1.14. Test Data Collection <<XXX>> 20

Appendix I-4.2. Data Format 20Appendix I-4.2.1. Results Data Format 20Appendix I-4.2.2. Subjective Data Analysis 21

Appendix I.5. Overview of Test Laboratories, Responsibilities and Schedule 22

Appendix I-5.1. Independent Laboratory Group (ILG) 22

Appendix I-5.2. Proponent Laboratories 22

Appendix I-5.3. Training Data 23

Appendix I-5.4. Test Schedule 23

Appendix I.6. Sequence Processing and Data Formats 25

Appendix I-6.1. Sequence Processing Overview 25Appendix I-6.1.1. Duration of Source Sequences <<XXX>> 26Appendix I-6.1.2. Camera and Source Test Material Requirements 26Appendix I-6.1.3. Software Tools 26Appendix I-6.1.4. Colour Space Conversion 26Appendix I-6.1.5. De-Interlacing 27



Appendix I-6.1.6. Cropping & Rescaling 27Appendix I-6.1.7. Rescaling 28Appendix I-6.1.8. File Format 28Appendix I-6.1.9. Source Test Video Sequence Documentation 29

Appendix I-6.2. Test Materials <<XXX>> 29Appendix I-6.2.1. Selection of Test Material (SRC) 30

Appendix I-6.3. Hypothetical Reference Circuits (HRC) <<XXX>> 30Appendix I-6.3.1. Video Bit-rates (exemplary) <<XXX>> 31Appendix I-6.3.2. Simulated Transmission Errors 31Appendix I-6.3.3. Live Network Conditions 33Appendix I-6.3.4. Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs 34Appendix I-6.3.5. Frame Rates <<XXX>> 35Appendix I-6.3.6. Pre-Processing 35Appendix I-6.3.7. Post-Processing 35Appendix I-6.3.8. Coding Schemes 36Appendix I-6.3.9. Processing and Editing Sequences 36

Appendix I-6.4. Reference Decoder <<XXX>> 36

Appendix I.7. Objective Quality Models 39

Appendix I-7.1. Model Type 42

Appendix I-7.2. Model Input and Output Data Format 42

Appendix I-7.3. Submission of Executable Model 46

Appendix I-7.4. Registration 46

Appendix I.8. Objective Quality Model Evaluation Criteria <<XXX>> 48

Appendix I-8.1. Evaluation Procedure 48

Appendix I-8.2. PSNR <<XXX>> 48

Appendix I-8.3. Data Processing <<XXX>> 49Appendix I-8.3.1. Calculating DMOS Values 49Appendix I-8.3.2. Mapping to the Subjective Scale 49Appendix I-8.3.3. Averaging Process 50Appendix I-8.3.4. Aggregation Procedure 50

Appendix I-8.4. Evaluation Metrics 51Appendix I-8.4.1. Pearson Correlation Coefficient 51Appendix I-8.4.2. Root Mean Square Error 51Appendix I-8.4.3. Outlier Ratio 52

Appendix I-8.5. Statistical Significance of the Results 53Appendix I-8.5.1. Significance of the Difference between the Correlation Coefficients 53Appendix I-8.5.2. Significance of the Difference between the Root Mean Square Errors 53Appendix I-8.5.3. Significance of the Difference between the Outlier Ratios 54

Appendix I.9. Recommendation 55

Appendix I.10. Bibliography 56










Appendix I.1. Overview 78

Appendix I.2. Terms of Reference – Hybrid models 79

Appendix I-2.1. Objectives and Application Areas 79Appendix I-2.1.1. Model Types NR/RR/FR 79Appendix I-2.1.2. Target Resolutions 79Appendix I-2.1.3. Target Distortions 80Appendix I-2.1.4. Model Input 80Appendix I-2.1.5. Results 80Appendix I-2.1.6. Model Validation 80Appendix I-2.1.7. Model Disclosure 80Appendix I-2.1.8. Time Frame 80

Appendix I-2.2. Relation to other Standardization Activities <<XXX>> 81


Appendix I.1. Introduction 6

Appendix I.2. List of Definitions 7

Appendix I.3. List of Acronyms 9

Appendix I.4. Subjective Evaluation Procedure 11

Appendix I-4.1. The ACR Method with Hidden Reference 11Appendix I-4.1.1. General Description 11Appendix I-4.1.2. Application across Different Video Formats and Displays 12Appendix I-4.1.3. Display Specification and Set-up 13Appendix I-4.1.7. Test Method 16Appendix I-4.1.8. Length of Sessions 17Appendix I-4.1.9. Evalulators (Viewers) 17Appendix I-4.1.11. Viewing Conditions 18Appendix I-4.1.12. Experiment design 18Appendix I-4.1.13. Randomization 19Appendix I-4.1.14. Test Data Collection <<XXX>> 20

Appendix I-4.2. Data Format 20



Appendix I-4.2.1. Results Data Format 20Appendix I-4.2.2. Subjective Data Analysis 21

Appendix I.5. Overview of Test Laboratories, Responsibilities and Schedule 22

Appendix I-5.1. Independent Laboratory Group (ILG) 22

Appendix I-5.2. Proponent Laboratories 22

Appendix I-5.3. Training Data 23

Appendix I-5.4. Test Schedule 23

Appendix I.6. Sequence Processing and Data Formats 25

Appendix I-6.1. Sequence Processing Overview 25Appendix I-6.1.1. Duration of Source Sequences <<XXX>> 26Appendix I-6.1.2. Camera and Source Test Material Requirements 26Appendix I-6.1.3. Software Tools 26Appendix I-6.1.4. Colour Space Conversion 26Appendix I-6.1.5. De-Interlacing 27Appendix I-6.1.6. Cropping & Rescaling 27Appendix I-6.1.7. Rescaling 28Appendix I-6.1.8. File Format 28Appendix I-6.1.9. Source Test Video Sequence Documentation 29

Appendix I-6.2. Test Materials <<XXX>> 29Appendix I-6.2.1. Selection of Test Material (SRC) 30

Appendix I-6.3. Hypothetical Reference Circuits (HRC) <<XXX>> 30Appendix I-6.3.1. Video Bit-rates (exemplary) <<XXX>> 31Appendix I-6.3.2. Simulated Transmission Errors 31Appendix I-6.3.3. Live Network Conditions 33Appendix I-6.3.4. Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs 34Appendix I-6.3.5. Frame Rates <<XXX>> 35Appendix I-6.3.6. Pre-Processing 35Appendix I-6.3.7. Post-Processing 35Appendix I-6.3.8. Coding Schemes 36Appendix I-6.3.9. Processing and Editing Sequences 36

Appendix I-6.4. Reference Decoder <<XXX>> 36

Appendix I.7. Objective Quality Models 39

Appendix I-7.1. Model Type 42

Appendix I-7.2. Model Input and Output Data Format 42

Appendix I-7.3. Submission of Executable Model 46

Appendix I-7.4. Registration 46

Appendix I.8. Objective Quality Model Evaluation Criteria <<XXX>> 48

Appendix I-8.1. Evaluation Procedure 48

Appendix I-8.2. PSNR <<XXX>> 48



Appendix I-8.3. Data Processing <<XXX>> 49Appendix I-8.3.1. Calculating DMOS Values 49Appendix I-8.3.2. Mapping to the Subjective Scale 49Appendix I-8.3.3. Averaging Process 50Appendix I-8.3.4. Aggregation Procedure 50

Appendix I-8.4. Evaluation Metrics 51Appendix I-8.4.1. Pearson Correlation Coefficient 51Appendix I-8.4.2. Root Mean Square Error 51Appendix I-8.4.3. Outlier Ratio 52

Appendix I-8.5. Statistical Significance of the Results 53Appendix I-8.5.1. Significance of the Difference between the Correlation Coefficients 53Appendix I-8.5.2. Significance of the Difference between the Root Mean Square Errors 53Appendix I-8.5.3. Significance of the Difference between the Outlier Ratios 54

Appendix I.9. Recommendation 55

Appendix I.10. Bibliography 56






Installation and preparation 64Running the program 64Setup-file parameters 65Example of a setup-file 66



Appendix I.1. Overview 79

Appendix I.2. Terms of Reference – Hybrid models 80

Appendix I-2.1. Objectives and Application Areas 80Appendix I-2.1.1. Model Types NR/RR/FR 80Appendix I-2.1.2. Target Resolutions 80Appendix I-2.1.3. Target Distortions 81Appendix I-2.1.4. Model Input 81Appendix I-2.1.5. Results 81Appendix I-2.1.6. Model Validation 81Appendix I-2.1.7. Model Disclosure 81Appendix I-2.1.8. Time Frame 81



Appendix I-2.2. Relation to other Standardization Activities <<XXX>> 82


1. Introduction 6




4.1. The ACR Method with Hidden Reference 114.1.1. General Description 114.1.2. Application across Different Video Formats and Displays 124.1.3. Display Specification and Set-up 134.1.7. Test Method 164.1.8. Length of Sessions 174.1.9. Evalulators (Viewers) 174.1.11. Viewing Conditions 184.1.12. Experiment design 184.1.13. Randomization 194.1.14. Test Data Collection <<XXX>> 20

4.2. Data Format 204.2.1. Results Data Format 204.2.2. Subjective Data Analysis 21


5.1. Independent Laboratory Group (ILG) 22

5.2. Proponent Laboratories 22

5.3. Training Data 23

5.4. Test Schedule 23


6.1. Sequence Processing Overview 256.1.1. Duration of Source Sequences <<XXX>> 266.1.2. Camera and Source Test Material Requirements 266.1.3. Software Tools 266.1.4. Colour Space Conversion 266.1.5. De-Interlacing 276.1.6. Cropping & Rescaling 276.1.7. Rescaling 286.1.8. File Format 286.1.9. Source Test Video Sequence Documentation 29

6.2. Test Materials <<XXX>> 296.2.1. Selection of Test Material (SRC) 30

6.3. Hypothetical Reference Circuits (HRC) <<XXX>> 306.3.1. Video Bit-rates (exemplary) <<XXX>> 31



6.3.2. Simulated Transmission Errors 316.3.3. Live Network Conditions 336.3.4. Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs 346.3.5. Frame Rates <<XXX>> 356.3.6. Pre-Processing 356.3.7. Post-Processing 356.3.8. Coding Schemes 366.3.9. Processing and Editing Sequences 36

6.4. Reference Decoder <<XXX>> 36


7.1. Model Type 42

7.2. Model Input and Output Data Format 42

7.3. Submission of Executable Model 46

7.4. Registration 46


8.1. Evaluation Procedure 48

8.2. PSNR <<XXX>> 48

8.3. Data Processing <<XXX>> 498.3.1. Calculating DMOS Values 498.3.2. Mapping to the Subjective Scale 498.3.3. Averaging Process 508.3.4. Aggregation Procedure 50

8.4. Evaluation Metrics 508.4.1. Pearson Correlation Coefficient 518.4.2. Root Mean Square Error 518.4.3. Outlier Ratio 52

8.5. Statistical Significance of the Results 538.5.1. Significance of the Difference between the Correlation Coefficients 538.5.2. Significance of the Difference between the Root Mean Square Errors 538.5.3. Significance of the Difference between the Outlier Ratios 54


10. Bibliography 56











11. Overview 79

12. Terms of Reference – Hybrid models 80

12.1. Objectives and Application Areas 8012.1.1. Model Types NR/RR/FR 8012.1.2. Target Resolutions 8012.1.3. Target Distortions 8112.1.4. Model Input 8112.1.5. Results 8112.1.6. Model Validation 8112.1.7. Model Disclosure 8112.1.8. Time Frame 81

12.2. Relation to other Standardization Activities <<XXX>> 82


1. Introduction 6
















6.3. Hypothetical Reference Circuits (HRC) <<XXX>> 306.3.1. Video Bit-rates (exemplary) <<XXX>> 316.3.2. Simulated Transmission Errors 316.3.3. Live Network Conditions 336.3.4. Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs 346.3.5. Frame Rates <<XXX>> 356.3.6. Pre-Processing 356.3.7. Post-Processing 356.3.8. Coding Schemes 366.3.9. Processing and Editing Sequences 36



7.1. Model Type 42






8.2. PSNR <<XXX>> 48







10. Bibliography 56



Annex III Background and Guidelines on Transmission Errors 60Introduction60Packet switched radio network 60Wireline Internet 60Circuit switched radio network 61Summary of transmission error simulators 61References 62





Notes 76


11. Overview 79


12.1. Objectives and Application Areas 8012.1.1. Model Types NR/RR/FR 8012.1.2. Target Resolutions 8012.1.3. Target Distortions 8112.1.4. Model Input 8112.1.5. Results 8112.1.6. Model Validation 8112.1.7. Model Disclosure 81



12.1.8. Time Frame 81



1. Introduction 6



















7.1. Model Type 42






8.2. PSNR <<XXX>> 48





10. Bibliography 56Introduction60Packet switched radio network 60Wireline Internet 60Circuit switched radio network 61Summary of transmission error simulators 61References 62Installation and preparation 64Running the program 64Setup-file parameters 65Example of a setup-file 66



Notes 76

11. Overview 79




1. Introduction 6












6.1. Sequence Processing Overview 25



6.1.1. Duration of Source Sequences <<XXX>> 266.1.2. Camera and Source Test Material Requirements 266.1.3. Software Tools 266.1.4. Colour Space Conversion 266.1.5. De-Interlacing 276.1.6. Cropping & Rescaling 276.1.7. Rescaling 286.1.8. File Format 286.1.9. Source Test Video Sequence Documentation 29





7.1. Model Type 42






8.2. PSNR <<XXX>> 48








Notes 76

11. Overview 79


12.1. Objectives and Application Areas 8012.1.1. Model Types NR/RR/FR 8012.1.2. Target Resolutions 8012.1.3. Target Distortions 8112.1.4. Model Input 8112.1.5. Results 8112.1.6. Model Validation 81

12.1.7. MODEL DISCLOSURE 8112.1.8. Time Frame 81


1. Introduction 6





4.2. Data Format 204.2.1. Results Data Format 20



4.2.2. Subjective Data Analysis 21












7.1. Model Type 42






8.2. PSNR <<XXX>> 48

8.3. Data Processing <<XXX>> 49



8.3.1. Calculating DMOS Values 498.3.2. Mapping to the Subjective Scale 498.3.3. Averaging Process 508.3.4. Aggregation Procedure 50





Notes 76

11. OVERVIEW 79




1. Introduction 6





7.1. Model Type 42






8.2. PSNR <<XXX>> 48






Transformation of source test sequences to UYVY AVI files 71

AviSynth Scripts for the common transformations <<XXX>> 72

UYVY Raw to UYVY AVI 73

UYVY Raw to RGB AVI 73

RGB AVI to UYVY AVI 74

Processing and Editing Sequences 74

Calibration 75

UYVY Decoder to UYVY Raw / UYVY AVI 75



Notes 76

11. Overview 79




1. Introduction 6












6.1. Sequence Processing Overview 25



6.1.1. Duration of Source Sequences <<XXX>> 266.1.2. Camera and Source Test Material Requirements 266.1.3. Software Tools 266.1.4. Colour Space Conversion 266.1.5. De-Interlacing 276.1.6. Cropping & Rescaling 276.1.7. Rescaling 286.1.8. File Format 286.1.9. Source Test Video Sequence Documentation 29





7.1. Model Type 42






8.2. PSNR <<XXX>> 48














Calibration 75


Notes 76

11. Overview 79




Table of Contents



1. .....................................................................................................1



2. .....................................................................................................2



3. .....................................................................................................3



4. .....................................................................................................4



5. .....................................................................................................5



6. 6



7.

1. Introduction 6




4.1. The ACR Method with Hidden Reference 114.1.1. General Description 114.1.2. Application across Different Video Formats and Displays 124.1.3. Display Specification and Set-up 13

4.1.4. QVGA Requirements 134.1.5. SD Requirements 134.1.6. HD Monitor Requirements 15

4.1.7. Test Method 164.1.8. Length of Sessions 174.1.9. Evalulators (Viewers) 17

4.1.10. Instructions for Evaluators and Selection of Valid Evaluators 17

4.1.11. Viewing Conditions 184.1.12. Experiment design 184.1.13. Randomization 194.1.14. Test Data Collection <<XXX>> 20








6.1. Sequence Processing Overview 256.1.1. Duration of Source Sequences <<XXX>> 266.1.2. Camera and Source Test Material Requirements 266.1.3. Software Tools 266.1.4. Colour Space Conversion 266.1.5. De-Interlacing 27



6.1.6. Cropping & Rescaling 276.1.7. Rescaling 286.1.8. File Format 286.1.9. Source Test Video Sequence Documentation 29





7.1. Model Type 42






8.2. PSNR <<XXX>> 48





10. Bibliography 56Introduction60Packet switched radio network 60Wireline Internet 60



Circuit switched radio network 61Summary of transmission error simulators 61References 62Installation and preparation 64Running the program 64Setup-file parameters 65Example of a setup-file 66







Calibration 75


Notes 76

11. Overview 79




1. Introduction 6





4.1.4. QVGA Requirements 13



4.1.5. SD Requirements 134.1.6. HD Monitor Requirements 15



4.1.11. Viewing Conditions 184.1.12. Experiment design 184.1.13. Randomization 194.1.14. Test Data Collection 20


5. Test Laboratories and Schedule 22





6.1. Sequence Processing Overview 256.1.1. Duration of Source Sequences 266.1.2. Camera and Source Test Material Requirements 266.1.3. Software Tools 266.1.4. Colour Space Conversion 266.1.5. De-Interlacing 276.1.6. Cropping & Rescaling 276.1.7. Rescaling 286.1.8. File Format 286.1.9. Source Test Video Sequence Documentation 29





7.1. Model Type 42








8.2. PSNR <<XXX>> 48












Calibration 75


Notes 76



11. Overview 79




1. Introduction 6





4.1.4. QVGA Requirements 134.1.5. SD Requirements 134.1.6. HD Monitor Requirements 15



4.1.11. Viewing Conditions 184.1.12. Experiment design 184.1.13. Randomization 194.1.14. Test Data Collection 20










6.2. Test Materials??TBD 306.2.1. Selection of Test Material (SRC) 31

6.3. Hypothetical Reference Circuits (HRC) [??TBD] 316.3.1. Video Bit-rates (exemplary) 326.3.2. Simulated Transmission Errors 326.3.3. Live Network Conditions 346.3.4. Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs 356.3.5. Frame Rates 356.3.6. Pre-Processing 366.3.7. Post-Processing 366.3.8. Coding Schemes 366.3.9. Processing and Editing Sequences 36

6.4. Reference Decoder 37


7.1. Model Type 43




8. Objective Quality Model Evaluation Criteria 49


8.2. PSNR 49

8.3. Data Processing 508.3.1. Calculating DMOS Values 508.3.2. Mapping to the Subjective Scale 508.3.3. Averaging Process 518.3.4. Aggregation Procedure 51








AviSynth Scripts for the common transformations 73





Calibration 76


Notes 77

11. Overview 80



12.2. Relation to other Standardization Activities [??TBD, scope] 83

1. Introduction 6






4.1. The ACR Method with Hidden Reference Removal 204.1.1. General Description 204.1.2. Application across Different Video Formats and Displays 204.1.3. Display Specification and Set-up 204.1.4. Test Method 204.1.5. Evaluators 오류! 책갈피가 정의되어 있지 않습니다.4.1.6. Viewing Conditions 204.1.7. Experiment design 204.1.8. Randomization 204.1.9. Test Data Collection 20





5.3. Test procedure and schedule 22

1.1. 22



6.2. Test Materials 296.2.1. Selection of Test Material (SRC) 30

6.3. Hypothetical Reference Circuits (HRC) 306.3.1. Video Bit-rates 316.3.2. Simulated Transmission Errors 316.3.3. Live Network Conditions 336.3.4. Pausing with Skipping and Pausing without Skipping 336.3.5. Frame Rates 346.3.6. Pre-Processing 356.3.7. Post-Processing 356.3.8. Coding Schemes 356.3.9. Processing and Editing Sequences 35


7.1. Model Type 39






8. Objective Quality Model Evaluation Criteria 45


8.2. PSNR 45

8.3. Data Processing 468.3.1. Calculating DMOS Values 468.3.2. Mapping to the Subjective Scale 468.3.3. Averaging Process 478.3.4. Aggregation Procedure 47

8.4. Evaluation Metrics 478.4.1. Pearson Correlation Coefficient 478.4.2. Root Mean Square Error 48





AviSynth Scripts for the common transformations 69





Calibration 72


Notes 73


8. Introduction

This document defines the procedure for evaluating the performance of objective perceptual quality models submitted to the Video Quality Experts Group (VQEG) formed from experts of ITU-T Study Groups 9 and 12 and ITU-R Study Group 6. It is based on discussions from various meetings of the VQEG Hybrid perceptual bit-stream working group (HBS) recorded in the Editorial History section at the beginning of this document.

The goal of the VQEG HBS group is to evaluate perceptual quality models suitable for digital video quality measurement in video and multimedia services delivered over an IP network. The scope of the testplan covers a range of applications including IPTV, internet streaming and mobile video. The primary point of use for the measurement tools evaluated by the HBS group is considered to be operational environments (as defined in Figures 11.1 through 11.3 X, Section Y), although they may be used for performance testing in the laboratory.

For the HBS testing, audio-video test sequences will be presented to evaluators (viewers). Evaluators will provide three quality ratings for each test sequence: a video quality rating (MOSV), an audio quality rating (MOSA) and an overall quality rating (MOSAV). Models may predict the quality of the video only or provide all three measures for each test sequence. InitiallyWithin this test plan, the hybrid project will test video only. If enough audio(with video) subjective data is available, models for audio and audio/video will be also validated.

The performance of objective models will be based on the comparison of the MOS obtained from controlled subjective tests and the MOS predicted by the submitted models. This testplan defines the test method, selection of source test material (termed SRCs) and processed test conditions (termed HRCs), and evaluation metrics to examine the predictive performance of competing objective hybrid/bit-stream quality models.

A final report will be produced after the analysis of test results.

of 223

9. List of Definitions

Hypothetical Reference Circuit (HRC) is one test case (e.g., an encoder, transmission path with perhaps errors, and a decoder, all with fixed settings).

Intended frame rate is defined as the number of video frames per second physically stored for some representation of a video sequence. The intended frame rate may be constant or may change with time. Two examples of constant intended frame rates are a BetacamSP tape containing 25 fps and a VQEG FR-TV Phase I compliant 625-line YUV file containing 25 fps; these both have an absolute frame rate of 25 fps. One example of a variable absolute frame rate is a computer file containing only new frames; in this case the intended frame rate exactly matches the effective frame rate. The content of video frames is not considered when determining intended frame rate.

Anomalous frame repetition is defined as an event where the HRC outputs a single frame repeatedly in response to an unusual or out of the ordinary event. Anomalous frame repetition includes but is not limited to the following types of events: an error in the transmission channel, a change in the delay through the transmission channel, limited computer resources impacting the decoder’s performance, and limited computer resources impacting the display of the video signal.

Constant frame skipping is defined as an event where the HRC outputs frames with updated content at an effective frame rate that is fixed and less than the source frame rate.

Effective frame rate is defined as the number of unique frames (i.e., total frames – repeated frames) per second.

Frame rate is the number of (progressive) frames displayed per second (fps).

Live Network Conditions are defined as errors imposed upon the digital video bit stream as a result of live network conditions. Examples of error sources include packet loss due to heavy network traffic, increased delay due to transmission route changes, multi-path on a broadcast signal, and fingerprints on a DVD. Live network conditions tend to be unpredictable and unrepeatable.

Pausing with skipping ( formerly aka frame skipping) is defined as events where the video pauses for some period of time and then restarts with some loss of video information. In pausing with skipping, the temporal delay through the system will vary about an average system delay, sometimes increasing and sometimes decreasing. One example of pausing with skipping is a pair of IP Videophones, where heavy network traffic causes the IP Videophone display to freeze briefly; when the IP Videophone display continues, some content has been lost. Another example is a videoconferencing system that performs constant frame skipping or variable frame skipping. Constant frame skipping and variable frame skipping are subsets of pausing with skipping. A processed video sequence containing pausing with skipping will be approximately the same duration as the associated original video sequence.

Pausing without skipping ( formerly aka frame freeze) is defined as any event where the video pauses for some period of time and then restarts without losing any video information. Hence, the temporal delay through the system must increase. One example of pausing without skipping is a computer simultaneously downloading and playing an AVI file, where heavy network traffic causes the player to pause briefly and then continue playing. A processed video sequence containing pausing without skipping events will always be longer in duration than the associated original video sequence.

Rebuffering is defined as a pausing without skipping (aka frame freeze) event that lasts more than 0.5 seconds. <<XXX>>

Refresh rate is defined as the rate at which the computer monitor is updated.

of 223

, 11/04/10,

This definition must be reviewed by the Hybrid Group. Critical portions of this document depend upon the definition of rebuffering.

Simulated transmission errors are defined as errors imposed upon the digital video bit stream in a highly controlled environment. Examples include simulated packet loss rates and simulated bit errors. Parameters used to control simulated transmission errors are well defined.

Source frame rate (SFR) is the intended frame rate of the original source video sequences. The source frame rate is constant. For the MM testplan the SFR may be either 25 fps or 30 fps.

Transmission errors are defined as any error imposed on the video transmission. Example types of errors include simulated transmission errors and live network conditions.

Variable frame skipping is defined as an event where the HRC outputs frames with updated content at an effective frame rate that changes with time. The temporal delay through the system will increase and decrease with time, varying about an average system delay. A processed video sequence containing variable frame skipping will be approximately the same duration as the associated original video sequence.

of 223

10. List of Acronyms

ACR-HRR Absolute Category Rating with Hidden Reference Removal

ANOVA ANalysis Of VAriance

ASCII ANSI Standard Code for Information Interchange

CCIR Comite Consultatif International des Radiocommunications

CIF Common Intermediate Format (352 x 288 pixels)

CODEC COder-DECoder

CRC Communications Research Centre (Canada)

DVB-C Digital Video Broadcasting-Cable

DMOS Difference Mean Opinion Score

FR Full Reference

GOP Group Of Pictures

HRC Hypothetical Reference Circuit

HSDPA High-Speed Downlink Packet Access

ILG Independent Laboratory Group

ITU International Telecommunication Union

LSB Least Significant Bit

MM MultiMedia

MOS Mean Opinion Score

MOSp Mean Opinion Score, predicted

MPEG Moving Picture Experts Group

NR No (or Zero) Reference

NTSC National Television Standard Code (60 Hz TV)

PAL Phase Alternating Line standard (50 Hz TV)

PLR Packet Loss Ratio

PS Program Segment

PVS Processed Video Sequence

QAM Quadrature Amplitude Modulation

QCIF Quarter Common Intermediate Format (176 x 144 pixels)

QPSK Quadrature Phase Shift Keying

VQR Video Quality Rating (as predicted by an objective model)

RR Reduced Reference

SMPTE Society of Motion Picture and Television Engineers

SRC Source Reference Channel or Circuit

VGA Video Graphics Array (640 x 480 pixels)

VQEG Video Quality Experts Group

VTR Video Tape Recorder

of 223

WCDMA Wideband Code Division Multiple Access

of 223

11. Overview: ILG, ProponentsTest Laboratories, Tasks and Schedule

11.1 Division of Labor

Given the scope of the HBS testing, both independent test laboratories and proponent laboratories will be given subjective test responsibilities. Before conducting subjective tests, all laboratories will inform VQEG (via the HBS Reflector [[email protected]]) of the test environment and equipment they plan to use.

11.1.1 Independent Laboratory Group (ILG)

The independent laboratory group is currently composed of IRCCyN (France), CRC (Canada), INTEL (USA), Acreo (Sweden), FUB (Italy), NTIA (USA), Ghent and AGH Nortel (CanadaPoland). Other ILG may be added. The ILG indicating a willingness to participate as test laboratories are as follows. This is a tentative list.

Acreo 1 (QVGA, SD625)

AGH 1

CRC 1

FUB 1+ (QVGA, SD625, HD50i, HD25p) as needed

Ghent 1

INTEL 0 (QVGA, HD60i, HD30p)

Acreo 1 (QVGA, SD625)

IRCCyN 1

FUB 1 (QVGA, SD625, HD50i, HD25p)

NTIA 0

Verizon (QVGA, SD525, HD60i, HD30p)

NOTE: ATIS option may be considered.

Total: 6

The ILG are responsible for the following:

1. Collect model submissions and validate basic model operation

2. Select SRC for each proponent subjective experiment

3. Review proponents’ subjective experiment test plans

4. number of PVSs from each type of error will be tested per image resolutionDifferent types of error conditions can be mixed between experiments to ensure a balance in the design of each individual experimentDetermine the test conditions for each experiment (i.e., modify & change proponent test plans)

5. Conduct ILG subjective tests

6. Check that all PVSs created by the ILG fall within the calibration and registration limits specified in section 8.

of 223

7. Examination of SRC with MOS < 4.0, conducted prior to data analysis.

8. A ll decisions on the discard of SRC and PVS

9. Data Analysis

Note: ATIS option may be considered

11.1.2 Proponent Laboratories

A number of proponents also have significant expertise in and facilities for subjective quality testing. Proponents can conduct subjective tests under the ILG guidance. Proponents indicating a willingness to participate as test laboratories are as follows (tentative list). Other proponents may participate in the Hybrid test.:

[??Editor’s Note] The resolution is tentative.

BT 1 0 (QVGA, SD625, HD50i, HD25p)

Ericsson1 (QVGA)

DT 1 (HD25p)

Ghent Univ. 1 (QVGA, HD30p, HD25p)

KDDI 2 1 (QVGA, SD525, HD60i, HD30p)

Lancaster Univ. 1 ??(unknown) (QVGA) not present at Boulder meeting

VQLINK 0 not present at Boulder meeting

NTT 2 1 (SD525, HD60i)

OPTICOM Opticom 1 (QVGA)

Psytechnics ?1

Symmetricom 1 (SD525, HD60i, HD30p)

Swissqual 1 ?1

Tektronix not present at Boulder meeting(unknown)

Yonsei 3 1-3 (QVGA, SD525, HD60i, HD30p)

Total: 15 9 to 11(QVGA, SDWVGA, HD50, HD60HD)

Proponents are responsible for the following:

1. Timely payment of ILG fee

2. Submit model executable to ILG, allowing time for validation that model runs on ILG computer

3. Optionally submit encrypted model code to ILG

4. Write draft subjective experiment test plan(s)

5. Conduct one or more subjective validation experiment

6. Check that all PVSs fall within the calibration and registration limits specified in section 8.

7. Double-check that all PVSs fall within these calibration and registration limits. <<XXX>>

8. Redistribution of PVSs to other proponents and ILG. <<XXX>>

of 223

, 11/04/10,

This is a proposal, only. The Hybrid Group should consider who is responsible for this data redistribution, and how this will be done.

, 11/04/10,

This idea was in the previous version of the test plan, but it is not obvious whether this was agreed to or not. It is also not obvious who will do this double-check. Putting it on the proponent list is a suggestion, only.

, 11/04/10,

This item is implied elsewhere, but may not have been formally agreed.

CRC 01

INTEL 1 0 (QVGA, HD60i, HD30p)

Acreo 0 or possibly 1 (QVGA, SD625)

IRCCyN 1 ?

Nortel 0

FUB 1 (QVGA, SD625, HD50i, HD25p) not present at Boulder meeting

NTIA 0

Verizon 1 ?? (QVGA, SD525, HD60i, HD30p) not present at Boulder meeting

NOTE: ATIS option may be considered.

Total: 4 or 53

It is clearly important to ensure all test data is derived in accordance with this testplan. Critically, proponent testing must be free from charges of advantage to one of their models or disadvantage to competing models.

The maximum number of subjective experiments run by any one proponent laboratory is 3 times the lowest non-zero number run by any other proponent laboratory, per image size.

The maximum number of non-secret PVSs included in overall test by any single proponent laboratory is 20%.

For each proponent subjective test, no more than 50% of test sequences may be derived from a single proponent. This does not apply to PVSs created by the ILG or to common sequences.

See Annex IV for details on fFees and conditions for proponents participating in the VQEG HBS tests will be determined by the ILG after approval of the Hybrid test plan.

11.1.3 VQEG

1. The VQEG Hybrid group will be responsible for deciding upon precise HRCs to be used in the testing.

2. Before conducting subjective tests, all laboratories will inform VQEG (via the HBS Reflector [[email protected]]) of the test environment and equipment they plan to use.Raise concerns about objections to an ILG or Proponent’s monitor specifications, within 2 weeks after the specifications are posted to the Hybrid Reflector.

3. Review subjective test plans for imbalances and other problems (after ILG adjustments)

4. Before conducting subjective tests, all laboratories must post to the reflector what monitor they plan to use. VQEG members have 2 weeks to object.

11.2 Overview

The proposed Hybrid Perceptual/Bitstream Validation (HBS) test will examine the performance of objective perceptual quality models for different video formats (HD, SDWVGA, and QVGA). Section Error:Reference source not found defines format and display types in detail. Video applications targeted in this test include the suite of IPTV services, internet video, mobile video, video telephony, and streaming video.

of 223

Separate subjective tests will be performed for different video sizes: <<XXX>>

QVGA (320 x 240) at 25fps and 30fps

SD (525/60 or 625/50 line formats)WVGA (<<XXX>>) at 25fps and 30fps

HD (1080i 50fps, 1080i 59.94fps60, 1080p 3029.97fps, and 1080p 25fps)

11.2.1 Testplan Design

The HRCs used in the subjective tests should cover the scope of the hybrid model. At a first step, proposals of test conditions and topics should be collected. This draws the scope of the model. Main conditions will be defined and should be included already in the Training Phase.

11.2.2 Compatibility Test Phase: Training Data

The compatibility test phase is mainly for testing compatibility of the candidate models to the PVS and bit-streams created by different processing labs. It is a subset of conditions those might be used in the evaluation phase later on. It is not desired to include all implementations of one codec or all variations of bit-rate and error patterns in the test phase. The test phase should just consist of typical examples.

Any source material used in the test / training phase must not be used in the evaluation phase. It might be sufficient using only a few sources in the training phase, while a wide variability of sources is desired in the evaluation phase.

Models must be prepared for all kinds of bit streams generated by other proponents and ILG. The training data is intended to provide proponents with a clear understanding of what kinds of impairments they should expect. A limited number of SRC will be used to generate a variety of PVSs and bit stream data. These will be redistributed to all proponents.

The compabibility test phase will occur prior to model submission (see the Test Schedule in Section 4.4).

<<XXX>>

11.2.3 Testplan Design

The HRCs used in the subjective tests should cover the scope of the hybrid model. At a first step, proposals of test conditions and topics should be collected. This draws the scope of the model. Main conditions will be defined and should be included already in the Training Phase.

11.2.4 Evaluation Phase

Based on the list of conditions the design and conduction of subjective tests will be done jointly by the proponents and the ILGs. Each interested party proposes a set of HRCs those are of interest, is fitting to scope of the model and the party and can be processed by that party too.

This total set of HRCs is than jointly subdivided and assigned to the individual subjective tests under constraints of formats and resolutions. It is proposed to allow so-called focus tests, where the focus is set to compression or to transmission errors.

That way the Draft Testplans are created. These Draft Testplans will be reviewed by the ILGs for mis-balances and spotting of ‘white areas’, which are not covered. The ILGs can re-assign HRCs among the tests and request HRCs those should be included. The Final Testplans are subject to agreement by all parties.

of 223

, 11/04/10,

Most of this section was written at the Kradow meeting, but were written separately. This should be re-read by the Hybrid group to make sure the original intention is retained.

After processing according to the Final Testplans, a visual review by all parties is allowed to discover weaknesses and processing errors. Observed problems have to be reported. The ILGs will do the final decision about solving reported problems.

11.2.5 Common Set

A common set of 24 video sequences will be included in every experiment. This common set will evenly span the full range of quality described in this test plan (i.e., including the best and worst quality expected). After the PVS have been created, the SRC and PVS will be format and frame-rate converted as appropriate for inclusion into each experiment. The ILG will visually examine the common set after frame rate conversion and ensure that all versions of each common set sequence are visually similar.<<XXX>>

The common set of PVSs will include the secret PVSs and secret source. The number of PVSs of the common set is 24.

11.3 Publication of Subjective Data, Objective Data, and Video Sequences

All subjective data for all clips will appear in the Final Report.

The objective data for all models that appear in the Final Report must be published in the Final Report. The objective data for withdrawn models will not be released.

Video data and bit stream data will be published provided that:

1. Such publication is not disallowed by the source content NDA or copyright

2. All participating labs that generated PVSs or performed the subjective tests agree to publish the PVSs along with the bit-stream data.

The video data and bit-stream data for the common set will be published.

Video data may be released when<<XXX>>

<<XXX>>

11.4 Test Schedule

1. Finalization of the candidate working systems which include reference encoder, container, server, packet capturer, extractor and reference decoder (June 2010).

2. Finalization of the working systems (Oct 2010)

3. Source video sequences are collected & sent to point of contact. (as soon as possible). Strong needs for European HD materials.

4. NDA for SRC video distribution (July 2010)

5.

6. AApproval of the test plan (next VQEG meeting, JanNovember 20110 after step 1. Steps 1 and 2 may happen at the same VQEG meeting).

of 223

, 11/04/10,

ATIS option was to be considered (i.e., ILG keep some datasets private).

, 11/04/10,

Hybrid group must decide upon a release date. I suggest “when the final report is published”.

, 11/04/10,

This paragraph is a proposal, only. The Hybrid group needs to examine this and approve or modify. Very little was written down about the common set.

7. Finalization of the working systems which include reference encoder, container, server, packet capturer, extractor and reference decoder at the next VQEG meeting.

8. Declaration of intent to participate and the number of models to submit (Approval of testplan + 1 month)

9. Fee payment if applicable (Approval of testplan + 2 month)

10. Training data exchange: (Approval of testplan + 3 month).

<<XXX>> Source video sequences are collected & sent to point of contact. (as soon as possible)

NDA for SRC video distribution (March 2010)

All SRC video will be sent to the requesting organization, except for the secret SRC after NDA is signed. The requesting organization will have to pay for the cost. The point of contact should send the source pool within two weeks after it receives the request. Alternatively, the SRC video can be distributed at the next VQEG meeting.

Secret content should be sent to the ILG directly. Proponents are not allowed to provide secret content.

Test period (5 months after Step 1). Three or four test session data will be distributed to the proponents and ILS within three months of Step 1. Any proponents or ILG may contribute test data. Proponents can report any problems during the next two months to address any potential issues.

VQEG compiles a list of HRCs that are of interest the HYBRID test. Proponents will send details of proposed HRCs and indicate which ones they can create to the points of contacts and example PVSs. (Step 1 + 2 month)

Each organization that will perform subjective testing creates a proposed list of HRCs, that they plan to use in a subjective test. This list will include exactly the number of HRCs needed. (Step 1 + 4 month)

The proposed lists of HRCs for each experiment are examined by VQEG for problems (e.g., one organization creating too many HRCs, overlap between experiments, using NTT guidelines). (Step 1 + 5 month)

11. Proponents submit their models (executable and, only if desired, encrypted source code). Procedures for making changes after submission will be outlined in a separate document. To be approved prior to submission of models. (Approval of testplan Step 1 + 8 6 month).

12. Training data exchange: (Approval of testplan + 3 month).

13. Test design by ILGs and proponents: (Model submission + 2 month).

14. Test design review by ILGs and proponents: (Model submission + 3 month).

[??STEPS BELOW: ??TBD] The following schedule is tentative.

15. ILG select SRC sequences for each experiment & sends them only to the organization running that experiment. ILG will send exactly the number of SRCs required. (Model submission + 1 4 month)??

of 223

, 11/04/10,

Hybrid Group must specify procedure for making changes. A new Annex is recommended, instead of a separate document.

16. ILG creates a set of secret SRCs and secret HRCscommon sets and send them to ILG/Proponents. The ILG inserts these into every proponents’ experiments. (Model submission + 1 4 month)

17. The relevant organizations running the experiment will generate the PVSs, using the scenes that were sent to them and send all the PVSs to a common point of contact who will distribute them to ILGs and proponents. (Model submission +3 5 month)

18. Proponents check calibration of all PVSs and identify potential problems. They may ask the ILG to review the selection of test material and replace if necessary. (Model submission + 6 month)

Proponents check the calibration and registration of the PVSs in their experiment. (Model submission + 4 month)

If a proponent or ILG testlab believes that their any experiment is unbalanced in terms of qualities or have calibration problems, they may ask the ILG and the proponent group to review the selection of test material. If 2/3rda majority of ILG agrees, then selection of PVSs will be amended by the ILG. An even distribution of qualities from excellent to bad is desirable. (Model submission + 5 6 month)

ILGs and pAll SRCs and PVSs are distributed to all the proponents (Model submission + 6 month)

Proponents check calibration of all PVSs and identify potential problems. They may ask the ILG to review the selection of test material and replace if necessary. (Model submission + 7 month)

Proponents run their subject test & submits results to the ILG. (Model submission + 8 month).

19. P roponents submit their objective data. (Model submission + 8 month)

20. Verification of submitted models by ILG (Model submission + 9 month)

21. ILG distribute subjective and objective data to the proponents and other ILG (Model submission + 9 month) <<XXX>>

22. Statistical analysis (Model submission + 9 10 month)

23. Draft final report (Model submission + 10 12 month)

24. Approval of final report (Model submission + 12 month

11.5 Advice to Proponents on Pre-Model Submission Checking

Prior to the official model submission date, the ILG will verify that the submitted models (1) run on the ILG’s computers and (2) yield the correct output values when run on the test video sequences. Due to their limited resources, the ILG may encounter difficulties verifying executables submitted too close to the model submission deadline. Therefore, proponents are strongly encouraged to submit a prototype model to the ILG well before the verification deadline, to work out platform compatibility problems well ahead of the final verification date. Proponents are also strongly encouraged to submit their final model executable 14 days prior to the verification deadline date, giving the ILG two weeks to resolve problems arising from the verification procedure.

The ILG requests that proponents kindly estimate the run-speed of their executables on a test video sequence and to provide this information to the ILG.

of 223

, 11/04/10,

Consider redistributing objective data to other proponents until after the first draft of the ILG statistical analysis. This allows proponents to withdraw without their model performance being widely redistributed. This was requested of the ILG during the VQEG HDTV test.

6. SRC Video Restrictions Sequence Processing and Data Video File Formats

Separate subjective tests will be performed for different video sizes:

QVGA (320 x 240)

SD (525/60 or 625/50 line formats)

HD (1080i50, 1080i60, 1080p30, and 1080p25)

In the case of Rec. 601 video source, aspect ratio correction will be performed on the video sequences prior to writing the AVI files (SRC) or processing the PVS.

Note that in all subjective tests 1 pixel of video will be displayed as 1 pixel native display. No upsampling or downsampling of the video is allowed at the player.

Presently, VQEG has access to a set of video test sequences. For audio-video tests this database needs to be extended to include new source material containing both audio and video.

11.6 Source Sequence Processing Overview and Restrictions

The test material will be selected from a common pool of video sequences.

The source video can only be used in the testing if an expert in the field considers the quality to be good or excellent on an ACR-scale. The source video should have no visible coding artifacts. The final decision whether a source video sequence is admissible will be made by ILGs.

For QVGA and WVGA, all source material should be 25 or 30 frames per second progressive and there should be no more than one version of each source sequence for each resolution. If the test sequences are in an interlaced format, then agreed de-interlacing methods will be applied to transform the test sequence to a progressive format for QVGA. The de-interlacing algorithm will de-interlace Rec. 601 (or other, e.g., HDTVHYBRID) formatted video into a progressive format (e.g.,, i.e., QVGA). Algorithms will be proposed on the VQEG reflector and approved before processing takes place. This document contains algorithms already approved.

The source video should have no visible coding artifacts. 1080i footage may be de-interlaced and then used as SRC in a 1080p experiment. 1080p enlarged from 720p or 1080i enlarged from 1366x768 or similar are valid HDTVHYBRID source. 1080p 24fps film footage can be converted and used in any 1080i or 1080p experiment. Otherwise, tThe frame rate of the unconverted source must be at least as high as the target SRC (e.g., 720p 50fps can be converted and used in a 1080i 50fps experiment, but 720p 29.97fps cannot be converted and used in a 1080i 59.94fps experiment).

Uncompressed AVI files will be used for subjective and objective tests. Tools are being sought to convert from the various coding schemes to uncompressed AVI (see Annex VIII for a description of the tools used for conversion). The progressive test sequences used in the subjective tests should also be used by the models to produce objective scores.

It is important to minimize the processing of video source sequences. Hence, we will endeavor to find methods that minimize this processing (e.g., to perform de-interlacing and resizing in one step).

of 223

11.7 SRC Resolution, Frame Rate and Duration

Separate subjective tests will be performed for the following video sizes:

Resolution Pixels Scanning and Frame Rate Name

QVGA 320 x 240 Progressive, 25fps and 30fps QVGA25fps, QVGA30fps

WVGA <<XXX>> Progressive, 25fps and 30fps WVGA25fps, WVGA30fps

HD 1920x1080 Interlaced, 50fps and 59.94fps

Progressive, 25 and 29.97fps

1080i50fps, 1080i59.94fps

1080p25fps, 1080p29.94fps

11.8 Duration of Source Sequences

11.9 The length of the source sequence depends upon the resolution, as follows. Note that the duration of QVGA source sequences depends upon whether or not rebuffering will be considered in the experiment. : <<XXX>>

Resolution Raw SRC Edited SRC

QVGA, no rebuffering 14 seconds 10 seconds

QVGA, with rebuffering 20 seconds 16 seconds

WVGA 19 seconds 15 seconds

HD 19 seconds 15 seconds

Source content may be obtained from content stored on tape or on hard drive, provided it meets the quality requirements outlined in Section 6.1.2.

The SRC/PVS length and rebuffering condition are as follows:

SD/HD

SRC/PVS length: 15 seconds

Rebuffering is not allowed.

The original source (before editing) must should be at least 19 seconds, allowinginclude an extra 2 seconds at the beginning and the end.

.

QVGA

SRC/PVS length: 10 seconds with rebuffering disallowed

The original source should be at least 14 seconds, allowing extra 2 seconds at the beginning and the end.

SRC/PVS length: SRC is 15 seconds with rebuffering allowed. PVS can be up to 24s. The maximum time limit for freezing or rebuffering is 8 seconds.

The original source should be at least 19 seconds, allowing extra 2 seconds at the beginning and the end.

of 223

, 11/04/10,

Other 20-sec lengths were changed to 19-sec, due to greater content availability.

, 11/04/10,

Agreement has not been reached on whether all of these formats will be considered.

, 11/04/10,

854x480 proposed but no decision

11.10 Camera and Source Test Material Requirements: Quality, Camera, Use Restrictions.

The standard definition source test material should be in Rec. 601, DigiBeta, Betacam SP, or DV25 (3-chip camera) format or better. Note that this requirement does not apply to Categories 4 and 8 (Section 11.14) where the best available quality reference will be used. HD source test material should be taken from a pro -fessional grade HD camera (e.g., Sony HDR-FX1) or better. Original HD video sequences that have been compressed should show no impairments after being re-sampled to QVGA.

The VQEG hybrid project expresses a preference for all test material to be open source. At a minimum, source material must be available within the VQEG hybrid project to both proponents and ILG for testing (e.g., under non-disclosure agreement if necessary).

Source content may be obtained from content stored on tape or on hard drive, provided it meets the quality requirements outlined in this document.

Note: The source video will only be used in the testing if an expert in the field considers the quality to be good or excellent on an ACR-scale.

11.11 Source Conversion

This section describes approved methods for converting source video from one format to another used in this experiment. These tools are known to operate correctly.

11.11.1 Software Tools

Transformation of the source test sequences (e.g., from Rec. 601 525-line720p to CIFQVGA) shall be performed using [??TBD]Avisynth 2.5.5 or later and the most recent version of, VirtualDub 1.6.11, and ffdshow 20050303. Within VirtualDub, video sequences will be saved to AVI files by specifying the appropriate color space for both read and write (Video Color Depth), then selectingusing Video Compression option (Video -> Compressor) "ffdshow Video Codec", configured with theto be "Uncompressed RGB/YCbCr" decoder and the UYVY color space. For the Colour Depth (Video->Color Depth), the setting “4:2:2 YCbCr (UYVY)” is used as output format. The processing mode (Video-> ) is set to “Full processing mode”. <<XXX>>

11.11.2 Colour Space Conversion

In the absence of known color transformation matrices (e.g., such as what might be used by a video display adapter), the following algorithms will be used to transform between ITU-R Recommendation BT.601 Y'CB'CR' video and R'G'B' video that is in the range [0, 255]. The reference for these color transformation equations is pages 15-16 of ColorFAQ.pdf, which can be downloaded from:

http://www.poynton.com/PDFs/ColorFAQ.pdf

Transforming R'G'B' to Y'CB'CR'

1. Compute the matrix transformation:

of 223

, 11/04/10,

This paragraph is a proposal. It needs to be examined by the Hybrid group for agreement. No previous agreement on this issue appears to exist.

http://www.poynton.com/PDFs/ColorFAQ.pdf

2. Round to the nearest integer.

3. Clamp all three components to the range 1 through 254 inclusive (0 and 255 are reserved for synchronization signals in ITU-R Recommendation BT.601).

Transforming Y'CB'CR' to R'G'B'

1. Compute the matrix transformation:

2. Round to the nearest integer.

3. Clamp all three components to the range 0 through 255 inclusive.

11.11.3 De-Interlacing

De-interlacing will be performed when original material is interlaced and requires de-interlacing, using the de-interlacing function “KernelDeint” in Avisynth. If the de-interlacing using KernelDeint results in a source sequence that has serious artifacts, the Blendfield or Autodeint may be used as alternative methods for de-in -terlacing. Proprietary algorithms and/or hardware de-interlacing may be used if the above three methods prove unsatisfactory.

To check for de-interlacing problems (e.g. serious artifacts introduced by the de-interlacing process), the ILG will examine source content will be played back at normal speed, with the option to inspect possible prob-lems at reduced speed. <<XXX>>

11.11.4 Cropping & Rescaling

Table 2 lists recommend values for region of interests to be used for transforming images. These source regions should be centered vertically and horizontally. These source regions are intended to be applied prior to rescaling and avoid use of over scan video in most cases. These regions are known to correctly produce square pixels in the target video sequence. Other regions may be used, provided that the target video sequence contains the correct aspect ratio.

The source region selection must not include overscan — i.e. black borders from the overscan are not allowed. When the conversion recommended in Table 2 produces black borders, the crop should be changed, while maintaining the same ratio of horizontal pixels to vertical lines.

In the case of Rec. 601 video source, aspect ratio correction will be performed on the video sequences prior to writing the AVI files (SRC) or processing the PVScreating the SRC to be used in the Hybrid experiment.

Video sequences will be resized using Avisynth’s ‘LanczosResize’ function.

of 223

, 11/04/10,

Change needing agreement: the task of checking de-interlacing quality was assigned to the ILG, but this does not appear to be a decision from a meeting. The Hybrid Group should decide whether it is okay for proponents to help with the de-interlacing.

TABLE 2. Recommended Source Regions for Video Transformation

of 223

From To Avisynth Code

1080i: 1920x1080 QVGA: 320x240 square pixel KernelDeint(order=1)

crop(240,0,1440,1080)

LanczosResize(640,480)KernelDeint(order=1)

crop(240,0,1440,1080)

LanczosResize(640,480)

720p: 1280x720

60fps or 59.94fps

QVGA: 320x240 square pixel <<XXX>>

AssumeFPS(60)

ConvertFPS(30,zone=0)

crop(160,0,960,720)

LanczosResize(640,480)<<XXX>>

AssumeFPS(60)


crop(160,0,960,720)


720p: 1280x720

50fps

QVGA: 320x240 square pixel <<XXX>>


crop(160,0,960,720)

LanczosResize(640,480)<<XXX>>


crop(160,0,960,720)


525-line: 720x486 Rec. 601 QVGA: 320x240 square pixel KernelDeint(order=0)

crop(8,3,704,480)


crop(8,3,704,480)


625-line: 720x576 Rec. 601 QVGA: 320x240 square pixel KernelDeint(order=1)

crop(38,0,644,576)


crop(38,0,644,576)


1080i: 1920x1080 WVGA <<XXX>><<XXX>>

720p: 1280x720 WVGA <<XXX>><<XXX>>

of 223

, 11/04/10,

This code has been checked, as has the box below. However, there are multiple ways to perform this conversion. This code drops every other frame. This transformation needs to be approved.

11.11.5 Rescaling

Video sequences will be resized using Avisynth’s ‘LanczosResize’ function.

11.12 Video File Format: Uncompressed AVI in UYVY

All source and processed video sequences will be stored in Uncompressed AVI in UyVy..

Source material with a source frame rate of 29.97 fps will be manually assigned a source frame rate of 30 fps prior to being inserted into the common pool of QVGA or WVGA video sequences.

AVI is essentially a container format that consists of hierarchical chunks – which have their equivalent in C data structures – which are all preceded by a so called fourcc, a “four character code”, which indicates the type of chunk following. Some of the chunks are compulsory and describe the structure of the file, while some are optional and others contain the real video or audio data. The AVI container format which is used for the exchange of files in the VQEG hybrid project is originally defined by Microsoft as part of the RIFF file specification in:“http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcedshow/html/_dxce_dshow_avi_riff_file_reference.asp”

Other descriptions can be found in:http://www.opennet.ru/docs/formats/avi.txt http://www.the-labs.com/Video/odmlff2-avidef.pdf

These last two can be found on the mmpretest ftp server. All these links describe the AVI format in details as far as the container itself is concerned. Since the multitude of chunks is quite confusing, an example C code that reads and writes AVI files down to this level is also included in the archive on the mmpretest reflector (files avilib.c and avilib.h). Please note that the provided C code falls under the GNU Public License. Please refer to the license statements in the files themselves. The provided C code is very simple to use and should serve all needs of VQEG. Please note that the C code allows opening the data chunk with the UVVY data, but it does not decode this data. In fact, avilib does not know how to interpret these data. All it returns is a pointer to the data and some additional information like image sizes and frame rate. Interpretation of these data is up to the user and described in the following paragraphs.

A description of the UYVY chunk format which is to be used inside the AVI container can be found in http://www.fourcc.org/index.php?http%3A//www.fourcc.org/fccyvrgb.php and below.

UYVY is a YUV 4:2:2 format. The effective bits per pixel are 16. In the AVI main header (after the fourcc “avih”), a positive height parameter implies a top-down image (top line first).Two image pixels form one macro pixel and are stored in one 32bit word with the following byte ordering:

(lowest byte) U0 Y0 V0 Y1 (highest byte)

11.13 Source Test Video Sequence Documentation

Preferably, each source video sequence should be documented. The exact process used to create each source video sequence should be documented, listing the following information:

Camera specifications Source region of interest (if the default values were not used) Use restrictions (e.g., “open source”) De-interlacing method

of 223

http://www.fourcc.org/index.php?http%3A//www.fourcc.org/fccyvrgb.php

http://www.the-labs.com/Video/odmlff2-avidef.pdf

http://www.opennet.ru/docs/formats/avi.txt

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcedshow/html/_dxce_dshow_avi_riff_file_reference.asp

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wcedshow/html/_dxce_dshow_avi_riff_file_reference.asp

This documentation is desirable but not required.

11.14 Test Materials and Selection Criteria??TBD

The test material will be representative of a range of content and applications. The list below identifies the type of test material that forms the basis for selection of sequences.

The SRCS used in each experiment must cover a variety of content categories from this list. At least 6 categories of content must be included in each experiment. <<XXX>>

1) video conferencing:

available for research purposes only, NTIA (Rec 601 60Hz); BT (Rec 601 50Hz), Yonsei (QVGA and SD), FT (Rec 601 50Hz, D1)), NTT (Rec 601 60Hz, D1)

Currently available: NTIA, NTT, FT

2) movies, movie trailers:

(VQEG Phase II), Opticom, IRCCyN, (trailer equivalent, restricted within VQEG)

Currently available: Psytechnics, SVT, Opticom,

3) sports

available, 15-20 mins from Yonsei, Comcast), KDDI (7 min D1 and D2, other scenes also available), NTIA (Comcast), IRCCyN

Currently available: Yonsei, SVT, Psytechnics, Opticom

4) music video

(Intel ), IRCCyN

Currently available: NTIA

5) advertisement:

Currently available: Psytechnics, Opticom

6) animation:

graphics Phase I, cartoon Phase II; Opticom will send material to Yonsei), IRCCyN

Currently available: Opticom, NTIA

7) broadcasting news

head and shoulders and outside broadcasting). (available – Yonsei;, possible Comcast), IRCCyN

Currently available: KBS, Opticom

8) home video

FUB possibly, BT possibly, INTEL, NTIA). Must be captured with DV camera or better.

Currently available: NTIA, SwissQual, Yonsei

There will be no completely still video scenes in the test.

All test material should be sent to the content point of contact (Chulhee Lee, Yonsei) first and then it will be put on the ftp server by NTIA. Ideally the material should be converted before being sent to Chulhee Lee.

The source video will only be used in the testing if an expert in the field considers the quality to be good or excellent on an ACR-scale.

of 223

, 11/04/10,

Is there any other guidance wanted? What if the 6-content restriction cannot be met? The Hybrid Group should address this issue.

11.14.1 Selection of Test Material (SRC)

The ILG is responsible for selecting SRC material to be used in each subjective quality test. The VQEG Hybrid group will be responsible for deciding upon precise HRCs to be used in the testing. Section 5.3 provides basic guidelines on the process for selecting SRCs and HRCs together with a procedure for the distribution of test content.

of 223

12. Hypothetical Reference Circuits (HRC Creation and Sequence Processing) [??TBD]

The subjective tests will be performed to investigate a range of Hypothetical Reference Circuit (HRC) error conditions. These error conditions may include, but will not be limited to, the following:

Compression errors (such as those introduced by varying bit-rate, codec type, frame rate and so on)

Transmission errors

Post-processing effects

Live network conditions

Interlacing problems <<XXX>>

The overall selection of the HRCs will be done such that most, but not necessarily all, of the following conditions are represented.

The following constraints must be met by every PVS. These constraints were chosen to be easily checked by the ILG, and to provide proponents with feedback on their model's calibration intended search range. It is recommended that those who generate PVSs should use the recommended maximum limits. Then, it would be very unlikely that the PVSs would violate the required maximum limits and have to be replaced.

Maximum allowable deviation in luminance gain is +/- 20% (Recommended maximum deviation in luminance gain is +/- 10% when generating PVSs)Maximum allowable deviation in luminance offset is +/- 50 (Recommended maximum deviation in luminance offset is +/- 20 when generating PVSs)Maximum allowable Horizontal Shift is +/- 8 pixels for QVGA, +/- 16 pixels for SD/HDTV +/- 5 pixels (Recommended maximum Horizontal Shift is +/- 3 pixels when generating PVSs)Maximum allowable Vertical Shift is +/- 8 lines for QVGA, +/- 16 lines for SD/HDTV+/- 5 lines (Recommended maximum Horizontal Shift is +/- 3 pixels when generating PVSs)

No PVS may have visibly obvious scaling.The color space must appear to be correct (e.g., a red apple should not mistakenly rendered be rendered "blue" due to a swap of the Cb and Cr color planes). No more than 1/2 of a PVS may consist of frozen frames or pure blackuni-color frames (e.g., from over-the-air broadcast lack of delivery). When creating PVSs, a SRC with +2 second of extra content before and after should be used. All of the content visible in the PVS should be contained in the SRC. Pure uni-color frames (e.g., from over-the-air broadcast lack of delivery) must not occur in the first 2-seconds or the last 2-seconds of any PVS. The reason for this constraint, is that the viewers may be confused and mistake the uni-color for the end of sequence. It is recommended that the first half second and the last half second might not contain any noticeable freezing so that the evaluators might not be confused whether the freezing comes from impairments or the player.The field order must not be swapped (e.g., field one moved forward in time into field two, field two moved back in time into field one). The intent of this test plan, is that all PVSs will contain realistic impairments that could be encountered in real delivery of HDTV (e.g., over-the-air broadcast, satellite, cable, IPTV). If a PVS appears to be completely unrealistic, proponents or ILGs may request to remove or replace it. ILGs will make the final decision regarding the removal or replacement.

of 223

, 11/04/10,

Is this true? What interlacing problems will be examined?

12.1 Reference Encoder, Decoder, Capture, and Stream Generator

For hybrid models, multiple decoders/players can be used to generate PVSs as long as the decoders can handle the bit-stream data which the reference decoder can decode. Bit-streams data can be generated by any encoder as long as the reference decoder can decode the bit stream data. Number of reference decoders (for compatibility check): 1 reference decoder per codec.

Number of encoders: any encoders compatible with the reference decoder. It is preferred that more than one encoder is used.

Number of decoders (for subjective tests and inputs to hybrid models): any decoders compatible with the reference decoder. It is preferred that more than one decoder is used.

<<XXX>>

The reference decoder is JM16.1, as modified by Ghent for the Joint Effort Group.

The reference encoder & server is Xstreamer, as written by Ghent for the Joint Effort Group. This is open source and, available at http://xstreamer.atlantis.ugent.be.

The capturer (for capturing and removing headers) is Xstreamer.

The H.264 StreamGenerator (traceplay) will be used to receive pcap files, remove headers, and generate the PCAP bit stream, which can be decoded by the reference decoder.

12.2 Video Bit-Rrates (exemplexamplesary) <<XXX>>

QVGA: 64 kbps to 704 kbps (e.g. 64, 128, 192, 320, 448, 704)

SDTVWVGA: 128kbps to 6Mbit/s (e.g. 128, 256, 320, 448, 704, ~1M, ~1.5M, ~2M, 3M,~4M)

HDTV: 1Mbit/s to 30Mbit/s

12.3 Frame Rates <<XXX>>

For those codecs that only offer automatically set frame rate, this rate will be decided by the codec. Some codecs will have options to set the frame rate either automatically or manually. For those codecs that have options for manually setting the frame rate (and we choose to set it for the particular case) , 5 fps will be considered the minimum frame rate for VGA and CIF, and 2.5 fps for PDA/Mobile.

Mmanually set frame rates (constant frame rate) may include: QVGA: 30, 25, 15, 12.5, 10, 8, 5 fps

SDTVWVGA: 30, 25 fps (interlaced)No manual reduction of frame rate allowed

HDTV: 1080i 60 Hz (30 fps), 1080i 50 Hz (25 fps), 1080p (25 fps), 1080p (30 fps)No manual reduction of frame rate is allowed

Variable frame rates are acceptable for the HRCs. <<XXX>>The first 1s and last 1s of each QCIF PVS must contain at least two unique frames, provided the source content is not still for those two seconds. The first 1s and last 1s of each QVGA, SD and HD PVS must contain at least four unique frames, provided the source content is not still for those two seconds.

of 223

, 11/04/10,

Is this for QVGA only? Or also WVGA & HD? Hybrid group should decide & specify.

만든 이, 11/04/10,

Values in this section are copies from the MM and HD test plan and require agreement by Hybrid Group

만든 이, 11/04/10,

The bit rates are copied from the MM and HD test plans. They should be officially agreed upon by the Hybrid Group

, 11/04/10,

The reference decoder, etc., need to finalized by the Hybrid group at the November meeting. This information is currently tentative.

http://xstreamer.atlantis.ugent.be/

Care must be taken when creating test sequences for display on a PC monitor. The refresh rate can influence the reproduction quality of the video and VQEG Hybrid requires that the sampling rate and display output rate are compatible. For example: given a source frame rate of video is 30fps, the sampling rate is 30/X (e.g. 30/2 = sampling rate of 15fps). This is called frame rate. Then we upsample and repeat frames from the sampling rate of 15fps to obtain 30 fps for display output.

The intended frame rate of the source and the PVS must be identical.

12.4 Pre-Processing

The HRC processing may include, typically prior to the encoding, one or more of the following:

Filtering

Simulation of non-ideal cameras (e.g. mobile) <<XXX>>

Colour space conversion (e.g. from 4:2:2 to 4:2:0)

Interlacing of previously de-interlaced source. <<XXX>>

Down- and up-sampling <<XXX>>

This processing will be considered part of the HRC.

12.5 Post-Processing

The following post-processing effects may be used in the preparation of test material:

Colour space conversion

De-blocking

Decoder jitter

Down- and up-sampling <<XXX>>

De-interlacing of codec output including when it has been interlaced prior to codec input. <<XXX>>

12.6 Coding Schemes

Only the following cCoding sSchemes are as followswill be used: <<XXX>>

H.264 (MPEG-4 Part 10): QVGA, SDWVGA, HD

MPEG-2: SDHD only <<XXX>>

The following profiles are suggested: <<XXX>>

QVGA – H.264 baseline profile

WVGA H.264 – H.264 baseline or main profile

HD – H.264 High profile provided that the reference decoder can handle this

HD – MPEG-2 main and high profile

12.7 Rebuffering

Rebuffering is only allowed within QVGA experiments.

of 223

, 11/04/10,

These profiles are tentative, pending whether the working system can handle these profiles.

, 11/04/10,

This needs to be confirmed by the Hybrid group.

, 11/04/10,

Is this HRC of interest to the Hybrid Group? Only applies to HD experiments.

, 11/04/10,

Is this HRC of interest to the Hybrid Group? Or was this copied unintentionally from the HDTV test plan?

, 11/04/10,

Is this HRC of interest to the Hybrid Group? Or was this copied unintentionally from the HDTV test plan?

, 11/04/10,

Is this HRC of interest to the Hybrid Group? Only applies to HD experiments.

, 11/04/10,

Is this HRC of interest to the Hybrid Group? If so, specification of how this is to be done should be inserted.

12.8 Transmission Errors

Any transmission errors will be allowed as long as the corresponding PVSs meet the calibration limits.

The “Simulated Transmission Errors” and “Live Network Conditions” sub-sections provide guidance on transmission error HRC creation.

12.8.1

12.8.2 Simulated Transmission Errors

A set of test conditions (HRC) will include error profiles and levels representative of video transmission over different types of transport bearers:

Packet-switched transport (e.g., 2G or 3G mobile video streaming, PC-based wireline video streaming)

Circuit-switched transport (e.g., mobile video-telephony)

It is important that when creating HRCs using a simulator, documentation is produced detailing simulator settings (for circuit switched HRCs the error pattern for each PVS should also be produced).

Annex III II provides guidelines on the procedures for creating and documenting transmission error conditions.

Packet-switched transmission

HRCs will include packet loss with a range of packet loss ratios (PLR) representative of typical real-life scenarios.

In mobile video streaming, we consider the following scenarios:

1. Arrival of packets is delayed due to re-transmission over the air. Re-transmission is requested either because packets are corrupted when being transmitted over the air, or because of network congestion on the fixed IP part. Video will play until the buffer empties if no new (error-checked/corrected) packet is received. If the video buffer empties, the video will pause until a sufficient number of packets are buffered again. This means that in the case of heavy network congestion or bad radio conditions, video will pause without skipping during re-buffering, and no video frames will be lost.

2. Arrival of packets is delayed, and the delay is too large: These packets are discarded by the video client.

Note: A radio link normally has in-order delivery, which means that if one packet is delayed the following packets will also be delayed.

Note: If the packet delay is too long, the radio network might drop the packet.

3. Very bad radio conditions: Massive packet loss occurs.

4. Handovers: Packet loss can be caused by handovers. Packets are lost in bursts and cause image artifacts.

of 223

Note: This is valid only for certain radio networks and radio links, like GSM or HSDPA in WCDMA. A dedicated radio channel in WCDMA uses soft handover, which will not cause any packet loss.

Typical radio network error conditions are:

Packet delays between 100 ms and 5 seconds.

In PC-based wireline video streaming, network congestion causes packet loss during IP transmission.

In order to cover different scenarios, we consider the following models of packet loss:

1. Bursty packet loss. The packet loss pattern can be generated by a link simulator or by a bit or block error model, such as the Gilbert-Elliott model.

2. Random packet loss

3. Periodic packet loss.

Note: The bursty loss model is probably the most common scenario in a ‘normal’ network operation. However, periodic or random packet loss can be caused by a faulty piece of equipment in the network. Bursty, random, and periodic packet loss models are available in commercially-available packet network emulators.

Choice of a specific PLR is not sufficient to characterize packet loss effects, as perceived quality will also be dependent on codecs, content, packet loss distribution (profiles) and which types of video frames were hit by the loss of packets. For our tests, we will select different levels of loss ratio with different distribution profiles in order to produce test material that spreads over a wide range of video quality. To confirm that test files do cover a wide range of quality, the generated test files (i.e., decoded video after simulation of transmission error) will be:

1. Viewed by video experts to ensure that the visual degradations resulting from the simulated transmission error are spread over a range of video quality over different content;

2. Checked to ensure that degradations remain within the limits stated by the test plan (e.g., in the case where packet loss causes loss of complete frames, we will check that temporal misalignment remains with the limits stated by the test plan).

Circuit-switched transmission

HRCs will include bit errors and/or block errors with a range of bit error rates (BER) or/and block 1 error rates (BLER) representative of typical real-world scenarios. In circuit-switched transmission, e.g., video-telephony, no re-transmission is used. Bit or block errors occur in bursts.

In order to cover different scenarios, the following error levels can be considered:

Air interface block error rates: Normal uplink and downlink: 0.3%, normally not lower. High value uplink: 0.5%, high downlink: 1.0%. To make sure the proponents’ algorithms will handle really bad conditions up to 2%-3% block errors on the downlink can be used.

Bit stream errors: Block errors over the air will cause bits to not be received correctly over the air. A video telephony (H.223) bit stream will experience CRC errors and chunks of the bit stream will be lost.

Tools are currently being sought to simulate the types of error transmission described in this section.

Proponents are asked to provide examples of level of error conditions and profiles that are relevant to the industry. These examples will be viewed and/or examined after electronic distribution (only open source video is allowed for this).

1 Note that the term ‘block’ does not refer to a visual degradation such as blocking errors (or blockiness) but refers to errors in the transport stream (transport blocks).

of 223

12.8.3 Live Network Conditions

Simulated errors are an excellent means to test the behavior of a system under well defined conditions and to observe the effects of isolated distortions. In real live networks however usually a multitude of effects happen simultaneously when signals are transmitted, especially when radio interfaces are involved. Some effects like e.g. handovers, can only be observed in live networks.

The term "live network" specifies conditions which make use of a real network for the signal transmission. This network is not exclusively used by the test setup. It does not mean that the recorded data themselves are taken from live traffic in the sense of passive network monitoring. The recordings may be generated by traditional intrusive test tools, but the network itself must not be simulated.

Live network conditions of interest include radio transmission (e.g., mobile applications) and fixed IP transmission (e.g., PC-based video streaming, PC to PC video-conferencing, best-effort IP-network with ADSL-access). Live network testing conditions are of particular value for conditions that cannot confidently be generated by network simulated transmission errors (see section 15). Live network conditions should exhibit distortions representative of real-world situations that remain within the limits stated elsewhere in this test plan.

Normally most live network samples are of very good or best quality. To get a good proportion of sample quality levels, an even distribution of samples from high to low quality should be saved after a live network session.

Note: Keep in mind the characteristics of the radio network used in the test. Some networks will be able to keep a very good radio link quality until it suddenly drops. Other will make the quality to slowly degrade.

Samples with perfect quality do not need to be taken from live network conditions. They can instead be recorded from simulation tests.

Live network conditions as opposed to simulated errors are typically very uncontrolled by their nature. The distortion types that may appear are generally very unpredictable. However, they represent the most realistic conditions as observed by users of e.g. 3G networks.

Recording PVSs under live network conditions is generally a challenging task since a real hardware test setup is required. Ideally, the capture method should not introduce any further degradation. The only requirement on capture method is that the captured sequences conform to the video file requirements. in section 11.12 and 58.1.

For applications including radio transmissions, one possibility is to use a laptop with e.g. a built-in 3G network card and to download streams from a server through a radio network. Another possibility is the use of drive test tools and to simulate a video phone call while the car is driving. In order to simulate very bad radio coverage, the antenna may be wrapped with some aluminum foil (Editors note: This strictly a simulation again, but for the sake of simplicity it can be accepted since the simulated bad coverage is overlayed with the effects from the live network).

In order to prepare the PVSs the same rules apply as for simulated network conditions. The only difference is the network used for the transmission.

12.9 PVS Editing

The edited PVS must be ofhave the following durations:

Video Resolution Duration of Edited SRC and PVS

QVGA, no Rebuffering All edited SRC and PVS must be exactly 10 seconds

QVGA with Rebuffering All edited SRC must be exactly 16 seconds

Edited PVS must be between 16 and 24 seconds duration. An average duration of 20 seconds is recommended.

WVGA, no Rebuffering All edited SRC and PVS must be exactly 15 seconds

of 223

HD, no Rebuffering All edited SRC and PVS must be exactly 15 seconds

of 223

13. Any transmission errors will be allowed as long as the corresponding PVSs meet the calibration limits.

of 223

14.

of 223

15. Pausing with Skipping and Pausing without Skipping (freezing) for 10s and 15s PVSs

of 223

16.

of 223

17.

of 223

18. Recommended maximum freezing: 3s (mandatory limit: 5s)

of 223

19. Recommended maximum skipping: 3s (mandatory limit: 5s)

of 223

20. Recommended maximum total frame loss (maximum total frame loss= maximum frame loss at start + maximum frame loss at end): 1s (mandatory limit: 2s)

of 223

21. Recommended maximum total extra frame (including beginning and end): 1s (mandatory requirement: the entire PVS must be contained in the SRC used for encoding??).

of 223

22. Anything can happen in-between (freezing with/without skipping, skipping, fast forward) as long as they meet the aforementioned conditions. The video should not play backwards, because this is an unnatural impairment. However, the video may jump backwards in time in response to a transmission error, or display a portion of a previous frame along with the current frame.

of 223

23.

of 223

24. Frame Rates

of 223

25. For those codecs that only offer automatically set frame rate, this rate will be decided by the codec. Some codecs will have options to set the frame rate either automatically or manually. For those codecs that have options for manually setting the frame rate (and we choose to set it for the particular case), 5 fps will be considered the minimum frame rate for VGA and CIF, and 2.5 fps for PDA/Mobile.

of 223

26.

of 223

27. Manually set frame rates (constant frame rate) may include:

of 223

28. QVGA: 30, 25, 15, 12.5, 10, 8, 5 fps

of 223

29. SDTV: 30, 25 fps (interlaced)

of 223

30. HDTV: 1080i 60 Hz (30 fps), 1080i 50 Hz (25 fps), 1080p (25 fps), 1080p (30 fps)

of 223

31.

of 223

32. Variable frame rates are acceptable for the HRCs. The first 1s and last 1s of each QCIF PVS must contain at least two unique frames, provided the source content is not still for those two seconds. The first 1s and last 1s of each QVGA, SD and HD PVS must contain at least four unique frames, provided the source content is not still for those two seconds.

of 223

33.

of 223

34. Care must be taken when creating test sequences for display on a PC monitor. The refresh rate can influence the reproduction quality of the video and VQEG Hybrid requires that the sampling rate and display output rate are compatible. For example: given a source frame rate of video is 30fps, the sampling rate is 30/X (e.g. 30/2 = sampling rate of 15fps). This is called frame rate. Then we upsample and repeat frames from the sampling rate of 15fps to obtain 30 fps for display output.

of 223

35.

of 223

36. The intended frame rate of the source and the PVS must be identical.

of 223

37. Pre-Processing

of 223

38. The HRC processing may include, typically prior to the encoding, one or more of the following:

of 223

39. Filtering

of 223

40. Simulation of non-ideal cameras (e.g. mobile)

of 223

41. Colour space conversion (e.g. from 4:2:2 to 4:2:0)

of 223

42. Interlacing of previously de-interlaced source.

of 223

43. Down- and up-sampling

of 223

44. This processing will be considered part of the HRC.

of 223

45. Post-Processing

of 223

46. The following post-processing effects may be used in the preparation of test material:

of 223

47. Colour space conversion

of 223

48. De-blocking

of 223

49. Decoder jitter

of 223

50. Down- and up-sampling

of 223

51. De-interlacing of codec output including when it has been interlaced prior to codec input.

of 223

52. Coding Schemes

of 223

53. Coding Schemes are as follows:

of 223

54. H.264 (MPEG-4 Part 10): QVGA, SD, HD

of 223

55. MPEG-2: SD

of 223

56. Calibration and Registration

The following constraints must be met by every PVS. These constraints were chosen to be easily checked by the ILG, and to provide proponents with feedback on their model's calibration intended search range

Factor Limitation Other Details

Luminance Gain Maximum ± 20%

Luminance Offset Maximum ± 50

Horizontal Shift QVGA Maximum ± 8 pixels

SD WVGA Maximum ± 16 pixels

& HHD Maximum ± 16 pixels

Vertical Shift Maximum ± 5 lines

Spatial Scaling No visibly obvious scaling

Color Space Must appear correct For example, a red apple should not mistakenly rendered be rendered "blue" due to a swap of the Cb and Cr color planes.

Frozen Frames & Pure Uni-Color Frames

No more than ½ of a PVS. For example, from over-the-air broadcast lack of delivery.

First 2-sec and last 2-sec of edited PVS

May not contain pure uni-color frames.

The reason for this constraint is that the viewers may be confused and mistake the uni-color for the end of sequence.

Field Order Field order must not be swapped For example, field one moved forward in time into field two, field two moved back in time into field one.

SRC Video Pre-Roll When creating PVSs, a SRC with +2 second of extra content before and after should be used.

These ±2sec pre-roll will typically not be visible within the edited PVS. The intention is that the PVS matches the SRC without this ± 2sec pre-roll.These ±2sec pre-roll will typically not be visible within the edited PVS. The intention is that the PVS matches the SRC without this ± 2sec pre-roll.

Total Extra Frames All of the content visible in the edited PVS should must be contained within the SRC with the extraplus ± 2sec pre-roll. ± 2sec pre-roll.

Recommend ≤ 1 second

This includes both the beginning and the end.Thus, maximum total frame loss = maximum frame loss at start + maximum frame loss at end.

Total Frame Loss Maximum 2 seconds Recommend ≤ 1 second

This includes both the beginning and the end. Thus, total frame loss = maximum frame loss at start +

of 223

maximum frame loss at end.

Each Freeze Event (pausing without skipping)

Maximum 5 second duration Recommend ≤ 3 seconds

Each Skipping Event Maximum 5 seconds skipped Recommend ≤ 3 seconds

First 1-sec and last 1-sec of edited PVS

Must contain at least four unique frames, provided the source content is not still for those two seconds.

Note that “Total Frame Loss” and “Total Extra Frames” refer to the duration of the edited PVS. Anything can happen in-between (freezing with/without skipping, skipping, fast forward) as long as they meet the aforementioned conditions. The video should not play backwards, because this is an unnatural impairment. However, the video may jump backwards in time in response to a transmission error, or display a portion of a previous frame along with the current frame.

Figure 78.1 shows this three examples of total lost frames for a QVGA test with no rebuffering. The edited SRC and PVS are 10sec duration. The SRC are shown with the 2sec preroll before and after (i.e., beyond the dotted line). The arrows indicate the time alignment of the first and last frame of the PVS, with matching colors indicating where the PVS content matches the SRC content. In the top example, frames are lost from at the end of the edited PVS; in the middle example, frames are lost at the beginning of the edited PVS, and in the bottom example, frames are lost from both the beginning and the end.

Similarly, figure 8.2 shows three examples of total extra frames for a QVGA test with no rebuffering.

The loss or extra frames at both the beginning and end of the PVS must be considered (e.g., the bottom example of visuallyFigures 8.1 and 8.2).

Figure 8.1. Total frame loss, shown for 10sec QVGA SRC and PVS without rebuffering.

of 223

PVS, Loss at End

PVS, Loss at Beginning

PVS, Loss at Both Ends

2sec Preroll

2sec Preroll

2sec Preroll

2sec Preroll

2sec Preroll

2sec Preroll SRC

SRC

SRC10sec

Figure 8.2. Total extra frames, shown for 10sec QVGA SRC and PVS without rebuffering.

Maximum allowable deviation in luminance gain is +/- 20% (Recommended is +/- 10%)

Maximum allowable deviation in luminance offset is +/- 50 (Recommended is +/- 20)

Maximum allowable Horizontal Shift is +/- 8 pixels for QVGA, +/- 16 pixels for SD/HDTV

Maximum allowable Vertical Shift is +/- 5 lines (Recommended +/- 1 line)

No PVS may have visibly obvious scaling.

The color space must appear to be correct (e.g., a red apple should not mistakenly rendered be rendered "blue" due to a swap of the Cb and Cr color planes).

No more than 1/2 of a PVS may consist of frozen frames or pure uni-color frames (e.g., from over-the-air broadcast lack of delivery).

When creating PVSs, a SRC with +2 second of extra content before and after should be used. All of the content visible in the PVS should be contained in the SRC.

Pure uni-color frames (e.g., from over-the-air broadcast lack of delivery) must not occur in the first 2-seconds or the last 2-seconds of any PVS. The reason for this constraint, is that the viewers may be confused and mistake the uni-color for the end of sequence.

The field order must not be swapped (e.g., field one moved forward in time into field two, field two moved back in time into field one).

The intent of this test plan, is that all PVSs will contain realistic impairments that could be encountered in real delivery of HDTV (e.g., over-the-air broadcast, satellite, cable, IPTV). If a PVS appears to be completely unrealistic, proponents or ILGs may request to remove or replace it. ILGs will make the final decision regarding the removal or replacement.

Maximum allowable deviation in luminance gain is +/- 20%

Maximum allowable deviation in luminance offset is +/- 50

of 223

PVS, Extra Frames at Beginning

PVS, Extra Frames at Both Ends

2sec Preroll

2sec Preroll

SRC

SRC

SRC10sec

2sec Preroll

PVS, Extra Frames at End

2sec Preroll

2sec Preroll

2sec Preroll

Maximum allowable Horizontal Shift is +/- 8 pixels for QVGA, +/- 16 pixels for SD/HDTV Maximum allowable Vertical Shift is +/- 8 lines for QVGA, +/- 16 lines for SD/HDTV

No PVS may have visibly obvious scaling.The color space must appear to be correct (e.g., a red apple should not mistakenly rendered be rendered "blue" due to a swap of the Cb and Cr color planes). No more than 1/2 of a PVS may consist of frozen frames or pure uni-color frames (e.g., from over-the-air broadcast lack of delivery). When creating PVSs, a SRC with +2 second of extra content before and after should be used. All of the content visible in the PVS should be contained in the SRC. Pure uni-color frames (e.g., from over-the-air broadcast lack of delivery) must not occur in the first 2-seconds or the last 2-seconds of any PVS. The reason for this constraint, is that the viewers may be confused and mistake the uni-color for the end of sequence.

The field order must not be swapped (e.g., field one moved forward in time into field two, field two moved back in time into field one). The intent of this test plan, is that all PVSs will contain realistic impairments that could be encountered in real delivery of HDTV (e.g., over-the-air broadcast, satellite, cable, IPTV). If a PVS appears to be completely unrealistic, proponents or ILGs may request to remove or replace it. ILGs will make the final decision regarding the removal or replacement.

Measurements will only be performed on the portions of PVSs that are not anomalously severely distorted (e.g. in the case of transmission errors or codec errors due to malfunction).

Models must include calibration and registration if required to handle the following technical criteria (Note: Deviation and shifts are defined as between a source sequence and its associated PVSs. Measurements of gain and offset will be made on the first and last seconds of the sequences. If the first and last seconds are anomalously severely distorted, then another 2 second portion of the sequence will be used.):

maximum allowable deviation in offset is ±20

maximum allowable deviation in gain is ±0.1

maximum allowable Horizontal Shift is +/- 1 pixel

maximum allowable Vertical Shift is +/- 1 pixel

maximum allowable Horizontal Cropping is 30 pixels for HD, 12 pixels for SD, 6 pixels for VGA, and 3 pixels for QCIF (for each side).

maximum allowable Vertical Cropping is 20 pixels for HD, 12 pixels for SD, 6 pixels for QVGA, and 3 pixels for QCIF (for each side).

no Spatial Rotation or Vertical or Horizontal Re-scaling is allowed

no Spatial Picture Jitter is allowed. Spatial picture jitter is defined as a temporally varying horizontal and/or vertical shift. Calibration checks will only be performed on the portions of PVSs that are not anomalously severely distorted (e.g. in the case of transmission errors or codec errors due to malfunction).

For a description of offset and gain in the context of this testplan see Annex IX. This Annex also includes the method for calculating offset and gain in PVSs.

Experiment Design

of 223

The ILG will determine the test conditions and experiment design. The ILG will decide whether or not experiments are full matrix.

The maximum number of non-secret PVSs included in overall test by any single proponent laboratory is 20%.

For each proponent subjective test, no more than 50% of test sequences may be derived from a single proponent. This does not apply to PVSs created by the ILG or to common sequences.

The ILG will ensure that a similar number of PVSs from each type of error will be tested per image resolution. Different types of error conditions can be mixed between experiments to ensure a balance in the design of each individual experiment. <<XXX>>

The number of PVSs in each experiment depends upon the video resolution and whether rebuffering is included, as follows:

Video Resolution Rebuffering Number of PVSs per Session

QVGA No 160

QVGA Yes 90

WVGA No 120

HD No 120

The above numbers do not include the common set sequences. The above numbers [do / do not] <<XXX>> include the SRC. Note that the SRC must be shown and rated.

Note: see the definition of rebuffering in Section 2.

It is not allowed to mix different length SRC sequences in a single experiment (e.g., QVGA 10sec and 16-24sec SRC/PVS in the same session).

56.1 Video Sequence Naming Convention <<XXX>><<XXX>> Further study on 16-24s SRC/PVS (e.g., single evaluation values for 24 sec, user response to various length of PVSs). May propose a special test for rebuffering, including coding and transmission error impairments.

The edited SRC and PVS (as seen by subjects) must be named according to the following naming convention:

<resolution><test>_srcXX_hrcYYY.avi Where <resolution> is either ‘h’ for HD, ‘wv’ for WVGA, or “qv” for QVGA; <test> indicates the experiment number; XX indicates the source sequence number and YYY represents the PVS number. The leading characters (h, wv, qv) and all extensions (“avi” and “dat”) should be in lower cases. XX should be ‘00’ for the original video. Here are some examples:

h01_src02_hrc00.avi HD test #1, SRC #2, original video edited.wv02_src04_hrc03.avi WVGA test #2, SRC #4, HRC #3

of 223

, 11/04/10,

This entire section is a proposal, based off of the previous few experiments.

, 11/04/10,

Editors note read: “Further study on 16-24s SRC/PVS (e.g., single evaluation values for 24 sec, user response to various length of PVSs). May propose a special test for rebuffering, including coding and transmission error impairments.”

, 11/04/10,

Hybrid group must specify.

, 11/04/10,

What does “error” mean in this context? This is ambiguous,

56.2 Check on Bit-stream Validity

In order to make sure that all models will understand the bit-stream data with/without transmission errors, open-source reference decoder and reference IP analyzer will be used to check the admissibility of bit-stream data (FigsFigures. 89.15 and 89.62). <<XXX>>

referencedecoder

bit-stream datawithout transmission errors

Fig.ure 98.51. Data compliance test for bit-stream data without transmission errors.

reference IPanalyzer &decoder

bit-stream datawith transmission errors

Figure. 98.62. Data compliance test for bit-stream data transmission errors.

Note: see the definition of rebuffering in Section 2.

Reduced Reference Models must include temporal registration if the model needs it. Temporal misalignment of no more than +/-0.25s is allowed. Please note that in subjective tests, the start frame of both the reference and its associated HRCs are matched as closely as possible. Spatial offsets are expected to be very rare. It is expected that no post-impairments are introduced to the outputs of the encoder before transmission. Spatial registration will be assumed to be within (1) pixel. Gain, offset, and spatial registration will be corrected, if necessary, to satisfy the calibration requirements specified in this test plan.

The organizations responsible for creating the PVSs shall check that they fall within the specified calibration and registration limits. The PVSs will be double-checked by one other organization. After testing has been completed any PVS found to be outside the calibration limits shall be removed from the data analyzes. ILG will decide if a suspect PVS is outside the limits.

Program Web address

EncoderElecard Converter

StudioVersion 3.1

MPEG-2H.264

http://www.elecard.com(AVI -> TS)

Server VLC media playerVersion 0.9.9

TCPUDPRTP

http://www.videolan.org/vlc/streaming.html

of 223

Packet Capture

WiresharkVersion 1.2.0

http://www.wireshark.org/ (TS->PCAP)

RemovePCAPheader

PCAPtoTS.exe .pcap -> .ts Home made program (PCAP->TS)

Decoder JM12.4 JM http://iphome.hhi.de/suehring/tml/ (TS->.264->YUV)

of 223

http://iphome.hhi.de/suehring/tml/

57. Subjective Evaluation Procedure

57.1 The ACR Method with Hidden Reference

This section describes the test method according to which the VQEG Hybrid Perceptual Bitstream Project’s subjective tests will be performed. We will use the absolute category scale (ACR) ITU-T [Rec. P.910rev] for collecting subjective judgments of video samples. ACR is a single-stimulus method in which a processed video segment is presented alone, without being paired with its unprocessed (“reference”) version. The present test procedure includes a reference version of each video segment, not as part of a pair, but as a freestanding stimulus for rating like any other. During the data analysis the ACR scores will be subtracted from the corresponding reference scores to obtain DMOSs. This procedure is known as “hidden reference removal.”

57.1.1 General Description

The VQEG HDTV subjective tests will be performed using the Absolute Category Rating Hidden Reference (ACR-HR) method.

The selected test methodology is the Absolute Rating method – Hidden Reference (ACR-HR) and is derived from the standard Absolute Category Rating – Hidden Reference (ACR-HR) method [ITU-T Recommendation P.910, 1999.] The 5-point ACR scale will be used.

Hidden Reference has been added to the method more recently to address a disadvantage of ACR for use in studies in which objective models must predict the subjective data: If the original video material (SRC) is of poor quality, or if the content is simply unappealing to viewers, such a PVS could be rated low by humans and yet not appear to be degraded to an objective video quality model, especially a full-reference model. In the HR addition to ACR, the original version of each SRC is presented for rating somewhere in the test, without identifying it as the original. Viewers rate the original as they rate any other PVS. The rating score for any PVS is computed as the difference in rating between the processed version and the original of the given SRC. Effects due to esthetic quality of the scene or to original filming quality are “differenced” out of the final PVS subjective ratings.

In the ACR-HR test method, each test condition is presented once for subjective assessment. The test presentation order is randomized according to standard procedures (e.g., Latin or Graeco-Latin square or via computer). Subjective ratings are reported on the five-point scale:

5 Excellent

4 Good

3 Fair

2 Poor

1 Bad.

Figure 10.1 borrowed from the ITU-T P.910 (1999):

T1207460-95

10 s10 s~10 s ~10 s ~10 s

Ai Sequence A under test condition iBj Sequence B under test condition jCk Sequence C under test condition k

Grey GreyPict.Ai Pict.Bj Pict.Ck

voting voting voting

of 223

Figure 10.1 – ACR basic test cell, as specified by ITU-T P.910.

Viewers will see each scene once and will not have the option of re-playing a scene.

An example of instructions is given in an Annex I

The SRC/PVS length and rebuffering condition are as follows: <<XXX>>

SD/HD

SRC/PVS length: 15 seconds

Rebuffering is not allowed.

QVGA

SRC/PVS length: 10 seconds with rebuffering disallowed

SRC/PVS length: SRC is 16 seconds with rebuffering allowed. PVS can be up to 24s. The maximum time limit for freezing or rebuffering is 8 seconds.

It is not allowed to mix 10s and 16-24s SRC/PVS in the same session. <<XXX>> Further study on 16-24s SRC/PVS (e.g., single evaluation values for 24 sec, user response to various length of PVSs). May propose a special test for rebuffering, including coding and transmission error impairments.

Note: Rebuffering is freezing longer than 0.5s without skipping

Instructions to the evaluators provide a more detailed description of the ACR procedure. The instruction script appears in Annex I.

Application Different Video Formats and DisplaysViewing Distance, Number of Viewers per Monitor, and Viewer Position

The proposed Hybrid Perceptual/Bitstream Validation (HBS) test will examine the performance of objective perceptual quality models for different video formats (HD, SD, and QVGA). Section Error: Reference sourcenot found defines format and display types in detail. Video applications targeted in this test include the suite of IPTV services, internet video, mobile video, video telephony, and streaming video.

The test instructions request evaluators to maintain a specified viewing distance from the display device. The viewing distance is as follows:

QVGA: 4-6H and let the viewer choose within physical limits

SDWVGA: 6H (to be consistent with the table on page 4 of ITU-R Rec. BT.500-11)

HD: 3H

H=Picture Heights (picture is defined as the size of the video window)

Preferably, each test viewer will have his/her own video display. For WVGA and QVGA, it is required that each test viewer will have his/her own video display. The test room will conform to ITU-R Rec. BT.500-11 requirements. The test cabinet will conform to ITU-T Rec. P.910 requirements.

It is recommended that viewers be seated facing the center of the video display at the specified viewing distance. That means that viewer's eyes are positioned opposite to the video display's center (i.e. if possible, centered both vertically and horizontally). If two or three viewers are run simultaneously using a single display, then the viewer’s eyes, if possible, are centered vertically, and viewers should be centered evenly in front of the monitor.

of 223

, 11/04/10,

Both BT.500 and P.910 are specified (original in different locations within this document). One of these needs to be deleted.

57.2 Display Specification and Set-up

The subjective tests will cover two display categories: television (SD/HD) and multimedia (WVGA, QVGA). For multimedia, LCD displays will be used. For SD/HD television, LCD/ or CRT (professional) displays will be used. The display requirements for each category are now provided.

Note that in all subjective tests 1 pixel of video will be displayed as 1 pixel native display. No upsampling or downsampling of the video is allowed at the player.

Labs must post to the reflector what monitor they plan to use. VQEG members have 2 weeks to object.

57.2.1 QVGA and WVGA Requirements

For QVGA resolution content, this Test Plan requires that subjective tests use LCD displays that meet the following specifications:

Monitor Feature SpecificationDiagonal Size 17-24 inchesDot pitch < 0.30Resolution Native resolution (no scaling allowed)Gray to Gray Response Time (if specified by manufacturer, otherwise assume response time reported is white-black)

< 30 ms (<10 ms if based on white-black)

Color Temperature 6500KCalibration YesCalibration Method Eye One / Video Essentials DVDBit Depth 8 bits/colourRefresh Rate >= 60 HzStandalone/laptop StandaloneLabel TCO ‘06 or later

The LCD shall be set-up using the following procedure:

Use the autosetting to set the default values for luminance, contrast and colour shade of white.

Adjust the brightness according to Rec. ITU-T P.910, but do not adjust the contrast (it might change bal-ance of the colour temperature).

Set the gamma to 2.2.

Set the colour temperature to 6500 K (default value on most LCDs).

The scan rate of the PC monitor must be at least 60 Hz.

The LCD display shall be a high-quality monitor. Annex V contains a list of preferred LCD monitors for use in the subjective tests.

Video sequences will be displayed using a black border frame (0) on a grey background (128). The black border frame will be of the following size:

18 lines/pixels QVGA

of 223

<<XXX>> lines/pixels WVGA

The black border frame will be on all four sides.

57.2.2 SD Requirements

[57.2.3]

Viewing conditions should comply with those described in International Telecommunications Union Recommendation ITU-R BT.500-11. An example schematic of a viewing room is shown in Figure 1. Specific viewing conditions for subjective assessments in a laboratory environment are:

Ratio of luminance of inactive screen to peak luminance: 0.02

Ratio of the luminance of the screen, when displaying only black level in a completely dark room, to that corresponding to peak white: 0.01

Display brightness and contrast: set up via PLUGE (see Recommendations ITU-R BT.814 and ITU-R BT.815)

Maximum observation angle relative to the normal: 300

Ratio of luminance of background behind picture monitor to peak luminance of picture: 0.15

Chromaticity of background: D65

Other room illumination: low

The monitor to be used in the subjective assessments is a 19 in. (minimum) professional-grade moni-tor, for example a Sony BVM-20F1U or equivalent.

The viewing distance of 6H selected by VQEG falls in the range of 4 to 6 H, i.e. four to six times the height of the picture tube, compliant with Recommendation ITU-R BT.500-10.

Soundtrack will not be included.

If a HD LCD monitor is used for SDTV testing, the picture area should be centered and the non-pic-ture area should be black or mid-level gray (e.g, 128).

of 223

Figure 1. Example of viewing room.

57.2.3[57.2.4] HD Monitor Requirements

All subjective experiments will use LCD monitors and or professional CRT monitors. Only high-end consumer TV (Full HD) or professional grade monitors should be used. LCD PC monitors may be used, provided that the monitor meets the other specifications (below) and is color calibrated for video.

Given that the subjective tests will use different HD display technologies, it is necessary to ensure that each test laboratory selects an appropriate display and common set-up techniques are employed. Due to the fact that most consumer grade displays employ some kind of display processing that will be difficult to account for in the models, all subjective facilities doing testing for HDTV shall use a full resolution display.

All labs that will run viewers must post to the HDTV reflector information about the model to be used. If a proponent or ILG has serious technical objections to the monitor, the proponent or ILG should post the objection with detailed explanation within two weeks. The decision to use the monitor will be decided by a majority vote among proponents and ILGs.

Input requirements

HDMI (player) to HDMI (display); or DVI (player) to DVI (display)

HD-SDI (player) to HD-SDI (display)

Conversion (HDMI to HD-SDI or vice versa) should be transparent

If possible, a professional HDTV LCD monitor should be used. The monitor should have as little post-processing as possible. Preferably, the monitor should make available a description of the post-processing performed.

of 223

If the native display of the monitor is progressive and thus performs de-interlacing, then if 1080i SRC are used, the monitor will do the de-interlacing. Any artifacts resulting from the monitor’s de-interlacing are expected to have a negligible impact on the subjective quality ratings, especially in the presence of other degradations.

The smallest monitor that can be used is a 24” LCD.

A valid HDTV monitor should support the full-HD resolution (1920 by 1080). In other words, when the HDTV monitor is used as a PC monitor, its native resolution should be 1920 by 1080. On the other hands, most TV monitors support overscan. Consequently, the HDTV monitor may crop boundaries (e.g, 3-5% from top, bottom, two sides) and display enlarged pictures (see Figure 10.2). Thus, it is possible that the HDTV monitor may not display whole pictures, which is allowed.

The valid HDTV monitor should be LCD types. The HDTV monitor should be a high-end product, which provides adequate motion blur reduction techniques and post-processing which includes deinterlacing.

Labs must post to the reflector what monitor they plan to use; VQEG members have 2 weeks to object.

Figure 10.2. An Example of Overscan

57.2.4[57.2.5] Viewing Conditions

Viewing conditions should comply with those described in International Telecommunications Union Recommendation ITU-T Recommendation P.910, 1999ITU-R BT.500-11. An example schematic of a viewing room is shown in Figure 7.1. Specific viewing conditions for subjective assessments in a laboratory environment are:

of 223

Ratio of luminance of inactive screen to peak luminance: 0.02Ratio of the luminance of the screen, when displaying only black level in a completely dark room, to that corresponding to peak white: 0.01Display brightness and contrast: set up via PLUGE (see Recommendations ITU-R BT.814 and ITU-R BT.815)Maximum observation angle relative to the normal: 300 Ratio of luminance of background behind picture monitor to peak luminance of picture: 0.15Chromaticity of background: D65

Other room illumination: lowThe monitor to be used in the subjective assessments is a 19 in. (minimum) professional-grade monitor, for example a Sony BVM-20F1U or equivalent.The viewing distance of 6H selected by VQEG falls in the range of 4 to 6 H, i.e. four to six times the height of the picture tube, compliant with Recommendation ITU-R BT.500-10. Soundtrack will not be included. If a HD LCD monitor is used for SDTV testing, the picture area should be centered and the non-picture area should be black or mid-level gray (e.g, 128).

Figure 7.2. Example of viewing room.

57.3 Subjective Test Video Playback

All subjective tests for QVGA will where possible be run using the same software package, provided by Acreo. The software package will include the following components:

Entry system for evaluator details (e.g. name, age, gender)

Test screens (prompts to users, grey panel, ACR scale, response input, data capture, data storage)

Timing control

Correct video play-out check

Video player

of 223

describes the test method to be used in the VQEG Multimedia testing. Annex V also provides minimum computer specifications (including required OS) required when using this subjective test software package.

For the HD/SD testing, the video will use the full screen dimensions and no background panel or black border will be present. If a HD LCD monitor is used for SDTV testing, the picture area should be centered and the non-picture area should be black or mid-level gray.

Length of Sessions

57.4 Evaluators (Viewers)

Exactly 24 valid viewers per experiment will be used for data analysis.

Different subjective experiments will be conducted by several test laboratories. A valid viewer means a viewer whose ratings are accepted after post-experiment results screening. Post-experiment results screening is necessary to discard viewers who are suspected to have voted randomly. The rejection criteria verify the level of consistency of the scores of one viewer according to the mean score of all observers over the entire experiment. The method for post-experiment results screening is described in Annex IVVI. Only scores from valid viewers will be reported in the results spreadsheets as described in Section 4.22. In Section 4.1.10 a procedure is described to obtain ratings for 24 valid observers.

It is preferred that each viewer be given a different randomized order of video sequences where possible. Otherwise, the viewers will be assigned to sub-groups, which will see the test sessions in different randomized orders. A maximum of 6 viewers may be presented with the same ordering of test sequences per subjective test. For QVGA and WVGA, a different ordering is required for each viewer. For more information on the randomization process, see Error: Reference source not found.

Each viewer can only participate in 1 experiment (i.e. one experiment at one image resolution).

Only non-expert viewers will participate. The term non-expert is used in the sense that the viewers’ work does not involve video picture quality and they are not experienced assessors. They must not have participated in a subjective quality test over a period of six months.

Prior to a session, the observers should usually be screened for normal visual acuity or corrected-to-normal acuity and for normal color vision. Acuity will be checked according to the method specified in ITU-T P.910 or ITU-R Rec. 500, which is as follows. Concerning acuity, no errors on the 20/30 line of a standard eye chart3 should be made. The chart should be scaled for the test viewing distance and the acuity test performed at the same location where the video images will be viewed (i.e. lean the eye chart up against the monitor) and have the evaluators seated. Ishihara or Pseudo Isochromatic plates may be used for colour screening. When using either colour test please refer to usage guidelines when determining whether evaluators have passed (e.g. standard definition of normal colour vision in the Ishihara test is considered to be 17 plates correct out of a 38 plate test; ITU-T Rec. P.910 states that no more than 2 plates may be failed in a 12 plate test. Evaluators should also have sufficient familiarity with the language to comprehend instructions and to provide valid responses using the semantic judgment terms expressed in that language.

57.4.1.1 Instructions for Evaluators and Selection of Valid Evaluators

For many labs, obtaining a reasonably representative sample of evaluators is difficult. Therefore, obtaining and retaining a valid data set from each evaluator is important. The following procedures are highly recommended to ensure valid subjective data:

2 Test laboratories can keep data from invalid viewers if they consider this to be of valuable information to them but they must not include them in the VQEG data.

3 Grahm-Field Catalogue Number 13-1240.

of 223

Write out a set of instructions that the experimenter will read to each test viewer. The instructions should clearly explain why the test is being run, what the evaluator will see, and what the evaluator should do. Pre-test the instructions with non-experts to make sure they are clear; revise as necessary.

Explain that it is important for evaluators to pay attention to the video on each trial.

There are no “correct” ratings. The instructions should not suggest that there is a correct rating or provide any feedback as to the “correctness” of any response. The instructions should emphasize that the test is being conducted to learn viewers’ judgments of the quality of the samples, and that it is the viewer’s opinion that determines the appropriate rating.

If it is suspected that an evaluator is not responding to the video stimuli or is responding in a manner contrary to the instructions, their data may be discarded and a replacement evaluator can be tested. The experimenter will report the number of evaluators’ datasets discarded and the criteria for doing so. Example criteria for discarding subjective data sets are:

The same rating is used for all or most of the PVSs.

The evaluator’s ratings correlate poorly with the average ratings from the other evaluators (see Annex IV II).

Different subjective experiments will be conducted by several test laboratories. Exactly 24 valid viewers per experiment will be used for data analysis. A valid viewer means a viewer whose ratings are accepted after post-experiment results screening. Post-experiment results screening is necessary to discard viewers who are suspected to have voted randomly. The rejection criteria verify the level of consistency of the scores of one viewer according to the mean score of all observers over the entire experiment. The method for post-experiment results screening is described in Annex IV VI. Only scores from valid viewers will be reported.

The following procedure is suggested to obtain ratings for 24 valid observers:

1. Conduct the experiment with 24 viewers

2. Apply post-experiment screening to eventually discard viewers who are suspected to have voted randomly (see Annex IV). VI).

3. If n viewers are rejected, run n additional evaluators.

4. Go back to step 2 and step 3 until valid results for 24 viewers are obtained.

57.4.2 Viewing Conditions

For the QVGA testing, each test session will involve only one evaluator per display assessing the test material. Evaluators will be seated directly in line with the center of the video display at a specified viewing distance (see Section 4.1.2). The test cabinet will conform to ITU-T Rec. P.910 requirements.

57.4.3 Subjective Experiment Sessions

Each subjective experiment will include the same number of PVSs4 for the same type of experiment. The PVSs include both the common set of PVSs inserted in each experiment and the hidden reference (hidden SRCs) sequences, i.e. each hidden SRC is one PVS. The common set of PVSs will include the secret PVSs and secret source. The number of PVSs of the common set is 24.

In this scenario, an experiment will include the following steps:

4 This will allow conducting an ACR experiment within about 1 hour, including practice clips and a comfortable break during the experiment.

of 223

1. Introduction and instructions to viewer2. Practice clips: these test clips allow the viewer to familiarize with the assessment procedure and

software. They must represent the range of distortions in the experiment but with different contents than those used in the experiment. A number of 6 practice clips is suggested. Ratings given to practice clips are not used for data analysis.

3. Assessment of PVSs4. Short break5. Practice clips (this step is optional but advised to regain viewer’s concentration after the break)6. Assessment of PVSs

for each experimentThe SRCS used in each experiment must cover a variety of content categories as defined in Section 6.2. At least 6 categories of content must be included in each experiment.

A similar number of PVSs from each type of error will be tested per image resolution. The image resolutions are defined in Section 4.1.2. The different types of error conditions are defined in Section 6.1.3. However different types of error conditions can be mixed between experiments to ensure a balance in the design of each individual experiment.

57.4.4 Randomization

It is preferred that each evaluator be given a different randomized order of video sequences where possible. If this is not possible, the viewers will be assigned to sub-groups, which will see the test sessions in different randomized orders. A maximum of 6 evaluators may be presented with the same ordering of test sequences per subjective test.

For each subjective test, a randomization process will be used to generate orders of presentation (playlists) of video sequences. Playlists can be pre-generated offline (e.g. using separate piece of code or software) or generated by the subjective test software itself. In generating random presentation order playlists the same scene content may not be presented in two successive trials.

Randomization refers to a random permutation of the set of PVSs used in that test. Shifting is not permitted, e.g.Subject1 = [PVS4 PVS2 PVS1 PVS3]Subject2 = [PVS2 PVS1 PVS3 PVS4]Subject3 = [PVS1 PVS3 PVS4 PVS2] …

If a random number generator is used (as stated in section 4.1.1), it is necessary to use a different starting seed for different tests.

Example script in Matlab that generates playlists (i.e. randomized orders of presentation) is given below:

rand('state',sum(100*clock)); % generates a random starting seedNpvs=200; % number of PVSs in the testNsubj=24; % number of evaluators in the testplaylists=zeros(Npvs,Nsubj);for i=1:Nsubj

playlists(:,i)=randperm(Npvs);end

of 223

57.4.5 Test Data Collection

The responsibility for the collection and organization of the data files containing the votes will be shared by the ILG Co-Chairs and the proponents. The collection of data will be supervised by the ILG and distributed to test participants for verification.

57.5 Results Data Format

57.5.1 Results Data Format

The following format is designed to facilitate data analysis of the subjective data results file.

The subjective data will be stored in a Microsoft Excel 97-2003 (i.e., *.xls) spreadsheet. Each spreadsheet will contain all of the data for one experiment. The top row of this file will be a header. Each row below the header will contain one video sequence. The columns are as follows, in this containing the following columns in the following order: experiment number, SRC number, HRC number, file name, subject #1’s ACR score, subject #2’s ACR score, … subject #24’s ACR scorelab name, test identifier, test type, evaluator #, month, day, year, session, resolution, rate, age, gender, order, scene, HRC, ACR Score.

Missing data ACR values will be indicated by the value -9999 to facilitate global search and replacement of missing valuesleft blank.

Figure 10.3 contains an example, showing 12 of the 24 subjects’ scores, and only six PVS.

ExperimentSRC Num

HRC Num File SUBJECT'S RESULTS

1 1 1 hybrid1_s01_hrc01.avi 2 3 1 2 2 1 3 1 3 2 2 31 1 2 hybrid1_s01_hrc02.avi 2 2 1 2 1 2 3 2 3 3 1 21 1 3 hybrid1_s01_hrc03.avi 1 1 1 1 1 2 2 1 3 1 1 11 1 4 hybrid1_s01_hrc04.avi 1 1 1 1 1 1 3 1 1 1 1 11 1 5 hybrid1_s01_hrc05.avi 2 2 2 2 2 1 3 2 3 2 1 1

Figure 10.3. Format for subjective data spreadsheet.

Each Excel spreadsheet cell will contain either a number or a name. All names (e.g., test, lab, scene, hrc) must be ASCII strings containing no white space (e.g., space, tab). Where exact text strings are to be used, the text strings will be identified below in single quotes (e.g., ‘original’). In the Excel sheet, only data from valid viewers (i.e., viewers who pass the visual acuity and color tests) will be forwarded to the ILG and other proponents.

Below are definitions for the Excel spreadsheet columns:

Lab: Name of laboratory’s organization (e.g., CRC, Intel, NTIA, NTT, etc.). This abbreviation must be a single word with no white space (e.g., space, tab).

Test: Name of the test. Each test must have a unique name.Type: Name of the test category. [Note: exact text strings will be specified after individual test

categories have been finalized.] Distance: Viewing distance (e.g. 6to10H, 6to8H, 4to6H, 3to6H).Evaluator #: Integer indicating the evaluator number. Each laboratory will start numbering viewers at a

different point, to ensure that all viewers receive unique numbering. Starting points will be separated by 1000 (e.g., lab1 starts numbering at 1000, lab2 starts numbering at 2000, etc). Evaluators’ names will not be collected or recorded.

Month: Integer indicating month [1..12]

of 223

Day: Integer indicating day [1..31]Year: Integer indicating year [2004..2006]Session: Integer indicating viewing sessionEditor’s note: do we need a new column to indicate number of viewers in a test session [1..3]?Resolution: One of the following three strings: ‘hd’, ‘sd’, or ‘qvga’.Rate: A number indicating the frames per second (fps) of the original video sequence.Age: Integer number that indicates the viewer’s age.Gender: ‘f’ for female, ‘m’ for maleOrder: An integer indicating the order in which the evaluator viewed the video sequences [or trial

number, if scenes are ordered randomly].Scene: Name of the scene. All scenes from all tests must have unique names. If a single scene is

used in multiple tests (i.e., digitally identical files), then the same scene name must be used. Names shall be eight characters or fewer.

HRC: Name of the HRC. For reference video sequences, the exact text ‘reference’ must be used. All processed HRCs from all tests must have unique names. If a single HRC is used in multiple tests, then the same HRC name must be used.

ACR Score: Integer indicating the viewer’s ACR score (1, 2, 3, 4, or 5).

See for an example

57.5.2 Subjective Data Analysis

Difference scores will be calculated for each processed video sequence (PVS). A PVS is defined as a SRCxHRC combination. The difference scores, known as Difference Mean Opinion Scores (DMOS) will be produced for each PVS by subtracting the score from that of the hidden reference score for the SRC used to produce the PVS. Subtraction will be done per viewer. Difference scores will be used to assess the performance of each full reference and reduced reference proponent model, applying the metrics defined in Section 59.

For evaluation of no-reference proponent models, the absolute (raw) subjective score will be used. Thus, for each test sequence, only the absolute rating for the SRC and PVS will be calculated. Based on each viewer’s absolute rating for the test presentations, an absolute mean opinion score will be produced for each test condition. These MOS will then be used to evaluate the performance of NR proponent models using the metrics specified in Section 59.

of 223

58. Objective Quality Models

Figures. 117.1 to 11-.3 show input parameters for FR, RR and NR hybrid perceptual bit-stream models. Fig. 4 illustrates the input parameters for the various models. Fig. 7.5 shows input parameters for P.NAMS and P.NBAMS. Fig. 78.6 4 illustrates how bit-stream data and PVSs are captured.

Figure. 117.1. Input parameters for FR hybrid perceptual bit-stream models.

Figure. 711.2. Input parameters for RR hybrid perceptual bit-stream models.

of 223

Fig.ure 711.3. Input parameters for NR hybrid perceptual bit-stream models.

Fig. 7.4. Input parameters for various models. [??TBD] What’s the difference between the circle and the triangle in the last column?

of 223

headerextractor

bit-stream data(trace dump with arrival time)

P.NBAMSbit-stream data(trace dump with arrival time)

P.NAMS

Fig. 7.5. Inputs for P.NAMS and P.NBAMS.

Figure. 117.64. Bit-stream capture and video capture procedure.

[Editor’s note: the following paragraph has to be decided].

In order to make sure that all models will understand the bit-stream data with/without transmission errors, open-source reference decoder and reference IP analyzer will be used to check the admissibility of bit-stream data (Figs. 7.7-8).

referencedecoder

bit-stream datawithout transmission errors

Fig. 7.7. Data compliance test for bit-stream data without transmission errors [??TBD].

of 223

reference IPanalyzer &decoder

bit-stream datawith transmission errors

Fig. 7.8. Data compliance test for bit-stream data transmission errors [??TBD].

Model Type and Model Requirements

VQEG Hybrid has agreed that the following types of??TBDP.NAMS, P.NBAMS??, Full Reference, Reduced Reference and No Reference hybrid perceptual bit-stream models may be submitted for evaluation:

Full Reference hybrid perceptual bit-stream

Reduced Reference hybrid perceptual bit-stream

No Reference hybrid perceptual bit-stream

No Reference.

Decoded signals (PVS) along with bit-stream data will be inputs to the hybrid models. Models which do not make use of these decoded signals (PVS) will not be considered as Hybrid Models. This test plan is not intended to evaluate P.NAMS and P.NBAMS models.

The side-channels allowable for the RR hybrid perceptual bit-stream models are:

QVGA: (10kbps, 64kbps)

SDWVGA: (15kbps, 56kbps, 128kbps, 256kbps)

/HD (H.264): (15kbps, 56kbps, 128kbps, 256kbps)

SD (MPEG2): (10kbps, 64kbps)<<XXX>>

Note that for each side-channel condition the limits defined here represent the maximum allowable side-channel data rate. For example, where the side-channel is limited to10 kbps, then valid side-channels are those that use a data rate of <=10 kbps and any data rates above 10 kbps are invalid. It is noted that 1kbps represents 1024 bits per second.

If there are proponents for NR models, such models will be also validated.

Proponents may submit one model of each type for all image size conditions. Note that where multiple models are submitted, additional model submission fees may apply. A model does not need to handle all video resolutions. For example, a model may be submitted that only handles QVGA.

Note: HD models must be able to handle both codecs (i.e., H.264 and MPEG-2).

58.1 Model Input and Output Data Format

Video will be full frame, full frame rate. The progressive and interlaced video format will be used in the test. The lengths of SRCs and PVSs are as follows:

58.1.1 No-Reference Hybrid Perceptual Bit-Stream Models and No-Reference Models

The NR Hybrid and NR models will take as input an ASCII file listing the processed video sequence files and the bit-stream data in PCAP files. Each line of this file has the following format:

of 223

, 11/04/10,

The bandwidths for each resolution need to be confirmed by the Hybrid Group, including whether the bit-rate target depends upon the video resolution.

<processed-file> <pcap-file>

where <processed-file> is the name of a processed video sequence file, and <pcap-file> contains the bit-stream data. File names may include a path. For example:

wv02_src04_hrc03.avi wv02_src04_hrc03.dat

or if paths are specified:

D:\video\wv02_src04_hrc03.avi D:\video\wv02_src04_hrc03.dat

SRC and PVS length for SD/HD: 15 sec (fixed)

SRC and PVS length for QVGA: 10 sec when the length is fixed (no rebuffering)

SRC and PVS length for QVGA: 16 seconds for SRCs and up to 24 seconds for PVSs when rebuffering is allowed.

Rebuffering is defined as freezing longer than 0.5 second without skipping.

7.2.1 P.NAMS

Models in accordance with P.NAMS will receive TS, RTP, UDP, and IP headers. They may also optionally receive long freezing [proposal by NTT, ??TBD]. Furthermore, models may receive encapsulated bit-streams [proposal by BT, TBD].

7.2.2 P.NBAMSModels in accordance with P. NBAMS will receive TS, RTP, UDP, IP headers, and ES information. They may also optionally receive long freezing [proposal by NTT, ??TBD]. Furthermore, models may receive encapsulated bit-streams [proposal by BT, ??TBD].

7.2.3 Full reference hybrid perceptual bit-stream models

In addition to the input parameters received by P.NBAMS, tThe FR hybrid perceptual bit-stream model will be giventake as input an ASCII file listing pairs of video sequence files to be processed and the associated bit-stream data in PCAP files. Each line of this file has the following format:

<source-file> <processed-file> <pcap-file>

where <source-file> is the name of a source video sequence file and <processed-file> is the name of a processed video sequence file, whose format is specified in section 11.12 of this document. File names may include a path. Ffor example, an input file should adhere to the following naming convention:

wv02_src04_hrc00.avi wv02_src04_hrc03.avi wv02_src04_hrc03.dat

or if paths are specified:

D:\video\wv02_src04_hrc00.avi D:\video\wv02_src04_hrc03.avi D:\video\wv02_src04_hrc03.dat

/video/hXX_YYY.avi /video/hXX_YYY.avi (Unix) or\video\hXX_YYY.avi \video\hXX_YYY.avi (Windows)

/video/sXX_YYY.avi /video/s_YYY.avi (Unix) or\video\sXX_YYY.avi \video\sXX_YYY.avi (Windows)

of 223

/video/qvXX_YYY.avi /video/qvXX_YYY.avi (Unix) or\video\qvXX_YYY.avi \video\qvXX_YYY.avi (Windows) /video/qcXX_YYY.avi /video/qcXX_YYY.avi (Unix) or\video\qcXX_YYY.avi \video\qcXX_YYY.avi (Windows)

where ‘h’ represents HD, ‘s’ represents SD, “qv” represents QVGA, “c” represents QCIF, XX indicates the test number and YYY represents the video sequence index. The leading characters (v,c,qv,qc) and all extensions (“avi” and “dat”) should be in lower cases.

The output file is an ASCII file created by the model program, listing the name of each processed sequence and the resulting Video Quality Rating (VQR) of the model. The contents of the output file should be flushed after each sequence is processed, to allow the testing laboratories the option of halting a processing run at any time. Each line of the ASCII output file has the following format:

< source-file > <processed-file> VQR

Where < source-file > is the name of the source file run through this model, without any path information; and <processed-file> is the name of the processed sequence run through this model, without any path information. VQR is the Video Quality Ratings produced by the objective model. For the input file example, this file contains the following:

h01_ref.avi h01_001.avi 0.150 h01_ref.avi h01_002.avi 1.304 h01_ref.avi h01_003.avi 0.102 h01_ref.avi h01_004.avi 2.989

Each proponent is also allowed to output a file containing Model Output Values (MOVs) that the proponents consider to be important. The format of this file will be

h01_001.avi 0.150 MOV1 MOV2,… MOVN




7.2.4 Reduced-reference Hybrid Perceptual Bit-stream Models

In an effort to limit the amount of variations and in agreement with all proponents attending the VQEG meeting consensus was achieved to allow only downstream video quality models.

58.1.1.1 7.2.4.1 Downstream Model – Original Video Processing:

The software (model) for the original video side will be given the original test sequence in the final file format and produce a reference data file. The amount of reference information in this data file will be evaluated in order to estimate the bit rate of the reference data and consequently assign the class of the method (Section 7.1). The input file format of the full-reference model will be used for the RR model for the original video side. Deterministic RR models for the original video side may ignore the processed video file name which is the second argument. For example, given an input file:

/video/wv02_src04_hrc00.avi hXX_YYY.avi /video/wv02_src04_hrc01.avi wv02_src04_hrc01.pcaphXX_ZZZ.avi (Unix example, for Windows OS the path

conforms to Windows format)

of 223

Then, the model should produce reference data files whose file names are made in the following way:

wv02_src04_hrc00 /video/hXX_YYY_BBB.dat (deterministic models) orwv02_src04_hrc00/video/hXX_YYY_ZZZ_BBB.dat (deterministic and non-deterministic

models)

where BBB indicates side-channel bandwidth in kbps. For example, for a VGA RR model with the 10kbps side channel, the output file names should be as follows:

hXX_001_10.dat hXX_002_10.dat

or

hXX_001_023_10.dat hXX_002_100_10.dat.

The model should save the output files in the current directory. The ILG should make sure that PVS files are not available for the software for the original video side.

58.1.1.2 7.2.4.2 Downstream Model – Processed Video Processing:

In addition to the input parameters received by P.NBAMS, the software (model) for tThe processed video side will be given the processed test sequence in the final file format, a PCAP file and a reference data file that contains the reduced-reference information (see Model Original Video Processing). The input file format of the full-reference model will be used for the model for the processed video side. The format of this file will be

/video/hXX_YYY.avi /video/hXX_ZZZ.avi

where h indicates HD resolution, XX indicates the test number, YYY represents the source video sequence index and ZZZ represents the processed video sequence index. Then, the model should make reference data file names as follows:

/video/hXX_YYY_BBB.dat (deterministic models) OR

/video/hXX_YYY_ZZZ_BBB.dat (deterministic and non-deterministic models)

where BBB indicates side-channel bandwidth in kbps.

Finally, the model should use the processed video file and the reference data file, and produce a VQR for the processed video sequence. The ILG should make sure that SRC files are not available for the software for the processed video side.

of 223

The output file format of the RR model will be identical with that of the FR model (Section 7.2.1).

58.1.1.3 Optional 7.2.4.3 Input Parameters for RR hybrid perceptual bit-stream models.

Some RR models, the identical software may generate and process reference data files at various side-channel bandwidths. In this case, the software needs information on side-channel bandwidth. In order to provide the information, the software (model) for the original video side will be given two arguments as follows:

CompanyName_hRRsrc.exe hXX.txt BBB

where hXX.txt is the input file name, XX indicates the test number and BBB indicates side-channel bandwidth in kbps.

The software (model) for the processed video side will be given two arguments as follows:

CompanyName_hRRpvs.exe hXX.txt BBB

58.1.2 Output File Format – All Models

The output file format for all models is a white-space delimited ASCII file created by the model program. This output file must list only the name of each processed sequence and the resulting Video Quality Rating (VQR) of the model. The contents of the output file should be flushed after each sequence is processed, to allow the testing laboratories the option of halting a processing run at any time. Each line of the ASCII output file has the following format:

<processed-file> VQR

Where < source-file > is the name of the source file run through this model, without any path information; and <processed-file> is the name of the processed sequence run through this model, without any path information. VQR is the Video Quality Ratings produced by the objective model. For the input file example, this file contains the following:

wv02_src04_hrc01.avi 0.150wv02_src04_hrc02.avi 1.304wv02_src04_hrc03.avi 0.102wv02_src04_hrc04.avi 2.989

Each proponent is also allowed to output a file containing Model Output Values (MOVs) that the proponents consider to be important.

of 223

58.1.3 7.2.5 No-Reference Hybrid Perceptual Bit-Stream Models

58.1.4

[58.1.5] In addition to the input parameters received by P.NBAMS, the NR model will be also given an ASCII file listing only processed video sequence files. Each line of this file has the following format:

58.1.5[58.1.6]

58.1.6[58.1.7] <processed-file>

58.1.7[58.1.8]

[58.1.9] where <processed-file> is the name of a processed video sequence file, whose format is specified in section 11.12 of this document. File names may include a path. For example, an input file should adhere to the following naming convention:

58.1.8[58.1.10]

[58.1.11] /video/hXX_001.avi

/video/hXX_002.avi

The output file format of the NR model will take the form

<processed-file> VQR

where <processed-file> is the name of the processed sequence run through this model, without any path information. VQR is the Video Quality Ratings produced by the objective model.

NR models will be required to predict the perceptual quality of both the source and processed video files used in subjective quality tests.

of 223

58.2 Submission of Executable Model

For each video format (QVGA, SDWVGA, and HD), a set of 2 source and processed video sequence pairs will be used as test vectors. They will be available for downloading on the VQEG web site http://www.vqeg.org/.Each proponent will send an executable of the model and the test vector outputs to the ILG by the date specified in action item “Proponents submit their models (executable and, only if desired, encrypted source code)”the schedule of Section 5.3. The executable version of the model must run correctly on one of the two following computing environments:

SUN SPARC workstation running the Solaris 2.3 UNIX operating system (SUN OS 5.5). [Ed. Note: The used of SUN workstation should be agreed]

WINDOWS 2000 workstation and Windows XP, Windows Vista, Windows 7

Any operating system if a computer is provided by the proponent. <<XXX>>

The use of other platforms will have to be agreed upon with the independent laboratories prior to the submission of the model.

The ILG will verify that the software produces the same results as the proponent with a maximum error of plus or minus 0.0001% of the proponents reported value. See Annex X for requirements on non-deterministic models. A maximum of 5 randomly selected files will be used for verification. If greater errors are found, the independent and proponent laboratories will work together to correct them. If the errors cannot be corrected, then the ILG will review the results and recommend further action.

58.3 Registration

Measurements will only be performed on the portions of PVSs that are not anomalously severely distorted (e.g. in the case of transmission errors or codec errors due to malfunction).

FR and RR Hybrid Models must include calibration and registration if required to handle all of the following technical criteriacalibration (registration) limitations identified in the HRC section. (Note: Deviation and shifts are defined as between a source sequence and its associated PVSs. Measurements of gain and offset will be made on the first and last seconds of the sequences. If the first and last seconds are anomalously se -verely distorted, then another 2 second portion of the sequence will be used.):

maximum allowable deviation in offset is ±20

maximum allowable deviation in gain is ±0.1

maximum allowable Horizontal Shift is +/- 1 pixel

maximum allowable Vertical Shift is +/- 1 pixel

maximum allowable Horizontal Cropping is 30 pixels for HD, 12 pixels for SD, 6 pixels for VGA, and 3 pixels for QCIF (for each side).

maximum allowable Vertical Cropping is 20 pixels for HD, 12 pixels for SD, 6 pixels for QVGA, and 3 pixels for QCIF (for each side).

no Spatial Rotation or Vertical or Horizontal Re-scaling is allowed

no Spatial Picture Jitter is allowed. Spatial picture jitter is defined as a temporally varying horizontal and/or vertical shift.

For a description of offset and gain in the context of this testplan see Annex IX. This Annex also includes the method for calculating offset and gain in PVSs.

of 223

, 11/04/10,

Previously listed operating systems are obsolete. ILG must specify anew. Hybrid group must agree.

http://www.vqeg.org/

No Reference Models should not need calibration.

Reduced Reference Models must include temporal registration if the model needs it. Temporal misalignment of no more than +/-0.25s is allowed. Please note that in subjective tests, the start frame of both the reference and its associated HRCs are matched as closely as possible. Spatial offsets are expected to be very rare. It is expected that no post-impairments are introduced to the outputs of the encoder before transmission. Spatial registration will be assumed to be within (1) pixel. Gain, offset, and spatial registration will be corrected, if necessary, to satisfy the calibration requirements specified in this test plan.

The organizations responsible for creating the PVSs shall check that they fall within the specified calibration and registration limits. The PVSs will be double-checked by one other organization. After testing has been completed any PVS found to be outside the calibration limits shall be removed from the data analyzes. ILG will decide if a suspect PVS is outside the limits.

of 223

59. Objective Quality Model Evaluation Criteria <<XXX>><<XXX>> This entire section is tentative. A proposal is pending, to replace this section with the HDTV Test Plan’s objective quality model evaluation criteria <<XXX>>

This chapter describes the evaluation metrics and procedure used to assess the performance of an objective video quality model as an estimator of video picture quality in a variety of applications.

59.1 Post Subjective Testing Elimination of SRC or PVS

We recognize that there could be potential errors and misunderstandings implementing this Hybrid test plan. No test plan is perfect. Where something is not written or written ambiguously, this fault must be shared among all participants. We recognize that ILG or Proponents who make a good faith effort to have their subjective test conform to all aspects of this test plan may unintentionally have a few PVSs that do not conform (or may not conform, depending upon interpretation).

After model & dataset submission, SRC or HRC or PVS can be discarded if and only if:

The discard is proposed at least one week prior a face-to-face meeting and there is no objection from any VQEG participant present at the face-to-face meeting (note: if a face-to-face meeting cannot be scheduled fast enough, then proposed discards will be discussed during a carefully scheduled audio call); or

The discard concerns a SRC no longer available for purchase, and the discard is approved by the ILG; or

The discard concerns an HRC or PVS which is unambiguously prohibited by Section 7 ‘HRC Creation and Sequence Processing’, and the discard is approved by the ILG.

The discard concerns a SRC and in the opinion of the ILG the poor MOS values for these source sequences are due to inferior quality then they shall be removed and not included in the subsequent data analysis.

Objective models may encounter a rare PVS that is slightly outside the proponent’s understanding of the test plan constraints. <<XXX>>

Prior to any data analysis, the ILG will perform an inspection of the subjective test data. Any source sequences presented in the test with a MOS rating of <4 will be identified and the file will be examined. If, in the opinion of the ILG the poor MOS values for these source sequences are due to inferior quality then they shall be removed and not included in the subsequent data analysis. This data inspection will be completed prior to proponents submitting their objective data to the ILG.

After testing has been completed any PVS found to be outside the calibration limits shall be removed from the data analyzes. ILG will decide if a suspect PVS is outside the limits.

59.2 Evaluation Procedure

The performance of an objective quality model is characterized by three prediction attributes: accuracy, monotonicity and consistency.

The statistical metrics root mean square (rmsRMSE) error, Pearson correlation, and outlier ratio (OR) together characterize the accuracy, monotonicity and consistency of a model’s performance. These statiscical metrics are named evaluation metrics in the following. The calculation of each statistical metric is performed along with its 95% confidence intervals. To test for statistically significant differences among the performance of various models, the F-test will be used for each evaluation metric.

of 223

, 11/04/10,

This is a proposal. Text taken from HDTV Test Plan. A procedure should be agreed to, whether it is this one or another. The Hybrid Group needs to examine this text. The SRC action was previously in this document, and appears to have been approved.

만든 이, 11/04/10,

Evaluation criteria for monitoring applications? Metrics of the HBS test are used for monitoring video quality, so a ‘good’ metric could be a metric that detects severe visual degradations instead of estimating MOS scores? Jens discussed this during breakfast at the Kyoto meeting. To be discussed

The evaluation metrics are calculated using the objective model outputs and the results from viewer subjective rating of the test video clips. The objective model provides a single number (figure of merit) for every tested video clip. The same tested video clips get also a single subjective figure of merit. The subjective figure of merit for a video clip represents the average value of the scores provided by all evaluators viewing the video clip.

Objective models cannot be expected to account for (potential) differences in the subjective scores for different viewers or labs. Such differences, if any, will be measured, but will not be used to evaluate a model’s performance. “Perfect” performance of a model will be defined so as to exclude the residual variance due to within-viewer, between-viewer, and between-lab effects

The evaluation analysis is based on DMOS scores for the FR and RR models, and on MOS scores for the NR model. Discussion below regarding the DMOS scores should be applied identically to MOS scores. For simplicity, only DMOS scores are mentioned for the rest of the chapter.

The objective quality model evaluation will be performed in three steps. The first step is a monotonic rescaling of the objective data to better match the subjective data. The second calculates the performance metrics for the model and their confidence intervals. The third tests for differences between the performances of different models using the F-test.

59.3 PSNR <<XXX>>

PSNR will be calculated to provide a performance benchmark. Proponents are encouraged to calculate PSNR Ideally, PSNR should be calculated with a spatial registration accuracy 0.1 pixel. If this is not possible, then a maximum registration tolerance of 0.5 pixel spatial accuracy is required. The Pearson correlation evaluation metric defined in this section will be applied to determine the predictive performance of PSNR and this will be reported in the final report.

Due to complexity of some HRCs, it could be possible that there are spatial or temporal misalignments and gain/offset variations. Ideally, PSNR should be calculated after compensation of these effects. Therefore, a modified version of the standard PNSR is considered in this test plan. Here are the details of its computation:

o First step is a spatial filtering (median filter 3x3) applied on both source and processed video in order to remove the effects of noise in the capturing process,

o Second step consists in performing a first global spatial and temporal alignment searching the temporal and spatial shifts that provides the maximum SNR using the first second of the PVS under test. These shifts are then used to estimate the gain/offset alignment that provides the maximum SNR using the same first second of the PVS. Spatial and gain/offset align-ments are considered to be constant for the overall sequence. The PVS is therefore realigned using the estimated values before next step.

o Third step is a secondary temporal alignment process done locally (e.g. frame by frame and using a search window of several seconds). Each frame of the PVS is associated to one frame of the SRC.

o Finally, the usual PSNR is calculated on realign data but considering only Y component.

59.4 Data Processing

59.4.1 Subjective Data AnalysisVideo Clips and Scores Used in Analysis

Difference scores will be calculated for each processed video sequence (PVS). A PVS is defined as a SRCxHRC combination. The difference scores, known as Difference Mean Opinion Scores (DMOS) will be produced for each PVS by subtracting the score from that of the hidden reference score for the SRC used to

of 223

만든 이, 11/04/10,

Taken from the final MM report

, 11/04/10,

This section appears not to have been agreed upon – see comment below. Consider using ITU-T J.340, which was used by MM, RRNR-TV and HD.

produce the PVS. Subtraction will be done per viewer. Difference scores will be used to assess the performance of each full reference and reduced reference proponent model, applying the evaluation metrics defined in Section 59.

For evaluation of full-reference hybrid and reduced-reference hybrid models, DMOS will be used.

For evaluation of no-reference proponent models and no-reference hybrid models, the absolute (raw) subjective score will be used. Thus, for each test sequence, only the absolute rating for the SRC and PVS will be calculated. Based on each viewer’s absolute rating for the test presentations, an absolute mean opinion score (MOS) will be produced for each test condition. These MOS will then be used to evaluate the performance of NR proponent models using the evaluation metrics specified in Section 59.

NR models will be required to predict the perceptual quality of both the source and processed video files used in subjective quality tests. <<XXX>>

59.4.2 Inspection of Viewer Data

Prior to any data analysis, the ILG will perform an inspection of the subjective test data. Any source sequences presented in the test with a MOS rating of <4 will be identified and the file will be examined. If, in the opinion of the ILG the poor MOS values for these source sequences are due to inferior quality then they shall be removed and not included in the subsequent data analysis. This data inspection will be completed prior to proponents submitting their objective data to the ILG. <<XXX>>

59.4.3 Calculating DMOS Values

The data analysis will be performed using the difference mean opinion score (DMOS). DMOS values are calculated on a per evaluator per PVS basis. The appropriate hidden reference (SRC) is used to calculate the DMOS value for each PVS. DMOS values will be calculated using the following formula:

DMOS = MOS (PVS) – MOS (SRC) + 5

In using this formula, a DMOS of 5 indicates ‘Excellent’ quality and a DMOS of 1 indicates ‘Bad’ quality. Any DMOS values greater than 5 (i.e. where the processed sequence is rated better quality than its associated hidden reference sequence) will be considered valid and included in the data analysis.

59.4.4 Mapping to the Subjective Scale

Subjective rating data often are compressed at the ends of the rating scales. It is not reasonable for objective models of video quality to mimic this weakness of subjective data. Therefore, in previous video quality projects VQEG has applied a non-linear mapping step before computing any of the performance metrics. A non-linear mapping function that has been found to perform well empirically is the cubic polynomial given in (1)

(1)

Where DMOSp is the predicted DMOS, and the VQR is the model’s computed value for a clip-HRC combination. The weightings a, b and c and the constant d are obtained by fitting the function to the data [DMOS, VCR]. The mapping function will maximize correlation between DMOSp and DMOS:

with constant k = 1, d = 0

This function must be constrained to be monotonic within the range of possible values for our purposes.

of 223

, 11/04/10,

This section may need to be re-written depending upon other decisions on data discards.

, 11/04/10,

SRC bi-tstream doesn’t exist!?!

Then root mean squared error is minimized over k and d.

a = k*a’

b = k*b’

c = k*c’

This non-linear mapping procedure will be applied to each model’s outputs before the evaluation metrics are computed. The same procedure will be used for MOS with NR models and NR Hybrid models.

Proponents, in addition to the ILG, may compute the coefficients of the mapping functions for their models and submit the coefficients to ILGs. The proponent who submits the coefficients should also submit his mapping tool (executable) to ILGs so that ILGs can use the mapping tool for other models. It is desirable that the proponent also submit the coefficients of the mapping functions for all the other proponents’ models. If a proponent chooses not to exercise this option to compute the coefficients of the mapping functions, the ILG will compute the coefficients of the mapping functions. The ILG will use the coefficients of the fitting function that produce the best correlation coefficient provided that it is a monotonic fit.

Any and all mapping algorithms used for the official data analysis must be referenced.

59.4.5 Averaging Process

Primary analysis of model performance will be calculated per processed video sequence.

Secondary analysis of model performance may be calculated and reported on (1) averaged data, by averaging all SRC associated with each HRC (DMOSH)(2) averaged data, by averaging all HRC associated with each SRC (DMOSS).

The common sequences (i.e., included in every experiment at one resolution) will not used for HRC analysis. This is in contrast to the primary data analysis, where the PVSs for each individual test and the common sequences were analyzed together. This secondary analysis used the same mapping as the primary analysis (e.g. computed on a per PVS basis). T

Please note that secondary analysis is not an invitation to perform a specific data fitting to the objective models. It is a reporting option where the models performance is considered per HRC; this may also be reported per SRC (collapsing across HRCs) if desired. But the fitting is performed per test as described above.

59.4.6 Aggregation Procedure

The evaluation of the objective metrics is performed in two steps. In the first step, the objective metrics are evaluated per experiment. In this case, the evaluation/statistical metrics are calculated for all tested objective metrics. A comparison analysis is then performed based on significance tests. In the second step, an aggregation of the performance results is considered. The aggregation will be performed by taking the average values for all three evaluation metrics for all experiments (see section 8.3).

The ILG has the option open to use some of the secret tests to replicate ILG experiments (i.e., run viewers through another lab’s experiment). Models’ performance evaluation will follow the procedures laid out above. The data collected will be not be considered an additional experiment for the purposes of model comparison--i.e. no double weight for any single experiment. The first experiment run will provide the data to be used for model performance evaluation. The replication data will be used for other analyses. This data will be shared with the proponents.

<<XXX>>

of 223

, 11/04/10,

Do you want to allow aggregation using the common set video sequences, to create a super-set? This was done for VQEG HDTV, and the results from the super-set seemed useful. Hybrid Group must decide

59.5 Evaluation Metrics

Once the mapping has been applied to objective data, the three evaluation metrics: root mean square error, Pearson correlation coefficient and outlier ratio are determined. The calculation of each evaluation metric is performed along with its 95% confidence interval. <<XXX>>

59.5.1 Pearson Correlation Coefficient

The Pearson correlation coefficient R (see Equation 2) measures the linear relationship between a model’s performance and the subjective data. Its great virtue is that it is on a standard, comprehensible scale of -1 to 1 and it has been used frequently in similar testing.

(2)

Xi denotes the subjective score DMOS and Yi the objective DMOSp one. N represents the total number of video samples considered in the analysis.

It is known [1] that the statistic z (3) is approximately normally distributed and its standard deviation is defined by (4). Equation (3) is called Fisher-z transformation.

(3)

(4)

The 95% confidence interval (CI) for the correlation coefficient is determined using the Gaussian distribution, which characterizes the variable z and it is given by (5)

CI = (5)

NOTE. If less than N<30 samples are used, then the Gaussian distribution needs to be replaced by the two-tailed t-Student distribution with t=1.96 [1].

.

59.5.2 Root Mean Square Error

The accuracy of the objective metric is evaluated using the root mean square error (rmse) evaluation metric.

The difference between measured and predicted DMOS is defined as the absolute prediction error Perror (6)

(6)

where the index i denotes the video sample.

The root-mean-square error of the absolute prediction error Perror is calculated with the formula (7)

of 223

, 11/04/10,

Which is the primary metric for analysis? For example, HDTV chose RMSE. Hybrid Group must decide

(7)

.Where N denotes the number of samples and d the number of degrees of freedom of the mapping function (1).The root mean square error is approximately characterized by a ^2 (n) [1 [Ed. Note: a page number or equation should be given here], where n represents the degrees of freedom and it is defined by (8)

(8)

where N represents the total number of samples.Using the ^2 (n) distribution, the 95% confidence interval for the rmse is given by (9) [1]

(9)

59.5.3 Outlier Ratio

The consistency attribute of the objective metric is evaluated by the outlier ratio OR which represents number of “outlier-points” to total points N.

(10)

where an outlier is a point for which

(11)

where σ(DMOS(i)) represents the standard deviation of the individual scores associated with the video clip i. The individual scores are approximately normally distributed and therefore twice the σ value represents the 95% confidence interval. Thus, 2 * σ(DMOS(i))value represents a good threshold for defining an outlier point.

The outlier ratio represents the proportion of outliers in N number of samples. Thus, the binomial distribution could be used to characterize the outlier ratio. The outlier ratio is represented by a distribution of proportions [1] characterized by the mean (12) and standard deviation (13)

(12)

(13)

For N>30, the binomial distribution, which characterizes the proportion p, can be approximated with the Gaussian distribution . Therefore, the 95% confidence interval (CI) of the outlier ratio is given by (14)

CI = (14)

NOTE. If less than N<30 samples are used, then the t-Student distribution with t=1.96 [1] can be used instead.

of 223

59.6 Statistical Significance of the Results

<<XXX>>

59.6.1 Significance of the Difference between the Correlation Coefficients

The test is based on the assumption that the normal distribution is a good fit for the video quality scores’ populations. The statistical significance test for the difference between the correlation coefficients uses the H0 hypothesis that assumes that there is no significant difference between correlation coefficients. The H 1

hypothesis considers that the difference is significant, although not specifying better or worse.

The test uses the Fisher-z transformation (3) [1]. The normally distributed statistic (15) is determined for each comparison and evaluated against the 95% t-Student value for the two–tail test, which is the tabulated value t(0.05) =1.96.

(15)

where (16)

and (17)

σz1 and σz2 represent the standard deviation of the Fisher-z statistic for each of the compared correlation coefficients. The mean (16) is set to zero due to the H0 hypothesis and the standard deviation of the difference metric z1-z2 is defined by (17). The standard deviation of the Fisher-z statistic is given by (18):

(18)

where N represents the total number of samples used for the calculation of each of the two correlation coefficients.

59.6.2 Significance of the Difference between the Root Mean Square Errors

Considering the same assumption that the two populations are normally distributed, the comparison procedure is similarly to the one used for the correlation coefficients. The H0 hypothesis considers that there is no difference between rmse values. The alternative H1 hypothesis is assuming that the lower prediction error value is statistically significantly lower. The statistic defined by (19) has a F-distribution with n1 and n2 degrees of freedom [1].

(19)

rmse,max is the highest rmse and rmse,min is the lowest rmse involved in the comparison. The ζ statistic is evaluated against the tabulated value F(0.05, n1, n2) that ensures 95% significance level. The n1 and n2 degrees of freedom are given by N1-1, respectively and N2-1, with N1 and N2 representing the total number of samples for the compared average prediction errors.

of 223

Yves Dhondt, 11/04/10,

This is the copied equation. As far as I can tell, this formula is correct. But as the original remark asked for D. Hands to comment on it, we should still do that.

, 11/04/10,

We calculated all three of these for MM, and the results were very confusing. Margaret Pinson recommends eliminating OR and Correlation from this section. RMSE appears to have the greatest ability to resolve between models.

59.6.3 Significance of the Difference between the Outlier Ratios

As mentioned in paragraph 59.5.3, the outlier ratio could be described by a binomial distribution of parameters (p, 1-p), where p is defined by (12). In this case p is equivalent with the probability of success of the binomial distribution.

The distribution of differences of proportions from two binomially distributed populations with parameters (p1, 1-p1) and (p2, 1-p2) (where p1 and p2 correspond to the two compared outlier ratios) is approximated by a normal distribution for N1, N2 >30, with the mean:

(20)

and standard deviation:

(21)

The null hypothesis in this case considers that there is no difference between the population parameters p1 and p2, respectively p1=p2. Therefore, the mean (20) is zero and the standard distribution (21) becomes equation (22)

(22)

where N1 and N2 represent the total number of samples of the compared outlier ratios p1 versus p2. The variable p is defined by (23)

(23)

References[1] M. Spiegel, “Theory and problems of statistics”, McGraw-Hill, 1998.

of 223

60. RecommendationThe VQEG will recommend methods of objective video quality assessment based on the primary evaluation metrics defined in Section 59. The Study Groups involved (ITU-T SG 12, ITU-T SG 9, and ITU-R SG 6) will make the final decision(s) on ITU Recommendations.

of 223

61. Bibliography- VQEG Phase I final report.

- VQEG Phase I Objective Test Plan.

- VQEG Phase I Subjective Test Plan.

- VQEG FR-TV Phase II Test Plan.

- Vector quantization and signal compression, by A. Gersho and R. M. Gray. Kluwer Academic Publisher, SECS159, 0-7923-9181-0.

- Recommendation ITU-R BT.500-10.

- document 10-11Q/TEMP/28-R1.

- RR/NR-TV Test Plan

of 223

ANNEXANNEX I INSTRUCTIONS TO THE EVALUATORS

Notes: The items in parentheses are generic sections for a Evaluator Instructions Template. They would be removed from the final text. Also, the instructions are written so they would be read by the experimenter to the participant(s).

(greeting) Thanks for coming in today to participate in our study. The study’s about the quality of video images; it’s being sponsored and conducted by companies that are building the next generation of video transmission and display systems. These companies are interested in what looks good to you, the potential user of next-generation devices.

(vision tests) Before we get started, we’d like to check your vision in two tests, one for acuity and one for color vision. (These tests will probably differ for the different labs, so one common set of instructions is not possible.)

(overview of task: watch, then rate) What we’re going to ask you to do is to watch a number of short video sequences to judge each of them for “quality” -- we’ll say more in a minute about what we mean by “quality.” These videos have been processed by different systems, so they may or may not look different to you. We’ll ask you to rate the quality of each one after you’ve seen it.

(physical setup) When we get started with the study, we’d like you to sit here (point) and the videos will be displayed on the screen there. You can move around some to stay comfortable, but we’d like you to keep your head reasonably close to this position indicated by this mark (point to mark on table, floor, wall, etc.). This is because the videos might look a little different from different positions, and we’d like everyone to judge the videos from about the same position. I (the experimenter) will be over there (point).

(room & lighting explanation, if necessary) The room we show the videos in, and the lighting, may seem unusual. They’re built to satisfy international standards for testing video systems.

(presentation timing and order; number of trials, blocks) Each video will be (insert number) seconds (minutes) long. You will then have a short time to make your judgment of the video’s quality and indicate your rating. At first, the time for making your rating may seem too short, but soon you will get used to the pace and it will seem more comfortable. (insert number) video sequences will be presented for your rating, then we’ll have a break. Then there will be another similar session. All our judges make it through these sessions just fine.

(what you do: judging -- what to look for) Your task is to judge the quality of each image -- not the content of the image, but how well the system displays that content for you. The images come in three different sizes; how you judge image quality for the different sizes is up to you. There is no right answer in this task; just rely on your own taste and judgment.

(what you do: rating scale; how to respond, assuming presentation on a PC) After judging the quality of an image, please rate the quality of the image. Here is the rating scale we’d like you to use (also have a printed version, either hardcopy or electronic):

5 Excellent4 Good3 Fair2 Poor1 Bad

??TBDPlease indicate your rating by pushing the appropriate numeric key on the keyboard (button on the screen). If you missed the scene and have to see it again, press the XXX key. If you push the wrong key and need to

of 223

change your answer, press the YYY key to erase the rating; then enter your new rating. [Note, this assumes that a program exists to put a graphical user interface (GUI) on the computer screen between video presentations. It should feed back the most recent rating that the evaluator had input, should have a “next video” button and an “erase rating” button. It should also show how far along in the sequence of videos the session is at present. The program that randomly chooses videos for presentation, records the data, and contains the GUI, should be written in a language that is compatible with the most commonly used computers.]

(practice trials: these should include the different size formats and should cover the range of likely quality) Now we will present a few practice videos so you can get a feel for the setup and how to make your ratings. Also, you’ll get a sense of what the videos are going to be like, and what the pace of the experiment is like; it may seem a little fast at first, but you get used to it.

(questions) Do you have any questions before we begin?

(evaluator consent form, if applicable; following is an example) The Multimedia Hybrid Quality Experiment is being conducted at the (name of your lab) lab. The purpose, procedure, and risks of participating in the Multimedia Hybrid Quality Experiment have been explained to me. I voluntarily agree to participate in this experiment. I understand that I may ask questions, and that I have the right to withdraw from the experiment at any time. I also understand that (name of lab) lab may exclude me from the experiment at any time. I understand that any data I contribute to this experiment will not be identified with me personally, but will only be reported as a statistical average.

Signature of participant Signature of experimenterName of participant Date Name of experimenter

ANNEX IIEXAMPLE EXCEL SPREADSHEET

lab test typeevaluat

or # month day Year session resolution rate age gender order scene hrcacr

scorentia mm1 compression 1000 10 3 2005 1 vga 30 47 m 1 susie hrc1 4ntia mm1 compression 1000 10 3 2005 1 vga 30 47 m 1 susie hrc2 2ntia mm1 compression 1000 10 3 2005 1 vga 30 47 m 1 susie hrc3 1ntia mm1 compression 1000 10 3 2005 1 vga 30 47 m 1 susie reference 5ntt mm2 robust 2003 10 18 2005 2 cif 25 38 f 2 calmob pktloss1 1ntt mm2 robust 2003 10 18 2005 2 cif 25 38 f 2 calmob pktloss2 2ntt mm2 robust 2003 10 18 2005 2 cif 25 38 f 2 calmob biterror1 1ntt mm2 robust 2003 10 18 2005 2 cif 25 38 f 2 calmob biterror2 3ntt mm2 robust 2003 10 18 2005 2 cif 25 38 f 2 calmob reference 4

yonsei mm3 livenetwork 3018 10 21 2005 1 qcif 30 27 m 1 football ip1 4

of 223

yonsei mm3 livenetwork 3018 10 21 2005 1 qcif 30 27 m 1 football ip2 3yonsei mm3 livenetwork 3018 10 21 2005 1 qcif 30 27 m 1 football reference 5

of 223

of 223

AnnexANNEX III Background and Guidelines on Transmission Errors

Introduction

Transmission errors should be created to emulate a real video service to ensure that the proponents’ models are trained and tested with realistic video material. There are three major types of transmissions used for video services today:

Packet switched radio network

This kind of transmission is typical for video service in so called 3G mobile networks. Examples of services are video streaming service, such as streaming news and sports video clips to a mobile phone, mobile TV and video shared in parallel with a normal speech call. The transmission errors are characterized by packet delays, which can be in the range of 10 ms to several seconds, and packet losses that could be massive (ranging from no losses to 50%). The packet delay might case packet to be dropped by the video client because they are received too late, or causing the buffer to run empty in the client. If the buffer runs empty it causes frame freezing (not currently included in the test plan). Packet losses will cause image artifacts in the video and possibly video frame jitter.

Transport errors should be created by running a video streaming service over a real-time link simulator, where packets can be delayed in with a delay pattern as in a typical mobile radio network. The link simulator should also be able to drop packets. Typically packets are dropped when a buffer somewhere in the network is full, and new packets arriving at the buffer are dropped. This situation can occur when the link to the mobile has a lower bandwidth than required by the video stream.

Packet losses are normally bursty, causing the video quality to vary a lot. A short video streaming sequence might even be played with best possible quality, even if the bandwidth is limited. Therefore, video streaming sequences should be longer than 8 to 10 seconds. An 8 to 10 seconds video clip can be cut out from the longer video sequence, from the part where the transmission errors have caused the desired video quality degradation. Note also that the packet size is related to video quality degradation for a certain packet loss ratio.

Wireline Internet

Typical service is video streaming to a PC with fixed Internet connection. Network congestion causes packet losses in the network switches. Random and periodic packet loss can occur due to faulty equipment. However, bursty packet losses are the most common loss type. Packet loss ratio is in the range from 0% to 50%. Packets are delayed with delay ranging from 2 ms to several seconds.

Transmission errors should be created with a bursty packet loss model, as expected for Internet bottlenecks.

Circuit switched radio network

A typical service is video telephony. The transmission errors are characterized by bursty loss of data. Chunks of data (packets are not used in circuit switched transmission) are lost. Block (radio blocks) error rates are typically ranging from 0.2% to 5% when averaged over a couple of seconds. Momentarily the error rate can be 100%.

Transport errors should be created by applying error masks on a bit stream. Errors in the mask should have a bursty pattern to mimic a radio interface, such as a WCDMA 64 kbps circuit switched radio bearer. Note that the size of the blocks over the simulated transport link is correlated to video quality. Within limits the larger block size the better quality for a certain block error rate. Block size can for example be 160 or 300 bytes.

Summary of transmission error simulators

of 223

Transport link type

Model Typical error rates

Packet switched radio network

Link simulator delaying and dropping packages. Delay based on bit/block errors over a radio link. Drop based on overflow in a network buffer due to low bandwidth. The packet delay should be introduced as in a real radio network. Typical target networks are GSM, WCDMA or CDMA radio networks.

Packet delay in the range from 10 ms to 5 s.Bursty packet loss in the range 0% to 50% (for an average over one or a few seconds)

Wireline Internet

Link simulator dropping packets, as expected when the buffer in an Internet switch overruns. As described in literature packet losses can be modeled with a Markov chain with two states representing no loss/loss. See for example [2] below for example of link model.

Packet delay in the range 2 ms to 5 seconds (high value when for example a satellite link Is used). Bursty packet losses in range from 0% to 50%

Circuit switched radio network

Link simulator dropping chunks of data. Alternative is to apply an error mask to a bit stream. The error mask should have been made by simulating a radio link. The bit stream should be a H.223 bit stream, which is used for video telephony. See reference [1] below.

Typical block error rates (over a radio link) are ranging from 0.2% to 5% (average over a couple of seconds)

Table 1 Summary of transmission error simulators

Note: A video service might use multiple transport links. Thus, it is possible to use a combination of simulators to get realistic transport errors. A combination of wireline and wireless IP link simulators can be used to simulate a service, such as video streaming over Internet and a radio link

Logging parameters

Table 2 below describes the parameters to be logged when introducing transmission errors with a simulator. All parameters are required, except those explicitly described as “optional”.

Logging Category

Logging details

Simulator description

Type of simulator (packet simulator, circuit switched simulator) Simulated network (GSM/WCDMA/CDMA/Wireline Internet) Version of simulator Hardware/system it was run on General description of how transport errors are introduced

Input parameters to simulator (depends on type of simulator. Only examples given here)

Bandwidth limit System buffer size Block or bit error rates Latency

Output parameters from simulator

Packet simulator (wireline and wireless) Average packet loss ratio in percent Length of window to calculate packet loss ratio Number of total packets Average packet delay in ms Sequence number of lost packets (optional) Distribution of packet delay (optional) Packet size distribution (optional)

Circuit switched simulator Average block and/or bit error rate (BLER/BER) Block size over transport link Maximum block error rate

of 223

Decoder General description of decoder (name, vendor) Version of decoder Post filter used (if known) Error concealment used (if known)

Table 2 Parameters to be logged when introducing transmission errors

.

References

[1] ITU-T Recommendation H.223, Multiplexing protocol for low bit rate multimedia communication.[2] B. Girod, K. Stuhlmüller, M. Link and U. Horn. “Packet Loss Resilient Internet Video Streaming”.

SPIE Visual Communications and Image Processing 99, January 1999, San Jose, CA

of 223

ANNEX IVII FEE AND CONDITIONS FOR RECEIVING DATASETS

Fee and Conditions

VQEG intends to enable everybody who is interested in contributing to the work as a proponent to participate in the assessment of video quality metrics and to do so even if the proponent is not able to finance more than the

regular participation fee as laid forth in this Annex (see below for details of the fees). On the other side VQEG will produce video databases which are extremely valuable to those developing video metrics. An organization

cannot get access to these databases without, at a minimum, substantially participating in the VQEG work. VQEG has decided that all proponents must provide at least one database (or a comparable contribution) which

fulfils the requirements laid out in this testplan in order to gain access to the subjective databases produced in the Hybrid tests. A comparable contribution should be agreed by the other proponents and could include such things as providing test sequences and/or running HRCs.If an organization has no facilities to create such a database by itself, it may contract a recognized subjective test facility to do so on its behalf. If an organization is lacking the

financial resources to fulfil this obligation, it can ask other proponents or the ILGs to run its model on the VQEG databases. In this case the party will not be granted direct access to the video databases, but the party is still able to participate in the assessment of their models after paying the regular participation fee to the Independent Lab

Group (ILG).

Some of the video data and bit-stream data might be published. See section 4.3 for details.

of 223

ANNEX V??TBD

(OWNER: KJELL BRUNNSTROM)

SUBJECTIVE TEST SOFTWARE PACKAGE AND REQUIRED COMPUTER SPECIFICATION

INTRODUCTION

THE PROGRAM ACRVQWIN-BETA2 IS A PROGRAM TO RUN SUBJECTIVE EXPERIMENTS FOR VIDEO QUALITY IN WINDOWS ENVIRONMENT, USING

THE ACR-METHOD IT IS DESIGNED TO PRESENT THE VIDEO ON A COMPUTER SCREEN, USING THE PICTURE SIZES QCIF, CIF AND VGA,

SYNCHRONIZED WITH THE REFRESH OF THE DISPLAY. THIS IS IMPLEMENTED USING DIRECTX.

SYSTEM REQUIREMENTS

THE PROGRAM USES A MULTITHREAD FUNCTION, SO IT REQUIRES A INTEL BASED (AMD HAS NOT BEEN TESTED) COMPUTER WITH TWO

PROCESSORS OR A PROCESSOR THAT SUPPORTS MULTITHREADING. THE CPU SPEED SHOULD BE AT LEAST 2 GHZ. THE GRAPHICS CARD

SHOULD SUPPORT YUV TO RGB CONVERSION AND HAVE AT LEAST 256

of 223

MBYTE OF VIDEO MEMORY. THE SYSTEM SHOULD HAVE PRIMARY MEMORY OF AT LEAST 512 MB. THE PROGRAM CAN BE RUN ON

WINDOWS 2000 AND WINDOWS XP.

USING THE PROGRAM

INSTALLATION AND PREPARATION

AFTER DOWNLOADING AND EXTRACTING THE FILES, THE FILE ACRVQWIN-BETA2.EXE HAS TO BE PLACED IN THE SAME LIBRARY AS

THE FIVE DLL FILES: LIBMATLB, LIBMCC,LIBMX, LIBMAT AND LIBUT LIBUT (SHOULD BE OBTAINED TOGETHER I.E IN THE SAME

DISTRIBUTION, WITH THE EXECUTABLE FILE, SO VERSIONS OF THE DLLS MATCH THE EXECUTABLE). THE SETUP-FILE (SETUP.SEQ) CAN BE

PLACED ANYWHERE AS THE PROGRAM WILL LET YOU SPECIFY THE PATH AT THE START OF THE EXPERIMENT. TO BE ABLE TO RUN THE

EXPERIMENT PROGRAM A SETUP-FILE, SEE ALSO BELOW, HAS TO BE USED IN ORDER TO SET SOME OF THE PARAMETERS IN THE PROGRAM

AND ALSO SPECIFY THE PATH AND NAME OF THE AVI-FILES THAT SHOULD BE USED DURING THE EXPERIMENT. TO SPECIFY YOUR AVI-FILES YOU WRITE THE FILENAMES JUST BELOW THE PATH TO THE DIRECTORY THEY ARE IN. EACH FILENAME MUST BE WRITTEN ON A

SEPARATE ROW, AS THE PROGRAM WILL TREAT EACH LINE OF TEXT AS A FILENAME.

RUNNING THE PROGRAM

of 223

THE PROGRAM IS STARTED IN THE NORMAL WAY IN WINDOWS E.G DOUBLE-CLICKING ON THE ICON. THE PROGRAM WILL INITIALLY

PRESENT A DIALOG BOX FOR SELECTING SETUP-FILE (THIS MAKES IT POSSIBLE TO HAVE DIFFERENT SETUP-FILES FOR VARIOUS EXPERIMENT STORED AT THE SAME TIME) TO USE FOR THE

EXPERIMENT.

AFTER CHOOSING THE SETUP-FILE THE PROGRAM WILL ENTER THE EXPERIMENT ENVIRONMENT, THE SCREEN WILL HAVE THE SAME

GREY-LEVEL AS HAS BEEN SPECIFIED WITH THE PARAMETER BACKGROUND. TO START THE FIRST SEQUENCE YOU CAN EITHER

PRESS THE S BUTTON ON THE KEYBOARD OR THE LEFT BUTTON ON THE MOUSE AND THE VIDEO SEQUENCE WILL BE DISPLAYED AT THE

CENTRE OF THE SCREEN. WHEN THE SEQUENCE IS FINISHED A DIALOG BOX WILL APPEAR IN ORDER TO GET THE EVALUATORS RATING OF THE PERCEIVED VIDEO QUALITY. THIS PROCEDURE WILL CONTINUE

UNTIL THE EXPERIMENT IS DONE AND THE PROGRAM WILL AUTOMATICALLY EXIT. YOU CAN AT ANY TIME QUIT THE CURRENT

EXPERIMENT BY PRESSING THE ESC KEY.

THE ORDER OF THE SEQUENCES DOES NOT MATCH THE ORDER OF THE FILENAMES IN THE SETUP-FILE. THE PROGRAM RANDOMISES THE

ORDER AT THE START OF THE PROGRAM. THE RANDOM NUMBERS ARE DRAWN FROM A UNIFORM DISTRIBUTION. THE RESULTS ARE STORED

IN TWO OUTPUT-FILES THAT HAVE THE SAME NAME AS THE PARAMETER SUBJECTID WITH AN EXTENSION OF EITHER LOG OR DAT.

THE LOG-FILE SHOWS THE RESULTS IN THE EXACT ORDER AS THE SEQUENCES HAVE BEEN SHOWN. THE DAT-FILE SORTS THE DATA

REGARDLESS OF THE RANDOMISATION THAT HAS BEEN DONE BY THE PROGRAM, IN ORDER TO MAKE IT EASIER TO ANALYSE THE RESULTS FROM A STATISTICAL POINT OF VIEW. THE DATA IN THE DAT-FILE ARE

STORED IN EXACTLY THE SAME ORDER AS IN THE SETUP-FILE.

IF THE ESC KEY IS BEING PRESSED DURING AN EXPERIMENT THE PROGRAM WILL STORE THE NECESSARY DATA IN A .SVD FILE TO BE ABLE TO RESTORE THE EXPERIMENT LATER. THE .SVD FILE WILL BE CREATED IN THE FOLDER SPECIFIED BY THE PARAMETER PATH AND

of 223

HAVE THE FILENAME SPECIFIED BY THE PARAMETER SUBJECTID. NEXT TIME THE PROGRAM STARTS IT WILL ASK YOU IF YOU WANT TO

RESTORE THE ABORTED EXPERIMENT AND IF SELECTING YES THE EXPERIMENT WILL CONTINUE FROM WHERE IT WAS ABORTED. IF YOU DON’T WANT THE PROGRAM TO KEEP ASKING YOU IF YOU WANT TO

RESTORE AN EXPERIMENT, SIMPLY DELETE THE .SVD FILE.

FOR MIXING 25-FPS AND 30-FPS SEQUENCES, THE PARAMETER SCREENREFRESHFREQ MUST BE SET TO 60 HZ. THE PROGRAM WILL

THEN RUN THE 30-FPS SEQUENCES AS NORMAL AND TRY TO RUN THE 25-FPS SEQUENCES AS CLOSE TO 25 FPS AS POSSIBLE. AS 60 IS EVENLY DIVIDED BY 30, EACH FRAME WILL BE DISPLAYED TWO

SCREEN REFRESH CYCLES AND THE TIMING BETWEEN THE FRAMES WILL BE CONSTANT IN THE 30-FPS SEQUENCES. THIS CAN’T BE DONE WITH THE 25-FPS SEQUENCES AND THE SOLUTION APPLIED IN THIS

PROGRAM IS TO ALTERNATE BETWEEN DISPLAYING A FRAME 2 OR 3 SCREEN REFRESH CYCLES. THE PROGRAM USES THE SERIE: 2, 3, 2, 3,

2. IN ORDER TO DISTINGUISH BETWEEN THE DIFFERENCES IN FRAMERATE OF THE SEQUENCES THE FRAMERATE PARAMETER IN

THE AVI-FILE INFOHEADER MUST BE VALID.

SETUP-FILE PARAMETERS

PIXELDEPTH: THE ONLY SUPPORTED MODE FOR THIS VERSION IS 16 BIT WITH THE FORMAT YUV 4:2:2

SCREENHEIGHT: THE HEIGHT OF THE SCREEN MEASURED IN PIXELS.

of 223

SCREENWIDTH: THE WIDTH OF THE SCREEN MEASURED IN PIXELS.

SCREENREFRESHFREQ: THE UPDATE FREQUENCY OF THE SCREEN.

VIDEOFRAMERATE: THE NUMBER OF FRAMES SHOWN PER SECOND.

BACKGROUND: THE GREY-LEVEL OF THE BACKGROUND IN THE EXPERIMENT ENVIRONMENT, WHERE 0 IS BLACK AND 255 IS WHITE.

NUMBEROFSERIES: THIS PARAMETER IS USED TO SET HOW MANY TIMES EACH SEQUENCE WILL BE PLAYED.

of 223

USEIMAGERATINGS: DECIDES WHETHER TO USE AN IMAGE OR TEXT TO DESCRIBE THE DIFFERENT RATINGS FOR THE VIEWER.

IMAGEPATHRATINGS: SPECIFIES THE PATH TO THE IMAGE IF THE PARAMETER USEIMAGERATINGS IS SET TO TRUE.

IMAGEPATHNORATING: SPECIFIES THE PATH TO THE IMAGE TO BE SHOWN IF USEIMAGERATINGS IS SET TO TRUE AND THE EVALUATOR

HAS MISSED TO RATE THE SEQUENCE.

RATING5-RATING1: THE TEXT USED TO DESCRIBE THE DIFFERENT RATINGS IF USEIMAGERATINGS IS SET TO FALSE.

of 223

NORATING: THIS TEXT IS SHOWN IF THE EVALUATOR MISSES TO RATE THE SEQUENCE AND USEIMAGERAT-INGS IS SET TO FALSE.

SUBJECTID: SPECIFIES THE FILENAME FOR THE .DAT AND .LOG FILE.

NOOFIMAGESINSEQ: THE NUMBER OF FRAMES THAT WILL BE DISPLAYED FROM THE AVI-FILE. THIS PARAMETER MAKES IT POSSIBLE TO DECIDE THE LENGTH OF THE SEQUENCES FROM THE SETUP-FILE.

PATH: SPECIFIES THE PATH TO THE DIRECTORY HOLDING THE AVI-FILES TO BE USED. THE .DAT AND .LOG FILE WILL BE PLACED HERE.

of 223

EXAMPLE OF A SETUP-FILE

[SEQUENCE]

PIXELDEPTH = 16

SCREENHEIGHT = 1024

SCREENWIDTH = 1280

SCREENREFRESHFREQ = 60

VIDEOFRAMERATE = 30

BACKGROUND = 128

NUMBEROFSERIES =3

USEIMAGERATINGS = FALSE

of 223

IMAGEPATHRATINGS = "C:\MYPATH\MYBITMAP1.BMP"

IMAGEPATHNORATING = "C:\MYPATH\MYBITMAP2.BMP"

RATING5 = "EXCELLENT"

RATING4 = "GOOD"

RATING3 = "FAIR"

RATING2 = "POOR"

RATING1 = "BAD"

NORATING = "THE SEQUENCE HAS NOT BEEN RATED"

SUBJECTID = "VIEWER"

NOOFIMAGESINSEQUENCE = 300

of 223

PATH = "C:\MYPATH"

MYAVI1.AVI

MYAVI2.AVI

MYAVI3.AVI

NOTE: EXPERIMENTER IS REQUIRED TO CHANGE THE ”SUBJECTID” FOR EACH EVALUATOR IN THE MM TEST.

TERM OF USAGE

of 223

THIS SOFTWARE IS PROVIDED AS IS. NO WARRANTY OR SUPPORT IS GIVEN. IT MAY BE USED WITHIN THE VQEG FOR RESEARCH PURPOSES.

COPYRIGHT 2005 ACREO AB, SWEDEN.

LIST OF PREFERRED LCD MONITORS FOR USE IN SUBJECTIVE TESTS:

TCO ’06:

BENQ FP241W, FP241WZ, FP241VW Q24W5

EIZO FLEXSCAN S2110W COLOREDGE CE210W S2110W

SAMSUNG 215TW DP21**

TCO ‘03

of 223

ANNEX IVI METHOD FOR POST-EXPERIMENT SCREENING OF EVALUATORS

(Owner: Quan Huynh-Thu)

Method for Post-Experiment Screening of Evaluators

Method

The rejection criterion verifies the level of consistency of the raw scores of one viewer according to the corresponding average raw scores over all viewers. Decision is made using correlation coefficient. Analysis per PVS and per HRC is performed for decision.

Linear Pearson correlation coefficient per PVS for one viewer vs. all viewers:

Where

xi = MOS of all viewers per PVS

yi = individual score of one viewer for the corresponding PVS

n = number of PVSs

i = PVS index.

Linear Pearson correlation coefficient per HRC for one viewer vs. all viewers:

Where

of 223

xi = condition MOS of all viewers per HRC, i.e. condition MOS is the average value across all PVSs from the same HRC

yi = individual condition MOS of one viewer for the corresponding HRC

n = number of HRCs

i = HRC index

Rejection criteria

1. Calculate r1 and r2 for each viewer

2. Exclude a viewer if (r1<0.75 AND r2 <0.8) for that viewer

Note: The reason for using analysis per HRC (r2) is that an evaluator can have an individual content preference that is different from other viewers, making r1 to decrease, although this evaluator may have voted consistently. Analysis per HRC averages out individual’s content preference and check consistency across error conditions.

xi = mean score of all observers for the PVS

yi = individual score of one observer for the corresponding PVS

n = number of PVSs

i = PVS index

R(xi or yi) is the ranking order

Final rejection criteria for discarding an observer of a test

The Spearman rank and Pearson correlations are carried out to discard observer(s) according to the following conditions:ANNEX IVII

of 223

A NNEX V. ENCRYPTED SOURCE CODE SUBMITTED TO VQEG

Proponents are entitled to submit a file with encrypted source code along with their model’s object code. This submission is not required but is offered in case there is a bug in the software that can be fixed without changing the algorithm. Normally, there would be no software updates possible after the submission of the object code.

In order for this option to be exercised the proponent must encrypt the source code with a readily available encryption program (see below for a freeware example) and send the password protected file to two ILG labs (CRC and Acreo). If it is determined by the proponent that a bug is present in the software, then the proponent must discuss the situation with the ILG Co-Chairs. If the Co-Chairs agree that a bug fix should be tried, then a procedure must be agreed to in order for the proponent to make the change to the code in the presence of the ILG member. This could be done in person or perhaps by telephone.

The proponent would make the change and the ILG member would verify that it was not an algorithm change. The code would be recompiled and tested in the presence of the ILG member. The revised code should be re-encrypted with a different password.

The encrypted file can be transported electronically or physically. It needs to be sent to both ILG contacts below:

ILG contacts:

Kjell Brunnstrom

Acreo

Stockholm, Sweden

+4686327732

[email protected]

Filippo Speranza

CRC

Ottawa, Canada

+1 613-998-7822

[email protected]

A good freeware encryption program:

Blowfish Advanced CS 2.57

of 223

http://www.hotpixel.net/software.html (click on Blowfish Advanced CS – Installer)

This software offers several encryption algorithms. The one that allows the largest key (448 bits) is Blowfish. It is also in German and English.

Source files should be zipped and then encrypted.

Other encryption programs can be used but if they are not free then the proponent is responsible for purchasing the program for the ILG if necessary.

NotesVirtualDub is also capable of saving UYVY AVI files without using ffdshow. The following settings are

used to get correct results.

The following procedure is not approved in the MM test plan:

WITHIN VIRTUALDUB, VIDEO SEQUENCES WILL BE SAVED TO AVI FILES USING VIDEO COMPRESSION OPTION (VIDEO-

>COMPRESSOR) "UNCOMPRESSED RGB/YCBCR". FOR THE COLOR DEPTH (VIDEO->COLOR DEPTH), THE SETTING “4:2:2

YCBCR (UYUV)” IS USED AS OUTPUT FORMAT. THE PROCESSING MODE (VIDEO->) IS SET TO “FULL PROCESSING

MODE”.ANNEX IXVI. DEFINITION AND CALCULATING GAIN AND OFFSET IN PVSS

Definition and Calculating Gain and Offset in PVSs

Before computing luma (Y) gain and level offset, the original and processed video sequences should be temporally aligned. One delay for the entire video sequence may be sufficient for these purposes. Once the video sequences have been temporally aligned, perform the following steps.

Horizontally and vertically cropped pixels should be discarded from both the original and processed video sequences.

The Y planes will be spatially sub-sampled both vertically and horizontally by the following factors: 16 for HD and WVGA, 8 for CIF QVGAand 4 for QCIF. This spatial sub-sampling is computed by averaging the Y samples for each block of video (e.g., for VGA one Y sample is computed for each 16 x 16 block of video). Spatial sub-sampling should minimize the impact of distortions and small spatial shifts (e.g., 1 pixel) on the Y gain and level offset calculations.

The gain (g) and level offset (l) are computed according to the following model:

(1)

where O is a column vector containing values from the sub-sampled original Y video sequence, P is a column vector containing values from the sub-sampled processed Y video sequence, and equation (1) may either be solved simultaneously using all frames, or individually for each frame using least squares

of 223

estimation. If the latter case is chosen, the individual frame results should be sorted and the median values will be used as the final estimates of gain and level offset.

Least square fitting is calculated according the following formula:

g = ( ROP – RORP )/( ROO – RORO ), and (2)

l = RP - g RO (3)

where ROP, ROO, RO and RP are:

ROP = (1/N) O(i) P(i) (4)

ROO = (1/N) [O(i)]2 (5)

RO = (1/N) O(i) (6)

RP = (1/N) P(i) (7)

of 223

APPENDIX I. TERMS OF REFERENCE OF HYBRID MODELS (SCOPE AS AGREED IN JUNESCOPE AS AGREED IN JUNE, 2009)

Note: This appendix does not contain any instructions to ILG or Proponents. When implementing the Hybrid Test Plan, the contents of this appendix must be ignored.

This appendix contains the conclusions reached during the June, 2009, VQEG meeting in Ghent. This document contains the agreements reached at that meeting regarding the appropriate scope of the Hybrid test, and how the Hybrid test differs from the P.NAMS and P.NBAMS testing being performed concurrently by ITU-T SG12. Some technical details changed between the writing of this appendix and approval of the Hybrid Test Plan (e.g., change from 11-point scale to 5-point scale for MOS).

Editorial History

Version Date Nature of the modification

0.0 June 24, 2009 Initial Draft, edited by C. Schmidmer

1.0 June 24,2009 Approved at the Berlin meeting

2.0 October 27, 2010 Edited by M. Pinson

Updated section numbering, and inserted clarifications above.

Appendix I.1 Overview

This document defines the terms of reference for hybrid video quality estimation models to be evaluated by VQEG. The document describes the targeted application areas, the basic operational principle of such hybrid models and it clarifies the relation to other ongoing standardization activities within the ITU.

The intention of this document is to give a brief overview on the project, while the details will be covered in separate testplan. In case of doubt, the specifications in the testplan supersede those in this document.

Appendix I.2 Terms of Reference – Hybrid models

Appendix I.2.1 Objectives and Application Areas

The objective of the hybrid project is to evaluate models that estimate the perceived video quality of short video sequence. The estimation shall be based on information taken from IP headers, bitstreams and the decoded video signal. Additionally, source video information may be used for some models. The bitstream demultiplexers are not part of the tested models. Decoded signals (PVS) along with bit-stream data will be inputs to the hybrid models. Models which do not make use of these decoded signals (PVS) will not be considered as Hybrid Models.

The idea is that such models can be implemented in set top boxes, where all these parameters are available.

The tested models shall be applicable for troubleshooting and network monitoring at the client side as well as in the middle of a network, provided that a separate decoder provides decoded signals.

of 223

Typical applications may include IPTV and Mobile Video Streaming STB/

Terminal(decoder)

DISPLAYbit-stream data

Hybrid NR (use signals,codec parameters, transmission information)

PVS

VQM

Figure I-1. Model application at STB (set top box) or mobile terminal.

Appendix I.2.2 Model Types NR/RR/FR

Model types submitted for evaluation may comprise no-reference (NR), reduced reference (RR) as well as full reference (FR) methods.

Appendix I.2.3 Target Resolutions

Video resolutions under study will be QVGA and SD/HDTV. A model for SDTV must also handle HDTV and a model for HDTV must also handle SDTV. A proponent may submit different models for QVGA and SD/HDTV or a model for either QVGA or SD/HDTV.

Appendix I.2.4 Target Distortions

The models shall be able of handling a wide range of distortions, from coding artifacts to transmission errors such as packet loss. Coding schemes which are currently discussed for use in this study are MPEG2 (SD) and H.264 (QVGA, SD, HD). The packet loss ratio ranges from ?? to ??TBD.

Appendix I.2.5 Model Input

Input to the models will be:

The source video sequence (Hybrid FR and Hybrid RR (headend) models only) Edit note:?? Clarification will be needed

Bitstreams (may be encrypted??TBD) which include, but are not limited to:

o transport header information

o Payload information

The decoded video sequence (PVS)

A reference decoder will be provided, which will be used to determine the admissibility of bit-stream data. The model should be able to handle the bit-stream data which can be decoded by the reference decoder.

of 223

Multiple decoders/players can be used to generate PVSs as long as the decoders can handle the bit-stream data which the reference decoder can decode. Bit-stream data can be generated by any encoder as long as the reference decoder can decode the bit stream data.

Appendix I.2.6 Results

Models submitted for this benchmark shall make use of an 11-point MOS scale (0=worst, 10=best). This definition is to avoid numerical problems only. A mapping function will be used for each model to map its results to each subjective database separately.

Appendix I.2.7 Model Validation

The scores produced by the models will be compared to MOS scores achieved by subjective tests. The validation data will only be available to the proponents after the models have been submitted

Appendix I.2.8 Model Disclosure

One clear objective of VQEG is that the benchmark shall lead to the standardization of one or more of the tested models by standardization organizations (e.g. ITU). This may involve the need for each proponent to fully disclose its model when it is accepted for standardization.

Appendix I.2.9 Time Frame

The benchmark is expected to be conducted in …..

Appendix I.3 Relation to other Standardization Activities [??TBD, scope]

It is known that the ITU groups conduct work in a similar field with the standardization activities for P.NAMS and P.BNAMS. The VQEG Hybrid project does not intend to compete with projects in ITU-T SG9, ITU-T SG12, and ITU-R WP6C and does not intend to duplicate their work. ??The distinction to these two recommendations is that the Hybrid project makes use of the same information as the ITU-T SG12 projects, but additionally uses the decoded video sequence.

In fact, parts of the P.NAMS and P.BNAMS models may optionally form part of a proposed hybrid model.

of 223

ANNEX IIIEXAMPLE INSTRUCTIONS TO THE SUBJECTS

NOTES: THE ITEMS IN PARENTHESES ARE GENERIC SECTIONS FOR A SUBJECT INSTRUCTIONS TEMPLATE. THEY WOULD BE REMOVED FROM THE FINAL TEXT. ALSO, THE INSTRUCTIONS ARE WRITTEN SO

THEY WOULD BE READ BY THE EXPERIMENTER TO THE PARTICIPANT(S).

(GREETING) THANKS FOR COMING IN TODAY TO PARTICIPATE IN OUR STUDY. THE STUDY’S ABOUT THE QUALITY OF VIDEO IMAGES;

IT’S BEING SPONSORED AND CONDUCTED BY COMPANIES THAT ARE BUILDING THE NEXT GENERATION OF VIDEO TRANSMISSION AND

DISPLAY SYSTEMS. THESE COMPANIES ARE INTERESTED IN WHAT LOOKS GOOD TO YOU, THE POTENTIAL USER OF NEXT-GENERATION

DEVICES.

(VISION TESTS) BEFORE WE GET STARTED, WE’D LIKE TO CHECK YOUR VISION IN TWO TESTS, ONE FOR ACUITY AND ONE FOR COLOR

VISION. (THESE TESTS WILL PROBABLY DIFFER FOR THE DIFFERENT LABS, SO ONE COMMON SET OF INSTRUCTIONS IS NOT

POSSIBLE.)

of 223

(OVERVIEW OF TASK: WATCH, THEN RATE) WHAT WE’RE GOING TO ASK YOU TO DO IS TO WATCH A NUMBER OF SHORT VIDEO

SEQUENCES TO JUDGE EACH OF THEM FOR “QUALITY” -- WE’LL SAY MORE IN A MINUTE ABOUT WHAT WE MEAN BY “QUALITY.” THESE

VIDEOS HAVE BEEN PROCESSED BY DIFFERENT SYSTEMS, SO THEY MAY OR MAY NOT LOOK DIFFERENT TO YOU. WE’LL ASK YOU TO

RATE THE QUALITY OF EACH ONE AFTER YOU’VE SEEN IT.

(PHYSICAL SETUP) WHEN WE GET STARTED WITH THE STUDY, WE’D LIKE YOU TO SIT HERE (POINT) AND THE VIDEOS WILL BE

DISPLAYED ON THE SCREEN THERE. YOU CAN MOVE AROUND SOME TO STAY COMFORTABLE, BUT WE’D LIKE YOU TO KEEP YOUR HEAD REASONABLY CLOSE TO THIS POSITION INDICATED BY THIS MARK (POINT TO MARK ON TABLE, FLOOR, WALL, ETC.). THIS IS BECAUSE THE VIDEOS MIGHT LOOK A LITTLE DIFFERENT FROM

DIFFERENT POSITIONS, AND WE’D LIKE EVERYONE TO JUDGE THE VIDEOS FROM ABOUT THE SAME POSITION. I (THE EXPERIMENTER)

WILL BE OVER THERE (POINT).

(ROOM & LIGHTING EXPLANATION, IF NECESSARY) THE ROOM WE SHOW THE VIDEOS IN, AND THE LIGHTING, MAY SEEM UNUSUAL. THEY’RE BUILT TO SATISFY INTERNATIONAL STANDARDS FOR

TESTING VIDEO SYSTEMS.

of 223

(PRESENTATION TIMING AND ORDER; NUMBER OF TRIALS, BLOCKS) EACH VIDEO WILL BE (INSERT NUMBER) SECONDS (MINUTES) LONG. YOU WILL THEN HAVE A SHORT TIME TO MAKE YOUR JUDGMENT OF THE VIDEO’S QUALITY AND INDICATE YOUR RATING. AT FIRST, THE TIME FOR MAKING YOUR RATING MAY SEEM TOO SHORT, BUT SOON

YOU WILL GET USED TO THE PACE AND IT WILL SEEM MORE COMFORTABLE. (INSERT NUMBER) VIDEO SEQUENCES WILL BE

PRESENTED FOR YOUR RATING, THEN WE’LL HAVE A BREAK. THEN THERE WILL BE ANOTHER SIMILAR SESSION. ALL OUR JUDGES

MAKE IT THROUGH THESE SESSIONS JUST FINE.

(WHAT YOU DO: JUDGING -- WHAT TO LOOK FOR) YOUR TASK IS TO JUDGE THE QUALITY OF EACH IMAGE -- NOT THE CONTENT OF THE

IMAGE, BUT HOW WELL THE SYSTEM DISPLAYS THAT CONTENT FOR YOU. THERE IS NO RIGHT ANSWER IN THIS TASK; JUST RELY ON

YOUR OWN TASTE AND JUDGMENT.

(WHAT YOU DO: RATING SCALE; HOW TO RESPOND, ASSUMING PRESENTATION ON A PC) AFTER JUDGING THE QUALITY OF AN

IMAGE, PLEASE RATE THE QUALITY OF THE IMAGE. HERE IS THE RATING SCALE WE’D LIKE YOU TO USE (ALSO HAVE A PRINTED

VERSION, EITHER HARDCOPY OR ELECTRONIC):

of 223

PLEASE INDICATE YOUR RATING BY ADJUSTING THE CURSOR ON THE SCALE ACCORDINGLY.

(PRACTICE TRIALS: THESE SHOULD INCLUDE THE DIFFERENT SIZE FORMATS AND SHOULD COVER THE RANGE OF LIKELY QUALITY)

NOW WE WILL PRESENT A FEW PRACTICE VIDEOS SO YOU CAN GET A FEEL FOR THE SETUP AND HOW TO MAKE YOUR RATINGS. ALSO, YOU’LL GET A SENSE OF WHAT THE VIDEOS ARE GOING TO BE LIKE, AND WHAT THE PACE OF THE EXPERIMENT IS LIKE; IT MAY SEEM A

LITTLE FAST AT FIRST, BUT YOU GET USED TO IT.

(QUESTIONS) DO YOU HAVE ANY QUESTIONS BEFORE WE BEGIN?

(SUBJECT CONSENT FORM, IF APPLICABLE; FOLLOWING IS AN EXAMPLE)

of 223

THE HDTV QUALITY EXPERIMENT IS BEING CONDUCTED AT THE (NAME OF YOUR LAB) LAB. THE PURPOSE, PROCEDURE, AND RISKS OF PARTICIPATING IN THE HDTV QUALITY EXPERIMENT HAVE BEEN EXPLAINED TO ME. I VOLUNTARILY AGREE TO PARTICIPATE IN THIS

EXPERIMENT. I UNDERSTAND THAT I MAY ASK QUESTIONS, AND THAT I HAVE THE RIGHT TO WITHDRAW FROM THE EXPERIMENT AT

ANY TIME. I ALSO UNDERSTAND THAT (NAME OF LAB) LAB MAY EXCLUDE ME FROM THE EXPERIMENT AT ANY TIME. I UNDERSTAND THAT ANY DATA I CONTRIBUTE TO THIS EXPERIMENT WILL NOT BE IDENTIFIED WITH ME PERSONALLY, BUT WILL ONLY BE REPORTED

AS A STATISTICAL AVERAGE.

SIGNATURE OF PARTICIPANT SIGNATURE OF EXPERIMENTER

NAME OF PARTICIPANT DATE NAME OF EXPERIMENTER

of 223

note - institute for telecommunication sciences - its · web vieworiginal hd video sequences that...

Documents