low complexity quantization in high efficiency video coding

Low Complexity Quantization in High Efficiency Video Coding

WANG, Meng; FANG, Xiaohan; TAN, Songchao; ZHANG, Xinfeng; ZHANG, Li

Published in:IEEE Access

Published: 01/01/2020

Document Version:Final Published version, also known as Publisher’s PDF, Publisher’s Final version or Version of Record

License:CC BY

Publication record in CityU Scholars:Go to record

Published version (DOI):10.1109/ACCESS.2020.3012145

Publication details:WANG, M., FANG, X., TAN, S., ZHANG, X., & ZHANG, L. (2020). Low Complexity Quantization in HighEfficiency Video Coding. IEEE Access, 8, 145159-145170. [9149906].https://doi.org/10.1109/ACCESS.2020.3012145

Citing this paperPlease note that where the full-text provided on CityU Scholars is the Post-print version (also known as Accepted AuthorManuscript, Peer-reviewed or Author Final version), it may differ from the Final Published version. When citing, ensure thatyou check and use the publisher's definitive version for pagination and other details.

General rightsCopyright for the publications made accessible via the CityU Scholars portal is retained by the author(s) and/or othercopyright owners and it is a condition of accessing these publications that users recognise and abide by the legalrequirements associated with these rights. Users may not further distribute the material or use it for any profit-making activityor commercial gain.Publisher permissionPermission for previously published items are in accordance with publisher's copyright policies sourced from the SHERPARoMEO database. Links to full text versions (either Published or Post-print) are only available if corresponding publishersallow open access.

Take down policyContact [email protected] if you believe that this document breaches copyright and provide us with details. We willremove access to the work immediately and investigate your claim.

Download date: 21/02/2022

https://scholars.cityu.edu.hk/en/publications/low-complexity-quantization-in-high-efficiency-video-coding(35662332-3934-49fd-9e7d-4ff7e6db0e20).html

https://doi.org/10.1109/ACCESS.2020.3012145

https://scholars.cityu.edu.hk/en/persons/meng-wang(c0a5cc37-79c1-4887-bcf0-f10cb6b1f9ba).html



https://scholars.cityu.edu.hk/en/journals/ieee-access(46e65d27-45dd-48ce-aabc-b8c28f596dbc)/publications.html

https://doi.org/10.1109/ACCESS.2020.3012145

Received July 1, 2020, accepted July 22, 2020, date of publication July 27, 2020, date of current version August 19, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.3012145

Low Complexity Quantization in HighEfficiency Video CodingMENG WANG 1, (Graduate Student Member, IEEE), XIAOHAN FANG2, SONGCHAO TAN 3,XINFENG ZHANG4, AND LI ZHANG 5, (Member, IEEE)1Department of Computer Science, City University of Hong Kong, Hong Kong2Dalian University of Technology, Dalian 116024, China3Institute of Digital Media, Peking University, Beijing 100871, China4School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China5Bytedance Inc., San Diego, CA 91945, USA

Corresponding author: Songchao Tan ([email protected])

This work was supported by the National Natural Science Foundation of China under Grant 61902006.

ABSTRACT The rate-distortion optimized quantization (RDOQ) provides an excellent trade-off betweenrate and distortion in High Efficiency Video Coding (HEVC), leading to inspiring improvement in terms ofrate-distortion performance. However, its heavy use imposes high complexity on the encoder in real-worldvideo compression applications. In this paper, we provide a comprehensive review on low complexityquantization techniques in HEVC, including both fast RDOQ and all-zero block detection. In particular,the fast RDOQ relies on rate and distortion models for rate-distortion cost estimation, such that the mostappropriate quantized coefficient is selected in a low complexity way. All-zero block detection infers theall-zero block by skipping transform and quantization, in an effort to further reduce the complexity. Therelationship between the two techniques is also discussed, and moreover, we also envision the future designof low complexity quantization in the upcoming Versatile Video Coding (VVC) standard.

INDEX TERMS High efficiency video coding, low complexity video coding, rate-distortion optimizedquantization.

I. INTRODUCTIONWith the fast development of the network technology andacquisition devices, videos play a more and more critical rolein numerous applications, ranging from industrial productionto consumer entertainment. The explosive growth of videodata creates an urgent demand to develop more efficient com-pression technologies to promote the video coding efficiency.The video coding technologies have evolved over the pastfew decades. After the standardization of the most prevalentstandard H.264/AVC [1] in 2003, the ISO/IEC Moving Pic-ture Experts Group (MPEG) and the ITU-T Video CodingExperts Group (VCEG) have collaborated to develop theHigh Efficiency Video Coding (HEVC) standard [2], whichreduces around 50% bit-rates under the same perceptual qual-ity compared to H.264/AVC. HEVC was finalized in 2013,which is a milestone in the video coding.

The associate editor coordinating the review of this manuscript andapproving it for publication was Jenny Mahoney.

Compared with its ancestor H.264/AVC, more efficientcoding tools were explored and adopted by the HEVC stan-dard. More specifically, HEVC employs flexible block treestructures to better adapt to the local characteristics of thevideos, such as coding unit (CU), prediction unit (PU) andtransform unit (TU) [3]. Moreover, the intra prediction modesare expanded from 9 to 35, with the goal of better inter-preting the complicated texture directions [4]. Meanwhile,a series of advanced inter prediction techniques have beenadopted to remove the temporal redundancies [5]. Regardingthe quantization, the hard-decision quantization has graduallyevolved into soft quantization strategies [6], and the ratedistortion optimization (RDO) is introduced to the quanti-zation process, leading to rate distortion optimized quanti-zation (RDOQ) [7] by which the most efficient quantizationlevel for individual coefficient can be determined followingthe sense of RDO. Regarding the removal of the statisti-cal redundancy, context-adaptive binary arithmetic coding(CABAC) [8] is utilized, which collaborates the context mod-eling with the binary arithmetic coding and removes the

VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 145159

https://orcid.org/0000-0002-5655-1464

https://orcid.org/0000-0001-9718-3170

https://orcid.org/0000-0003-2118-4876

M. Wang et al.: Low Complexity Quantization in HEVC

statistical redundancies in a lossless manner. Recently, thecontinuous development of video coding technologies leadsto the next generation of video coding standard VersatileVideo Coding (VVC) [9], which was launched in 2018.

In the hybrid video coding framework, almost all themodules have been enhanced during the development ofVVC, such as more flexible coding partitions [10], advancedintra/inter predictions [11]–[16], as well as advanced trans-form cores [17] supported for signal energy compaction.Moreover, along with the development of video codingstandards, a series of video coding techniques have alsobeen proposed, including extended quad-tree (EQT) [18],[19], history-based motion vector predictions [16], [20] andcross-component linear model prediction [21], [22]. Thesetechniques have also been shown to significantly improvethe coding performance. Regarding the quantization whichserves as one of the core stage in the hybrid video codingframework, the trellis-coded quantization scheme has beenintroduced in [23], where the quantization candidates areelaborately mapped into a trellis graph collaborating with thestate transferring mechanism.

Generally speaking, the quantization schemes evolvefrom hard-decision quanitzation which relies on the inputtransform coefficient and quantization step only, towardsoft-decision quantizaion which optimizes the quantizationprocess based on RDO. Given a quantization parameter(QP), the uniform hard-decision quantization straightfor-wardly maps a transform coefficient to the correspondingquantized level, which has been widely adopted in earlyvideo codecs. Moreover, the uniform hard-decision quan-tization with a dead-zone was adopted in the H.264/AVC,wherein the rounding offset is determined by the distributionof residual coefficients [24]. With soft-decision quantization,the inter-dependencies among the quantized residuals withinone transform block (TB) are also taken into account duringthe determination of the quantization level. In particular,the RD cost of quantization candidates will be elaboratelyevaluated such that the derived quantization levels whichare determined in a soft manner could strike an excellenttrade-off between the coding bits of the residuals and thequantization distortions. It was reported that the soft deci-sion quantization could bring 6% to 8% bit-rate savings atthe cost of high computational complexity compared withthe conventional hard-decision quantization with the dead-zone [25]. However, as the residual coding bits should becalculated synchronously through the entropy coding, thehigh complexity of soft decision quantization could hinderits application.

In the literature, numerous schemes have been devel-oped to achieve soft decision quantization in video coding.It was implemented with trellis searching in H.263+ andH.264/AVC [26], wherein transform coefficients and the con-text states are deployed to the trellis graph which delicatelyrepresents the combination of the available quantization can-didates. Moreover, the associated RD cost of each quanti-zation candidate is integrated to the trellis branch, such that

FIGURE 1. Taxonomy of the low complexity quantization schemes.

the optimal path can be decided by dynamic programming orViterbi search. However, it is acknowledged that executingfull trellis search in quantization involves extremely highcomputational complexity. In view of this, trellis searchingis simplified as the RDOQ by which sub-optimal quantiza-tion can be achieved. In particular, RDOQ has been widelyemployed in the H.264/AVC, HEVC and AVS2 [27] encoder,which examines limited number of quantization candidates,and finally the one with the minimum RD cost for the currenttransform coefficient is retained.

In this paper, we focus on the quantization in video coding,which is of prominent importance in controlling the distortionlevel and coding bitrate by reproducing the residuals withdifferent quantization levels. The advanced quantization tech-niques adopted in HEVC are first reviewed, following whichthe low complexity quantization techniques are introduced.The aim of the developed fast quantization techniques isto infer the best quantized coefficient in a most efficientway, and numerous approaches have been proposed towardsthis goal from different perspectives, as illustrated in Fig. 1.More specifically, the systematic review is conducted basedon the categories of the low complexity quantization tech-niques. In particular, we divide them into two categorizes,including fast RDOQ and all zero block (AZB) detection.In fast RDOQ, we introduce the statistics and RD modelbased methods. In AZB, we review the genuine AZB andpseudo AZB detection methods. As such, all aspects thatcould lead to low complexity quantization in the literaturehave been considered. Finally, we discuss the future quantiza-tion optimization techniques in the upcoming VVC standardin which more advanced quantization techniques have beenadopted. Overall, the aim of this paper is not limited to onlyproviding a review on the low complexity implementation of

145160 VOLUME 8, 2020


quantization in HEVC, but it is also highly anticipated that itcould shed light on developing low complexity quantizationoptimization schemes for VVC with a principled way.

II. QUANTIZATION IN HEVCIn this section, we revisit the quantization in the HEVC stan-dard. In principle, the RDOQ serves as the optimization toolto further optimize the hard-decision quantization strategywithout leading to any variation to the decoder. Given thetransform coefficient Ci,j and quantization step sizeQstep, thequantization level li,j with the hard-decision quantization canbe formulated as follows,

li,j = sign(Ci,j) ·⌊|Ci,j|QStep

+ f⌋, (1)

where f represents the rounding offset which is usually setaccording to the slice type [24].

In RDOQ, the RDO strategy is embedded to pursue theoptimal quantized coefficient. More specifically, the target ofthe RDO is to minimize the distortion D under the constraintof the coding bits budget R̂, which can be expressed asfollows,

minli,j∈Li,j

D, s.t. R ≤ R̂, (2)

where Li,j denotes the set of the quantization candidates ofthe transform coefficient at position (i, j). To convert suchconstraint problem to an unconstrained one, Lagrangian mul-tiplier λ is introduced, leading to the following optimizationproblem,

minli,j∈Li,j

J , where J = D+ λ · R. (3)

The RDOQ selects the optimal quantization levels accordingto Eqn. (3). More specifically, there are two main proce-dures in RDOQ. First, a pre-quantization is conducted fora transform coefficient Ci,j following the reverse scan order(diagonal, or vertical/horizontal allowed for certain blocks).The quantiziation candidates lceili,j and lfloori,j can be derived asfollows,

lceili,j =

⌊|Ci,j|Qstep

+12

⌋, (4)

lfloori,j =

0 lceili,j = 0

0, 1 lceili,j = 2

lceili,j − 1 otherwise.

(5)

As such, the optimal quantization can be selected accordingly,

l̂i,j = argminLi,j

(D+ λ · R), (6)

where

Li,j = {0, lfloori,j , lceili,j }. (7)

RDOQ and residual entropy coding are applied based on thecoefficient group (CG) [28], which is defined as a groupof 4 × 4 sub-blocks within one TB. The second step of

RDOQ aims to determine whether the current CG can bequantized to all-zero CG based on RD examination whereinRD-costs regarding the original quantized CG and all-zeroCG are respectively calculated. Meanwhile, the position oflast non-zero coefficient will be checked in the sense of RDOfollowing the traversing order.

In RDOQ, the distortion is typically defined as the sumof square error (SSE) between the original transform coef-ficients and the dequantized coefficients, and the codingbits are obtained through CABAC coding. As such, RDOQinvolves a considerable number of RD cost calculations. TheRD calculation and checking procedure should be iterativelyapplied to each individual transform coefficient within aCG, and each individual CG within one TB. Each transformcoefficient is associated with at least two quantization candi-dates. Furthermore, the context model updating in CABACfurther imposes burdens on the computational complexity.Therefore, it is highly desirable to investigate the fast RDOQscheme to facilitate the application of RDOQ. In the lit-erature, numerous works have been done to accelerate theRDOQ. One typical category concentrates on simplifying thequantiztion procedure, and another attempts to detect all zeroblocks in advance to bypass the tedious processes includingtransform, quantization, entropy coding, inverse quantizationand inverse transform.

III. FAST RDOQRDOQ aims to determine an optimal set of quantizationresults that achieves the lowest RD cost for a TB. Such kindof RD-based determination undoubtedly brings compressionperformance gains while in turn increasing the computationalcomplexity. Experimental results in [29] on the latest HEVCtest platform reported that RDOQ can achieve around 3% to5% BD-Rate [30] savings along with 12% to 25% encodingtime increment for HEVC. In the literature, there are twomainstrategies to achieve the low complexity RDOQ, including thestatistics-based methods and the RD model based methods.

A. STATISTICS-BASED FAST RDOQThe statistics-based approaches target to empirically skip theRDOQ according to the statistical analyses, avoiding theunnecessary computations in the recursive RDO process ofthe encoder.

In [31], a special block type named All Quantized ZeroBlock except DC Coefficient (AQZB-DC) is detected, whichoccupies around 20% of the non-zero blocks. Moreover, sta-tistical results reveal that over 30% of the DC coefficientsin AQZB-DC block maintain the lceili,j . As such, a predictionmodel is investigated for AQZB-DC block, which adaptivelyregulates the quantization level of DC coefficient. In this way,the RDOQ procedure can be bypassed for this type of TBs.

The residual quad-tree structure implicitly aggravates thecomputational burden of quantization [32], since the RDOQis repetitively invoked under the recursively TB divisionstructure. Moreover, statistical results show that RDOQbrings ignorable influence on the TB size determination,

VOLUME 8, 2020 145161


and the TB partitioning accuracy still reaches to 95% if thehard-decision quantization is employed. As such, the authorsin [32] proposed to directly apply the hard-decision quan-tization in the TB decision rounds, and employ the RDOQafter obtaining the best TB sizes. This method reduces 27%quantization complexity and the BD-Rate loss is 0.25% underlow delay P configuration.

In [33], the RDOQ bypass scheme is proposed based on thestatistics of transform coefficients. In particular, even thoughRDOQ achieves considerable performance improvements forHEVC, it cannot always bring variations to the quantizedlevel compared to the hard-decision quantization. One par-ticular example is that if the current TB is an all-zero blockafter pre-quantization, where the RDOQ is not necessary.Moreover, if the quantization outcomes of the hard-decisionquantization are identical to the those of RDOQ, calculatingthe RD cost is rather wasteful. According to the statisticalexperiments, it is found that when all the transform coef-ficients within one TB are smaller than a threshold that isgoverned by the quantization step size, the current TB can bedirectly determined to be zero TB without RDOQ. Moreover,based on the statistical result that when the sum of the abso-lute quantized coefficients in one TB is smaller than a giventhreshold, indicating that the non-zero coefficients occupy asmall fraction, RDOQ will be bypassed and hard-decisionquantization will be invoked to economize the encoding time.

In [34], the authors proposed to simplify the selection ofquantization candidates and the searching of last non-zerocoefficient for RDOQ. In particular, the conditional prob-ability P(l̂i,j 6= lceili,j |l

ceili,j = L) is evaluated, where l̂i,j

denotes the quantization level selected by RDOQ for posi-tion (i, j), as defined in Eqn. (6). L could be 1, 2, 3 orlarger than 3. Statistical results under random access con-figuration with varied QPs reveal that when L is largerthan 3, the P(l̂i,j 6= lceili,j |l

ceili,j = L) is 4% on average, indicat-

ing that quantization level is prone to remain unchanged withthe result of the pre-quantization, such that RDOQ can beskipped. Furthermore, the simplification of the last non-zerocoefficient searching is investigated for 4× 4 TBs, on whichthe searching scale is shrunk to the first four non-zero coeffi-cients.

In [35] and [29], based on the observations that RDOQtends to adjust the quantization level ‘‘1’’ to ‘‘0’’ for thecoefficients locating at high frequency domain in larger TBs,an early quantization level decision scheme is proposed,which forces the quantization level to be zero without RDOQprocess [29], [35],

l̂i,j=

0, if lceili,j =1,W ≥16, and pCG≤NCG2

lceili,j , otherwise,(8)

where pCG corresponds to the explicit position of the cur-rent CG following the scanning order and NCG denotes thetotal number of CGs within one TB. Here, W representsthe TB width. The proposed scheme brings 12.84% timesavings for quantization and the BD-Rate [30] loss is 0.21%

under all intra configuration. In addition, considering thatfor some sequences the probability of the adjustment case(i.e.P(l̂i,j = 0|lceili,j = 1)) is less than 70%, the authors proposeto employ adaptive rounding offset for calculating lceili,j duringthe pre-quantization stage. In particular, the rounding offset fis adjusted as follows [35],

f =

13, W ≥ 16, pCG ≤

NCG2

for I slice

16, W ≥ 16, pCG ≤

NCG2

for P and B slice

12, otherwise,

(9)

The proposed adaptive rounding offset achieves 15.29%quantization time savings with very negligible BD-Rate loss(0.01% on average) under all intra configuration.

B. RATE DISTORTION MODELS FOR FAST RDOQTo obtain the RD cost with respect to each quantization candi-date, the actual entropy coding collaborated with the contextmodeling and updating are carried out, which is consideredto be the major component that aggravates the computationalburden. Therefore, the key to achieve the low complexityRDOQ is to establish an accurate RD model to estimate theRD cost, instead of actual encoding. In the literature, variousrate and distortion models have been investigated [36]–[39],generally with the aim of efficient bit allocation, rate control,and fast RDO decision. However, only a few RD modelingstudies targeting on accelerating quantization have been con-ducted [33], [40]–[42].

Considering the fact that RDOQ adjusts quantization levelsby comparing the RD cost of lceili,j and lfloori,j , the RD cost dif-ference with respect to different quantization candidates arederived. In particular, Lee et al. [33] formulated a simplifiedlevel adjustment method with the 1J estimation model asfollows [33],

1J = J (lceili,j )− J (lfloori,j )

= 1D+ λ ·1R, (10)

wherein 1D and 1R denote the differences of the distor-tions and rates between lceili,j and lfloori,j for the coefficient at

position (i, j). By involving the float expression lfloati,j of thepre-quantization result, the 1D is given by [33],

1D =

{(−2b− 1) · Q2

step b < 0.5(−2b+ 1) · Q2

step b ≥ 0.5,(11)

where b is the decimal part of lfloati,j . In this way, the dequan-tization process can be safely removed. Regarding the rateestimation, a series of syntax elements such as ‘‘signif-icant_flag’’, ‘‘greater_than_one’’, ‘‘greater_than_two’’ and‘‘remaining_level’’, are involved in the coding of the quan-tized coefficient, such that the coding bit differences betweenlceili,j and lfloori,j can be represented as follows [33],

1R = 1Rsig +1Rgt1 +1Rgt2 +1Rrem. (12)

145162 VOLUME 8, 2020


TABLE 1. The estimation of 1R with three syntax elements by referencing the value of lceili,j [33].

The rate difference of the first three syntax elements canbe deduced according to the value of lceili,j , as illustrated inTable 1. The explicit value of those four syntax elements canbe obtained through a look-up-table that has been defined inHEVC test model.

The 1J model has also been established in [40] for lowcomplexity RDOQ. Typically, the coding bits of the signflag and the bits for representing the position of the lastsignificant coefficient are additionally involved in the 1Restimation. Subsequently, by setting the 1J in Eqn. (10) tozero, a threshold for T1li,j can be derived as follows [40],

T1li,j = −(0.5+ 0.2362 · β), (13)

where

1li,j = lceili,j − lfloati,j . (14)

Herein, β is a scaling factor in the transition of λ and QP thatis defined in the HM platform as follows [43],

λ = β · 2(QP−12)

3 . (15)

In this manner, the optimal quantization level l̂i,j can bedetermined [40],

l̂i,j =

lceili,j , if 1li,j ≥ T1li,j

lfloori,j , if 1li,j < T1li,j .(16)

In [41], the philosophy of comparing the rate-distortioncost based on1J is employed again, wherein the coding bitsregarding the residual coding syntax elements are inferredfrom statistical probability and information entropy. Besides,based on the rate estimation in [41], a parallel RDOQ schemefor HEVC open-source encoder x265 [44] is proposed [45],with which the RDO procedures can be parallelly executedon GPU, leading to the real-time encoding for 4K sequences.Moreover, Yin et al. [25] proposed a soft-decision quanti-zation scheme which reveals great benefits in enhancing thethroughput of the hardware.

To establish the rate and distortion models for RDOQ,Cui et al. [42] proposed to model the transform coefficientswith hybrid Laplacian distributions at TB level. In HEVC,residual quad-tree partitioning is employed, leading to variedTB sizes from 4 × 4 to 32 × 32, such that the behaviorsof coefficient distribution are distinct. The proposed hybrid

Laplacian distributions contain a succession of models withdifferent parameters, with the goal of better accommodatingthe characteristics in varied TB sizes. Moreover, the trans-form types regarding the 4 × 4 TUs are taken into accountin the modeling, such as DCT-II, DST-VII and transformskip. The hybrid Laplacian distribution can be formulated asfollows [42],

fHLk (Ci,j) = ωk ·λk

2e−λk ·|Ci,j|, (17)

where λk denotes the Laplacian parameter andωk is a weight-ing factor. k denotes the TB layer index. Layer 0 to layer 3 cor-respond to the TUs with sizes of 32×32 to 4×4. Layer 4 andlayer 5 indicate the 4 × 4 TBs that employ DST-VII andtransform skip, respectively. An online updating strategy isinvolved for the parameter refinement. After obtaining themodel parameters, the cumulative probability with respect todifferent quantization levels can be derived, such that the cod-ing bits can be acquired by integrating the self-informationof the quantized symbol. Furthermore, the estimated codingbits R̂k are derived according to a linear mapping with theself-information r̂k wherein the linear model parameters ξand γ are initialized and updated according to least squareregression. In terms of the quantization distortion model-ing, the quantization level originated from the hard-decisionquantization is employed, with which the SSEs betweenthe dequantized coefficients and original transform coeffi-cients are regarded as the quantization distortion. In thismanner, the RD cost for each quantization candidate can bederived. Finally, an estimated optimal quantization result canbe derived in an analytical way by minimizing the RD-costwith different quantization candidates as follows [42],

l̂i,j =⌊Ci,jQstep

−log2 e · α · ξ · λ · λk

2 · Qstep+

12

⌋, (18)

where α is an off-line trained parameter used for adjustingthe model accuracy. All-zero CGs can also be effectivelydetermined according to the threshold as follows [42],

Ci,j <(Qstep + log2 e · ξ · α · λ · λk

2

). (19)

If all the transform coefficients within one CG satisfyEqn. (19), suchCG can also be determined as the all-zero CG.

VOLUME 8, 2020 145163


IV. ALL ZERO BLOCK DETECTIONAll zero block (AZB), for which the prediction signalsreassemble as the reconstruction pixels, has been commonlyobserved especially in low bit-rate coding scenarios. Thequantized coefficient levels within an AZB are all zeros,such that early detecting AZB before transform or quantiza-tion is beneficial to economize the encoding computationalresources. In this manner, the encoding procedures, such astransform, quantization, residual coding, inverse quantizationand inverse transform can be straightforwardly skipped.

There have been numerous works focusing on the fore-cast of AZBs [31], [32], [46]–[53]. In particular, Wang andKwong [48] proposed to detect the zero quantized coef-ficients with a hybrid model, which is typically designedfor 4 × 4 blocks with integer DCT transform in H.264/AVC.The spatial domain residuals are modeled with Gaus-sian distribution, and subsequently, multiple levels of thedetermination thresholds with respect to the sum of the abso-lute difference (SAD) are derived for the detection of thezero coefficients. To accommodate the Hadamard transforminvoked by H.264/AVC, the hybrid model is adjusted accord-ingly [49], where the sum of the absolute transform differ-ence (SATD) is used to replace the SAD in [48].

Compared with H.264/AVC, HEVC adopts a series ofadvanced prediction technologies, leaving more spaces forthe improvement of the AZB detection. Moreover, consid-ering the RDOQ, which quantizes the coefficients in a softmanner in HEVC, more AZBs are generated since the all zerocases may achieve superior RD performance. In addition,HEVC introduces larger TB sizes (i.e. 16 × 16, 32 × 32),making the zero block detection more challenging, as thelarger TUs involve more coefficients with distinct proper-ties. As such, the AZB detection methods investigated forH.264/AVC may not be applicable to HEVC.

To better collaborate with RDOQ process in HEVC,several investigations concentrate on the detection of twotypes of AZBs, including the genuine AZB (G-AZB) andpseudo AZB (P-AZB) [50]–[52]. In particular, G-AZBdenotes the TBs that can be quantized to AZB throughthe hard-decision quantization. P-AZB represents thosethat could be potentially placed to AZB through RDOQ.For clearer explanations, the hard-decision quantization inEqn. (1) is equivalently interpreted as follows,

li,j = sign(Ci,j) · (|Ci,j| ·M + offset)� Qsh, (20)

where M is a multiplication factor relevant to QP. offsetdenotes the scaled rounding offset relying on slice types. Qshdepends on the QP, TB sizes and the coding bit-depth [2].In G-AZB, the absolute value of li,j should be less than 1,such that given the TB size W and quantization parameterQP, the detection threshold for individual DCT coefficientCi,j can be described as follows [31], [33], [50]–[52],

T =(2Qsh − offset

M

). (21)

The threshold T has been widely employed for the detectionof the AZBs.

Cui et al. [50] proposed a hybrid AZB detection methodfor HEVC. Initially, Walsh-order Hadamard transform isemployed for 4× 4 and 8× 8 TUs to replace the DCT, in aneffort to reduce the computational complexity, and DCT isused for 16× 16 and 32× 32 TUs. The associated SATD? fordifferent TB sizes is extracted and normalized. Two G-AZBdetection thresholds are proposed wherein the first thresholdT (1)SATD? is derived by adding up the single-coefficient based

threshold in Eqn. (21) as follows [50],

T (1)SATD? = W 2

· T , (22)

where W denotes the size of the TB. Subsequently, by mod-eling the prediction residuals with the Laplacian distribution,another threshold for G-AZB detection can be represented asfollows [50],

T (2)SATD? =

W 3· T

3√2µ · γ

(23)

where

γ = maxi,j∈[0,W−1]

(√ij),

µ =1

2W 2

W−1∑i=0

W−1∑j=0

√ij,

ij =[AHωRATHT

ω

]i,i

[AHωRATHT

ω

]j,j. (24)

Herein, R is a relevance matrix. A denotes the sparse matrix,andHω represents the core ofWalsh-orderedHadamard trans-form defined in [53]. As such, the G-AZB threshold withrespect to SATD? can be obtained as follows [50],

TSATD? = min{T (1)SATD? ,T

(2)SATD?

}. (25)

In [51], the authors proposed to modify the Hadamardtransform based all zero block detection [54] to better adaptto the transform and quantization characteristics of HEVC.In particular, the G-AZB detection thresholds with respect todifferent TB sizes are defined as follows [51],

TH = m · T , (26)

where

m =

8, W = 322, W = 161/2, W = 81/8, W = 4.

(27)

Since the Hadamard transform will be performed by defaultfor 4 × 4 and 8 × 8 TUs on the HM platform, employingthe Hadamard-based AZB detection for smaller TUs willnot additionally bring in computational costs. However, forlarger TUs such as 16 × 16 and 32 × 32, the computationalburden regarding the Hadamard transform is much heav-ier. Therefore, the uniformity of the Hadamard coefficients

145164 VOLUME 8, 2020


within larger TUs is evaluated. First, the 16×16 and 32×32TUs will be divided into 8× 8 sub-blocks on which the 8× 8Hadamard transform is conducted. Subsequently, the top leftDC coefficients within each 8 × 8 sub-blocks are extracted,forming the 2× 2 and 4× 4 DC blocks. Then the Hadamardtransform is performed again on the DC blocks termed asthe DC Hadamard transform. It should be noted that the DCcoefficients within each 8 × 8 sub-blocks can be efficientlyobtained by adding up the residuals in the spatial domain.If all the coefficients are smaller than TH , the TB can bedetermined as an AZB.

Based on the G-AZB detection threshold in [51],Fan et al. [52] additionally introduced a lower and a higherSAD thresholds to classify the all-zero and non-all-zeroblock. The lower threshold T lowSAD is set as ( d

100 · T ), where ddenotes the TB depth. If the SAD of the TB is smaller than thelower threshold, it can be determined as the G-AZB. More-over, the higher threshold T highSAD is defined as follows [52],

T highSAD =

√√√√√ε · W−1∑i=0

W−1∑j=0

|Ci,j − C ′i,j|2, (28)

where |Ci,j−C ′i,j| denotes the quantization error which shouldbe lower than the T derived in Eqn. (21) with specificQP andTB size. ε is empirically set as follows [52],

ε =

580/1024, W = 32148/256, W = 1637/64, W = 817/16, W = 4.

(29)

The P-AZB detection will be performed on thenon-G-AZBs. In [51], the threshold for deducing P-AZBis enlarged to be the twice of TH in Eqn. (26). Moreover,to prevent the false determination, the conditions for P-AZBdetection are tightened. Essentially, the key to the P-AZBdetermination is whether the RD cost of AZB is lower thanthe non-all-zero block (NZB). The associated RD cost can beformulated as follows,

JAZB = DAZB + λ · RAZBJNZB = DNZB + λ · RNZB. (30)

In particular, RAZB denotes the coding bits of an AZB whichis approximated to be 1. Moreover, DAZB can be estimated tobe the square of the residual coefficients ri,j in spatial domain.As such, JAZB can be described as follows [52],

JAZB =W−1∑i=0

W−1∑j=0

r2i,j + λ. (31)

Regarding the DNZB, it can be formulated as follows [52],

DNZB =

W−1∑i=0

W−1∑j=0|offset − disi,j|2

e ·M2 , (32)

where

disi,j = (Ci,j ·M + offset)&[(1� Qsh)− 1], (33)

e =

580, W = 32148, W = 1637, W = 817, W = 4.

(34)

By exploiting the rate estimation scheme in [55], the RNZBin [52] is estimated with the self-information. Consideringdifferent TB sizes, the transform coefficients are modeledwith the generalized Gaussian distribution.

Regarding the P-AZB detection in [50], to better adapt tothe characteristics of the larger TUs, the TB will be dividedinto the high-frequency region and low-frequency regionaccording to the QP and the maximum allowed QP valueQPmax . The size of the low-frequency region can be derivedas follows [50],

WL = maxQP,W

{round

(1−

QPQPmax

), 1}. (35)

The P-AZB detection is specifically performed on thelow-frequency region. In particular, the TB will be regardedas the P-AZB if the maximum transform coefficient withinthe low-frequency domain is smaller than the followingthreshold [50],

maxi,j∈[0,WL ]

{|C(i, j)|} <ξ · 2Qsh − offset

M, (36)

where ξ is empirically set to 2.2. To further investigate theP-AZBs incurred by RDOQ, RD comparisons regarding theJAZB and JNZB are investigated. In particular, JAZB [50] isderived as follows,

JAZB =(µ · SATD?)2 + λ

e, (37)

where e is defined in Eqn. (34). Moreover, the DNZB can beobtained by off-line training of the quantization distortions|Ci,j − C ′i,j|

2. RNZB is estimated with the linear combinationof the SATD? and the number of Hadamard transform coef-ficients that are larger than a threshold. Consequently, twodetermination ranges are derived to forecast the P-AZB.

V. RATE-DISTORTION PERFORMANCE AND CODINGCOMPLEXITYThe performances of the existing fast RDOQ and AZB detec-tion algorithms regarding the RD performance and encodingcomplexity are presented and discussed in this section. In par-ticular, the RD performance is measured by the BD-Rate [30]of luma component, where positive BD-Rate indicates theloss of compression performance. For the fast RDOQ algo-rithms, the quantization time savings 1TQ are used to evalu-ate the computational efficiency,

1TQ =T ancQ − T proQ

T ancQ× 100%, (38)

VOLUME 8, 2020 145165


TABLE 2. Coding performance and quantization time savings of fast RDOQ methods under AI configuration.

TABLE 3. Coding performance and quantization time savings of fast RDOQ methods under RA configuration.

where T ancQ and T proQ denote the quantization time of theanchor encoder and the one with proposed fast RDOQscheme, respectively. Moreover, for the AZB detection algo-ritions, the time savings 1TTQ regarding the transform,quantization, inverse quantization and inverse transform aremeasured as follows,

1TTQ =T ancTQ − T

proTQ

T ancTQ× 100%, (39)

where T proTQ and T ancTQ denotes the time consumed by trans-form, quantization and associated inverse processes withand without AZB detections, respectively. The fast RDOQschemes and AZB detection schemes are all implementedon HEVC test platform (different versions) wherein the fastRDOQ schemes are evaluated under all intra (AI), randomaccess (RA) and low delay (LD) configurations. Since it israre for AZBs under AI configuration, the performance of

145166 VOLUME 8, 2020


TABLE 4. Coding performance and quantization time savings of fast RDOQ methods under LD configuration.

AZB detection algorithms are mainly validated under RA andLD configurations.

The performance of several fast RDOQ schemes includingCui et al. [42], He et al. [41], Lee et al. [33], Xu et al. [29],Wang et al. [40], Zhang et al. [32] and Wang et al. [31] arepresented in Table 2, Table 3 and Table 4. The version of testplatforms, as well as the performance of individual sequencesare all presented. The hybrid Laplacian based fast RDOQscheme [42] strikes excellent trade-off between the codingperformance and computational complexity, where around70% quantization time can be saved compared to the con-ventional RDOQ along with only 0.3% BD-Rate increases.Moreover, He et al.’s method [41] could achieve competitiveacceleration whereas the coding performance loss is slightlyhigher. The performance of RDOQ bypass method combinedwith 1J estimation model proposed by Lee et al. [33] isrelatively conservative, as it introduces very negligible perfor-mance loss with moderate speedup. The statistical based fastRDOQ schemes proposed in [29], [32], [40] and [31] couldbring 30% to 40% quantization time reductions.

Furthermore, the performance of AZB schemes proposedby Cui et al. [50], Fan et al. [52] and Lee et al. [53] aretabulated in Table 5. Fan et al.’s method achieves the high-est time savings (over 40%) in terms of the transform andquantizaion. Meanwhile, the BD-Rate loss is around 0.5%.In addition, Cui et al.’s method [50] could more preciselypredict the AZB including the G-AZB and P-AZB, suchthat more than 20% time savings could be achieved withonly 0.06% performance loss. It is observed that Cui et al.’smethod [50] performs extremely well on 4K sequences in

Class A1 where even performance gains can be noticed with20% to 26% time savings, providing enlightenment to theoptimization of the 4K video application scenarios.

VI. DISCUSSIONSThe fast RDOQ and all zero block detection have interest-ing connections. Both of them aim to alleviate the issue ofcomplicated rate-distortion cost calculations in deducing theoptimal coefficient level. However, fast RDOQ focuses on theinference at the coefficient level while all zero block detectionworks on the block level. As such, they could seamlesslywork together towards the low complexity quantization opti-mization. In particular, traditional all zero block detectionmethods rely on the threshold according to statistical datacombining the traits of the hard-decision quantization. Withthe cooperation of the RDOQ, the RD models can be refinedas finally the level with the minimal RD cost is selected.As such, the collaborative design based on fast RDOQ andall zero block detection is beneficial, since eliminating theall zero blocks before conducting the fast RDOQ could yieldreasonable optimization results in reducing the overall com-putational complexity.

In VVC, trellis-based quantization scheme is adoptedwherein two quantizers are employed synchronously coop-erating with four transition states under the control of theparity of quantization levels.Meanwhile, the number of quan-tization candidates are doubled, and the associated RD costsare arranged in the trellis graph. As such, the computationalcomplexity regarding the trellis-based quantization is ele-vated compared with the RDOQ. Inspired by the existing

VOLUME 8, 2020 145167


TABLE 5. Coding performance and quantization time savings of AZB detection methods under RA and LD configurations.

works regarding the fast RDOQ and all zero block detectionschemes for HEVC, a more promising way of low complexityquantization in VVC should enjoy a series of desired proper-ties. First, the coefficient speculations should be of extremelylow complexity, and friendly for real applications. Second,the low complexity quantization optimization algorithm forVVC should well accommodate the trellis structure. Third,since the entropy coding scheme of residuals in VVC havebeen advanced accordingly to better adapt to the quantizationbehaviors, the RD model should be re-established. Last butnot least, multiple transform selection strategy introducesnew types of transform cores, which are better to be takeninto account as well in the RD modeling.

With the surge of deep learning, quantization has alsofound its applications in other scenarios such as featurecompression [56]–[58], end-to-end compression [59], [60]and deep neural network compression [61]–[63]. Thoughthe majority of state-of-the-art quantization methods in thesetasks still rely on hard-decision scalar quantization, it ishighly expected that the soft decision and vector quantizationtechniques can further improve the performance, along withthe advanced quality assessment techniques developed forcorrespondingmodalities. To enable these techniques in thosetasks for real applications, similar methodologies motivatedfrom the low complexity quantization in video coding arealso indispensable. Compared to the existing methods, bet-ter rate-distortion performance with the marginally highercomplexity are expected to be achieved. Moreover, suchlow complexity quantization design philosophies in videocoding could have interesting connections with the recently

developed compression techniques such as network prunning.The learning based quantization that directly maps the to-be-quantized signals to be the optimal representation is alsohighly desirable in these compression tasks.

VII. CONCLUSIONSQuantization plays a critical role in balancing rate and dis-tortion in the current video coding standard. In this paper,we review the low complexity quantization techniques inHEVC, and envision future development of quantization opti-mization in the VVC standard. In essence, fast quantizationtechniques rely on the derivation of the best quantized indexwithout going through the tedious transform, quantizationand entropy coding process. In the future, it is also anticipatedthat the quantized coefficient can be intelligently determinedin a low complexity and high efficiency way based on therecent advances of machine learning, leading to better per-formance in terms of rate-distortion-complexity cost.

REFERENCES

[1] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, ‘‘Overview ofthe H.264/AVC video coding standard,’’ IEEE Trans. Circuits Syst. VideoTechnol., vol. 13, no. 7, pp. 560–576, Jul. 2003.

[2] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, ‘‘Overview of thehigh efficiency video coding (HEVC) standard,’’ IEEE Trans. Circuits Syst.Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.

[3] I.-K. Kim, J. Min, T. Lee,W.-J. Han, and J. Park, ‘‘Block partitioning struc-ture in the HEVC standard,’’ IEEE Trans. Circuits Syst. Video Technol.,vol. 22, no. 12, pp. 1697–1706, Dec. 2012.

[4] J. Lainema, F. Bossen, W.-J. Han, J. Min, and K. Ugur, ‘‘Intra coding ofthe HEVC standard,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 22,no. 12, pp. 1792–1801, Dec. 2012.

145168 VOLUME 8, 2020


[5] P. Helle, S. Oudin, B. Bross, D. Marpe, M. O. Bici, K. Ugur, J. Jung,G. Clare, and T. Wiegand, ‘‘Block merging for quadtree-based partitioningin HEVC,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12,pp. 1720–1731, Dec. 2012.

[6] E.-H. Yang and X. Yu, ‘‘Soft decision quantization for H.264 with mainprofile compatibility,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 19,no. 1, pp. 122–127, Jan. 2009.

[7] M. Karczewicz, Y. Ye, and I. Cheong, Rate Distortion Optimized Quanti-zation, document VCEG-AH21, Jan. 2008.

[8] V. Sze and M. Budagavi, ‘‘High throughput CABAC entropy codingin HEVC,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12,pp. 1778–1791, Dec. 2012.

[9] B. Bross, J. Chen, and S. Liu, Versatile Video Coding (draft 4), docu-ment JVET-M1001, 2019.

[10] X. Li, H.-C. Chuang, J. Chen, M. Karczewicz, L. Zhang, X. Zhao, andA. Said,Multi-Type-Tree, document JVET-D0117, Joint Video ExplorationTeam, 2016.

[11] S. De-Luxan-Hernandez, V. George, J. Ma, T. Nguyen, H. Schwarz,D. Marpe, and T. Wiegand, ‘‘An intra subpartition coding mode for VVC,’’in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2019, pp. 1203–1207.

[12] L. Zhao, X. Zhao, S. Liu, X. Li, J. Lainema, G. Rath, F. Urban, andF. Racape, ‘‘Wide angular intra prediction for versatile video coding,’’ inProc. Data Compress. Conf. (DCC), Mar. 2019, pp. 53–62.

[13] N. Choi, Y. Piao, K. Choi, and C. Kim, CE 3.3 Related: Intra 67 ModesCoding With 3 MPM, document JVET-K0529, Joint Video ExplorationTeam, Jul. 2018.

[14] H. Liu, L. Zhang, K. Zhang, J. Xu, Y. Wang, J. Luo, and Y. He, ‘‘Adaptivemotion vector resolution for affine-inter mode coding,’’ in Proc. PictureCoding Symp. (PCS), Nov. 2019, pp. 1–4.

[15] K. Zhang, Y.-W. Chen, L. Zhang, W.-J. Chien, and M. Karczewicz, ‘‘Animproved framework of affine motion compensation in video coding,’’IEEE Trans. Image Process., vol. 28, no. 3, pp. 1456–1469, Mar. 2019.

[16] L. Zhang, K. Zhang, H. Liu, H. C. Chuang, Y. Wang, J. Xu, P. Zhao, andD. Hong, ‘‘History-based motion vector prediction in versatile video cod-ing,’’ in Proc. Data Compress. Conf. (DCC), Mar. 2019, pp. 43–52.

[17] X. Zhao, J. Chen,M. Karczewicz, A. Said, and V. Seregin, ‘‘Joint separableand non-separable transforms for next-generation video coding,’’ IEEETrans. Image Process., vol. 27, no. 5, pp. 2514–2525, May 2018.

[18] M. Wang, J. Li, L. Zhang, K. Zhang, H. Liu, S. Wang, S. Kwong, andS. Ma, ‘‘Extended quad-tree partitioning for future video coding,’’ in Proc.Data Compress. Conf. (DCC), Mar. 2019, pp. 300–309.

[19] M. Wang, J. Li, L. Zhang, K. Zhang, H. Liu, S. Wang, S. Kwong, andS. Ma, ‘‘Extended coding unit partitioning for future video coding,’’ IEEETrans. Image Process., vol. 29, pp. 2931–2946, 2020.

[20] J. Li, M. Wang, L. Zhang, K. Zhang, H. Liu, S. Wang, S. Ma, and W. Gao,‘‘History-based motion vector prediction for future video coding,’’ in Proc.IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2019, pp. 67–72.

[21] K. Zhang, J. Chen, L. Zhang, X. Li, and M. Karczewicz, ‘‘Enhancedcross-component linear model for chroma intra-prediction in video cod-ing,’’ IEEE Trans. Image Process., vol. 27, no. 8, pp. 3983–3997,Aug. 2018.

[22] J. Li, M. Wang, L. Zhang, K. Zhang, S. Wang, S. Wang, S. Ma, andW. Gao, ‘‘Sub-sampled cross-component prediction for chroma com-ponent coding,’’ in Proc. Data Compress. Conf. (DCC), Mar. 2020,pp. 203–212.

[23] H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand, ‘‘Hybrid video codingwith trellis-coded quantization,’’ in Proc. Data Compress. Conf. (DCC),Mar. 2019, pp. 182–191.

[24] G. J. Sullivan, Adaptive Quantization Encoding Technique Using an EqualExpected-Value Rule, document JVT-N011, Joint Video Team ISO/IECITU-T, Jan. 2005.

[25] H. B. Yin, E.-H. Yang, X. Yu, and Z. Xia, ‘‘Fast soft decision quantiza-tion with adaptive preselection and dynamic trellis graph,’’ IEEE Trans.Circuits Syst. Video Technol., vol. 25, no. 8, pp. 1362–1375, Aug. 2015.

[26] J.Wen,M. Luttrell, and J. Villasenor, ‘‘Trellis-based R-D optimal quantiza-tion in H.263+,’’ IEEE Trans. Image Process., vol. 9, no. 8, pp. 1431–1434,Oct. 2000.

[27] W. Gao and S. Ma, ‘‘An overview of AVS2 standard,’’ Adv. Video CodingSyst., vol. 22, pp. 35–49, Jan. 2014.

[28] T. Nguyen, P. Helle, M. Winken, B. Bross, D. Marpe, H. Schwarz, andT.Wiegand, ‘‘Transform coding techniques in HEVC,’’ IEEE J. Sel. TopicsSignal Process., vol. 7, no. 6, pp. 978–989, Dec. 2013.

[29] M. Xu, T. Nguyen Canh, and B. Jeon, ‘‘Simplified level estimation forrate-distortion optimized quantization of HEVC,’’ IEEE Trans. Broadcast.,vol. 66, no. 1, pp. 88–99, Mar. 2020.

[30] G. Bjøntegaard, Calculation of Average PSNR Differences Between RD-Curves, document ITU-T SG 16 Q.6 VCEG-M33, 2001.

[31] M. Wang, X. Xie, H. Fan, S. Wang, J. Li, S. Dong, G. Xiang, and H. Jia,‘‘Fast rate distortion optimized quantization method for HEVC,’’ in Proc.IEEE Int. Symp. Circuits Syst. (ISCAS), May 2017, pp. 1–4.

[32] Y. Zhang, R. Tian, J. Liu, and N. Wang, ‘‘Fast rate distortion optimizedquantization for HEVC,’’ in Proc. Vis. Commun. Image Process. (VCIP),Dec. 2015, pp. 1–4.

[33] H. Lee, S. Yang, Y. Park, and B. Jeon, ‘‘Fast quantization method withsimplified rate–distortion optimized quantization for an HEVC encoder,’’IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1, pp. 107–116,Jan. 2016.

[34] M. Xu, T. N. Canh, and B. Jeon, ‘‘Simplified level decision inrate-distortion optimized quanitzation,’’ in Proc. IEEE Int. Symp. Broad-band Multimedia Syst. Broadcast. (BMSB), Jun. 2019, pp. 1–4.

[35] M. Xu, T. N. Canh, and B. Jeon, ‘‘Simplified rate-distortion optimizedquantization for HEVC,’’ in Proc. IEEE Int. Symp. Broadband MultimediaSyst. Broadcast. (BMSB), Jun. 2018, pp. 1–6.

[36] S. Li, C. Zhu, Y. Gao, Y. Zhou, F. Dufaux, and M.-T. Sun, ‘‘Lagrangianmultiplier adaptation for rate-distortion optimization with inter-framedependency,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1,pp. 117–129, Jan. 2016.

[37] X. Li, N. Oertel, A. Hutter, and A. Kaup, ‘‘Laplace distribution basedlagrangian rate distortion optimization for hybrid video coding,’’ IEEETrans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 193–205, Feb. 2009.

[38] S. Ma, J. Si, and S.Wang, ‘‘A study on the rate distortion modeling for highefficiency video coding,’’ in Proc. 19th IEEE Int. Conf. Image Process.,Sep. 2012, pp. 181–184.

[39] S. Wang, A. Rehman, Z. Wang, S. Ma, and W. Gao, ‘‘SSIM-motivatedrate-distortion optimization for video coding,’’ IEEE Trans. Circuits Syst.Video Technol., vol. 22, no. 4, pp. 516–529, Apr. 2012.

[40] J. Wang, H. Yin, Z. Gao, and X. Zhang, ‘‘Improved rate distortion opti-mized quantization for HEVC with adaptive thresholding,’’ in Proc. IEEEInt. Symp. Broadband Multimedia Syst. Broadcast. (BMSB), Jun. 2016,pp. 1–4.

[41] J. He, F. Yang, andY. Zhou, ‘‘High-speed implementation of rate-distortionoptimised quantisation for H.265/HEVC,’’ IET Image Process., vol. 9,no. 8, pp. 652–661, Aug. 2015.

[42] J. Cui, S. Wang, S. Wang, X. Zhang, S. Ma, and W. Gao, ‘‘Hybrid laplacedistribution-based low complexity rate-distortion optimized quantization,’’IEEE Trans. Image Process., vol. 26, no. 8, pp. 3802–3816, Aug. 2017.

[43] (2018).HM HEVC Reference Software. [Online]. Available:https://vcgit.hhi.fraunhofer.de/jct-vc/HM/-/tree/HM-16.20

[44] (2014). A free H.265/MPEG-H HEVC Encoder. [Online]. Available:http://www.videolan.org/developers/x265.html

[45] H. Igarashi, F. Takano, T. Takenaka, H. Inoue, and T. Moriyoshi, ‘‘Parallelrate distortion optimized quantization for 4K real-time GPU-based HEVCencoder,’’ in Proc. IEEE Vis. Commun. Image Process. (VCIP), Dec. 2018,pp. 1–4.

[46] H. Wei, W. Zhou, X. Zhou, and Z. Duan, ‘‘An efficient all zero blockdetection algorithm based on frequency characteristics of DCT in HEVC,’’in Proc. Vis. Commun. Image Process. (VCIP), Dec. 2015, pp. 1–4.

[47] M. Wang, X. Xie, J. Li, H. Jia, and W. Gao, ‘‘Fast rate distortion optimizedquantization method based on early detection of zero block for HEVC,’’in Proc. IEEE 3rd Int. Conf. Multimedia Big Data (BigMM), Apr. 2017,pp. 90–93.

[48] H. Wang and S. Kwong, ‘‘Hybrid model to detect zero quantized DCTcoefficients in H.264,’’ IEEE Trans. Multimedia, vol. 9, no. 4, pp. 728–735,Jun. 2007.

[49] H. Wang and S. Kwong, ‘‘Prediction of zero quantized DCT coefficientsin H.264/AVC using Hadamard transformed information,’’ IEEE Trans.Circuits Syst. Video Technol., vol. 18, no. 4, pp. 510–515, Apr. 2008.

[50] J. Cui, R. Xiong, X. Zhang, S.Wang, S.Wang, S.Ma, andW.Gao, ‘‘Hybridall zero soft quantized block detection for HEVC,’’ IEEE Trans. ImageProcess., vol. 27, no. 10, pp. 4987–5001, Oct. 2018.

[51] K. Lee, H.-J. Lee, J. Kim, and Y. Choi, ‘‘A novel algorithm for zero blockdetection in high efficiency video coding,’’ IEEE J. Sel. Topics SignalProcess., vol. 7, no. 6, pp. 1124–1134, Dec. 2013.

[52] H. Fan, R. Wang, L. Ding, X. Xie, H. Jia, and W. Gao, ‘‘Hybrid zero blockdetection for high efficiency video coding,’’ IEEE Trans. Multimedia,vol. 18, no. 3, pp. 537–543, Mar. 2016.

VOLUME 8, 2020 145169


[53] B. Lee, J. Jung, and M. Kim, ‘‘An all-zero block detection scheme forlow-complexity HEVC encoders,’’ IEEE Trans. Multimedia, vol. 18, no. 7,pp. 1257–1268, Jul. 2016.

[54] Z. Liu, L. Li, Y. Song, S. Li, S. Goto, and T. Ikenaga, ‘‘Motion featureand Hadamard coefficient-based fast multiple reference frame motionestimation for H.264,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 18,no. 5, pp. 620–632, May 2008.

[55] X. Zhao, J. Sun, S. Ma, and W. Gao, ‘‘Novel statistical modeling, analysisand implementation of rate-distortion estimation for H.264/AVC coders,’’IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 5, pp. 647–660,May 2010.

[56] S.Wang, S.Wang, X. Zhang, S.Wang, S.Ma, andW. Gao, ‘‘Scalable facialimage compression with deep feature reconstruction,’’ in Proc. IEEE Int.Conf. Image Process. (ICIP), Sep. 2019, pp. 2691–2695.

[57] S. Wang, S. Wang, W. Yang, X. Zhang, S. Wang, S. Ma, and W. Gao,‘‘Towards analysis-friendly face representation with scalable feature andtexture compression,’’ CoRR, vol. abs/2004.10043, pp. 1–7, Oct. 2020.

[58] S. Wang, W. Yang, and S. Wang, ‘‘End-to-end facial deep learn-ing feature compression with teacher-student enhancement,’’ CoRR,vol. abs/2002.03627, pp. 1–5, May 2020.

[59] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, ‘‘Variationalimage compression with a scale hyperprior,’’ in Proc. 6th Int. Conf. Learn.Represent., Vancouver, BC, Canada, Apr.2018, pp. 1–4.

[60] J. Ballé, V. Laparra, and E. P. Simoncelli, ‘‘End-to-end optimized imagecompression,’’ in Proc. 5th Int. Conf. Learn. Represent., Toulon, France,Apr. 2017, pp. 1–8.

[61] Z. Chen, L.-Y. Duan, S. Wang, Y. Lou, T. Huang, D. O. Wu, and W. Gao,‘‘Toward knowledge as a service over networks: A deep learning modelcommunication paradigm,’’ IEEE J. Sel. Areas Commun., vol. 37, no. 6,pp. 1349–1363, Jun. 2019.

[62] Y. Xu, Y. Wang, A. Zhou, W. Lin, and H. Xiong, ‘‘Deep neural networkcompression with single and multiple level quantization,’’ in Proc. 32ndAAAI Conf. Artif. Intell., Sheila A. McIlraith and Kilian Q. Weinberger,Eds., 2018, pp. 4335–4342.

[63] Y. Gong, L. Liu, M. Yang, and L. D. Bourdev, ‘‘Compressing deep convo-lutional networks using vector quantization,’’ CoRR, vol. abs/1412.6115,pp. 1–8, Oct. 2014.

MENG WANG (Graduate Student Member,IEEE) received the B.S. degree (Hons.) in elec-tronic information engineering from China Agri-cultural University, Beijing, China, in 2015, andthe M.S. degree in computer application technol-ogy from Peking University, Beijing, in 2018. Sheis currently pursuing the Ph.D. degree with theDepartment of Computer Science, City Univer-sity of Hong Kong, Hong Kong. She has been aResearch Intern with Bytedance Inc., since 2017.

Her research interests include data compression and image/video coding.

XIAOHAN FANG is currently pursuing the B.S.degree in computer science with the Dalian Uni-versity of Technology.

SONGCHAO TAN received the B.S. and Ph.D.degrees in computer science and technology fromthe Dalian University of Technology, Dalian,China. He currently holds a postdoctoral posi-tion with Peking University. His research interestsinclude video coding and quality assessment.

XINFENG ZHANG received the B.S. degree incomputer science from the Hebei University ofTechnology, Tianjin, China, in 2007, and the Ph.D.degree in computer science from the Institute ofComputing Technology, Chinese Academy of Sci-ences, Beijing, China, in 2014. He is currentlyan Assistant Professor with the School of Com-puter Science and Technology, University of Chi-nese Academy of Sciences. He has authored morethan 100 refereed journal/conference papers. His

research interests include video compression, image/video quality assess-ment, and image/video analysis. He received the Best Paper Award of theIEEE MULTIMEDIA, in 2018, the Best Paper Award at the 2017 Pacific-RimConference on Multimedia (PCM), and the Best Student Paper Award in theIEEE International Conference on Image Processing, in 2018.

LI ZHANG (Member, IEEE) received the B.S.degree in computer science from Dalian MaritimeUniversity, Dalian, China, in 2003, and the Ph.D.degree in computer science from the Institute ofComputing Technology, Chinese Academy of Sci-ences, Beijing, China, in 2009.

From 2009 to 2011, she held a postdoctoralposition at the Institute of Digital Media, PekingUniversity, Beijing. From 2011 to 2018, she wasa Senior Staff Engineer with the Multimedia

Research and Development and Standards Group, Qualcomm, Inc., SanDiego, CA, USA. She is currently the Lead of the Video Coding StandardTeam, Bytedance Inc., San Diego. Her research interests include 2D/3Dimage/video coding, video processing, and transmission. She was a Soft-ware Coordinator for Audio and Video Coding Standard (AVS) and the 3Dextensions of High EfficiencyVideo Coding (HEVC). She has authoredmorethan 350 standardization contributions, more than 100 granted U.S. patents,and more than 60 technical articles in related book chapters, journals, andproceedings in image/video coding and video processing. She has been anactive contributor to the Versatile Video Coding, Advanced AVS, the IEEE1857, 3D Video (3DV) coding extensions of H.264/AVC and HEVC, andHEVC screen content coding extensions. During the development of thosevideo coding standards, she has co-chaired several ad hoc groups and coreexperiments. She has been appointed as an Editor of AVS and theMain Editorof the Software Test Model for 3DV Standards.

145170 VOLUME 8, 2020

low complexity quantization in high efficiency video coding

Documents