learning adaptive interpolation kernels for fast single-image super resolution

10
SIViP DOI 10.1007/s11760-014-0634-7 ORIGINAL PAPER Learning adaptive interpolation kernels for fast single-image super resolution Xiyuan Hu · Silong Peng · Wen-Liang Hwang Received: 30 June 2013 / Revised: 30 December 2013 / Accepted: 17 March 2014 © Springer-Verlag London 2014 Abstract This paper presents a fast single-image super- resolution approach that involves learning multiple adap- tive interpolation kernels. Based on the assumptions that each high-resolution image patch can be sparsely repre- sented by several simple image structures and that each structure can be assigned a suitable interpolation kernel, our approach consists of the following steps. First, we cluster the training image patches into several classes and train each class-specific interpolation kernel. Then, for each input low- resolution image patch, we select few suitable kernels of it to make up the final interpolation kernel. Since the proposed approach is mainly based on simple linear algebra compu- tations, its efficiency can be guaranteed. And experimen- tal comparisons with state-of-the-art super-resolution recon- struction algorithms on simulated and real-life examples can validate the performance of our proposed approach. This work was supported by the National Natural Science Foundation of China (61032007, 61101219, 61201375) and the National High Technology R&D Program of China (863 Program) (Grant No. 2013AA014602). X. Hu · S. Peng (B ) Institute of Automation, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China e-mail: [email protected] X. Hu e-mail: [email protected] W.-L. Hwang Institute of Information Science, Academia Sinica, Taipei 11529, Taiwan, ROC e-mail: [email protected] W.-L. Hwang Department of Information Management, Kainan University, Taoyuan County 33857, Taiwan, ROC Keywords Single-image super resolution · Dual dictionary learning · Sparse representation · Learning multiple interpolation kernels 1 Introduction The image super-resolution reconstruction (SRR) problem has attracted a great deal of attention in recent decades because of its broad application in fields such as broad- cast television and video surveillance. Approaches for recov- ering the high-resolution (HR) image from a single low- resolution (LR) image can be divided into two main cate- gories: reconstruction-based approaches and learning-based methods [1]. Reconstruction-based approaches [28] treat SRR as an inverse problem built on the image degradation model. Since the information provided in a single-input LR image is very limited, these approaches need to impose some strong priors or constraints on that ill-posed inverse problem. To overcome the limitations of reconstruction-based algo- rithms, machine learning-based techniques have been pro- posed [915]. They reconstruct the HR image by adding some details to the input LR image, which are derived from the coherence between corresponding LR and HR image patches that were already known. This type of algorithm usually con- sists of two steps: (1) capturing the coherence from a training data set that includes both LR and HR image patches; and (2) predicting the details of the HR image through such predic- tion methods as Markov random field [9, 10] or locally linear embedding [16]. Nevertheless, learning to recognize local geometric structures directly in the LR and HR patch spaces, as described in [9, 10], requires an enormous number of LR and HR patch pairs in the training data set. As a result, these methods lead to very high computational cost and cannot meet real-time processing demand. To reduce the size of the 123

Upload: wen-liang

Post on 20-Jan-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Learning adaptive interpolation kernels for fast single-image super resolution

SIViPDOI 10.1007/s11760-014-0634-7

ORIGINAL PAPER

Learning adaptive interpolation kernels for fast single-imagesuper resolution

Xiyuan Hu · Silong Peng · Wen-Liang Hwang

Received: 30 June 2013 / Revised: 30 December 2013 / Accepted: 17 March 2014© Springer-Verlag London 2014

Abstract This paper presents a fast single-image super-resolution approach that involves learning multiple adap-tive interpolation kernels. Based on the assumptions thateach high-resolution image patch can be sparsely repre-sented by several simple image structures and that eachstructure can be assigned a suitable interpolation kernel, ourapproach consists of the following steps. First, we clusterthe training image patches into several classes and train eachclass-specific interpolation kernel. Then, for each input low-resolution image patch, we select few suitable kernels of itto make up the final interpolation kernel. Since the proposedapproach is mainly based on simple linear algebra compu-tations, its efficiency can be guaranteed. And experimen-tal comparisons with state-of-the-art super-resolution recon-struction algorithms on simulated and real-life examples canvalidate the performance of our proposed approach.

This work was supported by the National Natural Science Foundationof China (61032007, 61101219, 61201375) and the National HighTechnology R&D Program of China (863 Program) (Grant No.2013AA014602).

X. Hu · S. Peng (B)Institute of Automation, Chinese Academy of Sciences,Beijing 100190, People’s Republic of Chinae-mail: [email protected]

X. Hue-mail: [email protected]

W.-L. HwangInstitute of Information Science, Academia Sinica,Taipei 11529, Taiwan, ROCe-mail: [email protected]

W.-L. HwangDepartment of Information Management, Kainan University,Taoyuan County 33857, Taiwan, ROC

Keywords Single-image super resolution ·Dual dictionary learning · Sparse representation ·Learning multiple interpolation kernels

1 Introduction

The image super-resolution reconstruction (SRR) problemhas attracted a great deal of attention in recent decadesbecause of its broad application in fields such as broad-cast television and video surveillance. Approaches for recov-ering the high-resolution (HR) image from a single low-resolution (LR) image can be divided into two main cate-gories: reconstruction-based approaches and learning-basedmethods [1]. Reconstruction-based approaches [2–8] treatSRR as an inverse problem built on the image degradationmodel. Since the information provided in a single-input LRimage is very limited, these approaches need to impose somestrong priors or constraints on that ill-posed inverse problem.

To overcome the limitations of reconstruction-based algo-rithms, machine learning-based techniques have been pro-posed [9–15]. They reconstruct the HR image by adding somedetails to the input LR image, which are derived from thecoherence between corresponding LR and HR image patchesthat were already known. This type of algorithm usually con-sists of two steps: (1) capturing the coherence from a trainingdata set that includes both LR and HR image patches; and (2)predicting the details of the HR image through such predic-tion methods as Markov random field [9,10] or locally linearembedding [16]. Nevertheless, learning to recognize localgeometric structures directly in the LR and HR patch spaces,as described in [9,10], requires an enormous number of LRand HR patch pairs in the training data set. As a result, thesemethods lead to very high computational cost and cannotmeet real-time processing demand. To reduce the size of the

123

Page 2: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

training set, Freedman and Fattal [14] proposed a real-timeimage and video upscaling method based on a local multi-scale self-similarity assumption. Although this assumptionallows retaining sharp edges and corners in the HR image,it cannot deal with textures well because textures violatethat assumption. Instead of learning in the raw patch spacesdirectly, Yang et al. [11,12] proposed a sparse representation-based super-resolution approach through learning two over-complete dictionaries to represent the LR and HR imagepatches at the same time from the training images. The sparserepresentation-based SRR algorithm assumes that the corre-sponding LR and HR image patches can be represented by thesame sparse coefficients vector with respect to their respec-tive overcomplete dictionaries. However, since the sparserepresentation-based optimization is also a nonlinear modelin the reconstruction step, the computational burden is stillvery high.

To accelerate the sparse representation-based SRR method,Zhang et al. [17] proposed a dual learning-based approachthat learns an overcomplete dictionary D and its dual matrixC simultaneously. Therefore, in the reconstruction step, thisalgorithm can simplify the nonlinear �1-norm optimizationto a matrix multiplication process. Albeit the dual learningmethod is about 10 times faster than the original sparsity-based algorithm, its reconstruction accuracy is worse thanthat sparsity-based one. This is because that the dual learning-based method cannot ensure that the product of the dictio-nary D and its dual matrix C is an identity matrix. We willset aside discussion of this method because it can be viewedas an interpolation model with a fixed kernel, which will bediscussed later.

In this work, we propose a fast single-image SRRapproach by learning multiple adaptive kernels to improveupon the dual learning-based algorithm. Similar to otherlearning-based approaches, we also adopt a patch-by-patchprocessing style for the input image. Our basic idea is that aninput LR image patch can be comprised of several distinctivestructures, and each structure should use the correspondinginterpolation kernel to recover the HR image patch. Thus,for each LR image patch, we build its own unique interpo-lation kernel via a combination of those kernels correspond-ing to each structure. For ease of presentation, we use boldupper case, e.g., A, to represent matrices; and bold lowercase, e.g., a, to represent vectors. More precisely, we firstcluster the training image patches into M classes accord-ing to the trained HR dictionary Dh , and for each classPi , we train its individual interpolation kernel as K i . Andfor a given LR image patch xl , we assume that it can besparsely represented by several simple structures. Then, weuse the interpolation kernels from the first N (N � M) rel-evant classes of that LR patch to construct the interpolationkernel as K = ∑N

i=1 wi K i (with wi > 0), where wi andK i correspond to the relevant degree and interpolation ker-

nel of the i th class, respectively. Finally, the estimated HRimage patch can be derived by xh = K xl . Since the i thinterpolation kernel K i reflects the correlation between LRand HR image patches in the i th class, our approach can beviewed as an adaptive interpolation technique that, for eachLR image patch, its own interpolation kernel is built on thosetrained kernels K i . For selecting relevant classes, we adoptthe PADDLE algorithm [18] to compute the relevance matrixC, which is similar to the method in [17]. Then, after apply-ing matrix C to the LR image patch xl , we choose only thefirst several significant coefficients to generate the interpola-tion kernel, which reflects the most significant structures ofthe LR image patch. Since our approach reconstructs the HRimage by directly applying adaptive interpolation kernels tothe LR image patch, it is much faster than most learning-based algorithms and can achieve more realistic textures.

The remainder of this paper is organized as follows. InSect. 2, we review the sparse representation-based SRR algo-rithm and its efficient implementation. In Sect. 3, we presentour multiple-interpolation-kernel learning procedure, and inSect. 4, we propose our fast single-image SRR algorithm. InSect. 5, we report experiments using several SRR algorithmson simulated and real-life images and discuss the results.Finally, Sect. 6 contains some concluding remarks.

2 Review of dual learning-based SRR method

Inspired by the assumption that the HR and LR image patchesshare the same underlying sparse representation, Yang et al.proposed a sparse representation-based SRR method [12]. Inthis approach, given an HR and LR dictionary pair {Dh, Dl}and an LR image X , the HR image Y can be reconstructedthrough a patch-by-patch processing style in which each HRimage patch can be derived by yi = Dhαi , where yi andαi are the i th patch of the HR image Y and its coefficientsvector, respectively. The coefficients of the i th HR imagepatch αi can be derived by solving the following equation

αi = arg minαi

‖Dlαi − xi‖2F + λ‖αi‖1, (1)

where xi is the i th patch of the corresponding LR image. Toretain the coherence of the HR and LR dictionaries {Dh, Dl},Yang et al. [12] proposed a joint dictionary learning strategyby optimizing the following equation

{Dh, Dl , α

}= arg min

Dh ,Dl ,α

∥∥∥X − Dα

∥∥∥

2

F+ λ‖α‖1, (2)

where X =[

1√M

XTh , 1√

NXT

l

]Tconsists of HR and LR

image patch pairs extracted from the training database, and

123

Page 3: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

D =[

1√M

DTh , 1√

NDT

l

]Tconsists of HR and LR dictionar-

ies.The sparse representation-based approach can achieve

better results than the previously proposed raw patch learning-based SRR algorithms [10,11]. However, in the reconstruc-tion procedure of the sparse representation-based algorithm,for each input LR image patch, we need to solve a nonlinearoptimization problem as in Eq. (1) which makes this proce-dure very time-consuming. To make it faster, Zhang et al.[17] applied the dual learning method. The basic idea of duallearning is to find a dual matrix of dictionary Dl to avoidsolving the nonlinear optimization problem in Eq. (1). Foran overcomplete dictionary Dl ∈ Rn×K (K > n), its dualmatrix Cl ∈ RK×n is a matrix that satisfies Cl Dl = I ,where I denotes the identity matrix. Then, given the dictio-nary Dl and its dual matrix Cl , the solution αi of Eq. (1) canbe derived directly by αi = Cl xi . Consequently, for an inputLR image patch xi , its recovered HR image patch yi can bederived via

yi = Dhαi = Dh (Cl xi ) . (3)

Therefore, the dual learning-based algorithm [17] is muchfaster (about 10 times) than the sparse representation-basedalgorithm. To find a dual matrix of the dictionary Dl , Zhanget al. modified the PADDLE algorithm [18] to solve the fol-lowing optimization model

{Dh, Dl , Cl , Z

}= arg min

Dh ,Dl ,Cl ,Z‖Xc − Dc Z‖2

F

+ η‖Z − Cl X l‖2F + λ‖Z‖1, (4)

where Xc =[

1√M

XTh , 1√

NXT

l

]Tconsists of HR and LR

image patches, Dc =[

1√M

DTh , 1√

NDT

l

]Tconsists of HR

and LR dictionaries, and Cl is the dual matrix of Dl . Throughsolving the optimization model in the above equation, thecoupled dictionary pair {Dh, Dl} and the dual matrix of Dl ,denoted as Cl , can be acquired.

However, in practice, the optimization model in Eq. (4)cannot ensure that Cl Dl is precisely an identity matrix, whichmeans that the coefficients αi derived by Cl xi are only anapproximation of the solution in Eq. (1). Furthermore, as inEq. (3), if we denote K = Dh Cl , the recovered HR imagepatch can be formulated as yi = K xi . That is, in the duallearning-based SRR approach [17], the matrix K is fixed afterthe training process as in Eq. (4), after which the HR imagepatches are recovered by interpolating the LR image patcheswith a fixed interpolation kernel K . But for real-life images,since the structures in every image patch are quite different,applying a fixed interpolation kernel to interpolate all the LRimage patches seems infeasible.

3 Dictionary and interpolation kernel learning

To overcome the disadvantage of the dual learning-basedSRR approach which uses a fixed interpolation kernel for allthe LR image patches, we synthesize an individualized inter-polation kernel for each LR image patch. Here, we focus onthe following two topics: (1) clustering the training databaseof LR and HR image patches into several classes; and (2)learning the interpolation kernel of each class. In the nextsection, we will discuss the synthesis mechanism of interpo-lation kernels and fast image SRR.

Given the sampled image patch pairs P ={Xh, X l}, where

Xh ={

x(1)h , x(2)

h , . . . , x(n)h

}and X l =

{x(1)

l , x(2)l , . . . , x(n)

l

}

denote the set of sampled HR image patches and its corre-sponding set of LR image patches, our goal is to cluster theimage patch pairs P into subsets P1, P2, . . . , PM , where theimage patches in the same subset have similar simple struc-tures. Because the training image patch pairs set is extremelylarge (usually more than 100,000 patch pairs, with up to 30–50 dimensions), directly applying the k-nearest neighbor(k-NN) classification algorithm to such a training set would beinfeasible [19]. However, according to sparse representationtheory, the natural image patches can be sparsely representedby an overcomplete dictionary in which atoms can reflectsimple structures. Hence, we transform the image clusteringproblem to the dictionary learning process. When the HRdictionary has been derived, we use each atoms in it as thecenter of each image patch pairs subset and apply the k-NNclassification algorithm to the whole image training set.

Traditionally, we can use Eq. (2) to train a coupled HRand LR dictionary (Dh and Dl ), and then, for an input LRimage patch xl , the sparest representation coefficients α usedto recover the HR image patch xh can be derived by solvingthe following optimization problem

min ‖α‖0 s.t. ‖Dlα − xl‖22 ≤ ε. (5)

Although the NP-hard optimization problem in Eq. (5) canbe converted to an equivalent �1-norm minimization prob-lem [20,21], solving such a nonlinear optimization problemis very time-consuming. Thus, Zhang et al. [17] proposed anefficient way to solve the nonlinear optimization problem inEq. (1) through training the dual matrix of the LR dictionaryDl (as shown in Eq. (4)). But here, we formulate our dictio-nary training process by optimizing the following problem

{Dh, C, Z

}= arg min

Dh ,C,Z‖Xh − Dh Z‖2

F

+η‖Z − C X l‖2F + λ‖Z‖1, (6)

where Dh , C, Z, Xh and X l denote the HR dictionary, therelevance matrix, the coefficients of each LR patch, the set of

123

Page 4: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

HR image patches, and the set of LR image patches, respec-tively.

Although our model seems similar to Eq. (4), they aredifferent. In Eq. (4), the LR dictionary Dl has nothing to dowith the image SRR because the HR image patch is recoveredonly by matrices Dh and C , as shown in Eq. (3). In addition,incorporating the optimization of Dl in Eq. (4) could bringsome error to the computation of coefficients Z which willlower the accuracy of matrix C and the convergence rate.In our image super-resolution algorithm, we do not need theLR dictionary either. Therefore, in our model, we directlytrain a relevance matrix C and an HR dictionary Dh so thatwhen we multiply matrix C to an LR image patch xl toderiving the coefficients α, they can be used to approximatethe HR image patch xh with an HR dictionary Dh . Thesetwo conditions correspond to the first and the second term inEq. (6), respectively.

After the HR dictionary Dh ∈ RM×N has been obtained,

we assign each image patch pair{

x( j)l , x( j)

h

}to one of

the nearest atoms d(i)h according to its Euclidian distance

to x( j)h computed as ‖x( j)

h − d(i)h ‖2. Since the image patch

pairs are sampled from the natural images, there exist someHR image patches that are not close to any atom in Dh .Therefore, a threshold T is needed to exclude those HRimage patches. That is, an image patch pair (x( j)

l , x( j)h )

in the i th patch pair set Pi should satisfy both condi-

tions∥∥∥x( j)

h − d(i)h

∥∥∥

2<

∥∥∥x( j)

h − d(k)h

∥∥∥

2for k �= i, k =

1, 2, . . . , N and∥∥∥x( j)

h − d(i)h

∥∥∥

2< T . After finding the K-

nearest image patches to the i th atom d(i)h , we denote the i th

set of image patch pairs as Pi = {Xhi , X li }, where Xhi andX li are the HR and LR image patches in class Pi , respec-tively. For each class Pi , whose image patches have the sim-ilar structure to the atom d(i)

h , we compute its interpolationkernel K i by solving the following least square(LS) problem

K i = arg minK i

‖Xhi − K i X li ‖2F . (7)

Here, we assume that the number of patch pairs in each classis considerably great larger than the number of pixels in theimage patch. This ensures that the optimization problem inEq. (7) is overdetermined. Then, we use the gradient descentalgorithm to solve it with the following update formula

K n+1i = K n

i + 1

σi

(Xhi − K n

i X li

)XT

li , (8)

where n denotes the nth iteration and σi = 2∥∥∥X li XT

li

∥∥∥

2

Fis

the step size.

3.1 Some implementation details

For building the training image patch pairs set P , each raw

patch pair(

x(i)l , x(i)

h

)is randomly selected from training

images that contain both LR and HR. To keep the correspon-dence of each LR and HR patch pair, they should be sampledat the same position on the LR and HR images. Thus, for agiven image database

{X(1), X (2), . . . , X (n)

}, we construct

the training patch pairs set as follows. First, we build the

LR and HR image pairs{

X(i)l , X (i)

h

}by downsampling and

upsampling of the original image. That is, X(i)h = X(i) and

X(i)l =↑ (↓ X(i)

), where ↑ and ↓ denote upsampling and

downsampling by bicubic interpolation, respectively. Then,each patch pair can be acquired by choosing randomly from

the image pair{

X(i)l , X (i)

h

}at the same position with the

same size. Finally, in order to preserve the consistency of theinterpolation kernel K i , we remove the mean value of eachLR and HR patch pair to training the interpolation kernel.

Accordingly, if the size of the LR patch is k × k, ourinterpolation kernel K is of size k2 × k2. Ideally, if the sizeof training patch pair set P is extremely large, we can findenough patch pair samples for each patch pair subset Pi toensure that the LS problem in Eq. (7) is overdetermined.However, in practice, for some specific subset Pi , there maynot exist enough patch pairs to ensure that its sample size islarger than the number of pixels in the HR image patch, whichcauses the LS problem in Eq. (7) to be under-determined. Inthis case, we just set the kernel of this subset to be an identitymatrix rather than estimate it by solving Eq. (7).

4 Fast image super-resolution reconstruction

Our reconstruction approach exploits the underlying sparsestructures of the input image patches. Since each structurecan be used to generate a customized interpolation kernel,the interpolation kernel applied to a given LR image patchshould consist of the interpolation kernels of the underlyingstructures that make up the LR image patch. In the trainingstep, the interpolation kernel of each structure (correspond-ing to each image patch pair class) has been derived. Then, inthe reconstruction step, we adopt a patch-by-patch process-ing style. For an input LR image patch xl , we first find itsunderlying sparse structures, and then, recover the HR imagepatch xh by interpolating the LR image patch using a kernelthat is computed according to the weighted average of thekernels of these structures as

xh = K xl =(

M∑

i=1

wi K i

)

xl (9)

123

Page 5: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

with wi > 0 and∑M

i=1 wi = 1. The weight wi reflects thecorrelation between the input LR image patch xl and the i thimage patch class Pi . As described in the training process,when we multiply the trained matrix C to the LR image patchxl , the sparse coefficients of xl , denoted as α, can be derived.The absolute value of the i th coefficient αi in α can reflectthe relevance of the current LR image patch xl to the i th classPi . Therefore, the weight of the i th interpolation kernel aswi can be computed by wi = |αi |∑M

i=1 |αi | .Theoretically, most of the elements in the weight vector

w should equal to zero because of the sparsity of α. How-ever, since much of the detail in the HR image patch in thetraining set has been wiped in its corresponding LR imagepatch, the relevance matrix C may not have sufficient accu-racy. Consequently, for every input LR image patch xl , therecovered coefficients α cannot always maintain the sparsity.But the significant structures in the LR image patch can becharacterized by those coefficients that have significant val-ues. Therefore, in practice, after using the matrix C to com-pute the coefficients α of an input LR image patch, we needchoose only the first N significant coefficients of α that havethe largest absolute values to compute the value of weightwi . Since our approach is a patch-by-patch based reconstruc-tion algorithm, enforcing a global reconstruction constraint(as described in [12]) can also improve the reconstructedresults. But, in our approach, because the reconstructed HRimage patch is derived through interpolating from the LRimage patch and not simply combining several atoms in theHR dictionary, the improvement brought about by the globalreconstruction constraint is much smaller than it in [12]. Also,at the beginning of our SRR algorithm, the LR image X l is theupsampled version of the original input LR image by bicubicinterpolation because LR image patches in the training set arederived by upsampling the downsampled HR image patches.The overall proposed fast image SRR process is summarizedas algorithm 1.

5 Experimental results

In this section, we evaluate both the SRR quality and compu-tational cost of the proposed method through experimentingon various simulated and real-life images.

5.1 Experimental settings

In all of our experiments, we evaluated the image SR resultsfor three times magnification; thus, as described in Sect. 4,the input LR image was firstly upsampled three times usingbicubic interpolation. In the training process, the parametersfor learning the relevance matrix C and interpolation kernelsK i were set as follows. To build the training image patch set,

Algorithm 1 Fast Learning-based SRR algorithm

1: Input a relevance matrix C ∈ RM×N , a set of interpolation kernelsK i with i = 1, 2, · · · , M , an LR image X l .

2: for each 7 × 7 patch xl of X l do3: Compute the mean pixel value m of patch xl ;4: Compute the coefficients α by multiplying matrix C to the image

patch xl as α = Cxl ;5: Choose the first N largest coefficients to compute the weight wi

and the interpolation kernel can be generated by K =N∑

i=1wi K i ;

6: Generate the HR image patch xh = K xl + m;7: end for8: Using the global reconstruction constraint to update the recon-

structed HR image;9: return Super resolution reconstructed image Xh .

Remarks. In step 5, the choice of parameter N will be discussed inSection 5.1. Step 8 uses a back projection style as used in [12] to retainthe global constraints.

we randomly selected 100, 000 patches, with patch size of7 × 7 pixels, from natural images. The number of atoms inour dictionary was set to be 128. Then, through optimizingEq. (6), an HR dictionary Dh ∈ R49×128 together with arelevance matrix C ∈ R128×49 were obtained. The learnedHR dictionary Dh is shown in Fig. 1. After the dictionaryhad been trained, we used the k-NN algorithm to clusterthe HR and LR image patch pairs in our training set into128 classes. Since all the image patches had been normal-ized between 0 and 1, the threshold we used in our clusterprocess was T = 0.015. For each patch pair class, its inter-polation kernel was learned according to Eq. (8). The leftand right columns in Fig. 2 show the trained interpolationkernels of classes #9, #18, #37, and #90, and some of thecorresponding HR image patches in these classes. It showsthat the image patches in the same class are quite similar, andthe interpolation kernels of different classes have differentstructures.

In the reconstruction process of our algorithm, two para-meters need to be determined: the number of un-thresholdedcoefficients used for recovering the interpolation kernel andthe overlapping pixels of each image patch. The former para-

Fig. 1 HR dictionary Dh trained via Eq. (6)

123

Page 6: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

Fig. 2 Trained interpolation kernels and image patches. Left columnInterpolation kernels; right column the first 90 image patches in oneclass. From top row to bottom the kernels and some of the image patchesin the 9th, 18th, 37th, and 90th class, respectively

meter reflects the effect of sparsity on the recovered image;the latter reflects the effect of global constraints. Figure 3illustrates the effects of these two parameters on the result-ing images. The top sub-figure in Fig. 3 shows that when thenumber of un-thresholded coefficients used is larger than 10,the more sub-kernels used for composing the final interpo-lation kernel, and the lower the derived PSNR value. Thisproperty validates the sparse representation assumption inthat using a few structures can represent the image patchwell. However, the relevance matrix C solved from Eq. (6)cannot ensure that most of the small coefficients derived byCxl are close to zero. Thus, the more coefficients we used,the smoother the recovered, which decreased the PSNR val-ues. The bottom sub-figure in Fig. 3 reflects the effects ofoverlapping pixels of each image patch on the result image.Although having fewer overlapping pixels results in lowerPSNR values, the PSNR value does not decrease very much,because the difference of PSNRs between 0 and 6 pixels over-lapping is <0.5 dB. Because our algorithm directly appliesthe interpolation kernels on the LR image patches, our algo-rithm can retain the global constraints of an image better thanthe method in [12].

To facilitate comparison with other algorithms, we usedfour sub-kernels and six pixels overlapping in our algorithmfor all of the following SRR experiments. Also, for process-ing color images, we first transformed the color images into

20 40 60 80 100 12029

30

31

32

33

34

35

number of un−thresholded coefficients used

PS

NR

Lena MKLLena bicubicBoat MKLBoat bicubicBeacon MKLBeacon bicubic

0 1 2 3 4 5 629

30

31

32

33

34

35

number of overlapping pixels

PS

NR

Lena MKLLena bicubicBoat MKLBoat bicubicBeacon MKLBeacon bicubic

Fig. 3 The effects of number of un-thresholded coefficients used andpixel overlapping on PSNRs of the recovered images: Lena, Boat, andBeacon as shown in Fig. 4. Top When the number of sub-kernels usedto build the interpolation kernel is larger than 10, the more kernels used,the lower the PSNR values. Bottom The fewer overlapping pixels, thelower the PSNR value; however, the PSNR value does not decrease fast

Fig. 4 Nine input color images used for the experiment. From topleft to right bottom Hydrange, Beacon, Parrot, Baboon, Barbara, Boat,Flight, Goldhill, and Lena (color figure online)

YCbCr space. Then, we applied our algorithm and bicubicinterpolation to the illuminance channel (Y) and color layers(Cb, Cr) of the images, respectively.

123

Page 7: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

Table 1 Comparison of RMSE and SSIM of recovered images derived by different methods (magnification factor is 3)

Index & methods RMSE SSIM

Images (size) Bicubic Dual-SR SP-SR MKL-SR Bicubic Dual-SR SP-SR MKL-SR

Hydrangeas (171×171) 6.101 5.179 4.852 4.783 0.946 0.970 0.974 0.974

Beacon (171×171) 8.786 8.180 8.094 8.041 0.938 0.951 0.952 0.954

Parrot (171×171) 10.555 9.787 9.540 9.409 0.936 0.946 0.949 0.951

Baboon (171×171) 17.510 16.979 16.894 16.723 0.802 0.840 0.841 0.850

Barbara (171×171) 13.381 13.139 13.107 13.090 0.859 0.871 0.872 0.873

Boat (171×171) 7.700 7.020 6.848 6.786 0.935 0.950 0.953 0.954

Flight (171×171) 8.490 7.813 7.587 7.497 0.959 0.969 0.971 0.972

Flower (110×57) 3.519 3.267 3.166 3.089 0.917 0.925 0.929 0.930

Girl (85×86) 5.909 5.570 5.509 5.419 0.799 0.817 0.819 0.824

Goldhill (171×171) 12.366 11.814 11.731 11.669 0.865 0.890 0.891 0.894

Lena (171×171) 6.312 5.665 5.488 5.330 0.957 0.966 0.968 0.970

Racoon (109×100) 9.723 9.268 9.147 9.017 0.724 0.755 0.755 0.763

Parthenon (153×98) 12.712 12.075 11.876 11.853 0.697 0.722 0.727 0.730

Bold values mean that these values are better than the other values

5.2 Results and discussion

In this subsection, we compare the performance of our pro-posed multiple kernel learning-based SR approach1 (MKL-SR) with some recent SR methods using both simulated andreal-life images. In the simulated examples, we will quanti-tatively compare the speed, root mean square error (RMSE),and structural similarity (SSIM) [22] values of our proposedMKL-SR algorithm against several learning-based super-resolution methods. The simulated LR images were deriveddirectly by using bicubic interpolation to downsample fromthe HR images. Since the image degradation model may dif-fer among super-resolution methods, to compare those meth-ods under the same image degradation model in the simulatedexperiments is unfair. Therefore, in the simulated examples,we compared MKL-SR only to the methods that have thesame image model: sparse representation-based SR approach(SP-SR) [12] and dual learning-based SR method (Dual-SR)[17]. We used three pictures in [12] (named as Flower, Girl,and Racoon) and nine other images (as shown in Fig. 4) toset up our simulated downsampled image base.

The RMSE and SSIM values of recovered HR imagesderived by different algorithms are shown in Table 1. Thesedata show that that all of the RMSE and SSIM values of ourmethod are slightly better than of the SP-SR algorithms2.

1 The source codes of our proposed SRR approach can be downloadedat http://mda.ia.ac.cn/people/huxy/oproj/fastsrr.htm.2 The codes for the SP-SR method used for comparison can bedownloaded from the authors’ homepage at http://www.ifp.illinois.edu/~jyang29/resources.html. The RMSE and SSIM values of picturesflower and girl are copied from [12] directly.

Table 2 Comparison of the computational cost (in seconds)of different methods with images of different sizes (magnification factoris 3)

Images Size SP-SR MKL-SR Speedup

Hydrangeas 171×171 134.6 12.13 11.0

Beacon 171 × 171 145.1 9.22 15.7

Boat 171 × 171 163.5 11.46 14.2

Flower 110 × 57 35.1 2.37 14.7

Girl 85 × 86 40.8 3.82 10.7

Racoon 109 × 100 64.8 6.66 9.6

Parthenon 153 × 98 86.6 7.08 12.1

Because the Dual-SR approach is an approximation of theSP-SR method, the RMSE and SSIM of its results are worsethan of the SP-SR method. The comparison of the compu-tational time of our proposed MKL-SR method and SP-SRalgorithm applied to images with different sizes is shownin Table 2. In our experiments, all the simulations were exe-cuted with MATLAB programs on Windows 7 running on anIntel Core i5 CPU at 2.8 GHz and with 4 GB RAM. Table 2reveals that MKL-SR is about 10 times faster than SP-SR. Wedid not compare the computational cost between MKL-SRand Dual-SR because we implemented the latter ourselves,which might have affected its time cost inaccurate. However,since the speedup between Dual-SR and SP-SR is provided in[17], we can infer that the efficiency of MKL-SR approach iscomparable with that of Dual-SR algorithm. We also com-pare the image details of recovered images in Fig. 5. Thisindicates that the textures derived by MKL-SR are compa-

123

Page 8: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

Fig. 5 Comparison of single-image super-resolution results derived by different methods with magnification factor of 3. From top to bottom areimages flower, racoon, and boat. From left to right bicubic interpolation, Dual-SR [17], SP-SR [12], and our proposed MKL-SR algorithm

rable to those under Dual-SR and SP-SR, whereas the edgesderived by MKL-SR are sharper than those under the othertwo methods.

For the real-life image super-resolution experiments,because we do not have the real HR images, we comparethe visual quality of our proposed algorithm with otherrecent state-of-the-art methods. Figure 6 compares the resultsderived by our methods with the results of Shan et al. [23],Freedman and Fattal [14], Takeda et al. (kernel regression,KR) [4], and the SP-SR approach [12]. The small images girland koala were taken from the web page provided in [14]. Foreach image, the methods of Shan et al. and Freedman et al.can generate very sharp edges. But for the fine-detailed tex-ture areas, those methods yield recovered HR image patcheslook unrealistic and somewhat faceted. The basic idea behindkernel regression is quite similar to our approach, since bothof them try to find adaptive kernels in different regions tointerpolate the LR image. However, the results derived bykernel regression appear too smooth, and many detailed tex-tures have been lost. Our proposed algorithm can retain

sharp edges and rich textures; moreover, it is more effi-cient than the sparse representation-based super-resolutionalgorithm.

6 Conclusion and future works

We have proposed an efficient single-image super-resolutionmethod that involves learning multiple interpolation kernels.Our approach is also based on the image sparse prior, simi-lar to the dual learning-based SRR algorithm [17]. However,unlike the latter, which used fixed interpolation kernels forall the image patches, our method learned several interpola-tion kernels for each image structure in the training process.And in the reconstruction step, we used a weighted aver-age of these kernels to build an adaptive interpolation ker-nel for each specific image patch. Experimental results onsingle-image super resolution demonstrate that the proposedalgorithm can preserve sharp edges and rich textures; more-over, it is faster than the sparse representation-based super-

123

Page 9: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

Fig. 6 Visual comparison of super-resolution results of real imagesgirl and koala with a magnification factor of 3. For each image, themethods used, from left to right and top to bottom, are bicubic inter-

polation, kernel regression (KR) [4], fast image/video upscaling (Shan)[23], image upscaling via local similarity learning (Freedman) [14],SP-SR [12], and our proposed algorithm, respectively

resolution algorithm. Since our proposed approach is mainlybased on simple linear algebra computations, its potentialapplication to real-time image processing warrants its furtherstudy.

Acknowledgments The authors wish to thank the anonymous review-ers for their insightful comments, which helped us improve the qualityof the paper significantly

References

1. Tian, J., Ma, K.K.: A survey on super-resolution imaging. SignalImage Video Process. 5(3), 329–342 (2011)

2. Farsiu, S., Robinson, D., Elad, M., Milanfar, P.: Fast and robustmultiframe super resolution. IEEE Trans. Image Process. 13(10),1327–1345 (2004)

3. Liu, Z., Wang, H., Peng, S.: Image magnification method usingjoint diffusion. J. Comput. Sci. Technol. 19(5), 698–707 (2004)

123

Page 10: Learning adaptive interpolation kernels for fast single-image super resolution

SIViP

4. Takeda, H., Farsiu, S., Milanfar, P.: Kernel regression for imageprocessing and reconstruction. IEEE Trans. Image Process. 16(2),349–366 (2007)

5. Fattal, R.: Image upsampling via imposed edge statistics. ACMTrans. Gr. (TOG) 26(3), 95:1–95:8 (2007)

6. Sun, J., Xu, Z., Shum, H.Y.: Image super-resolution using gradientprofile prior. In: IEEE conference on computer vision and patternrecognition, Anchorage, AK, 23–28 June 2008, pp. 1–8 (2008)

7. Anbarjafari, G., Demirel, H.: Image super resolution based on inter-polation of wavelet domain high frequency subbands and the spatialdomain input image. ETRI J. 32(3), 390–394 (2010)

8. Shao, W.Z., Deng, H.S., Wei, Z.H.: A posterior mean approachfor MRF-based spatially adaptive multi-frame image super-resolution. Signal Image Video Process. 1–13 (2013). doi:10.1007/s11760-013-0458-x

9. Freeman, W.T., Pasztor, E., Carmichael, O.: Learning low-levelvision. Int. J. Comput. Vis. 40(1), 25–47 (2000)

10. Freeman, W.T., Jones, T., Pasztor, E.: Example-based super-resolution. IEEE Comput. Gr. Appl. 22(2), 56–65 (2002)

11. Yang, J., Wright, J., Huang, T.S., Ma,Y.: Image super resolution assparse representation of raw image patches. In: IEEE conference oncomputer vision and pattern recognition, Anchorage, AK, 23–28June 2008, pp. 1–8 (2008)

12. Yang, J., Wright, J., Huang, T.S.: Image super resolution via sparserepresentation. IEEE Trans. Image Process. 19(11), 2861–2873(2010)

13. Glasner, D., Bagon, S., Irani, M.: Super-resolution from a singleimage. In: IEEE 12th international conference on computer vision,Kyoto, 29 Sep–2 Oct 2009, pp. 349–356 (2009)

14. Freedman, G., Fattal, R.: Image and video upscaling from localself-examples. ACM Trans. Gr. (TOG) 30(2), 12:1–12:11 (2011)

15. Damkat, C.: Single image super-resolution using self-examplesand texture synthesis. Signal Image Video Process. 5(3), 343–352(2011)

16. Chang, H., Yeung, D.Y., Xiong, Y.: Super-resolution through neigh-bor embedding. In: Proceedings of the IEEE conference on com-puter vision and pattern recognition, Washington, DC, 27 June– 2July 2004, pp. 275–282 (2004)

17. Zhang, H., Zhang, Y., Huang, T.S.: Efficient sparse representa-tion based image super resolution via dual dictionary learning. In:IEEE international conference on multimedia and expo, Barcelona,Spain, 11–15 July 2011, pp. 1–6 (2011)

18. Basso, C., Santoro, M., Verri, A., Villa, S.: PADDLE: proximalalgorithm for dual dictionaries learning. In: Artificial neural net-works and machine learning C ICANN 2011. Lecture notes in com-puter science, vol. 6791, pp. 379–386 (2011)

19. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is“nearest neighbor” meaningful. In: Database theory ICDT 99. Lec-ture notes in computer science, vol. 1540, pp. 217–235 (1999)

20. Donoho, D.L.: For most large underdetermined systems of linearequations, the minimal �1-norm solution is also the sparsest solu-tion. Commun. Pure Appl. Math. 59(6), 797–829 (2006)

21. Donoho, D.L.: For most large underdetermined systems of linearequations, the minimal �1-norm near-solution approximates thesparsest near-solution. Commun. Pure Appl. Math. 59(7), 907–934(2006)

22. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image qual-ity assessment: from error visibility to structural similarity. IEEETrans. Image Process. 13(4), 600–612 (2004)

23. Qi, S., Li, Z., Jia, J., Tang, C.K.: Fast image/video upsampling.ACM Trans. Gr. (TOG) 27(5), 153:1–153:7 (2008)

123