penalized spline estimator in nonparametric...

4
Proceedings of the IConSSE FSM SWCU (2015), pp. MA.1–4 ISBN: 978-602-1047-21-7 SWUP MA.1 Penalized spline estimator in nonparametric regression Helmina Andriani a* , Wahyu Wibowo b , Santi Puteri Rahayu c a Student, Institut Teknologi Sepuluh Nopember, Jl. Arif Rahman Hakim, Surabaya 60111, Indonesia b,c Lecturer, Institut Teknologi Sepuluh Nopember, Jl. Arif Rahman Hakim, Surabaya 60111, Indonesia Abstract Regression analysis is a statistical process that is used to investigate patterns of relationship and to know the influence of the independent variables on the dependent variables through regression curve estimation. If the shape of regression curve is assumed unknown then we used nonparametric regression. Nonparametric regression approach that quite popular is spline. There are two kinds of nonparametric model approach to spline, spline regression and smoothing spline. Combination of both approaches is known as penalized spline regression. This paper focuses on how to obtain the penalty matrix using penalized least square (PLS) method, then used this penalty matrix to estimate nonparametric regression model based on penalized spline estimator. Keywords nonparametric regression, penalized spline, penalty matrix 1. Introduction Regression analysis is a statistical process that is used to investigate patterns of relationships and to know the influence of the independent variables on the dependent variable through regression curve estimation. If the shape of regression curve is assumed known then we used parametric regression, but if the shape of regression curve is assumed unknown then we used nonparametric regression (Utami, 2013). Regression curve estimation using nonparametric regression model can be done by several methods, including the kernel, spline, and Wavelet and Fourier series expansion (Djuraidah & Aunuddin, 2006). Nonparametric regression approach that quite popular is spline. Spline is one kind of piecewise polynomial, which had segmented poperties. The segmented properties provides more flexibility than ordinary polynomials, thus allowing it to adapt more effectively to the local characteristics of a function or data (Budiantara et al., 2006). Other advantages owned by spline is able to describe the change in the pattern of behavior of the function in the sub- specified interval and can be used to address the data patterns experienced a sharp increase or decrease with the help of knots, as well as the resulting curve is relatively (Fathurahman, 2011). There are two kinds of nonparametric model approach to spline, i.e. spline regression and smoothing spline. Combination of both approaches known as penalized spline regression (Djuraidah & Aunuddin, 2006). In applying penalized spline, there are some things that need to be considered, namely: (a) the location and number of knots, (b) basis spline functions, and (c) degree of freedom and penalty matrix (Montoya et al., 2014). This paper focuses on how * Corresponding author. E-mail address: [email protected]

Upload: lyquynh

Post on 10-Apr-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Proceedings of the IConSSE FSM SWCU (2015), pp. MA.1–4 ISBN: 978-602-1047-21-7

SWUP

MA.1

Penalized spline estimator in nonparametric regression

Helmina Andriania*, Wahyu Wibowob, Santi Puteri Rahayuc

aStudent, Institut Teknologi Sepuluh Nopember, Jl. Arif Rahman Hakim, Surabaya 60111, Indonesia

b,cLecturer, Institut Teknologi Sepuluh Nopember, Jl. Arif Rahman Hakim, Surabaya 60111, Indonesia

Abstract

Regression analysis is a statistical process that is used to investigate patterns of relationship and to know the influence of the independent variables on the dependent variables through regression curve estimation. If the shape of regression curve is assumed unknown then we used nonparametric regression. Nonparametric regression approach that quite popular is spline. There are two kinds of nonparametric model approach to spline, spline regression and smoothing spline. Combination of both approaches is known as penalized spline regression. This paper focuses on how to obtain the penalty matrix using penalized least square (PLS) method, then used this penalty matrix to estimate nonparametric regression model based on penalized spline estimator.

Keywords nonparametric regression, penalized spline, penalty matrix

1. Introduction

Regression analysis is a statistical process that is used to investigate patterns of relationships and to know the influence of the independent variables on the dependent variable through regression curve estimation. If the shape of regression curve is assumed known then we used parametric regression, but if the shape of regression curve is assumed unknown then we used nonparametric regression (Utami, 2013). Regression curve estimation using nonparametric regression model can be done by several methods, including the kernel, spline, and Wavelet and Fourier series expansion (Djuraidah & Aunuddin, 2006).

Nonparametric regression approach that quite popular is spline. Spline is one kind of piecewise polynomial, which had segmented poperties. The segmented properties provides more flexibility than ordinary polynomials, thus allowing it to adapt more effectively to the local characteristics of a function or data (Budiantara et al., 2006). Other advantages owned by spline is able to describe the change in the pattern of behavior of the function in the sub-specified interval and can be used to address the data patterns experienced a sharp increase or decrease with the help of knots, as well as the resulting curve is relatively (Fathurahman, 2011).

There are two kinds of nonparametric model approach to spline, i.e. spline regression and smoothing spline. Combination of both approaches known as penalized spline regression (Djuraidah & Aunuddin, 2006). In applying penalized spline, there are some things that need to be considered, namely: (a) the location and number of knots, (b) basis spline functions, and (c) degree of freedom and penalty matrix (Montoya et al., 2014). This paper focuses on how

* Corresponding author. E-mail address: [email protected]

Penalized spline estimator in nonparametric regression

SWUP

MA.2

to obtain the penalty matrix using penalized least square (PLS) method, then used this penalty matrix to estimate nonparametric regression model based on penalized spline estimator.

2. Materials and methods

To get the penalty matrix, it takes several mathematical steps. Step 1.

Suppose that n pairs of measurements are observed, ��� , ���, � = 1,2,… , �, satisfying the model in Eq. (1), where ��� is an unknown regression function and the errors iε are

independent with constant variance ��. �� = ���� + �� , � = 1,2,… , �. (1) Step 2.

It is assumed that ��� can be well modeled by the truncated power basis of degree �, the basis is 1, �, … , ��, �� − ����, … , �� − ���� . Hence, the ���-degree spline model is ��� = �� + ��� +⋯+ ���� + ∑ !��� − �!�"��!#� ,

where � ≥ 1 is an integer, �� < ⋯ < �� is a set of fixed knots, and �� − �!�" =max)0, �� − �!�+, then nonparametric regression model using the truncated power function is

�� = ��� = �� + ��� +⋯+ ���� + ∑ !��� − �!�"� + ����#� . (2)

Step 3.

Stating nonparametric regression model in matrix form, denoting with , = -��, … , ��./ , 0 = �u�, … , u2�/, 3 = 41, ��, … , ���5�6�67, and 8 = 4��� − ���"� , … , ��� − ���"� 5�6�67.

Rewrite Eq. (2) as 9 = 3, + :0 + ;. Step 4.

Getting the penalty matrix D by minimizing the function < using PLS method, where the function < is < = ‖9 − 3, − :0‖� + λ0@0. Step 5.

Obtaining penalized spline estimator AB and prediction 9C by some procedures.

3. Results and discussion

Penalized spline estimator can be obtained by minimizing function < with PLS method, i.e., < = ‖9 − 3, − :0‖� + λ0/0 = �9 − 3, − :0�/�9 − 3, − :0� + D0/0 = �E/ − ,/3/ − 0/:/��E − 3, − :0� + D0/0 = 9/E − 9/3, − 9/:0 − ,/3/9 + ,/3/3, + ,/3/:0 − 0/:/9 + 0/:/3,

+0/:/:0 + D0/0 = 9/9 − F9/3, − F9/:0 + F,/3/:0 + ,/3/3, + 0/:/:0 + G0/0.

A necessary condition

1) HIH,J = 0.

H

H,J �9/9 − F9/3, − F9/:0 + F,/3/:0 + ,/3/3, + 0/:/:0 + G0/0� = 0

−23/9 + 2K/:0 + 23/3, = 0.

H. Andriani, W. Wibowo, S.P. Rahayu

SWUP

MA.3

Hence, 3/:0 + 3/3, = 3/E. (3)

2) HIH,J = 0.

H

HLJ �9/9 − F9/3, − F9/:0 + F,/3/:0 + ,/3/3, + 0/:/:0 + G0/0� = 0

−2:/9 + 2:/3, + 2:/:0 + 2G0 = 0. Hence,

:/3, + -:M: + GN.0 = :/9 (4)

Following Eqs. (3) and (4), then both equation can be rewriten as

O3/3 3/::/3 :/: + GNP Q,0R = O3/ 9:/ 9P SQ3/3 3/::/3 :/:R + QT TT GNRU Q,0R = Q3/:/R 9

SO3@:@P V3 :W + G QT TT NRU Q,0R = Q3/:/R 9.

Suppose that X = VK 8W, Y = Q,0R, and Z = OT��"����"�� T��"���T���"�� N�� P. Inherent with the advantages of parametric modelling, regression spline possess, how-

ever, a serious drawback—a proper strategy for selecting the number and location of knots is needed. Inappropriate selecting number of knots that cause the data are “overfitted” or “underfitted” (Krivobokova, 2006). There is an optimal number of knots that leads to an intermediate amount of smoothing that avoids underfitting or overfitting the data. This optimal number can be found by experimentation, but this can time intensive, especially if there are numerous, large, complicated datasets. An alternative approach to optimize the fit is achieved by imposing a penalty on spline coefficients. Specifically, one chooses a large amount of knots (e.g. by fixed selection method as suggested in Rupper, 2002) and prevents overfitting by putting a constraint on spline coefficiets, i.e. one finds (Krivobokova, 2006; Griggs, 2013) min^,L ‖9 − 3, − :0‖�, subjectto‖0‖� ≤ g, forg ≥ 0. Using the Lagrange multiplier, this minimization problem can be written as min,,0 )‖9 − 3, − :0‖� + G0/0+ = minA )‖9 − jA‖� + GA/kA+, G ≥ 0

Suppose that j = V3 :W and A = Q,0R, then

‖9 − jA‖� = �9 − jA�/�9 − jA� = �9/ − A/j/��9 − jA�

= 9/9 − 29/jA + A/j/jA. With subject to A/DA ≤ m, Lagrange equation can be written as < = 9/9 − 29/jA + A/jnjA + DAnkA. (5) To find the solution to the minimization of Eq. (5), PLS method can be used. A necessary condition:

HIHAJ = 0

H

HAJ �9/9 − 29/jA + A/j/jA + GA/kA� = 0

−2j/9 + 2j/j0 + 2GkA = 0 �j/j + Gk�A = j/9

Penalized spline estimator in nonparametric regression

SWUP

MA.4

AB = �j/j + Gk�o�j/9. Hence, 9C = jAB = 9C = j�j/j + Gk�j/9.

The value of YB depend on the value of smoothing parameter G. The larger the values of the smoothing parameter G , the more the fit shrinks towards a polynomial fit, while smaller values of G result in a wiggly “overfitted” estimate. To find the optimal G value, it can be used by generalized cross-validation (GCV) method.

4. Conclusion and remarks

It takes several mathematical steps to obtained penalty matrix using PLS method.

Further, the penalty matrix k = OT��"����"�� T��"���T���"�� N�� P is used to find the penalized

spline estimator AB = �j/j + Gk�o�j/9, where j = V3 :W and A = Q,0R.

Acknowledgment

The author would like to thank to Mr. Wahyu and Mrs. Santi Puteri for the suggestion and for reading portion of this paper and pointing out the gaps that needed to be filled.

References

Budiantara, I.N., Suryadi, F., Otok, B.W., & Guritno, S. (2006). Pemodelan B-spline dan MARS pada nilai ujian masuk terhadap IPK mahasiswa jurusan disain komunikasi visual UK. Petra, Surabaya. Jurnal

Teknik Industri, 8(1), 1–13. Djuraidah, A., & Aunuddin (2006). Pendugaan regresi spline terpenalti dengan pendekatan model

linear campuran. Statistika, 6(1), 47–54. Fathurahman, M. (2011). Estimasi parameter model regresi spline. Jurnal Eksponensial, 2(1), 53–58. Griggs, W. (2013). Penalized spline regression and its applications. Senior Project, Whitman College,

United States. Krivobokova, T. (2006). Theoritical and practical aspects of penalized spline smoothing. Disertation Dr.

rer. pol, Bielefeld University, Bielefeld. Montoya, E.L., Ulloa, N., & Miller, V. (2014). A simulation study comparing knot selection methods

with equally spaced knots in a penalized regression spline. International Journal of Statistics and

Probability, 3(3), 96–110. Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and

Graphical Statistics, 11(4), 735–757. Utami, T.W. (2013). Estimasi kurva regresi semiparametrik pada data longitudinal berdasarkan

estimator polinomial lokal. Statistika, 1(1), 30–36.