learning theory.rtf

Upload: jack-sudarto

Post on 13-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/21/2019 LEARNING THEORY.rtf

    1/139

    Omtfdtr Mzocgrjota cn \y,

  • 7/21/2019 LEARNING THEORY.rtf

    2/139

    Thi page intentionally left +lank

  • 7/21/2019 LEARNING THEORY.rtf

    3/139

    CAMBRIDGE MONOGRAPHS ON APPLIED ANDCOMPTATIONAL MATHEMATICSSeries Editors

    M, -, $%LO.'T"/ S, 0, $1'S/ ), -, 0'2#0/ $, 'S)&L)S/ -, O#)2)2/ 4, -, OL1)&

    Learning Theory:

  • 7/21/2019 LEARNING THEORY.rtf

    4/139

    An Approximation Theory ViewpointThe #am+ridge Monograph on $ppliedand #omputational Mathematic re5ect the crucial role of mathematical and computational techni6ue incontemporary cience, The erie pu+lihe e!poition on all apect of applica+le and numericalmathematic/ 7ith an emphai on ne7 de8elopment in thi fat9mo8ing area of reearch,

    State9of9the9art method and algorithm a 7ell a modern mathematical decription of phyical and

    mechanical idea are preented in a manner uited to graduate reearch tudent and profeional alike,Sound pedagogical preentation i a prere6uiite, 't i intended that +ook in the erie 7ill er8e to inform ane7 generation of reearcher,

    .ithin the erie 7ill +e pu+lihed title in the Li+rary of #omputational Mathematic/ pu+lihed under theaupice of the Foundation of #omputational Mathematic organiation, Learning Theory: $n $ppro!imationTheory 1ie7 4oint i the ;rt title 7ithin thi ne7 u+erie,

    The Li+rary of #omputational Mathematic i edited +y the follo7ing editorial +oard: Felipe #ucker(Managing )ditor< &on e8ore/ 2ick 0igham/ $rieh 'erle/ a8id Mumford/ $llan 4inku/ -im &enegar/ MikeShu+,

    Also in this series:$ practical uide to 4eudopectral Method/ Bengt Fornberg

    ynamical Sytem and 2umerical $nalyi/A. M. Stuart and A. R. Humphries

    Le8el Set Method/. A. Sethian

    The 2umerical Solution of 'ntegral )6uation of the Second ind/

    !endall ". At#insonOrthogonal &ational Function/Adhemar Bultheel, $ablo %on&ale&'(era,"ri# Hendri#sen, and )la* +astadTheory of #ompoite/ %raeme -. Milton

    eometry and Topology for Meh eneration/ Herbert "delsbrunner Sch7arz9#hrito=el Mapping/ Tobin A.

    ris/oll and Lloyd +. Tre0ethen

    0igh9Order Method for 'ncomprei+le Fluid/ M.). e*ille,

    ".H. Mund and4 Fisher

    4ractical )!trapolation Method/A*ram Sidi

    eneralized &iemann 4ro+lem in #omputational Fluid ynamic/M. Ben'Art&i and . Fal/o*t&

    &adial %ai Function/ Martin Buhmann

    Learning Theory: An Approximation TheoryViewpointF)L'4) #>#)&City University of Hong Kong'29?>$2 "0O>City University of Hong Kong

    CAMBRIDGE>2'1)&S'T@ 4&)SSCAMBRIDGE UNIVERSITY PRESS

    #am+ridge/ 2e7 @ork/ Mel+ourne/ Madrid/ #ape To7n/ Singapore/ Sao 4aulo #am+ridge >ni8erity 4re

    The )din+urgh %uilding/ #am+ridge #%A B&>/ >4u+lihed in the >nited State of $merica +y #am+ridge >ni8erity 4re/ 2e7 @ork 777,cam+ridge,org

    'nformation on thi title: 777,cam+ridge,orgCDEBGAHBIGGD3J #am+ridge >ni8erity 4re AE

    Thi pu+lication i in copyright, Su+ject to tatutory e!ception and to the pro8iion of rele8ant collecti8e liceningagreement/ no reproduction of any part may take place 7ithout the 7ritten permiion of #am+ridge >ni8erity 4re,Firt pu+lihed in print format AE'S%29H3 DEB99GHH9AEKE9E e%ook ()%Lni8erity, .e ha8e 7ritten a num+er of papertogether on 8ariou apect of learning theory, 't gi8e me great pleaure to continue to 7ork7ith +oth mathematician, ' am proud of our joint accomplihment,

    ' lea8e to the author the tak of decri+ing the content of their +ook, ' 7ill gi8e omeperonal perpecti8e on and moti8ation for 7hat they are doing,#omputational ciencedemand an undertanding of fat/ ro+ut algorithm, The ame applie to modern theorie ofarti;cial and human intelligence, 4art of thi undertanding i a comple!ity9theoretic analyi,0ere ' am not peaking of a literal count of arithmetic operation (although that i a +y9product

  • 7/21/2019 LEARNING THEORY.rtf

    8/139

    #tephen #male CicagoPre$a%e%roadly peaking/ the goal of (maintream< learning theory i to appro!imate a function (orome function feature< from data ample/ perhap pertur+ed +y noie, To attain thi goal/learning theory dra7 on a 8ariety of di8ere u+ject, 't relie on tatitic 7hoe purpoe ipreciely to infer information from random ample, 't alo relie on appro!imation theory/

    ince our etimate of the function mut +elong to a prepeci;ed cla/ and therefore thea+ility of thi cla to appro!imate the function accurately i of the eence, $nd algorithmicconideration are critical +ecaue our etimate of the function i the outcome of algorithmicprocedure/ and the ePciency of thee procedure i crucial in practice, 'dea from all theearea ha8e +lended together to form a u+ject 7hoe many ucceful application ha8etriggered it rapid gro7th during the pat t7o decade,

    Thi +ook aim to gi8e a general o8er8ie7 of the theoretical foundation of learningtheory, 't i not the ;rt to do o, @et 7e 7ih to emphaize a 8ie7point that ha dra7n littleattention in other e!poition/ namely/ that of appro!imation theory, Thi emphai ful;ll t7opurpoe, Firt/ 7e +elie8e it pro8ide a +alanced 8ie7 of the u+ject, Second/ 7e e!pect toattract mathematician 7orking on related ;eld 7ho ;nd the pro+lem raied in learningtheory cloe to their interet,

    .hile 7riting thi +ook/ 7e faced a dilemma common to the 7riting of any +ook inmathematic: to trike a +alance +et7een clarity and conciene, 'n particular/ 7e faced thepro+lem of ;nding a uita+le degree of elf9containment for a +ook relying on a 8ariety ofu+ject, Our olution to thi pro+lem conit of a num+er of ection/ all called &eminder/N7here e8eral +aic notion and reult are +rie5y re8ie7ed uing a uni;ed notation,

    $e are indebted to several friends and colleagues who have helped us in many

    ways" #teve #male deserves a special mention" $e %rst became interested in

    learning theory as a result of his interest in the sub&ect! and much of the material

    in this boo' comes from or evolved from &oint papers we wrote with him" (iang $u!

    )iming )ing! *angyan Lu! +ongwei #un! ,i-ong Chen! #ong Li! Luo.ing Li!

    /ing0heng Li! Li0hong Peng! and Tiangang Lei regularly attended our wee'ly

    seminars on learning theory at City University of +ong 1ong! where we exposed

    early drafts of the contents of this boo'" They! and 2os3 Luis /alca0ar! read

    preliminary versions and were very generous in their feedbac'" $e are indebted

    also to ,avid Tranah and the sta4 of Cambridge University Press for their patience

    and willingness to help" $e have also been supported by the University 5rants

    Council of +ong 1ong through the grants CityU 67897;P! 67

    ,HRH

    http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark86http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark86http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark86
  • 7/21/2019 LEARNING THEORY.rtf

    9/139

    Figure H,H

    #ae H,A #aeH,H readily e!tend to a claical ituation in cience/ namely/ that of learning aphyical la7 +y cur8e ;tting to data, $ume that the la7 at hand/ an unkno7n function f: & &/ ha a peci;c form and that the pace of all function 7ith thi form can +eparameterized +y /real num+er, For intance/ iffi aumed to +e a polynomial of degree d/then / ' d H and the parameter are the unkno7n coePcient w0#!!!/ wdof f, 'n thi cae/;nding the *est 1t+y the least s+,ares %etodetimate the unkno7n ffrom a et ofpair Q(!i/yi 9 yi>;/ 7ith fw(x) 1 ^$ w3x3

    i'" &?7i minimized/ 7here/ typically/ % 4 /,'n general/ the minimum 8alue a+o8e i not , To ol8ethi minimization pro+lem/ one ue the leat 6uare techni6ue/ a method going +ack toau and Legendre that i computationally ePcient and relie on numerical linear alge+ra,

    Since the 8alue yiare a=ected +y noie/ one might take a tarting point/ intead of theunkno7n f/ a family of pro+a+ility meaure exon & 8arying 7ithx e&, The only re6uirementon thee meaure i that for allx e&/ the mean of exi f (x)!Then yii randomly dra7n

    from e?i, 'n ome conte!t thexi/ rather than +eing choen/ are alo generated +y apro+a+ility meaure5 on &, Thu/ the tarting point could e8en +e a ingle meaure5on & !& 9 capturing +oth the meaure5and the meaure exforx e& 9 from 7hich the pair (xi/yi)are randomly dra7n,

    $ more general form of the function in our appro!imating cla could +e gi8en +y/fw (x) ' ^$ wr1 (x)# i'"7here the faare the element of a +ai of a peci;c function pace/ not necearily ofpolynomial,#ae H,3 The training of neural net7ork i an e!tenion of #aeH,A, &oughly peaking/ aneural net7ork i a directed graph containing ome input node/ ome output node/ andome intermediate node 7here certain function are computed, 'fdenote the input pace

    (7hoe element are fed to the input node< and Ythe output pace (of poi+le elementreturned +y the output node

  • 7/21/2019 LEARNING THEORY.rtf

    10/139

    a certain gray cale of a digitized photograph of the hand7ritten letter or ome featuree!tracted from the letter, .e may take Yto +e

    "AI AI ye &AIW yR ieiuch that i 'H H?H i?t0ere eii the ith coordinate 8ector in &AI/ each coordinate correponding to a letter, 'f $ c Yithe et of point ya a+o8e uch that Xi 7H/ for i 'H/,,,/ AI/ one can interpret a point in$ a a pro+a+ility meaure on the et Q$/ %/ #/,,,/ ?/ @/ "Y, The pro+lem i to learn the idealfunction f: ^ Ythat aociate/ to a gi8en hand7ritten letterx/ a linear com+ination of theei7ith coePcient Q4ro+Q! R $Y/4ro+Q! R %Y/ ,,,/4ro+Q! R "YY, >nam+iguou letter aremapped into a coordinate 8ector/ and in the (pure< clai;cation pro+lem ftake 8alue onthee ei!Learning fN mean ;nding a uPciently good appro!imation off7ithin a gi8enprecri+ed cla,

    The appro!imation of fi contructed from a et of ample of hand7ritten letter/ each ofthem 7ith a la+el in Y, The et Q(!H/ yH

  • 7/21/2019 LEARNING THEORY.rtf

    11/139

    kno7n (the meaure in U/HVninherited from the tandard Le+egue meaure on &n

  • 7/21/2019 LEARNING THEORY.rtf

    12/139

    Fi!x e and conider the function from Yto & mapping yinto (y 9 f5(x))! Since thee!pected 8alue of thi function i / it 8ariance ia;

    5

    a$(x) dP' (f5)!

  • 7/21/2019 LEARNING THEORY.rtf

    13/139

    2o7 a8erage o8er/ to o+tainThe num+er o5i a meaure of ho7 7ell conditioned5i/analogou to the

    notion of condition num+er in numerical linear alge+ra,&emark H,Ei; 't i important to note that 7herea5and f5are generally unkno7n/N5 i kno7n in

    ome ituation and can e8en +e the Le+egue meaure on inherited from )uclidean

    pace (a in #aeH,A andH,I

  • 7/21/2019 LEARNING THEORY.rtf

    14/139

    marginal measure of5onis our original5" Dn addition! a;? 7! the error

    speciali0es to the error mentioned in Case 6"E! and the regression function f

    5of5coincides with f;except for a set of measure 0ero in"$ypothesisspa!es and target "n!tions

    Learning procee do not take place in a 8acuum, Some tructure need to +e preent at the+eginning of the proce, 'n our formal de8elopment/ 7e aume that thi tructure take theform of a cla of function (e,g,/ a pace of polynomial/ of pline/ etc, R up If(!

  • 7/21/2019 LEARNING THEORY.rtf

    15/139

    W\z (fj) . B ' . (fj(xi< f6(xi< 9 Ayi)(fj(xi< 9 f6(xi

  • 7/21/2019 LEARNING THEORY.rtf

    16/139

    eentially independent of f5, #one6uently/ +ound for y (fB)7ill not depend on propertieoff5, 0o7e8er/ due to their dependence on the random ampleB/ they 7ill hold 7ith only acertain con;dence, That i/ the +ound 7ill depend on a parameter ?and 7ill hold 7ith acon;dence of at leat H ] B,

    Thi dicuion e!tend to ome algorithmic iue, $lthough dependence on the +eha8iorof f5eem una8oida+le in the etimate of the appro!imation error (and hence on thegeneralization error (fB)of fz

  • 7/21/2019 LEARNING THEORY.rtf

    17/139

    $ another e!ample of o8er;tting/ conider the 4$# learning ituation in #ae H,G 7ith Cconiting of all u+et of &n, #onider alo a ample Q(!i/ H

  • 7/21/2019 LEARNING THEORY.rtf

    18/139

    U3IV,'n thi +ook 7e 7ill not go deeper into the detail of 4$# learning,$ tandard reference for

    thi i UIEV,Other (+ut not all< +ook dealing 7ith di8ere mathematical apect of learning theory are

    UE/AD/3E/ GE/ GD/IH/DA/DG/HE/HHH/HAK/HAG/H3A/ H33/H3I/H3EV, 'n addition/ a num+erof cienti;c journal pu+lih paper on learning theory, T7o de8oted 7holly to the theory ade8eloped in thi +ook are2o,rnal of 9acine :earning Researcand 9acine:earning!

    *inally! we want to mention that the exposition and structure of this chapter

    largely follow

  • 7/21/2019 LEARNING THEORY.rtf

    19/139

    addition/ 7hen vR57e imply 7rite ' WWpintead of the more cum+erome ' WW4?,2ote that element in : ()are clae of function, 'n general/ ho7e8er/ one a+ue

    language and refer to them a function on, For intance/ 7e ay that f e :y ()icontin,o,s7hen there e!it a continuou function in the cla off,

    The s,55ortof a meaure oni the mallet cloed u+etofuch that v( = v)R,

    $ function f: & i %eas,ra*le7hen/ for all a e&/ the et 6x e W f (x)X aY i a%orel u+et of,

    The pace :G ()i de;ned to +e the et of all meaura+le function on uch that'' f' ' L^(?< :R up Wf(!, Thu/ for any f eL;I-n>/ there e!it 7fa@J'_6c#7I-n> uch that 88fa@9 fKW 7hen @ra, One can pro8e that for any uch e6uence(fa@)#the e6uence (F(fa@))con8erge to the ame element in L;I-n>, .e denote thielement +y FIf)and 7e ay that it i the Fourier tranform of f, The notation f intead of F(f)i often ued,

    The follo7ing reult ummarize the main propertie of F: L;I-n> L ;I-n>,Theorem A,3 (4lancherel theorem< Forf e L;I-n>

    i; F (f )(w) 1lim eiwMxf (x) dx# were te convergence is for te@^7x 3J@@Xnnor% in L;(-n)!

    ii; WW*(f)88 1 ($n)n$88fWW,

    iii; f (x) 1lim 9 elw

    Mx

    F(f )(w) dw# were te convergence is'ra ($n)n2J@#@Xn

    for te nor% in L;I-n>, If f e L6I-n> n L;I-n>/ ten te convergence olds al%osteverywere!

    iv; ;e %a5 F: L ;I-n> L ;I-n> is an iso%or5is% of Hil*ert s5aces!III; Our third reminder i a+out compactne,

    't i 7ell kno7n that a u+et of -ni compact if and only if it i cloed and +ounded, Thii not true for u+et of ; ()!@et a characterization of compact u+et of ;(> in imilarterm i till poi+le,

    $ u+et

  • 7/21/2019 LEARNING THEORY.rtf

    20/139

    e+,icontin,o,s! The fact that e8ery cloed +all in -ni compact i not true in 0il+ert pace, 0o7e8er/ 7e

    7ill ue the fact that cloed +all in a 0il+ert pace Hare wea@ly co%5act!That i/ e8erye6uence 7fnne/in a cloed +all in Hha a 7eakly con8ergent u+e6uence 7fn@@e2/ or/in other 7ord/ there i ome f e uch thatlim =fnk/ gN ? f/ gN/ Vg e H,IV; @.e cloe thi ection 7ith a dicuion of completely monotonic function, Thidicuion i on a le general topic than the preceding content of thee reminder,

    $ function f: /TO< & i co%5letely %onotonicif it i continuou on /TOs, 2otice that 7hen ? /the inner product a+o8e coincide 7ith that of L6().That i/ WW WW7? WW WW, .e de;ne theSo+ole8 pace Hs()to +e the completion of #TO()7ith repect to the norm WW WWs, TheSo+ole8 em+edding theorem aert that for all re 2 and all _ nCA Q r/ the incluion2s: Hs(

  • 7/21/2019 LEARNING THEORY.rtf

    21/139

    K(x#x)K(t# t)!

    Forxe/ 7e denote +y Kxthe function

    Kx: ? &

    t K (x# t)!The main reult of thi ection i gi8en in the follo7ing theorem,Theorem A,D ;ere exists a ,ni+,e Hil*ert s5ace (HK/ (/)HK) of f,nctions onsatisfying te following conditions[

    i;for all x e # Kxe HK#ii; te s5an of te set 6KxWx e & is dense in HK# andiii; for all f e HKandx e #f (x) ' 6Kx/ f )HK!

    9oreover# HKconsists of contin,o,s f,nctions and te incl,sion IK: HK^ C() is*o,nded wit ==IKWW X CK!4roof, Let 0+e the pan of the et 6KxWx e Y, .e de;ne an inner product in 0o a r(f/ g) ' 2$ aP3Kfe/ t3

  • 7/21/2019 LEARNING THEORY.rtf

    22/139

    atify(! y)dR ]axaya/ 1!/y e &n,KaK?dLet Q!H/,,,/x@Y c, Then/ for all cH/,,,/ c@e &/[

    ;

    cic3K(xi/x3)'J2 adY. Ca cix7a3'" dR7 WaWRd iR6i/3RH

    Therefore/ Ki a Mercer kernel,

  • 7/21/2019 LEARNING THEORY.rtf

    23/139

    $n e!plicit e!ample i the linear polynomial kernel,)!ample A,HA Let+e a u+et of & containing at leat t7o point/ and K the Mercer kernelongi8en +y K(x# y) 'H x y!Then HKi the pace of linear function and QH/x&forman orthonormal +ai of HK,4roof, 2ote that for a e / Kai the function H axof the 8aria+lexin c&, Take a ' * e ,%y the de;nition of the inner product in HK/

    ==Ka . K*'l1 R (Ka . K*/ Ka . K*)K' K(a# a) . $K(a# *) K(*# *)'H a$.A(H a*) H +AR (a .+

  • 7/21/2019 LEARNING THEORY.rtf

    24/139

    @(_)eix3* e.ixftc&cl($n)n3l'"Y c3ceK (x3/xe)3#i'=

    Ato get($n).n

    c3ei3d__ /7here W W mean the module in # andBi the comple! conjugate of z, Thu/ K i a Mercerkernel on any u+et of &n, )!ample A,HG ($ pline kernel< Let @+e the uni8ariate function upported on U9A/AV gi8en +y@(x) 'H 9 W!WCA for 9A Xx 7A, Then the kernel Kde;ned +y K (x# y) ' @ (x . y)i a Mercer

    kernel on any u+etof &,4roof, One can eaily check that $@ (x)e6ual the con8olution of the characteritic function!U9H/HV 7ith itelf, %ut !U9H/HV(f< R Ain__!Thu/ @(_) 'A(in *C* Let 'U+H+A,,, *+X+e an n x +matri! 7here U+H+A,,, *nXi in8erti+le, #hooe @(x) ' (9 * 9)(x) to +e the +o!pline 7ith direction et U%/ X!Then/ for all We &n/

    in (f *3$) f *3CAAand the kernel ! (3#y) 1 # (3 9 y)i a Mercer kernel on any u+et? of &n,

    $n intereting cla of tranlation in8ariant kernel i pro8ided +y radial basis 0un/tions!0ere the kernel take the form !(3#y) 1 0 (WW! R yWWA< for a uni8ariate function 0on / Qro

  • 7/21/2019 LEARNING THEORY.rtf

    25/139

    d v(a)Proof" %y 4ropoitionA,G/ there i a ;nite %orel meaure von /TO< for 7hichfor all t eU/TO

  • 7/21/2019 LEARNING THEORY.rtf

    26/139

    Let ? R G (-nH< andK: ? ! ? -(!/ t) (x t)d,Let alo: ? &w

    ! =3a(;d' K(x# t) ' Cdx ata,WaWRdKaK?dCdxawa/ 7e kno7 that the .eylR Y Cdxata' 6Kx/ Kt&K,On the other hand/ ince Kx(w) ' ^=a='d inner product of Kxand Ktati;e(Kx#Kt

    http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark127http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark127http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark121http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark121http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark127http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark121
  • 7/21/2019 LEARNING THEORY.rtf

    27/139

    For f e Rand y/ yM e 7ith d (y# yM) 7 ?#7e ha8e'f (y) . f(" Let+e

    2.7 compact and K: x - be a Sercer 'ernel" /yProposition;";

    space" +ere and in what follows Rdenotes the closed ball of radius R

    centered on the origin"Reminders IIThe general nonlinear 5rogra%%ing 5ro*le%i the pro+lem of ;ndingx e&n to ol8e thefollo7ing minimization pro+lem:

    min f (x)

    ,t, gi(!< X / i 'H/ / m/ (A,Af=N$>@)# tat is# fp1 :>2=N$>@(g) for so%e g e L$# te 8(fp/R)XA;QHHgH;Q;R.>! Conversely# if 5is odegeerate

    and 8(fp/ R) < CR.>for so%e constants C and 0# tenfplies in te range of :>fiEN6>).0'

    for all e 4 >!$lthough TheoremK,H may +e applied to pline kernel (ee SectionK,I)unle fpi ;[ itelf,'ntead/ alo in #hapter I/ 7e deri8e logarithmic order like8(fp/ R) 1O((log R).>)for analytickernel and So+ole8 mooth regreion function,

    4.1 Reminders III.e recall ome +aic fact a+out 0il+ert pace,

    $ e6uence 65n&n4i in a 0il+ert pace Hi aid to +e a co%5lete ortonor%al syste%(or an ortonor%al *asis)if the follo7ing condition hold:

    i; for all n ' % 4H/ (5n# 5%) '/ii; for all n 4H/ WWpnW 'H/ and

    iii; for all f e H/ f ' Y^''i f/5n)5n!$ e6uence atifying (i< and (ii< only i aid to +e an ortonor%al syste%!The num+er (f/

    5n)are the Fo,rier coecientsoffin the +ai 65n&n4"!'t i eay to ee that theecoePcient are uni6ue ince/ iff ' ^f an5n# an' (f/5n)for all n 4H,

    Theorem K,A (4are8al theorem< If 65n& is an ortonor%al syste% of a Hil*ert s5aceH# ten# for all f e H#^fn(f/5n)$7 =WWA, +,ality olds for allf e H if and only if 65n& isco%5lete!

    .e de;ned compactne of an operator in SectionA,3,.e ne!t recall ome other +aicpropertie of linear operator and a main reult for operator atifying them,e;nition K,3 $ linear operator :: H ^ Hon a 0il+ert pace Hi aid to +e self.ad3ointif/ forall f/ g e H/ (:f/ g) ' (f/ :g)!'t i aid to +e5ositive (repecti8ely strictly 5ositive)if it ielf9adjoint and/ for all nontri8ial f e H/ (:f #f) 4 (repecti8ely (:f #f< _ :^(>/ 7hich/ a+uing notation/ 7e alo denote +y :K,

    The function Ki aid to +e the @ernelof :/ and e8eral propertie of :K follo7 frompropertie of K, &ecall the de;nition of Cand Kxintroduced in SectionA,K,Proposition ="E IfK is contin,o,s# ten :K: :^(> C() is well de1ned andco%5act! In addition# ==:KWW X 1v()CK! Here v() denotes te %eas,re of!

    Proof"To ee that :Ki 7ell de;ned/ 7e need to ho7 that :Kfi continuou for e8ery fe:^(>, To do o/ 7e conider fe :^(> and !i/!Ae , Then

    http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark192http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark192http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark190http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark190http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark315http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark315http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark123http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark123http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark125http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark125http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark192http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark190http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark315http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark123http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark125
  • 7/21/2019 LEARNING THEORY.rtf

    40/139

    =(:Kf

  • 7/21/2019 LEARNING THEORY.rtf

    41/139

    Theorem;"F then shows that fa@e HK" Dn fact!7#. v() 7TO, @

    $e shall assume! without loss of generality! that@N A'Qi for all @N 6" Usingthe eigenfunctions fa@J! we can %nd an orthonormal system of the -1+# H."

    Theorem K,B :et v *e a orel %eas,re on # and : x & a 9ercer @ernel! :et @*e te @t eigenval,e of :@# and fa@te corres5onding ortonor%al eigenf,nction!;en Jfa@fa@:@47J for%s an ortonor%al syste% in H!4roof, $e apply the reproducing property stated in Theorem;"F"Assumei!3 4\we have

    6Vfafa!x3fa) ' ^'3 (! y)fa (y) d v (y)# x3fa^1 23 fa (y)y! [email protected] v (y)i ' ^3fxfa(y)fa(y) dv(y) ' fai# faK\M 88Kx'l ? K(x#x) 7#"@_H @_H+ence the series@f@I!> K\converges" This is true for each pointx e "%i

    fa@ (x)fa@ (t)@'% HCA%i HCA =t@ (t)=$ =N4@ (x)WA

    @'% @ '%

    %i HCA

    X CK =fa@(x)=$ /%i7

    2o7 7e ;! a pointx e , .hen the +ai 6fakYk_i ha in;nitely many function/ theetimate a+o8e/ together 7ith the #auchy9Sch7arz ine6uality/ tell u that for each t e /=@%I!x ,0NL^IX>

    K(x# y)f (y) dv ,7hich tend to zero uniformly (for t e?

  • 7/21/2019 LEARNING THEORY.rtf

    42/139

    t< con8erge a+olutely and uniformly on?to a continuou function gx, On the other hand/ aa function in L6(?

  • 7/21/2019 LEARNING THEORY.rtf

    43/139

    and HK, 'n addition/ conidered a an operator on :$()# :Yi the 6uare root of :@in theene that :K' :Yo LVCA(hence the notation LVCA

  • 7/21/2019 LEARNING THEORY.rtf

    44/139

    H 9 e bWWa 9 *e' Xand%ut the de;nition of the norm W aW rimplie that7WWa2r,(a/ tR#e)R#e

    Thereforea . *eWWX

    (a/tR#e)

    .r(".r)(a/ tR#e)

    , (H 9e)tR#e m

    HHqqqH

    C i =4(4.r)

  • 7/21/2019 LEARNING THEORY.rtf

    45/139

    :rK (:$())HCAProof of Theorem="6Take 0 a in #orollaryK,HE and r ' ($ )!

    'f f5e&ange(L1C(KAD

  • 7/21/2019 LEARNING THEORY.rtf

    46/139

    p % :P % K

  • 7/21/2019 LEARNING THEORY.rtf

    47/139

    234 Re"eren!es and additiona# remar*s47

    ;6=6e@$6=4.r@Ne=4.%@Eh

  • 7/21/2019 LEARNING THEORY.rtf

    48/139

    482 5o#ynomia# de!ay o" the approximation

    errorWWf (") .f ()=:$ 7 $K(f/ ") 7 $Cf,

    Thi pro8e the tatement and/ 7ith it/ the econd claim,

    4.7 Re"eren!es and additiona# remar*sFor a proof of the pectral theorem for compact operator ee/ for e!ample/ UE3Vand SectionK,H of UKV,Mercer theorem 7a originally pro8ed UBGVfor 'U/HV and v#the Le+egue meaure,4roof for thi imple cae can alo +e found in UI3/E3V,

    TheoremK,H andK,HA are for general nondegenerate meaure von a compact pace,For an e!tenion to a noncompact paceee UHA3V,

    The map in TheoremK,HK i called the feat,re %a5in the literature on learning theoryU3E/HE/H3KV, More general characterization for the decay of the appro!imation error +eingof type Q(5(R))7ith5decreaing on (/ ro< can +e deri8ed from the literature onappro!imation theory e,g,/ (UBE/DKV< +y mean of 9functional and moduli of moothne, Forinterpolation pace ee UHIV,

    &0S generated +y general pline kernel are decri+ed in UH3EV, 'n the proof of )!ample

    K,HD 7e ha8e ued a tandard techni6ue in appro!imation theory (ee UEBV

  • 7/21/2019 LEARNING THEORY.rtf

    49/139

    tenf(f) f()M C f(?< (x)=sd P7 C'f : 7 C'lf DDLLA ":P :P

    i; If f is CHon J.9# 9X and its derivative satis1es (N!N)# tenf(f) . f()M #Wlf f " 7 CWlf f

    ii; If f (,)N c 47 for every , e J.9# 9 X# ten

    f(f) . f() 4Z Df DDA ,E!timatin( %overin( n*m+er!The +ound for the ample error decri+ed in #hapter 3 are in term of/ among other6uantitie/ ome co8ering num+er, 'n thi chapter/ 7e pro8ide etimate for thee co8eringnum+er 7hen 7e take a +all in an &0S a a hypothei pace, Our etimate are gi8en interm of the regularity of the kernel, $ a particular cae/ 7e o+tain the follo7ing,

    Theorem G,H :et *e a co%5act s,*set of&n/ andiam(?):R ma!!/ye? WW! 9 yWW itsdia%eter!i; IfK e Cs( x ) for so%e s 4 and as 5iecewise s%oot *o,ndary# ten tere

    is C 4 de5ending on and s only s,c tat

    (R AnCs

    . #1 7n 7&CA,ii; IfK (x# y) 'e! ] WW! 9 yH$ofor so%e a 4 0# ten# for all 7n 7 RA#

    IKnIiamI? >>AnH( RnH ln /6IK(R),n< X 3A a ln^If# %oreover# contains a c,*e in te sense tat;x_ 9f! 9fXnfor so%e x_ e and f 4 0# ten# for all7 7 n 7 R$#=R8n6ln +6IK(R)# n) 4 ;iln . !Here#His a 5ositive constant de5ending only on a and f!

    4art (i< of TheoremG,H follo7 from Theorem G,G and LemmaG,I, 't ho7 ho7 theco8ering num+er decreae a the inde! of the So+ole8 mooth kernel increae,$cae7here the hypothei of 4art (ii< applie i that of the +o! pline kernel decri+ed in )!ample

    A,HE,.e ho7 thi i o in 4ropoitionG,AG,.hen the kernel i analytic/ +etter than So+ole8 moothne for any inde! _ / one can

    ee from 4art (i< that ln /(IK(r)#n< decay at a rate fater than (Rn)efor any e 4, 0enceone 7ould e!pect a decay rate uch a (ln(Rv

  • 7/21/2019 LEARNING THEORY.rtf

    50/139

    +igger the inde! s/ the higher the regularity of f,II; .heni a u+et of a )uclidean pace &n/ 7e can conider a more general meaure ofregularity, Thi can +e done +y mean of 8ariou order of di8ided di=erence,

    (.")r.3

    f (x 3t)!Let+e a cloed u+et of &nand f: ^&, For r e2/ t e&n/ andx e uch that !/x t/,,,/x rt e / de;ne the divided dierence'n particular/ 7hen r 'H/83f(!< R f (x t) . f (x)!i8ided di=erence can +e ued tocharacterize 8ariou type of function pace, Letr#t' 6x e W !#x t#!!!#x rt e Y,H fWLip*(/LP( c Lip*IH/ ;(>>/the lat +eing kno7n a theAyg%,nd class,

    $gain/ the regularity of a function f eLip*I/ L5())i meaured +y the inde! , The +iggerthe inde! / the higher the regularity of f,

    Af_09X`b_ :R 6anr_ :(H /A

  • 7/21/2019 LEARNING THEORY.rtf

    51/139

    5.2 $hen5 '; and '-n! it is also natural to measure the regularity offunctions in :2I-n> by means of the *ourier transform" Let s 47" The

    fractional consists of functions in :2I-n> such that the

    following norm is %nite:Co)ering nm'ers "or So'o#e)

    smooth *erne#s&ecall that if ) i a %anach pace and R 4/ 7e denote R() ' 6x e) : WW!WW X &Y,'f the pace ) i clear from the conte!t/ 7e imply 7rite R!Lemma G,A :et) c ;(< *e a anac s5ace! For all n# R 4// (R# n) ' /(%i/ &p A %Z Y jR\ (x( j _ A %Z< X##(%9rY

  • 7/21/2019 LEARNING THEORY.rtf

    52/139

    aymptotic +eha8ior of the co8ering num+er for So+ole8 pace i a 7ell9kno7n reult, Fore!ample/ the +all %&(Lip*(/C(U/HVN

  • 7/21/2019 LEARNING THEORY.rtf

    53/139

    n'' 0I" 0I$HH(?< _ f (3i) 0I$ (3iW X IfWW

    j < lWLip*(< lltWWH X KWLip*(

  • 7/21/2019 LEARNING THEORY.rtf

    54/139

    #om+ining thi ine6uality 7ith the fact (cf, TheoremA,D

  • 7/21/2019 LEARNING THEORY.rtf

    55/139

    839 Co)ering nm'ers "or ana#yti! *erne#s

    55D fDU M VlDDUDfD1!7e conclude that fe Lip*(;/ C())and

    DD f+Lip*(CA< M76rlDK+u4*(< DD fDl 9The proof of the theorem i complete, Lemma G,I :et ne H/ _ / TN H/ andx_e -n, Ifc !* Q / lBnand as 5iecewises%oot *o,ndary ten/(R(:i5_(s# C()))# n)X /(C/SS&( Lip* (/ #(U/HVn &ecall that Cs()c LipW(/ C())for any _ , Then/ +y TheoremG,G 7ith s 7 r 7 s H/ the aumption K e Cs( x < implie that IK(R) c yA/AWWy t&(LipW(CA/ C()))!Thi/ together 7ith(52; and LemmaG,I 7ith T 4iam(?

  • 7/21/2019 LEARNING THEORY.rtf

    56/139

    56 8 Estimating !o)ering nm'ers

    n&s!t!"""!sJYlJt . t 2 tl . t3't i eay to check thatatify thee condition,

    ^st=st .H< ...=st . H

  • 7/21/2019 LEARNING THEORY.rtf

    57/139

    839 Co)ering nm'ers "or ana#yti! *erne#s

    57UZ/7 nH $/n."/

    ma! WZ WHX37n(G,BeHC(I(A2 H

  • 7/21/2019 LEARNING THEORY.rtf

    62/139

    ,'t follo7 thata n;@(/) 7n3e( nN^e9(vACHI

  • 7/21/2019 LEARNING THEORY.rtf

    63/139

    . t7

    i#3'" %

    ' YA c3i#3'i8ii#3'i

    xi . x_ x3 . x t/Q t>@ 1'_8.x_i?i3t0;K 8

    i; enote tR (G/G/ , , , / G)eU/HVn, Let gR i?i /i!ie K(BR)!Thenthe lat line +y the de;nition of K$, Since g e IK(R)#7e ha8e'?9!* te IK8(R)!6?6 Qt7'f Qf"#, , , /f/&i an n9net of IK8(R)on8R U/HVn7ith / ' /(IK8(R)# n)#then there i ome3

    eQH/, , , / /&uch that7 n!i?i;I7!iBn>upteU/HVn

    6/G!t/xi . x_ t . f3 (t)7 n,i'"

    Thi mean thatupxe2^CiK 8

    iRHx . x_8xi . x_ t7! 8 t7f3x . x_8 tX n!

    Take t ' ((x . x_)8) H, .hen ! e c x_ U9$CA/ $CAVn/ 7e ha8e teU/HVn, 0enceupxe%2^Ci@ (x . xi< 9 f3i'"x . x _8 t'upxe%3$ciKxt(x) . f3i'Hx . x_8Q t7M n"

    Thi i the ame aThi ho7 that if 7e de;ne f3_(x):R f3(((x . x_)8) t0)#the et f*/.../ f3_&i an n9net of thefunction et Q&RHCiKie IK(R)Y in C ()! Since thi function et i dene in IK(R)#7e ha8e/(IK(R)# n) 7 / ' /(IK8(R)# n)!Iii> Df 6x 7'8CA! $CABG and 6!,,,! g/is an nnet of IK(R)with / 1 /(IK(R)# n)#

  • 7/21/2019 LEARNING THEORY.rtf

    64/139

    then! foreach g e R#wecan%nd some3 eH!,,,!/J suchthat WWg g3==C() < n!Let f 12$l1"cKt8 e IK8(R)!Then! for any t e 8#% %%

    0 (t) 1 22, ;i#(8(t ' ti)) 1 ^2;i#(3 ' ?i) 1 D4 ;i!xi (3),i'" i'" i'"

    7here3 1 3(t) 1 3_ 8(t 9 t0) e ?and3i 1 3_ 8(ti 9t0) e ?,'tfollo7 from thi e!preion that'' f'l1$ ' ^$ CiC3K8(ti/ t3< '^$ CiC3@(8(ti . t3)) i!3?6 i!3?6%' 2$ CiC3K(xi/x3) 'WWgWW1 X &/i!3?6where g? ?i /!?i So, ge K(R)and weha!e llg g3==C() ' sp!e?Kg(!< g3(!

  • 7/21/2019 LEARNING THEORY.rtf

    65/139

    /n.HCA(e

  • 7/21/2019 LEARNING THEORY.rtf

    66/139

    X Co^V (nH/8n.3 /;@() d 7 Co e./$X $Conn$A2n9He9?2CA,#om+ining the t7o +ound a+o8e/ 7e ha8eH2H

    H/ H Kn /;@ (/) 7K#o2n9HCA] K#o23n.^$ 7 C/n83n /Since 4 n(p Aln K

  • 7/21/2019 LEARNING THEORY.rtf

    67/139

    .N LoCer bounds 0or /o*ering numbers 67

    ?Co8/$7 y 7 ?Co8(/9H

  • 7/21/2019 LEARNING THEORY.rtf

    68/139

    68 8 Estimating !o)ering nm'ers

    appro!imation error,e;nitionG,HD Let! :? I!H/x%&cl,.eaythatuJR H iaetofnodal f,nctionsaociated7ith the nodex"#/x%if Ui epan(xH/ , , , / Km)and,i (x3

  • 7/21/2019 LEARNING THEORY.rtf

    69/139

    .N LoCer bounds 0or /o*ering numbers 69

    for all n 40 satisfying4roof, %y 4ropoitionG,A/the et of nodal function 6u3(3)&3'"aociated 7ith ! e!it andcan +e e!preed +y%

    ui(3) ' (!U!rj& (!

  • 7/21/2019 LEARNING THEORY.rtf

    70/139

    70 8 Estimating !o)ering nm'ers

    ia.;d ,%ounding from +elo7 the integral o8er the u+etJ./n/ /nXn#7e ee that;c;KU!Vc _ ($n).n/n inf @(n)neJ./n #/ n Xncae

    ia.d&n'''cWAA(?@/n inf @(),4(/< ] eJ./n#/ nXn't follo7 that the mallet eigen8alue of the matri! KU!V i at leat/n inf @ ()/ eJ./ n / n Xnfrom 7hich the etimate for the norm of the in8ere matri! follo7,

    #om+ining TheoremG,AH and 4ropoitionG,AA/ 7e o+tain the follo7ing reult,Theorem 7.$=

  • 7/21/2019 LEARNING THEORY.rtf

    71/139

    4roof, %y )!ampleA,HE/the Fourier tranform @of @(x) ' (9* 9)(x) ati;eTo get themoothne of K7e etimate the decay of @, Firt 7e o+er8e that the function t (in t)tati;e/ for all te (9H/Hondary) for the e&tension of fnction classes on a >onded domain?to the correspondin" classes onRn.

    %stimatin" co!erin" nm>ers for !arios fnction spaces is a standard theme in the fields of fnction spaces ?48@ and

    appro&imation theory?8A,[email protected] pper and lower >onds(7.$) for "eneraliBed ipschitB spaces and, more "enerally, Trie>el+iBor-in spaces can >e fond in?48@.

    http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark135http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark135http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark633http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark633http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark598http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark598http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark598http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark617http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark617http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark239http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark239http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark239http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark135http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark633http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark598http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark617http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark239http://var/www/apps/conversion/tmp/scratch_4/HYPERLINK%23bookmark570
  • 7/21/2019 LEARNING THEORY.rtf

    72/139

    The pper >onds for co!erin" nm>ers of >alls of RKHSs associated with So>ole! smooth -ernels descri>ed in Section 7.$

    (Theorem7.A)and the lower >onds "i!en in Section7.4 (Theorem7.$1)can >e fond in ?17;@.The >onds for analytic translation

    in!ariant -ernels discssed in Section7.= are ta-en from?177@.

    The >onds (7.$=) for the Corier transform of the in!erse mltiDadrics can >e fond in ?A$@and ?107@, where properties of

    nodal fnctions and :roposition7.$0 can also >e fond.

    Cor estimates of smoothness of "eneral >o& splines sharper than those in :roposition7.$7, see ?41@.

    Lo(arithmi% #e%ay o$ the approximation error#n 6hapter 4 we characteriBed the re"ression fnctions and -ernels for which the appro&imation error has a decay of order Q(R.e).This characteriBation was in terms of the inte"ral operator :Kand interpolation spaces. #n this chapter we contine thisdiscssion.

    Ee first show, in Theorem;.$,that for a C -ernel K(and nder a mild condition on5) the appro&imation error can decayas Q(R.e)only if f5 is C as well. Since the latter is too stron" a reDirement on f5, we now focs on re"ression fnctions and-ernels for which a lo"arithmic decay in the appro&imation error holds.

  • 7/21/2019 LEARNING THEORY.rtf

    73/139

    :roof. Since5dominate the Le+egue meaure57e ha8e that5i nondegenerate, 0ence/H ' HK+y &emark K,HB, %y #orollaryK,HE/our decay aumption implie thatf5e (:5,3()#HK>C(AQ>, To do o/ 7e take r eF/ r 4 $s($ 0)0 4/and t eRn, Let g e HKandx e r,t, Then8rtfp(x) ' 8rt(fp.g)(x)8rtg(x) 'J (.")r 2(fp.g)(x 3t)8rtg(x)!3 ' 1-C

    Let t ' $s($ 0)0, >ing the triangle ine6uality and the de;nition ofup*(tCA/#=y Ar6WW KKLip*It> llg D1WWtt;Y

    9 8# (Ar

    wAr * `*`_,Lemma B,H3 For all y 4/''fr'k X 1T(y)y andWWfYWW[X CKyV(y)y,4roof, Since fYi a minimizer of (B,A

  • 7/21/2019 LEARNING THEORY.rtf

    92/139

    Aclog(HCG Q =0p=3@ 'y>J"Since WfWWOTX CKWfWWX CKRand Wf5(x)= 7M almot e8ery7here/ 7e ;ndthat(

  • 7/21/2019 LEARNING THEORY.rtf

    93/139

    Vn N 7"(A.;)

    Since an n(A(M& #&;))9co8ering of %i yield an n(A(M Q CKR))' co8ering of R#and 8ice8era/ 7e ee that for any n 4/ an n(A(M& Q #&;))9co8ering of %6pro8ide an n9co8ering ofFR.That i/

    %ut R 4M and A(H CK) 7 (CK 3

  • 7/21/2019 LEARNING THEORY.rtf

    94/139

    (KBMA IM< log (ACBing the6uantity *_(m# ;%YR$v_(%# h$) K# loKAC 3M HT(Y)(AKMA 3M< log (ACG;M J;!the lat almot urely, Therefore/ WWfzWWX MCCy for almot all z eA%!

    Lemma B,HE ay that D(9.UV) ' A%up to a et of meaure zero (7e ignore thi null etlater

  • 7/21/2019 LEARNING THEORY.rtf

    95/139

    DD fB''M a%R *%#Vz e D(R) = VR#7ith a%and *%a gi8en in our tatement, 'n other 7ord/ D(R) = VRc D (a%R *%

  • 7/21/2019 LEARNING THEORY.rtf

    96/139

    ;en# tere exist real n,%*ers 5# # not *ot Bero# s,c tat

    5TF (c) TH (c) ', (B,Dlem of minimiBin" on 60 e HKW Wf WWKR 1Y. 5y 6orollary'.A, we can

    minimiBe on 60 e HK,zW Wf WWKR HY and ta-e%0BR * ;B3!x,a c3yiK(xi!x)N7H

    = %m /4'4 HczR ( c z / i / c z / m < R argminc&m c;KU!VcRHSince

  • 7/21/2019 LEARNING THEORY.rtf

    103/139

    H% %cBB/ i / , , , / cB/m< R argmin 9 75' yK(xi/3)c3'csR% %i'i 'i% y 26 /!(3i, ?3)/3. ?6

    For each i 'H / , , , /m / 3'tyK(xi#x3)c3 '(yTU!Vc< i a con8e!function of c e&m, 'n addition/ ince Ki a Mercer kernel/ the ramian matri! KU!V i poiti8eemide;nite, Therefore/ the function c cT U!Vc i con8e!, Thu/ czi the minimizer of acon8e! function,

    &egularized clai;er aociated 7ith general lo function are dicued in the ne!tchapter, 'n particular/ 7e ho7 there that the leat 6uare lo W yield a atifactoryalgorithm from the point of 8ie7 of con8ergence rate in it error analyi, 0ere 7e retrictour e!poition to a pecial lo/ called inge loss#h(t) 1(H 9 t) 1ma!QH 9 t/ Y, (D,E# and all n 4 ># tene_(%# ?) 7BI(H #8)ma!

    jlo"

  • 7/21/2019 LEARNING THEORY.rtf

    112/139

    Takeand e '$(log %)%.Then/ for %_ ma!QKC$/ 3Y/$ (log %)5 H K

    _ ] and log log % 7Alog m, % % 8't follo7 thath ($lvm%< X #log8 log m< 9 (log %)65Co log X 9(log %)log HX log G,hSince i decreaing/ 7e ha8e$(log %)5e (%# h)X e* X , %

    'n order to o+tain etimate for the miclai;cation error from the generalization error/7e need to compare R7ith , Thi i imple,

    Theorem '.$1 For any %eas,re 5 and any %eas,ra*le f,nction f: -/ &(gn(f))9 R(fc)X

    ( f ) 9 (fc)!

    4roof, enotec' 6x e \gn(f )(x) ' fc(x)&!%y the de;nition of the miclai;cationerror/R(sgn(f)) 9 R(fc)R 4ro+(y R gn(f )(x)W !)

    ?c Y.4ro+( y ' fc(x)Wx) d5,For a pointx e c/ 7e kno7 that 4ro+@(y 'gn(f )(x)Wx) '4ro+@(y R fc(x) \ x)!0ence/4ro+@(y 'gn(f )(x) \ x) .4ro+@(y ' fc(x) \ x) ' f5(x) or .f5(x)according to 7hetherf5(x)_, 't follo7 that W f5(x)\ '4ro+@(y ' fc(!) Wx) .4ro+@(y ' fc(x)Wx)and/ therefore/&(gn(f)) . R(fc) ' =f P(x)=d5, (D,HK