examples of hilbert spaces lecture 11a

Examples of Hilbert spaces

Lecture 11A.

MA 751Part 4

Measurability and Hilbert Spaces

1. Some set theory in ‘:

Def. 1: A in of radius withball ‘ %: !center is a seta − ‘:

F œ Ö − À m m ×x x a‘ %:

of points within of the fixed point % a.

Def 2: A set is if it is a union ofK § ‘: openballs in .‘:

Def. 3: Given a set , the K § ‘: boundary`K K of is the set of all points suchx − ‘:

that every ball centered at contains pointsxin and also the complement ~ .K K

[An open set can also be defined as a setthat does not contain its own boundary]

Def. 4: A set is if it containsJ § ‘: closed its boundary

[A closed set can also be defined as a setwhose complement in is open]‘:

Def. 5: A set is a if it consistsV § ‘: regionof an open set together with some partK(or maybe none) of its boundary.

2. Measurable functions and sets

Let be the set of continuous functions onG‘. Let be the set of Q measurablefunctions:

Def. 6: A subset has if itE § ‘ measure !can be covered by arbitrarily small balls.

That is, for any number no matter how% !small, there is a set of open balls F ßF ßá" #

whose union contains but whoseEvolumes add up to less than .%

Def. 7: A statement about points in holds‘:

almost everywhere a.e. almost all ( ) (or for real numbers) if it holds for all x − ‘:

except for a set of measure 0.

Def. 8: The set of Q measurable functionson (or an interval of ) is the set of‘ ‘:

functions that are limits of continuousfunctions, almost everywhere i.e.,

` œ 0Ð Ñ Àš x there are continuous functions0 Ð Ñ 0Ð Ñ œ 0 Ð Ñ×8 8

8Ä∞x x xsuch that for almostlim

all x − ‘›

Measureable functions

Fig. 1: the function as a limit of continuous functions0ÐBÑ


In fact, lots of functions (even discontinuousones) can be viewed as limits of continuousfunctions.


For example

0ÐBÑ œ M ÐBÑ œ" B − Ò!ß "Ó!Ò!ß"Ó œ if

otherwise .

is a discontinuous but measurable function.

Note: ordinary notion of integral is difficult touse for functions as complicated asmeasurable functions.


Definition 9: A is (Lebesgue)set I § ‘:

measurable if its indicator function

M Ð Ñ ´" B − E! B Â EE x œ if

if is a measurable

function.

[Equivalent on to our previous definition of‘:

measurability on any space]


10. Integration of measurable functions

To integrate measurable functions (Lebesgueintegral) first need:


Theorem 1: Given a non-negativemeasurable function , there is0 À Ä‘ ‘:

always an increasing sequence e f0 Ð8 8œ"∞xÑ

of non-negative continuous functions (i.e.with the property that for all0 Ð Ñ 0 Ð Ñ8" 8x xx x) which converges to almost0Ð Ñeverywhere.


Def. 11: xIf is a positive0Ð Ñ !measurable function, define

( (‘ ‘: :

0 ÐBÑ .B œ 0 ÐBÑ .Bßlim8Ä∞

8

where is any increasing sequence of0 ÐBÑ8

nonnegative continuous functions whichconverges to a.e.0


[note we know the value of the integrals ofthe continuous functions - they are0 Ð Ñ8 xordinary Riemann integrals on ]‘:


Fig. 2: sequence of continuous functions 0 Ð8 xÑincreasing to 0Ð Ñx


Def 12: To find the integral of a negativemeasurable function , we just compute the0integral of (which is positive), and put 0a minus sign in front of it.


Since every function is the sum of a0positive plus a negative function

0 œ 0 0 ß" #

the integral of is defined as0

( ( (∞ ∞ ∞

∞ ∞ ∞

" #0 .B œ 0 .B 0 .BÞ

[Thus we now know how to define the integralof an arbitrary function]


Ex 1: if looks like:0ÐBÑ

fig 3: has positive and negative part0ÐBÑ


Then integral of is integral of a positive0ÐBÑplus a negative function:

fig 4: now sum the areas between (or ) and the x0 0" #

axis


Note we can show pretty easily all theproperties of integrals we are used to alsohold for this more general Lebesgueintegral.

For example, we still have

( ( (Ð0 1Ñ .B 0 .B 1 .Bß = + etc.


[For now we will assume the above andrelated facts already known to be true forstandard Riemann integrals]

Hilbert spaces of functions

2. New Hilbert spaces:

Consider the space

L œ P Ò Ó# 1 1,


œ 0ÐBÑ Ò ß Óœmeasurable functions on 1 1

with (

#

1

1

0 ÐBÑ.B ∞ Þ

Can show that if then and 0 ß 1 − L 0 1 -0are in if is a constant (exercise). MoreL -generally is a vector space.L


Further, we can define an inner product on L(known as the inner product):P#

Ø0 ß 1Ù œ Ø0 ß 1Ù œ 0ÐBÑ 1ÐBÑ .BÞP

# (1

1

This satisfies conditions (1) - (4) of an innerproduct.

Can also show that is complete (i.e., everyLCauchy sequence converges to aÖ0 ×8function in ).0 L


Thus is a Hilbert space.L

Note: we always consider two measurablefunctions the same if they differ just at afinite number of points


fig 5: two functions and which differ at a finite0 0" #

collection of points.


Can show: such functions and have0 0" #

the same integral [certainly area isunchanged]; furthermore,

( l0 0 l .B œ !" # (1)

Def 13: More generally we will consider twofunctions to be the same or if equivalent (1)holds


[Equivalently, holds iff differ on(1) 0 Ð Ñß 0 Ð Ñ" #x xa set of measure ]!

Function space basis expansions

3. Fourier series: an example in Hilbertspaces

Ex 2: Consider Hilbert spaceL œ P Ò ß Óß# 1 1 with usual inner product

Ø0 ß 1Ù œ Ø0 ß 1Ù œ 0ÐBÑ1ÐBÑ.BÞP

# ( 1

1


Consider set of vectors

F œ Ö 8Bl 8 œ "ß #ßá×š sin

together with Ö 8Bl 8 œ !ß "ß #ßá×cos ›œ Ö"ß Bß Bß #Bß #Bß á×cos sin cos sin


We will show this is an orthogonal set. First:show that is orthogonal to all other"vectors:

Ø"ß 8BÙ œ 8B .B œ ! Ða 8 œ "ß #ßá Ñcos cos(1

1

Ø"ß 8BÙ œ 8B .B œ ! Ða8 œ "ß #ßá Ñsin sin(1

1


Now show that (for example) cos is&Borthogonal to all other vectors:

, Ø &B 8BÙ œ &B 8B œ ! acos sin cos sin'1

1

8 œ "ß #ßá


To show above we use the trig identities:

cos cos cos cos + , œ Ð+ ,Ñ Ð+ ,Ñ"

#c d

and

sin cos sin sin + , œ Ò Ð+ ,Ñ Ð+ ,ÑÓ"

#

sin sin cos cos+ , œ Ð+ ,Ñ Ð+ ,Ñ"

#c d.


[Above holds similarly for any other cos .]7B

Similarly, we also have:

Ø &Bß 8BÙ œ &B 8B .B œ ! acos cos cos cos '1

1

8 Á &

Can similarly show that sin is also7Borthogonal to all other vectors.


Thus these vectors form a orthogonal set ofvectors. Are they orthonormal?

m 8Bm œ Ð 8Bß 8BÑ œ 8B.Bcos cos cos cos# #

(

1

1

œ .B œ" #8B

#(1

1 cos 1


Thus:

m 8Bm œ Þcos È1

Thus has length .1È1

cos 8B "

Similarly, has length 1È1sin 8B "

And: has length .1È#1† " "


Thus:

šÈ È È È1 1 1 1, , , ,

#B B #B

1 1 1 1cos sin cos

1 1 1 , , È È È ›

1 1 1sin cos sin#B $Bß $B á

œ Ö@ ß @ ß @ ßá×" # $


Are an orthonormal (and hence lin ind ) setÞ Þfor the space of cont. functions.

Can show: they are a basis. So any vector0ÐBÑ can be written in the form:

0ÐBÑ œ - @ - @ á" " # #

œ - - B - B" " "

#" # $È È È1 1 1

cos sin


- #B - #B á" "

% &È È1 1cos sin

œ + B , B + #B+

#!

" " #cos sin cos

, #B á# sin

[Fourier series of a function]


Notice that

- œ Ð0ÐBÑß #BÑ œ 0ÐBÑ #B .B" "

%È È(

1 1cos cos

1

1

œ 0ÐBÑ #B .B"È (1 1

1

cos

Ê + œ œ 0ÐBÑ #B .B#- "

%È1 1 1

1 ' cos


Generally:

+ œ 0ÐBÑ 8B .B"

81

(1

1

cos

, œ 0ÐBÑ 8B .BÞ"

81

(1

1

sin

[Using above linear algebra have no needto do advanced calculus for theory ofFourier series!]


Ex: 0ÐBÑ œ #B

fig 6


#B œ + B , B + #B+

#!

" " #cos sin cos

, #B á# sin

, œ #B &B .B"

&1

(1

1

sin


œ .B# B &B &B

& &1 Ÿº ðóóóóñóóóóò(cos cos 1

1 1

1

!

œ œ# # %

& &1

1œ


, œ %

''

Generally:

, œ #B 8B œ" 8

88

%8

%8

1(

1

1

cos if evenif odd

Can show + œ !Þ8


Thus

#B œ , B , #B , $B á

ðóóóóóóóñóóóóóóóòï" # $sin sin sinW ÐBÑ"

W ÐBÑ#

œ % Ò" † B † #B † $B áÓ" "

# $sin sin sin


Lecture 11B.

Part 5 (MA 751)

Statistical machine learning and kernelmethods

Primary references:John Shawe-Taylor and Nello Cristianini,

Kernel Methods for Pattern Analysis

Christopher Burges, A tutorial on supportvector machines for pattern recognition,Data Mining and Knowledge Discovery 2,121–167 (1998).

Other references:Aronszajn Theory of reproducing kernels.ß

Transactions of the American MathematicalSociety, 686, 337-404, 1950.

Felipe Cucker and Steve Smale, On themathematical foundations of learning.Bulletin of the American MathematicalSociety, 2002.

Teo Evgeniou, Massimo Pontil and TomasoPoggio, Regularization Networks andSupport Vector Machines Advances inComputational Mathematics, 2000.

1. Linear functionals

Definition 1. Given a vector space , weZdefine a map from to the real0 À Z Ä Z‘numbers to be a .functional

If is , i.e., if for real we have0 +ß ,linear

0Ð+ , Ñ œ +0Ð Ñ ,0Ð Ñßx y x y

then we say is a 0 linear functional.

If is an inner product space (so each hasZ va length ), we say that is ifm m 0v bounded

l0Ð Ñl Ÿ Gm mx x

for some number and all .G ! − Zx

Reproducing kernel Hilbert spaces

2. Reproducing Kernel Hilbert spaces:

Def. 1. A matrix is if8 ‚ 8 Q symmetric Q œ Q 3ß 4Þ34 43 for all

A symmetric is if all of itsQ positive eigenvalues are non-negative.


Equivalently is positive ifQ

Ø ßQ Ù ´ Q !a a a aX

for all vectors , with , thea œ Ø † † Ù

++ã+

Ô ×Ö ÙÖ ÙÕ Ø

"

#

8

standard inner product on .‘8


Definition 2: Let be compact (i.e., a\ © ‘:

closed bounded subset). A (real)reproducing kernel Hilbert space (RKHS) [on is a Hilbert space of functions on \ \(i.e., a complete collection of functionswhich is closed under addition and scalarmult, and for which an inner product isdefined)Þ


[ also needs the property: for any fixedx x− \ À Ä, the evaluation functional ‡ [ ‘defined by

x x‡Ð0Ñ œ 0Ð Ñ

is a bounded linear functional on .[


Definition 3: We define a to be akernel function which isO À \ ‚\ Ä ‘symmetric, i.e.,

OÐ ß Ñ œ OÐ ß Ñx y y x

for .x yß − \


We say is if for any fixed collectionO positive

Ö ßá ß × § \x x" 8 ,

the matrix8 ‚ 8

K x xœ ÐO Ñ ´ OÐ ß Ñ34 3 4

is positive (i.e., non-negative).

Kernel existence

We now have the reason these are calledRKHS:

Kernel existence

Theorem 1: Given a reproducing kernelHilbert space of functions on ,[ ‘\ § .

there exists a unique symmetric positivekernel function such that for allOÐ ß Ñx y0 − ß[

0Ð Ñ œ Ø0Ð ÑßOÐ ßx x† † ÑÙ[

(inner product above is in the variable ;†x is fixed).

Kernel existence

Note this means that evaluation of at fixed 0 xis equivalent to taking inner product of 0Ð † Ñwith the fixed function (in variableOÐ † ß Ñx† with fixed)x

Kernel existence

Proof (please look at this on your own): Forany fixed , recall is a boundedx x− \ ‡

linear functional on .[

Kernel existence

By the thereRiesz Representation theorem 1

exists a fixed function, call it suchO Ð † Ñxthat for all (recall is fixed, now is0 − 0[ xvarying)

0Ð Ñ œ Ð0Ñ œ Ø0Ð † ÑßO Ð † ÑÙÞx x‡x (1)

(all inner products are in in , i.e.,[ß Pnot #

Ø0 ß 1Ù œ Ø0 ß 1Ù[).

1Riesz Representation Theorem: If is a bounded linear functional on , there exists a unique 9 [ ‘ [ [À Ä −ysuch that .a − ß Ð Ñ œ Ø ß Ùx x y x[ 9

Kernel existence

That is, evaluation of at is equivalent to0 xan inner product with the function .Ox

Define Note by (1), theOÐ ß Ñ œ O Ð ÑÞx y yxfunctions and satisfyO Ð † Ñ O Ð Ñx y †

ØO Ð † ÑßO Ð ÑÙ œ O Ð Ñ œ O Ð Ñx y y x† x y ,

so is symmetric.OÐ ß Ñx y

Kernel existence

To prove is positive definite: letOÐ ß Ñx yÖ ßá ß ×x x" 8 be a fixed collection. IfO ´ OÐ ß Ñ œ ÐO Ñ34 3 4 34x x K, then if is a matrix

and c œ ß

--ã-

Ô ×Ö ÙÖ ÙÕ Ø

"

#

8

Ø ß Ù ´ œ - - OÐ ß Ñc Kc c Kc x xX

3ß4œ"

8

3 4 3 4

Kernel existence

œ - - ØO Ð † ÑßO Ð † ÑÙ3ß4œ"

8

3 4 x x3 4

œ - O Ð † Ñß - O Ð † Ñ¤ ¥3œ" 4œ"

8 8

3 4x x3 4

œ - O Ð † Ñ !¾ ¾3œ"

8

3

#

x3

[

.

Kernel existence

Definition 4: We call the above kernelOÐ ß Ñx y the of .reproducing kernel [

Definition 5: A is a positiveMercer kernel definite kernel which is alsoOÐ ß Ñx ycontinuous as a function of and andx ybounded.

Kernel existence

Def. 6: For a continuous function on a0compact set we define\ § ‘:

m0m œ l0Ð ÑlÞ∞−\

maxx

x

[Recall here is assumed a closed\ § ‘:

bounded set]

Kernel existence

Theorem 2:(i) For every Mercer kernel , O À \ ‚\ Ä ‘

there exists a unique Hilbert space (an[RKHS) of functions on such that is its\ Oreproducing kernel.

(ii) Moreover, this consists of continuous[functions, and for any 0 − [

m0m Ÿ Q m0m∞ O [,

where | |Q œ OÐ ß Ñ ÞOß −\maxx y

x y

Kernel existence

Proof (please look at this on your own): LetOÐ ß Ñ À \ ‚\ Äx y ‘ be a Mercer kernel.We will construct a reproducing kernelHilbert space with reproducing kernel [ Oas follows.

Define (below span means finite span; noinfinite sums)

Kernel existence

[! −\œ ÖO Ð † Ñ×span x x

is any finiteœ - O Ð † Ñ À Ö × § \š3

3 3 3x3x

subsetà - − Þ3 ‘ ›

Kernel existence

Now we define inner product forØ0 ß 1Ù0 ß 1 − Þ[! Assume

0Ð † Ñ œ + O Ð † Ñß 1Ð † Ñ œ , O Ð † ÑÞ3œ" 3œ"

6 6

3 3x x3 3

[Note we may assume both use same0 ß 1set of since if not we may take a unionÖ ×x3

without loss]. [Note again that here ]Ø † ß † Ù œ Ø † ß † Ù[

Kernel existence

Then defining ,ØO Ð † ÑßO Ð † ÑÙ œ OÐ ß Ñx y x ydefine

Ø0Ð † Ñß 1Ð † ÑÙ

œ + OÐ ß † Ñß , OÐ ß † Ñ¤ ¥3œ" 4œ"

6 6

3 3 4 4x x

Kernel existence

œ + + ØOÐ ß † ÑßOÐ ß † ÑÙ3ß4œ"

6

3 4 3 4x x

œ + , OÐ ß ÑÞ3ß4œ"

6

3 4 3 4x x

Kernel existence

Easy to check that with the above innerproduct is an inner product space (i.e.,[!

satisfies properties ). Now formÐ Ñ Ð Ña dthe of this space into the completion2

(complete) Hilbert space [Þ2The completion of a non-complete inner product space space is the (unique) smallest complete inner product[!

(Hilbert) space which contains . That is, , the inner product on is the same as on , and there is[ [ [ [ [ [! ! !§no smaller complete Hilbert space which contains .[!

Example 1: [ œ œ Ð+ ß + ßá Ñ l+ l ∞ Ø ß Ù œ + ,œ ºa a b" # 3 3 33œ" 3œ"

∞ ∞# with inner product was discussed in class.

The inner product space

all but a finite number of are 0[ [ [! " # 3œ Ð+ ß + ßá Ñ − + §œ º is an example of an incomplete space. is its completion.[Example 2: [ 1 1 [œ P Ð ß Ñ 0ÐB −# with standard inner product for functions. We know if ) then

0ÐBÑ œ + 5B , 5B 0 −+#

5œ"

∞

5 5 !! cos sin . Define to be all for which the above sum is (i.e., all but a[ [ finite

finite number of terms are 0). Then is the completion of .[ [!

Kernel existence

Note that for as above0 œ + O Ð † Ñ − À3

3 !x3[

Kernel existence

l0Ð Ñl œ Ø0Ð † ÑßOÐ ß † ÑÙ Ÿ m0Ð † ÑmmOÐ ß † Ñmx x x

œ m0m ØOÐ ß † ÑßOÐ ß † ÑÙÈ x x

œ m0m OÐ ß ÑðóóñóóòÈ x xQO

œ Q m0m ÞO [

Kernel existence

[Note again here we write bym0m œ m0m[definition; similarly ]Ø0 ß 1Ù œ Ø0 ß 1Ù[

The above shows that the identity mappingM À Ä GÐ\Ñ[! (the latter is the continuousfunctions on ) is bounded.\

By this we mean that maps function as aM 0function in to itself as a function in[!

GÐ\Ñ GÐ\Ñ 0; in norm of ism0m ´ l0ÐBÑl∞

B−\sup .

Kernel existence

By bounded we mean thatmM0m œ m0m Ÿ .m0m∞ ∞ [ for some constant. !.

Thus any Cauchy sequence in is also[!

Cauchy in and so has limit asGÐ\Ñfunction in .GÐ\Ñ

So it follows easily that the completion of[[! exists as a subset of .GÐ\Ñ

Kernel existence

That is a reproducing kernel for followsO [by approximation from the fact that Oworks as a reproducing kernel in [!Þ

Regularization methods

3. Regularization methods for choosing 0

Finding desired from training set0Ð Ñx

R0 ´ œ ÖÐ ß C Ñ×g x3 3 3œ"R

is an : a unique operatorill-posed problemR R" does not exist because is not one toone.


Need to combine both:

(a) Data (posterior or R0 œ g a posterioriinformation)

(b) Prior or information, e.g., " isa priori 0smooth", e.g. expressing a preference forsmooth over wiggly solutions seen earlier.

How to incorporate both? Using Tikhonovregularization methods.


We introduce a regularization loss functionalN Ð0Ñ representing penalty (loss) for choiceof an "unrealistic" such as that in 0 (a)above.

Assume we want to find correct function0 Ð Ñß! x from data

R0 Ð Ñ œ ÐÐ C Ñßá ß Ð ß C ÑÑ œ! " " 8 8x x x, g


Suppose we are given as a candidate0Ð Ñxfor approximating from the information0 Ð Ñ! xin g Þ

We score as a good or bad approximation0based on a combination of

(a) Its error on the known points ,Ö ×x3 3œ"8

(b) Its "plausibility", i.e., how low the penaltyN Ð0Ñ is.


These are combined in minimization of theLagrangian

_Ð0Ñ œ PÐ0Ð Ñß C Ñ N Ð0ÑÞ"

83œ"

8

3 3x

Here measures loss wheneverPÐ0Ð Ñß C Ñx3 3

predicted is far from actual value ,0Ð Ñ Cx3 3

e.g.

PÐ0Ð Ñß C Ñ œ l0Ð Ñ C l Þx x3 3 3 3#


And measures the i.e., aN Ð0Ñ a priori loss,measure of discrepancy between theprospective choice and our prior0expectation about .0

Examples: Regularization methods

Example:

N Ð0Ñ œ mE0m œ . lE0Ð Ñl ßP# #

# ( x x

where hereE0 œ 0 0à?

?0 œ á Þ` 0 ` 0`B `B

# #

"# #

:

Note and thus measures the degree?0 N Ð0Ñof non-smoothness that has (i.e., we0prefer smoother functions a priori).


Example 3: Consider case N Ð0Ñ œ mE0m#

above. The norm

m0m œ mE0m[ P#

œ reproducing kernel Hilbert space norm(at least if dimension is small)..

That is, this norm comes from an innerproduct , andØ0 ß 1Ù œ ÐE0ÑÐBÑÐE1ÑÐBÑ.B'

\

with this inner product is an RKHS.[


If this is the case, in general things becomeeasier.


Example 4: In the case , .0 œ 0ÐBÑ B − ‘"

Suppose we choose:

E0 œ 0 0 œ " 0ß. .

.B .B

# #

# #Œwe have


N Ð0Ñ œ mE0m œ " 0 .Bß.

.B#

#

#

#( ” •Œand is a measure of "lack ofmE0msmoothness" of .0


4. More about using the Laplacian tomeasure smoothness (Sobolevsmoothness)

Basic definitions: Recall the Laplacianoperator on a function on ? ‘0 :

0Ð Ñ œ 0ÐB ßá ß B Ñx " :

is defined by


?0 œ 0 á 0Þ` `

`B `B

# #

"#

:#

Using the Laplacian for kernels

For an even integer, we can define the= !Sobolev space by:L=

L œ Ö0 − P Ð Ñ À Ð" Ñ 0 − P Ð Ñ×= # . =Î# # :‘ ? ‘ .

This is the set of functions in (i.e.0 P Ð Ñ# :‘square integrable functions) which are stillin after taking the derivativeP Ð Ñ# :‘operation , i.e., repeatedÐ" Ñ ÐM Ñ? ?=Î#

=Î# " œ M times (operator is always theidentity operator).


For define the new inner product0 ß 1 − L=

Ø0 ß 1Ù œ ØÐ "Ñ 0 ß Ð "Ñ 1Ù àL=Î# =Î#

P= #? ?

[note ]Ø2Ð Ñß 5Ð ÑÙ œ 2Ð Ñ5Ð Ñ.x x x x xP \# '


Can show that is an RKHS withL=

reproducing kernel

OÐ Ñ œ"

Ðl l "Ñz Y"

# =Œ=

(1)

where denotes the inverse FourierY"

transform. The function is a"Ðl l "Ñ= # =

function on where= œ Ð ßá ß Ñ − ß= = ‘" ::

l l œ á Þ= # # #" := =


Fig 7: in one dimension - a smooth kernelOÐ Ñz


OÐ Ñz is called a radial basis function.

Note: the kernel (as function of 2OÐ ß Ñx yvariables) is defined in terms of above byO

OÐ ß Ñ œ OÐ ÑÞx y x y


The Representer Theorem for RKHS

1. Application: using RKHS forregularization

Assume again we have unknown functionC œ 0Ð Ñ \ §x on , with only data‘:

R0 œ ÐÐ C Ñßá ß Ð ß C ÑÑ œx x" " 8 8ß g .

To find the best guess for , approximate0 0s

it by the minimizer

RKHS and regularization

0 œ m0Ð Ñ C m m0ms "

8arg min0−L 3œ"

8

3 3# #

L=

=Ÿx - (1a)

where can be some constant.-


We seek which balances minimizing0

3œ"

8

3 3#m0Ð Ñ C m ßx

i.e., the data error, with minimizing , i.e.,m0m#L=

maximizing the smoothness.

The solution to such a problem will look likethis:


It will compromise between fitting the data(which may have error) and trying to be

smooth.


The amazing thing: 0s can be foundexplicitly using the above radial basisfunctions.


2. Solving the minimization

Now consider general version optimizationproblem with a space of functions (1a) [that is an RKHS.

Claim we can solve it explicitly.

To see this works in general for RKHS, returnto general problem:


General problem: Given unknown0 − œ[ RKHS, try to find "best"approximation to fitting the data0 0s

R0 ´ ÐÐ ß C Ñßá ß Ð ß C ÑÑx x" " 8 8 , but ALSOsatisfying a priori knowledge that ism0 m! [

small (e.g. so is smooth).0


Specifically, want to find

arg min 0− 3œ"

8

3 3#

[[

"

8PÐ0Ð Ñß C Ñ m0m Þx - (2)

Note we can have, e.g.,

PÐ0Ð Ñß C Ñ œ Ð0Ð Ñ C Ñ Þx x3 3 3 3#


In that case

3œ" 3œ"

8 8

3 3 3 3#PÐ0Ð Ñß C Ñ œ Ð0Ð Ñ C Ñ œx x squared error

Consider the general case , with arbitrary(2)error measure . We have theP


Representer Theorem: E solution of theTikhonov optimization problem can be(2)written

0ÐBÑ œ + OÐ ß Ñßs

3œ"

8

3 3x x (3)

where is the reproducing kernel of theORKHS .[


Important theorem: thus only need to find 8numbers to optimize infinite dimensional+3problem above.(2)

Proof: Use calculus of variations.

If a minimizer of (2) exists, now consider0" any .1 − [

Assuming derivatives with respect to exist:%

[again all norms and inner products are in ][

Representer theorem proof

! œ PÐÐ0 1ÑÐ Ñß C Ñ m0 1m. "

. 8%% - % º

3œ"

8

" 3 3 "#

œ!

x [%


œ Ð0 Ð Ñß C Ñ † 1Ð Ñ" `P

8 `0 Ð Ñ3œ"

8

" 3" 3 3 3x x x

Ø0 ß 0 Ù # Ø0 ß 1Ù Ø1ß 1Ù.

.- % %

%˜ ™º" " "

#

œ!%

œ P Ð0 Ð Ñß C Ñ † 1Ð Ñ # Ø0 ß 1Ùß"

83œ"

8

" " 3 3 3 "x x -


where and all innerP Ð+ß ,Ñ œ PÐ+ß ,Ñ"``+

products are in [Þ

Since the above is true for all it1 − ß[follows that if we let we get1 œ Ox

(recall ):O Ð Ñ ´ OÐ ß Ñx x x x3 3


! œ P Ð0 Ð Ñß C ÑO Ð Ñ # Ø0 ßO Ù"

83œ"

8

" " 3 3 3 "x xx x-

œ P Ð0 Ð Ñß C ÑO Ð Ñ # 0 Ð Ñß"

83œ"

8

" " 3 3 3 "x x xx -

or


0 Ð Ñ œ P Ð0 Ð Ñß C ÑOÐ ß ÑÞ"

# 8" " " 3 3 3

3œ"

8

x x x x-

Thus if a minimizer exists for (1a) it0 œ 0 ßs"

can be written in the form (3) as claimed,with

+ œ P Ð0 Ð Ñß C ÑÞ"

# 83 " " 3 3

-x


Note that this does not solve the problem,since the are expressed in terms of the+3solution itself.

But it does reduce the possibilities for what asolution looks like.

examples of hilbert spaces lecture 11a

Documents