nonparametric estimation of conditional quantiles using nn.pdf

-2 -

AB S1RA cr

We establish the consistency of nonparamenic conditional quantile estimators based on artificial

neural networks. The results follow from general results on sieve estimation for dependent

processes. We also show that conditional quantiles can be learned to any pre-specified accuracy

using approximate rather than exact network optimization.

-3 -

I. INTRODUCnON

In forecasting, it is by far most common to give point forecasts, the most common point

forecast being some estimate of the conditional expectation of the variable of interest given some

more or less explicit set of conditioning variables. By their very nature, point forecasts generally

provide no infol1Jlation about the variation of the variable of interest around its conditional

expectation. Sometimes point forecasts are augmented by "margins of error", but these margins

may be based on unwarranted assumptions (e.g. normality) about the conditional distribution of

the variable of interest.

An attractive alternative to point forecasts, whether or not accompanied by error margins, is

a centered interval forecast (e.g. Granger, White and Kamstra, 1989). The median (50%

percentile) provides a convenient central value, while the interval limits specify upper and lower

bounds such that the variable of interest lies below the lower bound or above the upper bound

with (the same) given probability, conditional on available infonnation. For example, a 50%

centered interval forecast is such that 25% of the time the forecasted variable will lie below the

lower bound and 25% of the time the forecasted variable will lie above the upper bound.

Construction of interval forecasts is based on the conditional quantiles of the variable of

1nrerest given a speclneCl set of conClitioning variables. Often, there is little infoImation that

would permit a confident specification of a parametric model for the conditional quantile. Linear

models are those most frequently considered (e.g. Koenker and Bassett. 1978), but this is

probably due more to the tractability of linear models than to the cenain knowledge that the

phenomena of interest are linear. fu the absence of a finn model for the conditional quantile

function, it is desirable to use some flexible model specification capable of adapting to whatever

features the data may present. In this paper. we consider the use of artificial neural network

models to provide the desired flexibility in estimating conditional quantiles.

-4.

Anificial neural networks can be viewed as flexible nonlinear functional fomls suitable for

approximating arbitrary mappings. As numerous authors have recently shown (e.g. Carroll and

Dickinson, 1989; Cybenko 1989; Funahashi, 1989; Hecht-Nielsen, 1989; Homik, 1989; Homik,

Stinchcombe and White, 1989a,b (HSWa,b); Stinchcombe and White, 1989, 1990) suitable

classes of network output functions are dense in a broad range of function spaces under general

conditions. White (1990) exploited this fact to show that a class of artificial neural networks. the

"single hidden layer feedforward networks," can be used to obtain a consistent nonparametIic

estimator of a square integrable conditional expectation function. White's (1990) re.'mlt~ :lrp.

proven by applying the method of sieves (Grenander, 1981; Geman and Hwang, 1982; White and

Wooldridge, 1990). Here we give analogous results, using the method of sieves to establish the

consistency of nonparametric conditional quantile estimators based on single hidden layer

feedforward networks.

We also consider approximate conditional quantile estimators obtained by approximately

solving the optimization problem that delivers the exact quantile estimators. Our results are

fonnulated both for independent identically distributed (i.i.d.) random variables as well as for

stationary mixing or ergodic stochastic processes relevant to interval forecasting of time series

2. MAIN RESULTS

2.a Exact Optimization

Our first assumption describes the properties of the stochastic process generating our

observations.

ASSUMPTION A.l: Observations are generated as the realization of a bounded vector-valued

stochastic process {21} defined on a complete probability space (0, F, P), where p is such that

either

5

(i) {Zt} is an independent identically distributed (i.i.d.) process; or

(ii) {Zt} is a stationary l/J- or a-mixing process such that l/J(k) = l/Jo f;k or a(k) = ao f;k,

° <; < 1, rPo, ao > 0, k > 0. D

We partition Zt as Zt = (yt, x; )', where yt is a random scalar and Xt is a random r X 1 vector.

Without loss of generality, we may assume Zt: .<1 -7 Dr+l = xr~l [0,1]. In pan (ii), we define

the <1>- and a-mixing coefficients in the usual way as

l/J(m) ="SUP{Ae F~.Be F;-+..:P(A»O} IP(B IA)-P(B)I

a(m) = SUP{Ae F~.Be F;-~..} IP(A nB)-P(A)P(B)1 .

The object of our interest is the conditional quantile p of yt given Xtt defined by a function

(}p: Dr ~ D such that

P[Y, $ °p(X,) I X,] =p , p E (0, I)

A k x 100 % centered interval forecast is obtained using ep by taking the lower bound for Yl

given XI as (} k/2(XI) and the upper bound for YI given XI as (}1-k/2(XI).

We consider approximations to (} p obtained as the output of a single hidden layer

feedforward network,

q -r(x, oq) = fJo + LfJj1jf(x'rj)

j=l

where x = (1, x')' is an (,+1) x 1 vector consisting of a "bias unit" and "network inputs" x. the

function 111 : IR ~ IR is a nonlinear "hidden layer activation function" (often chosen to be a

cumulative distribution function (c.d.t)), q E IN is the number of "hidden units" of the network,

and oq =(13'.r')' (where 13 =(130.131. 13q)'. r=(r~ r~)'. rj =(rjO.rj1 ..

is an s x 1 parameter vector, s = q(r+2) + 1. In network parlance, oq is the vector of'i,network

connection strengths." We explicitly index @ by q in order to emphasize the dependence of the

-6 -

dimension of this vector on q.

To obtain our results, we use the "connectionist sieve" introduced by White (1990), defined

as follows:

DEFINITION 2.1: Let "' : IR -7 IR be a given bounded function, and let 8 be a subset of the

set of functions from Dr to m. For any q E IN and L\ E m.+ define a collection of network

output functions T( 1f/ , q, Ll) as

q -T( 1JI. Q. A) = {[} E A: [}(r) = fio + L fi j 1Jf(r' 'Y) for all -T in Dr

j=l

q q rL lf3jl~~, L L Irji

j=O j=l i=OI ~ q~}

For given 11' and sequences { q71 } , { 671 } , define the sequence of (single hidden layer) connectionist

sieves { en ( 1/1) } as

8n(1jI)=T(1jI, qn, L\n) , n = 1, 2,... . 0

When e E E>n(1jf), we have e(X) =f1"(X, ~" ) in the notation of (2.1). By letting qn -700 and

Lln ~ 00, we obtain a sequence of increasingly flexible network models.

We obtain the conditional quantile estimator (say {;n) from a sample of size n by solving the

optimization problem

9 Eme~'lf) Qn(8) = n11

-1 L I Y,-()(Xt)1(pl[Y, ~()(Xt)] + (l-p)l[Yt «)(X,)]) ,

t =1(2.2)

where 1 [ .] is the indicator function for the specified event. Solving this problem is therefore

equivalent to solving the problem

n

mill n-1 L I yt -~.(Xt, $1. )

8"' E D. t =1(pl[Yt ~fl"(xt, ~" )] + (l-p)l[Yt <fl"(xt, ~" )])

$ iln } .This is a direct non-linear analog

-9-

The growth of en(1jI) must be sufficiently restricted to ensure that the uniform convergence

condition (2.3) holds. In the present application Q is defined by Q«(}) = E(Q/I«(})). The required

unifonn convergence holds under straightfo1Ward conditions using Theorem 4.2 and Lemma 4.3

of White (1990). To control the growth of en(1jI) appropriately, we impose conditions on 1jI, {qn}

and {~n } .We say that 11' satisfies a Lipschitz condition if 111'(a I) -1I'(a2) I $: L I a 1 -a21 for all

a 1, a2 E m. and some L E m.+ .We denote L as the set of all functions 1/' : m. -? m. such that

11' is bounded, satisfies a Lipschitz condition (for given L < ~ ) and is either a c.d.f. or is l-finite.

The following condition imposes the appropriate structure for our problem.

ASSUMPnON A.2: en(111) = T(1/', qn' ~n)' n = 1, 2"., where 111 E L and {qn} and {~n} are

such that q/1 i 00, ~/I i 00 as n ~ 00, ~/I = o(n'l. ) and either (i) q/1 ~; log q/1 ~/I = o(n) or (ii)

qn ~nlog qn ~n = o(nY.).

LE~1MA 2.4: Suppose Assumptions A. 1 (i) and A.2(i) or A.l(ii) and A.2(ii) hold. Then

0P[SUP8Ee.(IJI)IQ/I(e)-E(Q/I(e))1 >e]-7Q as n-7oo

It remains to ensure the identification condition (2.4) and the continuity of Q at 80 = 8po

Continuity is straightforward. For (2.4), the following condition suffices.

ASSUMPnON A.3: For given p E (0,1), ep: Or ~ 0 is a measurable function such that

P[Yt ~8p(XJIXt] =p and for every 8 E e and all E > O sufficiently small E 18(Xt)-8p(Xt)1 >E

implies

E[1[(9p(Xt) + 9(Xt))/2 ~ rt < 9p(Xt)] 19(Xt) < 9p(Xt)] > 8£ and

E[1[6p(X,) S yt < (6p(X,) + 6 (Xt»/2 ] I 6(X,) ~ 6p(Xt)] > oe . 0

This assumption ensures that the conditional distribution of Y, given X, is continuous in a

neighborhood ofep(x,), ensuring the uniqueness (a.s.-P) ofep(Xt).

13 -

optimum. For given' > 0. picking {) 11, £ e S1l(E. ,) requires only approximate optimization.

The result for approximate estimation can now be stated.

THEOREM 2.8: Given Assumptions B.l, B.2 and A.3, for anye > O there exist qE E IN and

'e > O such that if{} n. e E Sn(E, 'e), n = I, 2,... then {}n.e E S(E) a.a.n, a.s.-P. 0

Thus, conditional quantile functions 8 p can be estimated to any desired degree of (L 1-) accuracy

using artificial neural networks having estimated parameters that approximately solve the

nonlinear quantilc rcgrc33ion problcm.

3. SOME REMARKS ON COMPUTAnON

Due to the nonlinearity in parameters of the neural network output function, standard linear

programming methods for computing e1l or {) 11, £ (e.g. FultOn, Subramanian and Carson, 1987)

cannot be used. However, the neural network output function is linear in the /3 parameters. This

suggests that computation of estimates for fixed q could proceed by selecting parameters r (e.g.

at random) and then applying standard teclmiques to estimate parameters fJ. Picking sufficiently

many values for r (e.g. by multi-start methods) and then estimating/3 as just described should

produce useful estimates.

Alternatively, the nonsmoothness and nonlinearity of the objective function Qn(e) make it

an attractive candidate for application of simulated annealing (e.g. Hajek. 1985) or genetic

algorithm (GA) optimization methods (Holland, 1975; Goldberg,1989),

Regardless of the method applied, computing either ell or {) 11, E is cenain to be

computationally demanding. The attractiveness of using such estimates to produce

nonparamettic interval forecasts may suffice in certain applications to justify this effort.

-14-

MATHEMAnCAL APPENDIX

All notation and assumptions are as given in the text.

PROOF OF LEMMA 2.3:

x E D.r, {30 E 1R, {3j E 1R, rj E 1Rr+l , j = 1, ..., q, q E IN}. It follows from Theorem 2.4 of

HSWa or Corollary 3.6 of HSWb and Theorem 3.14 of Rudin (1974) that Lr(1fI) is dense in

L 1 ( Dr, .u). Let 9 be an arbitrary element of Lr(1JI) so that for some q E IN, fJo E 1R , fJ j E 1R ,

r .E IRr...l J" =1J ...

0 -, q, we have g(x) = fJo + L fJ j 1jI(x' rj). Because qn -7 00 and Lln -7 00

j=1

we can always pick n sufficiently large that Lj=o lf3j I ~ Llnl Lj=l L;=o Irji I ~ Lln and q ~ qn'

Thus, for n sufficiently large 9 belongs to en(1jf) = T(1jf, qll' Lln) and therefore U;=l en (1jf).

Because 9 is arbitrary }::,r(1/f) c U:=1 8/1 (1jI). It follows from tIle denseness of}::,r(1jI) inLl( Or ,JL)

that U;=l 8n(1jf) is dense inLl( nn,JL), D

PROOF OF LEMMA 2.4: We apply Theorem 4.2 and Lemma 4.3 of White (1990). For

Theorem 4.2, the complete probability space (.Q., F, P) is that of Assumption A.l; the metric

space (e, p) is that of the continuous functions on Dr with unifonn metric. The sequence { en }

of Theorem 4.2 corresponds to {8n(1/l)} here. By choice of 1/1, {qn} and {L\n}, this is an

increasing sequence of (separable) subsets of e. The summands of interest are

Sn(Zt. 0) = I Yt-O(Xt)l(pl[Yt ~O(Xt)] + (l-p)l[Yt <O(XJ])

where Sn (White's (1990) notation) does not depend on n here. Note that the continuity of e

ensures that for each () in en(1jI) Sn( ., ()) is continuous on n, as required.

Now for each z in Dr + 1 and ()O in en (lJI), geometric arguments ensure that

I S/I(Z, (}) -S/I(Z, (}O ) ~max (p, I-p) 18(x)-8°(z)1

~ max (p, I-p)p(e, eo )

-15-

for all e in 11 n(eO ) = {e E en(1jf) : p(e, eO) < 1 }. Consequently, putting mn(z, e) = 1 and mn = 1

(White's (1990) notation) that theguarantees required domination condition

Is/I(Z, (})-S/I(Z' (}O)I ~m/1(z, (}O)p«(}, (}Of holds with..:t = 1 and d/I«(}O) = 1 in White's (1990)

notation.

We also require a choice for Sn ;?; sup 8 e 8.('1') I Sn(Z, 8) I. Now

f)(x) IIsn(z,8)1 ~ ly-8(x)1 ~ I yl +

Taking the bound on 1JI to be unity (without loss of generality), we have I(}(x) I :5 Lln. With

/1n ~ 1 we then have Isn(z,())1 ~2/1n for all z e Dr+l and () e en (1jI). We therefore take

s,. = 21\.,.

Because the conditions of Theorem 4.2 of White (1990) thus hold, it follows from (i) of

Theorem 4.2 ( {Zt } i.i.d.) that

ta.l.l)n

Plsupee e.(I/f)ln-1 :}:[sn(z,,(})-E(sn(Z,,(}»]/ >E]

1=1

~ 2Gn(E/6)[exp(-6n/7) + exp(-E2n/4~;[18+4E])]

for any e > O and all n sufficiently large, where Gn(e) = exp Hn(e) and Hn(e) is the metric entropy

of en(1j/). If in addition ,,-1 ~ -7 O as ,~ -7 ~ culd fUl cilll;; ? O (S;;fn) /fn(EIO) ~ o as n ~ 00, then

the right hand side of (a.l.i) converges to zero, and we are done with the proof of case (i).

Similarly, (ii) of Theorem 4.2 of White (1990) ({Zt} mixing as in Assumption A.1(ii»

ensures that there exist constants O < c 1 .c 2 < 00 independent nf n ~l1rh that for any S ~ O and all

n sufficiently large

n

(a.l.ii) P[sup 9 E e.(1jI) I n-1 L [Sn(ZI. e) -E(Sn(ZI. e))] I > E]

1=1

~ c 1 Gn(e/6)[exp(-c2 n'l. ) + eXp(-C2 e n 'I./12Lln)] .

-16-

If in addition n-l i -7 O as n -7 00 and for all e > 0 ( snln% ) Hn(eI6) -7 O as n -700, then the

right hand side of (a.l.ii) converges to zero, and we are done with the proof of case (ii).

It remains to verify the conditions on i and the convergence to zero of the bounds in (a.l.i)

and (a.l.ii). Because sn = 2~n' in either case it suffices that ~n = o(n'1.). Now Lemma 4.3 of

White (1990) ensures that for all E > 0 sufficiently small.

Hn(E) ~ vn tog q/E + Vn 1og[~n + rL ~; ] + vn tog Vn ,

We seek choices for {qn} ensuring that (iln) Hn(eI6) ~ O orwhere Vn = qn(r + 2) + 1

( Sn/n x) Hn(E/6) ~ 0. Now for all n sufficiently large. we will have Ll~ > Lln and Ll~ > (I +rL), as

well as ~~ > 48/�. Consequently,

Hn(e/6) ~ Vn lag 48/e + Vn lag[An+rL A; ] + Vn lag Vn

~ Vn log ~~ + Vn log ~~(l +rL) + Vn log Vn

~ 6vn log ~n + Vn log Vn

$ 6vll log ~II VII

--;l;have VII L\~ log VII L\II = o (n) SO that ( sllln) HII(eI6) -7 O as required.

Because qn ~n log qn ~n = o(n Yo ),Similarly, ( snln Yz ) Hn(EI6) S n-Yz 12 Vn /).n log Vn /).n.

the desired result again follows. 0

PROOF OF THEOREM 2.5: (a) The argument is sketched in the text. Note that the

compactness of en(ljI) follows because it is totally bounded and closed.

(b) The denseness OfV;=l en('1') required in Theorem 2.2(b) is established by Lemma 2.3,

:Inti thp lmiform convergence condition (2.3) is established by Lcmma 2.4. It rcm4U~ tu vcrify

(2.4) and the continuity at (}p of Q, where

-19-

inf9 e 1JC(9,. £)Q(()) -Q(() p) > 0£ £/2 > O and verifying (2.4). D

PROOF OF THEOKEM 2.S: The argument is similar to that of Theorem 3.1 of Stinchcombe

and White (1990). For given E > 0. choose °E > 0 as guaranteed by Assumption A.3. Given

Assumption B.2, it follows from either Theorem 2.3 or 2.6 of Stinchcombe and White (1990) and

the argument of Corollary of HSWa that there exists q e E iN sufficiently large and ee E 8£(1//, 6)

such thatp(f)£, f)p) < ~£12, ~e =e DE/2.

Because ee(1fI, Ll) is the continuous image of the compact set {~. : 1/30 I ~ ~, 1/3 j I ~ ~,

Irji I ~ Ll, i = 0, ..., r, j = 1, ..., qe}, ee(!JI, Ll) is compact Because Qn(e) is continuous on

e.,(1Jf .A) for each realization of {2, ) it follnw~ that there always exists a miniInizcr e II, £ of QII

on E>e(1jI, 6.). It follows from Theorem 1 of White (1989) that {j n, e ~ 8; a.s., where

provided that there exists Dl~ IYl-fJ(Xl)I((l-p)l[Y,~fJ(X,)]-p l[Yl>fJ(X,)]) for all

8 E 8e(1f/ , ~) such that E(Dt) < 00, But I Yt -()(Xt) I ((l-p)l[Yt ~ ()(Xt)] -p l[Yt > ()(XuD

~ 21 Yr -8(Xr) I ~ 21 Yr I + 2 sup 9 E e,(1iI. ~).X E U' 8(x) E Dr. Assumptions B.l and B.2 ensure that

E(Dt) < 00. Note that Theorem 1 of White (1989) assumes that {yt, Xt} is an i.i.d. sequence, but

ergodic theorem (e.g. Stout, 1974, Theorem 3.5.7) instead of the Kolmogorov law of large

numbers for i.i.d. in establishing the unifomlsequences law of large numbers,

suPe E B.(1JI. A) I Qn((}) -Q((}) I ~ O a.s.

We establish that {}n.E E S(e) a.a.n. a.s. by contradiction. Suppose the desired conclusion

is false. Then there exists F E F, P(F) > 0, such that for each (0 E F there exists a subsequence

{ n '} for which {) n " E e S (e) for all n '. Without loss of generality .we may also choose {.0 so that

-A .sup 8 E 8.('1', ~) I Q1I((0, e) -Q(e) I -7 O and e 11,£((0) -78£ as n -700, as these events occur for (0 in

of probabilitya set Nowone.

20-

Q(fJn-,£«(J))1 + IQ(fJn-,£«(J))-Q(8;)1 + IQ(8;)-Q(8p)1 for8;e e;. For the last teffil,the

argument of Theorem 2.5 establishes that Q is minimized at 8po Hence Q(8p) ~ Q(8; ) ~ Q(8£),

so that O ~ Q«(}; ) -Q«(}p) ~ Q«(}E) -Q«(}p) <P«(}E' (}p) < ~E/2 by choice of (}Eo

I Q((); ) -Q(()p) I < ~e/2. For the second term, the fact that {}n'.e((J)) -78; and the continuity of

Q imply that IQ({)n'.£((J))) -Q(e; ) I < '£18 for all n' sufficiently large.

For the first telm. ~ I Q({)n',e(co)) -Qn'(CO, {)n',e(co))1

+ IQ(en,.E({J)))-Qn'({J),en,.E((J)))1 + IQn'((O' {)n',e((0)) -Qn'((O, {)n',e((0))1 < t;e/8 + t;e/8 + t;

for all n' sufficiently large by the triangle inequality, the unifolm law of large numbers and the

definition of {) n'. E. Putting t; = t;E/8 and collecting together all the preceding inequalities, we

--have I Q({)n',e(ro)) -Q((Jp) I < 'e for all n' sufficiently large.

We complete the proof by showing that--

IQ({)n'.e((J)))-Q(ep)1 «e implies

P({)n',e(ro).8p) <e for all n' sufficiently large. contradicting {)n',e(ro) E S(e) for all n

In the proof of Theorem 2.5 we established that given Assumption A.3,

Q(f))-Q(f)p) > 8eP(f),f)p)/2 for f) E 17C(f)p,E)

Because {)n',e(OJ) e S(e), {)n',£((J)) E 1JC(ep. e) so

Q(8 '.',I:;(tt») -Q(9p) > Dc p(8 '.',I:;(tt», 9p)J2 for all Eufficiendy large. Butn

--

t;e> Q(8n,.e«(J)))- Q«(}p) > 8ep(8n'.e«(J)), (}p)/2 impliesp(8n,.e«(J)), (}p) <e for all n' sufficiently

large, because ,£ = 8£ e/2. We thus have a contradiction, and as (0 E F was arbitrary, the proof is

complete. 0

-21

REFERENCES

Canon, S.M. and B. W. Dickinson, (1989), "Construction of Neural Nets Using the RadonTransfoIm,'. in rroceedmgs of the International Joint Conference on Neural Networks,Washington, D.C., New York: IEEE Press, pp. 1:607-611.

Cybenko, G. (1989), "Approx.imation by Superposition of a Sigmoid Function," Mathematics ofControl, Signals and Systems 2, 303-314.

Fulton, M., S. Subramanian and R. Carson (1987), "Estimating Fast Regression Quantiles Using aModification of t1lC B(UlUI.lCIlc Wll.l R.obens L 1 Algonmm,'. uepamnem of EconomicsDiscussion Paper 87-8, University of California, San Diego.

Funahashi, K. (1989), "On the Approximate Realization of Continuous Mappings by NeuralNetworks," Neural Networks, 2, 183-192.

Geman, S. and c. Hwang (1982): "Nonparametric Maximum Likelihood Estimation by theMethod of Sieves," The Annals ofStan'sn'cs 10, 401-414.

Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning.

Reading, MA: Addison Wesley.

Grenander, U. (1981). Abstract Inference. New York: Wiley.

Hajek, B. (1985), "A Tutorial Survey of Theory and Applications of Simulated Annealing," inProcccdin83 of thc 24th IEEE Confcrcllcc QI' Dc:(;iJiu" u,Jd Cun,rul, pp. 7-'-'- 700.

Hecht-Nie1sen, R. (1989), "Theory of the Back-Propagation Neural Network," in Proceedings ofthe International Joint Conference on Neural Networks, WashingtOn, D.C., New York:IEEE Press, pp. I: 593-606.

Homik, K. (1989), "Learning Capabilities of Multi1ayer Feedforward Networks," TechnischeUniversitat Wien, technical report.

Homik, K., M. Stinchcombe and H. White {1989), "Multilayer Feedforward Networks areUniversal Approxirnators," Neural Networks 2,359-366.

Homik, K., M. Stinchcombe and H. White (1990), "Universal Approximation of an UnknownMapping and Its Derivatives," Neural Networks 3, (to appear).

Holland, J. (1975). Adaptation in Natural and Artificial SyJtC171.r

Michigan Press.

Aru1 AtllUl. Ul1ivt:Thlly or

Koenker, R. and G. Basset (1978), "Regression Quantiles," Econometrica 46,33-55.

Rinnooy Kan, A.H.G., C.G.E. Boender and G. Th. Timmer (1985), "A Stockastic Approach toGlobal Optimization," in K. Schittkowski, ed., Computational MathematicalProgramming, NATO ASI Series, Vo1. F15. Berlin: Springer-Verlag, pp. 281-308.

Rudin, W. (1974). Real and Complex Analysis. New York: McGraw Hill.

-22-

Stinchcombe, M. and H. White (1989), "Universal Approximation Using Feedforward Networkswith Non-Sigmoid Hidden Layer Activation Functions," in Proceedings of theInternational Joint Conference on Neural Networks, Washington, D.C., New York: IEEEPct::):), pp. I: 613-617.

Stinchcombe, M. and H. White (1990), "Approximating and Learning Unknown Mappings UsingMultilayer Feedforward Networks with Bounded Weights," in Proceedings of theInternational Joint Conference on Neural Networks, San Diego. New York: IEEE Press,pp. lli-7-15.

Stout, w.r. (1974). AllfwJt SUIt: CUI'Yt:rb't:"ct:. Ncw YUlk.; ACC1Uemlc PresS.

White, H. (1989), "Learning in Artificial Neural Networks: A Statistical Perspective," Neural

Computation 1,425-464.

White, H. (1990), "Connectionist Nonparametric Regression: Multilayer FeedfolWard NetworksCan Learn Arbitrary Mappings," N eural N etworks 3, (to appear).

wrute, ti. ana J. Woolartdge (1~~U), "Some Results on Sieve Estimation with DependentObservations," in w. Barnett, I. Powell and G. Tauchen, eds., Nonparametric and Semi-Parametric Methods in Econometrics and Statistics. New York: Cambridge UniversityPress. (to appear)

nonparametric estimation of conditional quantiles using nn.pdf

Documents