goodness of fit using bootstrap
DESCRIPTION
Goodness of Fit using Bootstrap. G. Jogesh Babu Center for Astrostatistics http://astrostatistics.psu.edu. Astrophysical Inference from astronomical data. Fitting astronomical data Non-linear regression Density (shape) estimation Parametric modeling Parameter estimation of assumed model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/1.jpg)
Goodness of Fit using Bootstrap
G. Jogesh Babu
Center for Astrostatistics
http://astrostatistics.psu.edu
![Page 2: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/2.jpg)
Astrophysical Inference from astronomical data
Fitting astronomical data • Non-linear regression• Density (shape) estimation• Parametric modeling
– Parameter estimation of assumed model– Model selection to evaluate different models
• Nested (in quasar spectrum, should one add a broad absorption line BAL component to a power law continuum)
• Non-nested (is the quasar emission process a mixture of blackbodies or a power law?)
• Goodness of fit
![Page 3: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/3.jpg)
Chandra X-ray Observatory ACIS dataCOUP source # 410 in Orion Nebula with 468 photons
Fitting to binned data using 2 (XSPEC package)Thermal model with absorption, AV~1 mag
![Page 4: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/4.jpg)
Fitting to unbinned EDF Maximum likelihood (C-statistic)Thermal model with absorption
![Page 5: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/5.jpg)
Incorrect model family Power law model, absorption AV~1 mag
Question : Can a power law model be excluded with 99% confidence?
![Page 6: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/6.jpg)
Empirical Distribution Function
![Page 7: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/7.jpg)
K-S Confidence bandsF=Fn +/- Dn()
![Page 8: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/8.jpg)
Model fitting
Find most parsimonious `best’ fit to answer:• Is the underlying nature of an X-ray stellar
spectrum a non-thermal power law or a thermal gas with absorption?
• Are the fluctuations in the cosmic microwave background best fit by Big Bang models with dark energy or with quintessence?
• Are there interesting correlations among the properties of objects in any given class (e.g. the Fundamental Plane of elliptical galaxies), and what are the optimal analytical expressions of such correlations?
![Page 9: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/9.jpg)
Statistics Based on EDF
Kolmogrov-Smirnov: supx |Fn(x) - F(x)|,
supx (Fn(x) - F(x))+, supx (Fn(x) - F(x))-
Cramer - van Mises:
Anderson - Darling:
All of these statistics are distribution free
Nonparametric statistics.
But they are no longer distribution free if the parameters are estimated or the data is multivariate.
dF(x)F(x))(x)(F 2n −∫
dF(x) F(x))F(x)(1
F(x))(x)(F 2n∫ −
−
![Page 10: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/10.jpg)
KS Probabilities are invalid when the model parameters are estimated from the data. Some astronomers use them incorrectly.
(Lillifors 1964)
![Page 11: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/11.jpg)
Multivariate CaseWarning: K-S does not work in multidimensions
Example – Paul B. Simpson (1951)
F(x,y) = ax2 y + (1 – a) y2 x, 0 < x, y < 1
(X1, Y1) data from F, F1 EDF of (X1, Y1)
P(| F1(x,y) - F(x,y)| < 0.72, for all x, y) is > 0.065 if a = 0, (F(x,y) = y2 x) < 0.058 if a = 0.5, (F(x,y) = xy(x+y)/2)
Numerical Recipe’s treatment of a 2-dim KS test is mathematically invalid.
![Page 12: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/12.jpg)
Processes with estimated Parameters
{F(.; ): } - a family of distributions
X1, …, Xn sample from F
Kolmogorov-Smirnov, Cramer-von Mises etc.,
when is estimated from the data, are
Continuous functionals of the empirical process
Yn (x; n) = (Fn (x) – F(x; n))n
![Page 13: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/13.jpg)
In the Gaussian case,
and )s,X(è 2nn =
∑=
=n
1iiX
n
1X
∑=
−=n
1i
2i
2n )X(X
n
1s
![Page 14: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/14.jpg)
BootstrapGn is an estimator of F, based on X1, …, Xn
X1*, …, Xn
* i.i.d. from Gn
n*= n(X1
*, …, Xn*)
F(.; is Gaussian with (2)and , then
Parametric bootstrap if Gn =F(.; nX1
*, …, Xn* i.i.d. from F(.; n
Nonparametric bootstrap if Gn =Fn (EDF)
)s,X(è 2nn = )s,X(è *2
n*n
*n =
![Page 15: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/15.jpg)
Parametric Bootstrap
X1*, …, Xn
* sample generated from F(.; n).In Gaussian case .
Both supx |Fn (x) – F(x; n)| and
supx |Fn* (x) – F(x; n
*)| have the same limiting distribution
(In the XSPEC packages, the parametric bootstrap is command FAKEIT, which makes Monte Carlo simulation of specified spectral model)
)s,X(è *2n
*n
*n =
n
n
![Page 16: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/16.jpg)
Nonparametric Bootstrap
X1*, …, Xn
* i.i.d. from Fn.A bias correction
Bn(x) = Fn (x) – F(x; n) is needed.
supx |Fn (x) – F(x; n)| and
supx |Fn* (x) – F(x; n
*) - Bn (x) | have the same limiting distribution (XSPEC does not provide a nonparametric bootstrap capability)
n
n
![Page 17: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/17.jpg)
• Chi-Square type statistics – (Babu, 1984, Statistics with linear combinations of chi-squares as weak limit. Sankhya, Series A, 46, 85-93.)
• U-statistics – (Arcones and Giné, 1992, On the bootstrap of U and V statistics. Ann. of Statist., 20, 655–674.)
![Page 18: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/18.jpg)
Confidence limits under misspecification of model family
X1, …, Xn data from unknown H.H may or may not belong to the family {F(.; ): }.
H is closest to F(.; 0), in Kullback - Leibler information
h(x) log (h(x)/f(x; )) d(x) 0
h(x) |log (h(x)| d(x) <
h(x) log f(x; 0) d(x) = maxh(x) log f(x; ) d(x)
∫
∫ ∫
∞
≥
∫
![Page 19: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/19.jpg)
For any 0 < < 1,
P( supx |Fn (x) – F(x; n) – (H(x) – F(x; 0)) | <C*)
C* is the -th quantile of
supx |Fn* (x) – F(x; n
*) – (Fn (x) – F(x; n)) |
This provide an estimate of the distance between the true distribution and the family of distributions under consideration.
n
n
![Page 20: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/20.jpg)
References
• G. J. Babu and C. R. Rao (1993). Handbook of Statistics, Vol 9, Chapter 19.
• G. J. Babu and C. R. Rao (2003). Confidence limits to the distance of the true distribution from a misspecified family by bootstrap. J. Statist. Plann. Inference 115, 471-478.
• G. J. Babu and C. R. Rao (2004). Goodness-of-fit tests when parameters are estimated. Sankhya, Series A, 66 (2004) no. 1, 63-74.
![Page 21: Goodness of Fit using Bootstrap](https://reader033.vdocuments.net/reader033/viewer/2022061605/56812bdf550346895d904e3d/html5/thumbnails/21.jpg)
The End