three data analysis problems
DESCRIPTION
Three data analysis problems. Andreas Zezas University of Crete CfA. Two types of problems: Fitting Source Classification. Fitting: complex datasets. Fitting: complex datasets. Maragoudakis et al. i n prep. Fitting: complex datasets. Fitting: complex datasets. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/1.jpg)
Three data analysis problems
Andreas Zezas
University of CreteCfA
![Page 2: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/2.jpg)
Two types of problems:• Fitting
• Source Classification
![Page 3: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/3.jpg)
Fitting: complex datasets
![Page 4: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/4.jpg)
Fitting: complex datasets
Maragoudakis et al. in prep.
![Page 5: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/5.jpg)
Fitting: complex datasets
![Page 6: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/6.jpg)
Fitting: complex datasets
Iterative fitting may work, but it is inefficient and confidence intervals on parameters not reliable
How do we fit jointly the two datasets ?
VERY common problem !
![Page 7: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/7.jpg)
Problem 2
Model selection in 2D fits of images
![Page 8: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/8.jpg)
A primer on galaxy morphology
Three components:
spheroidal
exponential disk
and nuclear point source (PSF)
€
I(R) = I e exp −7.67RRe
⎛ ⎝ ⎜
⎞ ⎠ ⎟
1/ 4
−1 ⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
€
I(R) = I0 exprrh
⎛ ⎝ ⎜
⎞ ⎠ ⎟
![Page 9: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/9.jpg)
Fitting: The method
Use a generalized model
n=4 : spheroidal
n=1 : disk
Add other (or alternative) models as needed Add blurring by PSF
Do χ2 fit (e.g. Peng et al., 2002)
€
I(R)=Ieexp−kRRe
⎛ ⎝ ⎜
⎞ ⎠ ⎟
1/n
−1 ⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
![Page 10: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/10.jpg)
Fitting: The method
Typical model tree
n=free n=4 n=1
n=4 n=4
n=4
PSF PSF PSF PSF PSF
€
I(R)=Ieexp−kRRe
⎛ ⎝ ⎜
⎞ ⎠ ⎟
1/n
−1 ⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
⎡
⎣ ⎢ ⎢
⎤
⎦ ⎥ ⎥
![Page 11: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/11.jpg)
Fitting: Discriminating between models
Generally χ2 works
BUT: Combinations of different models may give similar χ2
How to select the best model ?
Models not nested: cannot use standard methods
Look at the residuals
![Page 12: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/12.jpg)
Fitting: Discriminating between models
![Page 13: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/13.jpg)
Fitting: Discriminating between models
Excess variance
Best fitting model among least χ2 models the one that has the lowest exc. variance
€
σXS2 =σ obj
2 −σ sky2
![Page 14: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/14.jpg)
Fitting: Examples
Bonfini et al. in prep.
![Page 15: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/15.jpg)
Fitting: Problems
However, method not ideal: It is not calibrated
Cannot give significance Fitting process computationally intensive
Require an alternative, robust, fast, method
![Page 16: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/16.jpg)
Problem 3
Source Classification
(a) Stars
![Page 17: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/17.jpg)
Classifying stars
Relative strength of lines discriminates between different types of stars
Currently done “by eye”
orby cross-correlation analysis
![Page 18: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/18.jpg)
Classifying stars
Would like to define a quantitative scheme based on strength of different lines.
![Page 19: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/19.jpg)
Classifying stars
Maravelias et al. in prep.
![Page 20: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/20.jpg)
Classifying stars
Not simple….
• Multi-parameter space• Degeneracies in parts of
the parameter space• Sparse sampling• Continuous distribution of
parameters in training sample (cannot use clustering)
• Uncertainties and intrinsic variance in training sample
![Page 21: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/21.jpg)
Problem 3
Source Classification(b) Galaxies
![Page 22: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/22.jpg)
Classifying galaxies
Ho et al. 1999
![Page 23: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/23.jpg)
Classifying galaxies
Kewley et al. 2001 Kewley et al. 2006
![Page 24: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/24.jpg)
Classifying galaxies
Basically an empirical scheme
• Multi-dimensional parameter space
• Sparse sampling - but now large training sample available
• Uncertainties and intrinsic variance in training sample
Use observations to define locus of different classes
![Page 25: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/25.jpg)
Classifying galaxies
• Uncertainties in classification due to
• measurement errors• uncertainties in diagnostic
scheme• Not always consistent results
from different diagnostics
Use ALL diagnostics together Obtain classification with a
confidence interval
Maragoudakis et al in prep.
![Page 26: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/26.jpg)
Classification
• Problem similar to inverting Hardness ratios to spectral parameters
• But more difficult• We do not have well
defined grid• Grid is not continuous
Taeyoung Park’s thesis
NH
Γ
![Page 27: Three data analysis problems](https://reader036.vdocuments.net/reader036/viewer/2022081503/56816130550346895dd0858e/html5/thumbnails/27.jpg)
Summary
• Model selection in multi-component 2D image fits• Joint fits of datasets of different sizes• Classification in multi-parameter space
• Definition of the locus of different source types based on sparse data with uncertainties
• Characterization of objects given uncertainties in classification scheme and measurement errors
All are challenging problems related to very common data analysis tasks.
Any volunteers ?