preliminary census of rosat bright sources: results from ...sundog.stsci.edu/rick/aas03jan.pdfclassx...

1
Preliminary Census of ROSAT Bright Sources: Results from ClassX R.L. White, A.A. Suchkov, R.J. Hanisch, M. Postman, M.E. Donahue (STScI), T.A. McGlynn, L. Angelini, M.F. Corcoran, S.A. Drake, W.D. Pence, N. White, E.L. Winter (NASA/GSFC), F. Genova, F. Ochsenbein, P. Fernique, S. Derriere (CDS), W. Voges (MPE) Abstract. ClassX is being developed as a system for automated classification of X- ray sources within the Virtual Observa- tory environment. Its core is a network of classifiers “trained” using diverse data sets for X-ray sources of known object type and their optical, infrared, radio, etc. counterparts. The network is integrated in the ClassX pipeline with a search engine that queries remote multi-wavelength data repositories, using systems such as CDS VizieR service, to get data (in the VOTa- ble format) for the sources to be classi- fied. In this paper we present a preliminary census from ClassX of previously unclas- sified X-ray sources observed with the ROSAT PSPC. The early results include the finding that our sources are dominated by QSOs (Fig.1), in contrast to the star- dominated samples were used to train our classifiers. The ClassX census appears to be consistent with expectations when one considers the fainter population of sources being studied compared with pre- viously classified objects. Fig. 1. Class distribution. Comparison of class fraction for X-ray sources classified in the WGACAT (blue) and sources previously unclassified (red) for which GSC2 counterparts were found. Each panel presents results from different classifiers: “X-ray–optical” (xo9a_xo), “X-ray” (xo9a_x), and “optical” (xo9a_o). The green histogram at the bottom shows the distribution of the classes from the original WGACAT classifi- cation for the training set. Note that while the training set of known classi- fications was dominated by stars, the most common class among the newly classified objects is QSO, followed by galaxy clusters. This change in population is expected since the unidentified X-ray sources are gener- ally fainter. That our classifier is able to respond to the changing popula- tion is encouraging, as it is generally a challenging problem to classify a set of objects that differs substantially from the training set. Fig. 6. Near IR colors for newly classified sources. Same as in Fig. 5 but for previously unclassified sources. Classes are from the “X-ray” classifier. Fig. 5. Near IR colors. 2MASS J-K color distri- bution of X-ray sources with previously known classifications in the WGACAT. Fig. 2. Classification probabilities. Class probability distribution for previously unclassified X-ray sources with GSC2 and 2MASS counterparts (from the classi- fier trained using only X-ray data). ClassX provides classification probabilities for every class for each source. The plot shows the distribution of QSO proba- bilities for all objects (gray) and objects classified as QSOs (red), meaning that QSO is the highest probabil- ity class. The probabilities are relatively low because the QSO and AGN classes are so similar. Fig. 7. Mean IR & X-ray colors. Mean 2MASS J-K color (upper panel) and mean X-ray “color” (lower) for classified sources (classes based on the WGACAT -- blue) and unclassified sources (classes from the X-ray classifier -- red.) Reference A.A. Suchkov, T.A. McGlynn, L. Angelini, M.F. Corcoran, S.A. Drake, W.D. Pence, N. White, E.L. Winter, R.J. Hanisch, R.L. White, M. Postman, M.E. Donahue, F. Genova, F. Ochsenbein, P. Fernique, & S. Derriere, 2002. Automated Object Classification with ClassX, Astro-ph/0210407 Introduction: ClassX classifiers. Clas- sification of observed astronomical objects plays in major role in converting observational data into science. It is also trickier than one might guess because the class categories often overlap: the same object can be called a star and a white dwarf, a galaxy and an AGN, an AGN and a QSO, etc. The situation gets even more complicated when the same object is viewed with different instru- ments: for instance, at the position of an X-ray cluster of galaxies, an optical counterpart from, say, GSC2, would typ- ically be a galaxy rather than a combined entity called a cluster of galaxies. These and similar conceptual issues related to the ClassX project were discussed ear- lier by Suchkov et al. (2002). This paper. In this paper, we classify previously unidentified ROSAT sources with several different classifiers, each trained with a different set of parame- ters. For instance, the training of the “X- ray” classifier involves X-ray magni- tudes but not optical and infrared magni- tudes, while the training of the “X-ray and optical” classifier involves both X- ray and optical magnitudes. Fig. 1 com- pares class frequency of two samples of X-ray sources with classification from three classifiers. Data. The WGA catalog of X-ray sources from ROSAT PSPC observa- tions contains 36995 sources for which we found optical counterparts within 30 arcsec in the GSC2, with both F and J magnitudes. Of those, 6505 sources were classified in the WGACAT; we used this sample to train our classifiers. The classifiers were then applied to the remaining 30490 sources to determine the object type (class) associated with these previously unclassified sources (see Fig. 1). AAS Meeting 201, January 5 – 9, Seattle, WA Class properties of previously unclassified sources. Not surprisingly, the unclassified sources are on average fainter, which implies that the respective class objects are on average more distant or less luminous. We expect some systematic differences between classes in the classified and unclassified sam- ples. Illustrations of such differences can be found in the figures presented here. For example, QSOs are much more common at fainter magnitudes, which accounts for the large increase in the fraction of QSOs compared with the training set. Observational biases. Class properties in the classified and unclassified samples are also different because of different source detectability in different bands.. Bluer 2MASS colors are found in the unclassified sample because at faint magnitudes detections in the K band are possible only for bluer sources (Fig. 7, upper panel). Similarly, fainter sources are softer in the X-ray because detections in the hard band, x3, are available only for softer sources (Fig. 7, lower panel). Future work. Clearly these statistical checks on the properties of different classes are not a substitute for checking the accuracy of the classifications using spectroscopically identified sources. In the near future we plan detailed comparisons of our classes with external data such as SDSS. Highlights Compared with the previously classified objects, for the newly classified sources: • QSOs and clusters of galaxies are much more common (whereas stars dominate the training set.) • All classes in the newly classified sample are softer in X-rays (except for OF stars). • All classes in the newly classified sample are bluer in the 2MASS bands. • Class QSO is the “softest” and much softer than class AGN in both samples. • Class AGN is the reddest and much redder than class QSO in both samples. • AGNs, galaxies, & clusters of galaxies all show bimodal IR color distributions. • In the infrared, AGNs, galaxies, and clusters of galaxies are dominated by the group of blue 2MASS counterparts as opposed to the group of red counterparts in the classified sample. 0.0 0.2 0.4 0.6 0.8 P(QSO or AGN) 0.0 0.2 0.4 0.6 0.8 1.0 P(Star) QSO+AGN Stars Other Validation of ClassX classification. We explore validity of the ClassX classification using a variety of checks on the internal and external consis- tency of the classification results. Figs. 5 and 6 display the distribution of the 2MASS J-K color, a parameter that was not used in the training or classification of the sources. Comparing the two figures, we notice a num- ber of common features. For example, the distribution of AGNs is obvi- ously bimodal in both the classified and unclassified samples, which isolates two groups, blue and red, centered at ~0.7 and ~1.4 (although the relative prominence of the two groups is different for the two samples). This kind of consistency suggests that the classifier does indeed a good job statistically in identifying AGNs among X-ray sources. Fig. 7 shows the class variation of the mean infrared and X-ray colors for classified and unclassified sources. There is a remarkable consistency between the two samples in the color variation from class to class: Classes that are redder/softer in the classified sample are also redder/ softer in the unclassified sample. Again, this is indicative of a substantial degree of reliability of the ClassX classification. Fig. 4 Separation in probabilities. The total probabili- ties P(Star) vs. P(QSO) are plotted for the combined classes of Fig. 3. Note that the histograms in Fig. 3 result from summing this distribution along the y direc- tion. Most of the stars are very well separated from the other classes, as are many of the QSOs+AGN. Objects near the intersections and boundaries are difficult to classify. 0.0 0.2 0.4 0.6 0.8 Probability (QSO) 0 500 1000 1500 2000 2500 Number 0.0 0.2 0.4 0.6 0.8 0 500 1000 1500 2000 2500 Distribution of QSO probabilities Non-QSOs QSOs Fig. 3 Combining class probabilities. The class probabilities can be usefully combined to compare groups of similar classes. Here the normal stellar classes have been combined into a single “Stars” class, QSOs & AGNs have been combined into a second class, and the X-ray Binary, Galaxy and Cluster classes are left unchanged. Now the QSOs/AGNs (red) separate very well from the stars (blue). 0.0 0.2 0.4 0.6 0.8 Probability (QSO or AGN) 0 500 1000 1500 Number 0.0 0.2 0.4 0.6 0.8 0 500 1000 1500 Distribution of combined QSO+AGN probabilities Stars QSO/AGNs Other

Upload: others

Post on 03-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Preliminary Census of ROSAT Bright Sources: Results from ...sundog.stsci.edu/rick/aas03jan.pdfClassX isbeingdevelopedasa system for automated classification of X-ray sources within

0.0 0.2 0.4 0.6 0.8Probability (QSO)

0

500

1000

1500

2000

2500

Num

ber

0.0 0.2 0.4 0.6 0.80

500

1000

1500

2000

2500 Distribution of QSO probabilitiesNon-QSOs

QSOs

0.0 0.2 0.4 0.6 0.8Probability (QSO)

0

500

1000

1500

2000

2500

Nu

mb

er

0.0 0.2 0.4 0.6 0.80

500

1000

1500

2000

2500 Distribution of QSO probabilitiesNon-QSOs

QSOs

Preliminary Census of ROSAT Bright Sources: Results from ClassXR.L. White, A.A. Suchkov, R.J. Hanisch, M. Postman, M.E. Donahue (STScI), T.A. McGlynn, L. Angelini, M.F. Corcoran, S.A. Drake,W.D. Pence, N. White, E.L. Winter (NASA/GSFC), F. Genova, F. Ochsenbein, P. Fernique, S. Derriere (CDS), W. Voges (MPE)

Abstract. ClassX is being developed as a

system for automated classification of X-

ray sources within the Virtual Observa-

tory environment. Its core is a network of

classifiers “trained” using diverse data

sets for X-ray sources of known object

type and their optical, infrared, radio, etc.

counterparts. The network is integrated in

the ClassX pipeline with a search engine

that queries remote multi-wavelength data

repositories, using systems such as CDS

VizieR service, to get data (in the VOTa-

ble format) for the sources to be classi-

fied.

In this paper we present a preliminary

census from ClassX of previously unclas-

sified X-ray sources observed with the

ROSAT PSPC. The early results include

the finding that our sources are dominated

by QSOs (Fig.1), in contrast to the star-

dominated samples were used to train our

classifiers. The ClassX census appears to

be consistent with expectations when one

considers the fainter population of

sources being studied compared with pre-

viously classified objects.

Fig. 1. Class distribution. Comparison of class fraction for X-ray sources

classified in the WGACAT (blue) and sources previously unclassified

(red) for which GSC2 counterparts were found. Each panel presents

results from different classifiers: “X-ray–optical” (xo9a_xo), “X-ray”

(xo9a_x), and “optical” (xo9a_o). The green histogram at the bottom

shows the distribution of the classes from the original WGACAT classifi-

cation for the training set. Note that while the training set of known classi-

fications was dominated by stars, the most common class among the

newly classified objects is QSO, followed by galaxy clusters. This change

in population is expected since the unidentified X-ray sources are gener-

ally fainter. That our classifier is able to respond to the changing popula-

tion is encouraging, as it is generally a challenging problem to classify a

set of objects that differs substantially from the training set.

Fig. 6. Near IR colors for newly classified sources.Same as in Fig. 5 but for previously unclassified

sources. Classes are from the “X-ray” classifier.

Fig. 5. Near IR colors. 2MASS J-K color distri-

bution of X-ray sources with previously known

classifications in the WGACAT.

Fig. 2. Classification probabilities. Class probability

distribution for previously unclassified X-ray sources

with GSC2 and 2MASS counterparts (from the classi-

fier trained using only X-ray data). ClassX provides

classification probabilities for every class for each

source. The plot shows the distribution of QSO proba-

bilities for all objects (gray) and objects classified as

QSOs (red), meaning that QSO is the highest probabil-

ity class. The probabilities are relatively low because

the QSO and AGN classes are so similar.

Fig. 7. Mean IR & X-ray colors. Mean 2MASS

J-K color (upper panel) and mean X-ray “color”

(lower) for classified sources (classes based on the

WGACAT -- blue) and unclassified sources

(classes from the X-ray classifier -- red.)

ReferenceA.A. Suchkov, T.A. McGlynn, L. Angelini, M.F. Corcoran, S.A. Drake, W.D. Pence, N. White, E.L. Winter, R.J.Hanisch, R.L. White, M. Postman, M.E. Donahue, F. Genova, F. Ochsenbein, P. Fernique, & S. Derriere, 2002.Automated Object Classification with ClassX, Astro-ph/0210407

Introduction: ClassX classifiers. Clas-

sification of observed astronomical

objects plays in major role in converting

observational data into science. It is also

trickier than one might guess because

the class categories often overlap: the

same object can be called a star and a

white dwarf, a galaxy and an AGN, an

AGN and a QSO, etc. The situation gets

even more complicated when the same

object is viewed with different instru-

ments: for instance, at the position of an

X-ray cluster of galaxies, an optical

counterpart from, say, GSC2, would typ-

ically be a galaxy rather than a combined

entity called a cluster of galaxies. These

and similar conceptual issues related to

the ClassX project were discussed ear-

lier by Suchkov et al. (2002).

This paper. In this paper, we classify

previously unidentified ROSAT sources

with several different classifiers, each

trained with a different set of parame-

ters. For instance, the training of the “X-

ray” classifier involves X-ray magni-

tudes but not optical and infrared magni-

tudes, while the training of the “X-ray

and optical” classifier involves both X-

ray and optical magnitudes. Fig. 1 com-

pares class frequency of two samples of

X-ray sources with classification from

three classifiers.

Data. The WGA catalog of X-ray

sources from ROSAT PSPC observa-

tions contains 36995 sources for which

we found optical counterparts within 30

arcsec in the GSC2, with both F and J

magnitudes. Of those, 6505 sources

were classified in the WGACAT; we

used this sample to train our classifiers.

The classifiers were then applied to the

remaining 30490 sources to determine

the object type (class) associated with

these previously unclassified sources

(see Fig. 1).

AAS Meeting 201, January 5 – 9, Seattle, WA

Class properties of previously unclassified sources. Not surprisingly, the unclassified sources are on average fainter, which implies that the respective

class objects are on average more distant or less luminous. We expect some systematic differences between classes in the classified and unclassified sam-

ples. Illustrations of such differences can be found in the figures presented here. For example, QSOs are much more common at fainter magnitudes, which

accounts for the large increase in the fraction of QSOs compared with the training set.

Observational biases. Class properties in the classified and unclassified samples are also different because of different source detectability in different

bands.. Bluer 2MASS colors are found in the unclassified sample because at faint magnitudes detections in the K band are possible only for bluer sources

(Fig. 7, upper panel). Similarly, fainter sources are softer in the X-ray because detections in the hard band, x3, are available only for softer sources (Fig. 7,

lower panel).

Future work. Clearly these statistical checks on the properties of different classes are not a substitute for checking the accuracy of the classifications using

spectroscopically identified sources. In the near future we plan detailed comparisons of our classes with external data such as SDSS.

HighlightsCompared with the previously classified objects, for the newly classified sources:

• QSOs and clusters of galaxies are much more common (whereas stars dominate the

training set.)

• All classes in the newly classified sample are softer in X-rays (except for OF stars).

• All classes in the newly classified sample are bluer in the 2MASS bands.

• Class QSO is the “softest” and much softer than class AGN in both samples.

• Class AGN is the reddest and much redder than class QSO in both samples.

• AGNs, galaxies, & clusters of galaxies all show bimodal IR color distributions.

• In the infrared, AGNs, galaxies, and clusters of galaxies are dominated by the group of

blue 2MASS counterparts as opposed to the group of red counterparts in the classified

sample.

0.0 0.2 0.4 0.6 0.8P(QSO or AGN)

0.0

0.2

0.4

0.6

0.8

1.0

P(S

tar)

QSO+AGN

Stars

Other

Validation of ClassX classification. We explore validity of the ClassX

classification using a variety of checks on the internal and external consis-

tency of the classification results. Figs. 5 and 6 display the distribution of

the 2MASS J-K color, a parameter that was not used in the training or

classification of the sources. Comparing the two figures, we notice a num-

ber of common features. For example, the distribution of AGNs is obvi-

ously bimodal in both the classified and unclassified samples, which

isolates two groups, blue and red, centered at ~0.7 and ~1.4 (although the

relative prominence of the two groups is different for the two samples).

This kind of consistency suggests that the classifier does indeed a good

job statistically in identifying AGNs among X-ray sources.

Fig. 7 shows the class variation of the mean infrared and X-ray colors for

classified and unclassified sources. There is a remarkable consistency

between the two samples in the color variation from class to class:

Classes that are redder/softer in the classified sample are also redder/

softer in the unclassified sample. Again, this is indicative of a substantial

degree of reliability of the ClassX classification.

Fig. 4 Separation in probabilities. The total probabili-

ties P(Star) vs. P(QSO) are plotted for the combined

classes of Fig. 3. Note that the histograms in Fig. 3

result from summing this distribution along the y direc-

tion. Most of the stars are very well separated from the

other classes, as are many of the QSOs+AGN. Objects

near the intersections and boundaries are difficult to

classify.

0.0 0.2 0.4 0.6 0.8Probability (QSO)

0

500

1000

1500

2000

2500

Num

ber

0.0 0.2 0.4 0.6 0.80

500

1000

1500

2000

2500 Distribution of QSO probabilitiesNon-QSOs

QSOs

Fig. 3 Combining class probabilities. The class

probabilities can be usefully combined to compare

groups of similar classes. Here the normal stellar

classes have been combined into a single “Stars” class,

QSOs & AGNs have been combined into a second

class, and the X-ray Binary, Galaxy and Cluster

classes are left unchanged. Now the QSOs/AGNs

(red) separate very well from the stars (blue).

0.0 0.2 0.4 0.6 0.8Probability (QSO or AGN)

0

500

1000

1500

Num

ber

0.0 0.2 0.4 0.6 0.80

500

1000

1500 Distribution of combinedQSO+AGN probabilities

StarsQSO/AGNs

Other