how common are the magellenic clouds? (in...
TRANSCRIPT
How Common are the Magellenic Clouds? (In
Progress)
Lulu Liu
March 31, 2010
1 Introduction
The tremendous amount of data generated by the Sloan Digital Sky Survey lends
itself to statistically significant observational tests on the current theories of galaxy
formation and evolution. This document outlines in detail our investigation so far,
which seeks to establish empirical corroboration with the galaxy clustering predictions
of the latest numerical galaxy formation models. Here, we specifically target the
occurrence of large satellites around Milky Way-Sized isolated host galaxies in order
to address the statistical likelihood of the existence of the LMC and SMC in the
Local Group, however, we expect that the methodology developed for extracting
useful background-subtracted statistics from the SDSS catalog can be adapted to
future studies in galaxy-clustering and satellite formation.
2 Assumptions
Several assumptions are implicit in this analysis. All distances are calculated using
the benchmark cosmological model with Ωm ≈ 0.3, ΩΛ ≈ 0.7, and h ≈ 0.7.
...what else?
Finally, a generally one-to-one, direct correlation between mass and luminosity
is assumed for central galaxies of interest. Ordinary galaxies of similar luminosity
are expected to be approximately equal in mass. The abundance matching technique
developed by (Risa’s paper) et al, allows us to assign virial masses and radii to galaxies
of certain luminosities with a well-defined uncertainty.
1
3 The SDSS Catalogs
We draw our primary galaxy sample from the k-corrected NYU Value Added Galaxy
Catalog derived from the most recent (7th) data release of the Sloan Digital Sky
Survey (SDSS), filename kcorrect.none.model.z0.10.fits. This database contains pho-
tometric information for all objects matching the main sample target criteria of the
SDSS imaging survey as well as spectroscopic data for the subset of those which are
non-conflicting and available. We cross reference the parallel spectroscopic catalog,
filename object sdss spectro.fits, and use the information contained in the PRIMTARGET
tag [2] as well as the stated apparent magnitude limit of the galaxy survey to isolate
only those members of the main catalog identified as potential milky-way sized or
larger host galaxies, the set of which we will from now on refer to as our main galaxy
sample.
As the Sloan Digital Sky Survey is a ground-based study the apparent magni-
tude limit for galaxies in the main sample is around 17.77. Since we are interested
in relatively dim bodies (satellites between 2 and 4 magnitudes dimmer than their
primaries), the spectroscopic catalog alone is insufficient for our purposes. For this
search we avail ourselves of the Stripe 82 photometric database, which, as a result
of arduous summing over many exposures, deepens our seeing by several orders to
a magnitude limit of about 23. The general strategy is such: locate candidate host
galaxies with certain desired traits within the main sample, and use the celestial co-
ordinates of these candidates to cross correlate and conduct our search for satellites
within the deeper Co-Add database.
That there exists no spectroscopic data, and consequently, no high precision red-
shift values, for galaxies in the Stripe 82 catalog complicates our investigation. Much
of our attention will be paid to establishing a reliable method for foreground/background
subtraction which will separate the signal, the presence of satellites within a certain
radius of candidate host galaxies, from the noise, line-of-sight contamination from
galaxies of a wide range of redshifts.
Though not of a high enough precision to eliminate the need for foreground/background
subtraction, the existence of photometric redshift values for members of the Co-Add
catalog will certainly assist in bringing the noise down to a more manageable level.
Since our candidate hosts in the spectroscopic catalog range between z=0 and z=0.15,
reasonable faith in the photometric redshift values of objects in the stripe 82 catalog
should allow us to selectively dial down the noise while keeping the signal mostly
2
intact. The trade-offs involved here will be examined in Section 8.1. The general
strategy imposes a sharp maximum redshift cut on the members of Stripe 82 catalog
based on the spectroscopic redshift values of our candidate hosts. The choice, which
will be explained later, is zmax = 0.21, which trims the catalog from some 13 million
elements to a much more tractable 470,000.
4 Selection of Milky Way-Sized Central Galaxies
4.1 Preliminary Cut on Apparent Magnitude
Our first challenge is selecting, from our main galaxy sample, a statistically robust
set of suitable hosts. The main catalog is populated by a total of about 2.5 million
objects, around 750,000 of which are galaxies. QSO’s are tagged independently and
are excluded from this sample since those likely to scatter into our magnitude and
redshift ranges correspond to low mass galaxies with bright nuclei, distinctly non-
Milky-Way-like outliers in our assumed one-to-one mapping of luminosity to galactic
mass –citation=Risa & co.’s abundance matching paper.
Within this sample we make several cuts. The apparent magnitude limit of galax-
ies surveyed by the SDSS is 17.77 (citatioN?). Ideally, an apparent magnitude distri-
bution of our galaxy sample should show a sharp cut-off at this value and no scatter
beyond. However, due to photometric recalibration and post-processing effects, the
hard limit is smoothed somewhat in our sample obtained from the NYU-VAGC and
there appears to be a small tail of dimmer objects. To avoid potentially systematic
selection effects, we limit our pool of candidate hosts to only those galaxies brighter
than the peak magnitude of 17.60, about 600,000 in all. See Figure 1.
4.2 Spatial Constraints
Since our actual search is to be conducted within the Stripe 82 data, we trim our
main catalog to the corresponding region of the sky (with right ascension between
309.1 and 59.8 and declination between -1.258 and +1.258). This reduces our
sample size to about 23,000.
We then carefully consider edge effects which may result in a systematic under-
counting of neighbors, and conservatively eliminate the candidates which are within a
certain distance Riso of any bounding edge of the Stripe 82 region. The optimal value
of Riso will be discussed in Section 4.4. Depending on the choice of a reasonable Riso,
3
Figure 1: The apparent magnitude distribution of galaxies takes on the usual appearance for
magnitudes brighter than 17.60. We limit our pool of potential hosts to objects in this region only.
this constraint has the effect of throwing out an additional 10% to 20% of candidate
hosts.
4.3 Magnitude Binning
Our best estimates for the luminosity of the Milky Way Galaxy place its v-band
absolute magnitude, Mv, at around -20.9 [1]. As measurements were only taken
and absolute magnitudes computed in the UGRIZ bands for the sdss catalog, a
V −R value of X is used to convert between the two systems. The ”milky-way sized”
constraint is imposed here as a ±0.2 bin around an R-band absolute magnitude of
-20.73. There are 3200 such galaxies within our sample.
4.4 Isolation Criteria
We are interested specifically in galaxy clusters with Milky-Way sized primaries and
to this end require that no candidate host is itself a satellite of a larger galaxy. Rather
primitively, we begin by imposing the most straight-forward criterion available: we
require that there exists no galaxy potentially more massive within a certain radius
Riso of our candidate host. Specifically, within this region, we seek the answers to two
questions: (1) Does there exist a more luminous galaxy at the same redshift as our
4
candidate? (2) Is there another galaxy possibly larger than our candidate for whom
we have no redshift information?
In the first test we compare absolute magnitudes and allow a velocity dispersion
of 1000km/s. In the second test we compare apparent magnitudes only, given a null
value for redshift. A ’yes’ to either question is enough to eliminate a galaxy from our
pool of milky-way primaries.
The first condition proves much more restrictive than the second, as the main
sample of galaxies is well-covered spectroscopically. The difference between applying
condition (1) only and conditions (1) and (2) simultaneously is only a few galaxies;
still, we choose to go with the more conservative sample in each case.
0 0.2 0.4 0.6 0.8 1 1.2Distance Cut (Mpc)
0.4
0.5
0.6
0.7
0.8
0.9
1
Com
plet
enes
s / P
urity
Fra
ctio
n
CompletenessPurity
2D projected distance, 1000km/s redshift slice
Figure 2: Purity and completeness of our sample of isolated Milky Way-Sized centrals as a function
of a varying isolation radius, Riso. An N-Body simulation was used to derive these values.
There are certain trade-offs in our choice of Riso in terms of completeness and
purity of the host sample. The conservative choice is a larger Riso, which while
ensuring a greater degree of isolation, thus a ’purer’ sample of Milky Way-Sized
centrals, would preclude the detection of certain actually isolated candidates and
harm the completeness of our sample. Though purity is of more import as it impacts
the accuracy and relevance of the results of this study, due to the limiting nature
of our candidate selection, a more complete sample stands to improve the statistical
5
significance of our measurements. Therefore, our choice for Riso seeks to maximize
completeness while holding impurities to an acceptably low level.
Figure 2 is the result of numerical simulations on LCDM galaxy formation models
performed by Peter Behroozi (of Stanford University) using the ((())) . Completeness
is shown as the fraction out of all Milky Way-Sized central galaxies which are included
in our sample using this isolation method as a function of Riso. Purity is plotted as
the fraction of our sample of candidate primaries which are actually primaries and
not satellites of more massive galaxies.
The results are encouraging. Given a target purity of well above 90%, we are
able to choose the Riso which yields the most complete sample. We make our cut
at Riso = 0.6 for the moment and check the stability of our results in this region in
Section 8 upon variation of this parameter.
5 Properties of the Candidate Primaries
Figure 3: Histogram of redshift values of 1332 Milky Way-Sized hosts. 0.005 bins.
5.1 Redshift Distribution
Our 1332 hosts, selected for 0.6 Mpc isolation, are distributed in the expected way.
Since redshift is a measure of cosmic distance, it is natural that the number of objects
6
Figure 4: Histogram of apparent magnitudes of 1332 Milky Way-Sized hosts is correlated with
value. Bins are of width 0.1.
Figure 5: Histogram of absolute magnitude values of 1332 Milky Way-Sized hosts.
7
within ∆Z of a certain Z will increase as Z increases. The abrupt drop-off is due to
a correlation between apparent magnitude and redshift. Figure 3.
5.2 Apparent Magnitude Distribution
The sharp apparent magnitude cut-off imposed earlier on the main galaxy sample is
evident here. Bins are 0.1 wide. Figure 4.
5.3 Absolute Magnitude Distribution
Our selected galaxies, on the other hand, given our selection criteria, are distributed
fairly uniformly within a thin band in absolute magnitude. Figure 5.
6 Satellites of Milky Way-Sized Primaries
6.1 Definition of a Satellite
We consider any galaxy lying within the virial radius of another larger (read: more
luminous) galaxy to be a satellite of that galaxy. Galaxies with v-band magnitudes
around -20.9 have been found by (()) methods to have dark matter halos extending
out to distances of XX kpc or more.
Centered on each milky way-sized host, we limit our search to a circular aperture
with radius 100kpc ((how is this chosen?)). We do not specifically correct for projec-
tion effects on this level as it will be well-covered by the error subtraction later on.
Since we are interested in LMC/SMC-type objects around Milky Way-sized centrals,
we further impose a lower and upper limit on the apparent luminosity of the satellite
objects. For our standard exercise we search between 2 and 4 magnitudes dimmer
than the host (bracketing the brightness of the SMC and LMC relative to the Milky
Way).
6.2 Signal Counts and Noise Subtraction Overview
Our search is conducted within the photometric Stripe 82 Co-Add database. For
observation of any piece of the sky, the lack of redshift information presents an ana-
lytical challenge in isolating members of physical clusters from the inherent noise due
to line-of-sight projection effects.
8
A sort of foreground-background subtraction must take place. Since there is much
difficulty, on the level of individual objects, to discern with good accuracy the reality
of its apparent proximity to one another, we choose to work on a statistical level.
We essentially employ a three-part process to separate signal from noise, roughly
represented in Figure 6 and discussed in detail in the following sections.
Figure 6: With our target galaxy at the center of the aperture, we employ a 3-Part process to
statistically remove line-of-sight projected noise.
1. Retrieve composite signal and noise. We count up all luminous objects around
each candidate host fitting a certain luminosity and projected angular distance
criterion. See Panel (1) of Figure 6 and Section 6.3.
2. Generate noise profile. Mimicking the redshift and apparent magnitude dis-
tributions of candidate hosts, a set of virtual milky way-sized central galaxies
are generated with no preference for spatial position. Conducting an identical
search around these virtual galaxies allows us to sample the line-of-sight noise
which we assume to be relatively isotropic within the Stripe 82 region. See
Panel (2) of Figure 6 and Section 6.4.
3. Extract signal. A deconvolution is performed given this noise profile in order to
extract the signal from the composite counts. Panel (3) of Figure 6 and Section
6.5.
9
6.3 All Potential Satellites
To retrieve the total composite counts of all potential satellites around each Milky
Way-Sized primary, we draw a perfectly circular aperture centered on the candidate
host with angular radius equivalent to 100 kpc at the appropriate redshift. In this
cone-shaped region our algorithm collects every object within a certain relative bright-
ness of the host. Naturally, this sample is populated by not only actual satellites of
the galaxy in question, but foreground and background galaxies which have scattered
into the apparent magnitude bins of interest. Depending on the per unit area density
of objects in the Stripe 82 catalog, this figure could be mostly signal or mostly noise.
A probability density function (PDF) is constructed out of the normalized his-
togram of number counts of objects around each candidate host and represents the
probability of finding N objects within a 100 kpc aperture around a targeted galaxy.
See Figure 7 starred data points.
6.4 Isotropic Noise Profile
The same algorithm which searches in the vicinity of actual galaxies can be used to
generate a probability density profile of foreground and background contaminating
objects, which we call the isotropic noise profile, by randomizing the centroid location
of each aperture in Right Ascension and Declination, while mimicking the targeted
search in every other way.
In short, a set of 25000 virtual galaxies are generated and each assigned a set of
properties (absolute magnitude, apparent magnitude, redshift) belonging to a ran-
domly chosen target host galaxy. Only the RA and Dec coordinates are reassigned
randomized values within the confines of Stripe 82. This manner of setting param-
eters is preferred to a completely stochastic attempt to mimic the distributions of
the target galaxies due to a somewhat complicated dependence of R-band apparent
magnitude on redshift and luminosity given absolute magnitude.
The 25000 virtual galaxies are then subject to the identical isolation constraints
as the previous target class, with one additional imposition. In order that our noise
profile not be contaminated by the signal we seek, we require that no randomly
generated aperture encloses, or is within 100 kpc of enclosing, a milky-way sized
galaxy in our host sample. Ultimately, around 15,000 to 17,000 data points are used
to generate the isotropic noise PDF, see Figure 7 dotted line.
10
6.5 Signal Extraction via Deconvolution
We assume the signal– the number of satellites around a milky-way sized galaxy– and
the noise– the density of line-of-sight projected objects– are uncorrelated random
variables. Both with units of counts, for simplicity, we assign them the variables S
and N , respectively. p(S) is then naturally the probability of finding S LMC/SMC-
like satellites within 100 kpc of a Milky Way-Sized galaxy and p(N) is the probability
of finding N objects in a random, signal-free, 100 kpc aperture.
We can write the PDF of the random variable Z = N + S as the convolution of
the two individual PDFs.
p(Z) = p(N + S) = p(N) ∗ p(S) (1)
In Fourier space,
F(p(Z)) = F(p(N)) · F(p(S)) (2)
We measure p(Z) and p(N), and we invert equation 2 to find p(S), assuming each
function to be well-behaved and non-singular,
p(S) = F−1
(F(p(Z))
F(p(N))
)(3)
The Fast Fourier Transform (FFT) algorithm in Matlab assists us in finding p(S).
To check the validity of our results, we perform an integral convolution p(S) ∗ p(N)
and compare with the empirical p(Z). Their agreement (starred points and solid line
in Figure 7) affirms our earlier assumptions.
6.6 Error Bars
Since we seek to maximize our statistics we choose to generate each PDF with the full
available sample. In the case of our Milky Way-Sized candidate hosts, we have only
one such set, so repeated trials is not an option. Instead, the jackknife resampling
technique is used to estimate the bias-corrected mean and error on each histogram
value.
With Stripe 82 divided into 30 spatial sections of equal area, we use the subtract-
one technique to compute the normalized PDF of counts 30 times, each time omitting
one such section. The average in each histogram bin is the unbiased mean, and
11
Figure 7: Deconvolution through Fast Fourier Transform.
the errors on these values are approximately√N − 1 · σi, where σi is the standard
deviation in each bin of these subtract-one values.
Error bars on our histograms are then propagated through the deconvolution into
our results using stochastic methods. The same computation is performed approxi-
mately 10,000 times, each time with a set of initial values (for the PDF’s) randomly
drawn from Gaussian distributions with the means and standard deviations found
above. The result is a normalized vector of values representing the probability of
occurrence of N satellites around a Milky Way-Sized primary, with N = 0,1,2,3...
The spread in each bin is the empirically determined uncertainty.
7 Initial Results
For the 1332 Milky Way-Sized galaxies discussed in Section 5, we find that the prob-
ability of occurrence of zero, one, and two bound satellite galaxies is 0.61 ± 0.05, 0.21
± 0.8, and 0.12 ± 0.9 respectively, which compares well with results from the Bolshoi
numerical simulation on large-scale galaxy formation [3]. Full results are displayed in
Figure 8.
12
−1 0 1 2 3 4 5 6 7 8 9 10−0.2
0
0.2
0.4
0.6
0.8
Number of Satellites
Norm
aliz
ed P
robab
ilit
y
Noise Profile
Composite Counts
Satellites
Figure 8: Deconvolved PDF of Satellites Around Milky Way-Sized Galaxies, generated with Stripe
82 catalog trimmed to zphot < 0.21 .
13
8 Variation of Parameters
8.1 Choice of Maximum Photometric Redshift Cut-Off
An otherwise trivial exercise of counting is complicated immensely by lack of precise
redshift information for the dim objects of interest populating the Stripe 82 Co-Add
catalog. Much of our problem becomes a matter of finding an optimal signal to
noise ratio. Since our candidate hosts extend no further than z = 0.15, including
objects at redshifts greater than 0.15 in our sample of potential satellites in reality
adds only noise, as our signal, the count of actual satellites, remains unchanged. We
have, however, in our possession no precise measurement of the redshift of the deeper
catalog, so we must be careful not to eliminate vital signal and introduce a significant
systematic error to our findings. Therefore, we have in our hands yet another purity
vs. completeness problem. But this time, the emphasis is on completeness. While
losing as few satellites as possible, we seek to make the most efficient redshift cut we
can in the Co-Add catalog, where the only redshift information we have follows from
the Photo-Z work of ((Ribamar’s group)).
We cite their future paper as reference for a more detailed description of their
work. For our purposes, we present a relevant portion of the results of their Photo-Z
validation set which proved to be tremendously helpful as a estimator of systematic
error (Figure 9). A validation set is used to quantify the effectiveness of any Photo-Z
algorithm. It is a database of objects of disparate apparent magnitudes and known
redshifts which is fed through the procedure and each assigned a Photo-Z same as an
object of unknown redshift would be. Ideally, the two redshifts should be the same,
but due to the imperfect nature of the exercise, at times, the photometric redshift fails
quite dramatically to approximate the spectroscopic redshift of the object, usually
by overestimating. In these cases, a real satellite galaxy of redshift z < 0.15 may be
mistaken as a larger galaxy at a much higher redshift and left out of our analysis.
The probability of this occurring varies with apparent magnitude, with dimmer
objects more likely to have failed Photo-Z’s. Thus, in order to obtain a realistic
account of the systematic error introduced by performing the cut in photo-Z, we
must use a sample of the validation set which has the same distribution in apparent
magnitude (and to some extent, spectroscopic redshift), as those which we ultimately
count as satellites. When this matching is applied, we obtain the result in Figure 10.
The general shape of the distribution is a skewed Gaussian with an apparently
constant noise floor, indication that catastrophic failures are equally likely to result
14
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Known Spectroscopic Redshift
Gen
erat
ed P
hoto
met
ric
Red
shif
t
SDSS ObjectsY=X
Region of Interest
Figure 9: Photo-Z validation set for those objects with apparent magnitudes brighter than 21.77
(corresponding to a limit of 4 magnitudes dimmer than the dimmest object in the spectroscopic
catalog). Plotted is spectroscopic redshift against photometric redshift. Those objects which lie far
from the y = x line are catastropic failures which are of concern.
15
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
2
4
6
8
10
12
Photometric Redshift
Pro
bab
ilit
y D
ensi
ty
Probability Density
Cumulative Probability
0
0.2
0.4
0.6
0.8
1.0
Cum
ula
tive
Dis
trib
uti
on
Figure 10: PDF of objects in Photo-Z validation set matched in apparent magnitude with our
satellite objects of interest in the Stripe 82 catalog. The cumulative distribution gives us the com-
pleteness of our sample.
16
in a mis-assignment in any photo-z bin. Analyzing our cumulative distribution, we
find that a photometric redshift cut at 0.21 corresponds to no better than about 85%
completeness.
In progress: there are several ways to incorporate this knowledge into our analy-
sis. 1) Deem the systematic error small enough to be negligible by comparison with
random error. 2) Improve the completeness of the sample so that 1) is true. This
can be done by a) increasing the max photo-z threshold or b) changing the apparent
magnitude lower limit on candidate hosts which removes the dimmest satellites from
contention. The trouble with the a) is the issue of too much noise. The trouble
with b) is lack of statistics. Both result in an expansion of error bars. 3) Which
may be the best option ultimately, is adjusting our PDF taking into account this
effect. Essentially introducing a second random variable, η, – number of satellites
missed due to photo-z errors– which is uncorrelated with the measured signal (that
is, we are no more or less likely to lose a satellite from a galaxy with 3 than with 2
satellites). We must make certain informed assumptions about the PDF of η. The
simplest case ignores the possibility of losing two or more satellites from a single
galaxy. Given a completeness of 85%, the PDF would be 0.85 for η = 0, and 0.15
for η = 1, and uniformly zero everywhere else. Our bias adjusted results would be
simply the convolution of the PDF from Section 7 with p(η).
References
[1] van den Bergh, Sidney. ”The Local Group of Galaxies.” Astronomy and Astro-
physics Review (1999) 9: 273-318.
[2] Schlegel, David. ”Princeton/MIT SDSS Spectroscopy Home Page.”
http://spectro.princeton.edu/
[3] Michael’s paper.
17