how common are the magellenic clouds? (in...

How Common are the Magellenic Clouds? (In

Progress)

Lulu Liu

March 31, 2010

1 Introduction

The tremendous amount of data generated by the Sloan Digital Sky Survey lends

itself to statistically significant observational tests on the current theories of galaxy

formation and evolution. This document outlines in detail our investigation so far,

which seeks to establish empirical corroboration with the galaxy clustering predictions

of the latest numerical galaxy formation models. Here, we specifically target the

occurrence of large satellites around Milky Way-Sized isolated host galaxies in order

to address the statistical likelihood of the existence of the LMC and SMC in the

Local Group, however, we expect that the methodology developed for extracting

useful background-subtracted statistics from the SDSS catalog can be adapted to

future studies in galaxy-clustering and satellite formation.

2 Assumptions

Several assumptions are implicit in this analysis. All distances are calculated using

the benchmark cosmological model with Ωm ≈ 0.3, ΩΛ ≈ 0.7, and h ≈ 0.7.

...what else?

Finally, a generally one-to-one, direct correlation between mass and luminosity

is assumed for central galaxies of interest. Ordinary galaxies of similar luminosity

are expected to be approximately equal in mass. The abundance matching technique

developed by (Risa’s paper) et al, allows us to assign virial masses and radii to galaxies

of certain luminosities with a well-defined uncertainty.

1

3 The SDSS Catalogs

We draw our primary galaxy sample from the k-corrected NYU Value Added Galaxy

Catalog derived from the most recent (7th) data release of the Sloan Digital Sky

Survey (SDSS), filename kcorrect.none.model.z0.10.fits. This database contains pho-

tometric information for all objects matching the main sample target criteria of the

SDSS imaging survey as well as spectroscopic data for the subset of those which are

non-conflicting and available. We cross reference the parallel spectroscopic catalog,

filename object sdss spectro.fits, and use the information contained in the PRIMTARGET

tag [2] as well as the stated apparent magnitude limit of the galaxy survey to isolate

only those members of the main catalog identified as potential milky-way sized or

larger host galaxies, the set of which we will from now on refer to as our main galaxy

sample.

As the Sloan Digital Sky Survey is a ground-based study the apparent magni-

tude limit for galaxies in the main sample is around 17.77. Since we are interested

in relatively dim bodies (satellites between 2 and 4 magnitudes dimmer than their

primaries), the spectroscopic catalog alone is insufficient for our purposes. For this

search we avail ourselves of the Stripe 82 photometric database, which, as a result

of arduous summing over many exposures, deepens our seeing by several orders to

a magnitude limit of about 23. The general strategy is such: locate candidate host

galaxies with certain desired traits within the main sample, and use the celestial co-

ordinates of these candidates to cross correlate and conduct our search for satellites

within the deeper Co-Add database.

That there exists no spectroscopic data, and consequently, no high precision red-

shift values, for galaxies in the Stripe 82 catalog complicates our investigation. Much

of our attention will be paid to establishing a reliable method for foreground/background

subtraction which will separate the signal, the presence of satellites within a certain

radius of candidate host galaxies, from the noise, line-of-sight contamination from

galaxies of a wide range of redshifts.

Though not of a high enough precision to eliminate the need for foreground/background

subtraction, the existence of photometric redshift values for members of the Co-Add

catalog will certainly assist in bringing the noise down to a more manageable level.

Since our candidate hosts in the spectroscopic catalog range between z=0 and z=0.15,

reasonable faith in the photometric redshift values of objects in the stripe 82 catalog

should allow us to selectively dial down the noise while keeping the signal mostly

2

intact. The trade-offs involved here will be examined in Section 8.1. The general

strategy imposes a sharp maximum redshift cut on the members of Stripe 82 catalog

based on the spectroscopic redshift values of our candidate hosts. The choice, which

will be explained later, is zmax = 0.21, which trims the catalog from some 13 million

elements to a much more tractable 470,000.

4 Selection of Milky Way-Sized Central Galaxies

4.1 Preliminary Cut on Apparent Magnitude

Our first challenge is selecting, from our main galaxy sample, a statistically robust

set of suitable hosts. The main catalog is populated by a total of about 2.5 million

objects, around 750,000 of which are galaxies. QSO’s are tagged independently and

are excluded from this sample since those likely to scatter into our magnitude and

redshift ranges correspond to low mass galaxies with bright nuclei, distinctly non-

Milky-Way-like outliers in our assumed one-to-one mapping of luminosity to galactic

mass –citation=Risa & co.’s abundance matching paper.

Within this sample we make several cuts. The apparent magnitude limit of galax-

ies surveyed by the SDSS is 17.77 (citatioN?). Ideally, an apparent magnitude distri-

bution of our galaxy sample should show a sharp cut-off at this value and no scatter

beyond. However, due to photometric recalibration and post-processing effects, the

hard limit is smoothed somewhat in our sample obtained from the NYU-VAGC and

there appears to be a small tail of dimmer objects. To avoid potentially systematic

selection effects, we limit our pool of candidate hosts to only those galaxies brighter

than the peak magnitude of 17.60, about 600,000 in all. See Figure 1.

4.2 Spatial Constraints

Since our actual search is to be conducted within the Stripe 82 data, we trim our

main catalog to the corresponding region of the sky (with right ascension between

309.1 and 59.8 and declination between -1.258 and +1.258). This reduces our

sample size to about 23,000.

We then carefully consider edge effects which may result in a systematic under-

counting of neighbors, and conservatively eliminate the candidates which are within a

certain distance Riso of any bounding edge of the Stripe 82 region. The optimal value

of Riso will be discussed in Section 4.4. Depending on the choice of a reasonable Riso,

3

Figure 1: The apparent magnitude distribution of galaxies takes on the usual appearance for

magnitudes brighter than 17.60. We limit our pool of potential hosts to objects in this region only.

this constraint has the effect of throwing out an additional 10% to 20% of candidate

hosts.

4.3 Magnitude Binning

Our best estimates for the luminosity of the Milky Way Galaxy place its v-band

absolute magnitude, Mv, at around -20.9 [1]. As measurements were only taken

and absolute magnitudes computed in the UGRIZ bands for the sdss catalog, a

V −R value of X is used to convert between the two systems. The ”milky-way sized”

constraint is imposed here as a ±0.2 bin around an R-band absolute magnitude of

-20.73. There are 3200 such galaxies within our sample.

4.4 Isolation Criteria

We are interested specifically in galaxy clusters with Milky-Way sized primaries and

to this end require that no candidate host is itself a satellite of a larger galaxy. Rather

primitively, we begin by imposing the most straight-forward criterion available: we

require that there exists no galaxy potentially more massive within a certain radius

Riso of our candidate host. Specifically, within this region, we seek the answers to two

questions: (1) Does there exist a more luminous galaxy at the same redshift as our

4

candidate? (2) Is there another galaxy possibly larger than our candidate for whom

we have no redshift information?

In the first test we compare absolute magnitudes and allow a velocity dispersion

of 1000km/s. In the second test we compare apparent magnitudes only, given a null

value for redshift. A ’yes’ to either question is enough to eliminate a galaxy from our

pool of milky-way primaries.

The first condition proves much more restrictive than the second, as the main

sample of galaxies is well-covered spectroscopically. The difference between applying

condition (1) only and conditions (1) and (2) simultaneously is only a few galaxies;

still, we choose to go with the more conservative sample in each case.

0 0.2 0.4 0.6 0.8 1 1.2Distance Cut (Mpc)

0.4

0.5

0.6

0.7

0.8

0.9

1

Com

plet

enes

s / P

urity

Fra

ctio

n

CompletenessPurity

2D projected distance, 1000km/s redshift slice

Figure 2: Purity and completeness of our sample of isolated Milky Way-Sized centrals as a function

of a varying isolation radius, Riso. An N-Body simulation was used to derive these values.

There are certain trade-offs in our choice of Riso in terms of completeness and

purity of the host sample. The conservative choice is a larger Riso, which while

ensuring a greater degree of isolation, thus a ’purer’ sample of Milky Way-Sized

centrals, would preclude the detection of certain actually isolated candidates and

harm the completeness of our sample. Though purity is of more import as it impacts

the accuracy and relevance of the results of this study, due to the limiting nature

of our candidate selection, a more complete sample stands to improve the statistical

5

significance of our measurements. Therefore, our choice for Riso seeks to maximize

completeness while holding impurities to an acceptably low level.

Figure 2 is the result of numerical simulations on LCDM galaxy formation models

performed by Peter Behroozi (of Stanford University) using the ((())) . Completeness

is shown as the fraction out of all Milky Way-Sized central galaxies which are included

in our sample using this isolation method as a function of Riso. Purity is plotted as

the fraction of our sample of candidate primaries which are actually primaries and

not satellites of more massive galaxies.

The results are encouraging. Given a target purity of well above 90%, we are

able to choose the Riso which yields the most complete sample. We make our cut

at Riso = 0.6 for the moment and check the stability of our results in this region in

Section 8 upon variation of this parameter.

5 Properties of the Candidate Primaries

Figure 3: Histogram of redshift values of 1332 Milky Way-Sized hosts. 0.005 bins.

5.1 Redshift Distribution

Our 1332 hosts, selected for 0.6 Mpc isolation, are distributed in the expected way.

Since redshift is a measure of cosmic distance, it is natural that the number of objects

6

Figure 4: Histogram of apparent magnitudes of 1332 Milky Way-Sized hosts is correlated with

value. Bins are of width 0.1.

Figure 5: Histogram of absolute magnitude values of 1332 Milky Way-Sized hosts.

7

within ∆Z of a certain Z will increase as Z increases. The abrupt drop-off is due to

a correlation between apparent magnitude and redshift. Figure 3.

5.2 Apparent Magnitude Distribution

The sharp apparent magnitude cut-off imposed earlier on the main galaxy sample is

evident here. Bins are 0.1 wide. Figure 4.

5.3 Absolute Magnitude Distribution

Our selected galaxies, on the other hand, given our selection criteria, are distributed

fairly uniformly within a thin band in absolute magnitude. Figure 5.

6 Satellites of Milky Way-Sized Primaries

6.1 Definition of a Satellite

We consider any galaxy lying within the virial radius of another larger (read: more

luminous) galaxy to be a satellite of that galaxy. Galaxies with v-band magnitudes

around -20.9 have been found by (()) methods to have dark matter halos extending

out to distances of XX kpc or more.

Centered on each milky way-sized host, we limit our search to a circular aperture

with radius 100kpc ((how is this chosen?)). We do not specifically correct for projec-

tion effects on this level as it will be well-covered by the error subtraction later on.

Since we are interested in LMC/SMC-type objects around Milky Way-sized centrals,

we further impose a lower and upper limit on the apparent luminosity of the satellite

objects. For our standard exercise we search between 2 and 4 magnitudes dimmer

than the host (bracketing the brightness of the SMC and LMC relative to the Milky

Way).

6.2 Signal Counts and Noise Subtraction Overview

Our search is conducted within the photometric Stripe 82 Co-Add database. For

observation of any piece of the sky, the lack of redshift information presents an ana-

lytical challenge in isolating members of physical clusters from the inherent noise due

to line-of-sight projection effects.

8

A sort of foreground-background subtraction must take place. Since there is much

difficulty, on the level of individual objects, to discern with good accuracy the reality

of its apparent proximity to one another, we choose to work on a statistical level.

We essentially employ a three-part process to separate signal from noise, roughly

represented in Figure 6 and discussed in detail in the following sections.

Figure 6: With our target galaxy at the center of the aperture, we employ a 3-Part process to

statistically remove line-of-sight projected noise.

1. Retrieve composite signal and noise. We count up all luminous objects around

each candidate host fitting a certain luminosity and projected angular distance

criterion. See Panel (1) of Figure 6 and Section 6.3.

2. Generate noise profile. Mimicking the redshift and apparent magnitude dis-

tributions of candidate hosts, a set of virtual milky way-sized central galaxies

are generated with no preference for spatial position. Conducting an identical

search around these virtual galaxies allows us to sample the line-of-sight noise

which we assume to be relatively isotropic within the Stripe 82 region. See

Panel (2) of Figure 6 and Section 6.4.

3. Extract signal. A deconvolution is performed given this noise profile in order to

extract the signal from the composite counts. Panel (3) of Figure 6 and Section

6.5.

9

6.3 All Potential Satellites

To retrieve the total composite counts of all potential satellites around each Milky

Way-Sized primary, we draw a perfectly circular aperture centered on the candidate

host with angular radius equivalent to 100 kpc at the appropriate redshift. In this

cone-shaped region our algorithm collects every object within a certain relative bright-

ness of the host. Naturally, this sample is populated by not only actual satellites of

the galaxy in question, but foreground and background galaxies which have scattered

into the apparent magnitude bins of interest. Depending on the per unit area density

of objects in the Stripe 82 catalog, this figure could be mostly signal or mostly noise.

A probability density function (PDF) is constructed out of the normalized his-

togram of number counts of objects around each candidate host and represents the

probability of finding N objects within a 100 kpc aperture around a targeted galaxy.

See Figure 7 starred data points.

6.4 Isotropic Noise Profile

The same algorithm which searches in the vicinity of actual galaxies can be used to

generate a probability density profile of foreground and background contaminating

objects, which we call the isotropic noise profile, by randomizing the centroid location

of each aperture in Right Ascension and Declination, while mimicking the targeted

search in every other way.

In short, a set of 25000 virtual galaxies are generated and each assigned a set of

properties (absolute magnitude, apparent magnitude, redshift) belonging to a ran-

domly chosen target host galaxy. Only the RA and Dec coordinates are reassigned

randomized values within the confines of Stripe 82. This manner of setting param-

eters is preferred to a completely stochastic attempt to mimic the distributions of

the target galaxies due to a somewhat complicated dependence of R-band apparent

magnitude on redshift and luminosity given absolute magnitude.

The 25000 virtual galaxies are then subject to the identical isolation constraints

as the previous target class, with one additional imposition. In order that our noise

profile not be contaminated by the signal we seek, we require that no randomly

generated aperture encloses, or is within 100 kpc of enclosing, a milky-way sized

galaxy in our host sample. Ultimately, around 15,000 to 17,000 data points are used

to generate the isotropic noise PDF, see Figure 7 dotted line.

10

6.5 Signal Extraction via Deconvolution

We assume the signal– the number of satellites around a milky-way sized galaxy– and

the noise– the density of line-of-sight projected objects– are uncorrelated random

variables. Both with units of counts, for simplicity, we assign them the variables S

and N , respectively. p(S) is then naturally the probability of finding S LMC/SMC-

like satellites within 100 kpc of a Milky Way-Sized galaxy and p(N) is the probability

of finding N objects in a random, signal-free, 100 kpc aperture.

We can write the PDF of the random variable Z = N + S as the convolution of

the two individual PDFs.

p(Z) = p(N + S) = p(N) ∗ p(S) (1)

In Fourier space,

F(p(Z)) = F(p(N)) · F(p(S)) (2)

We measure p(Z) and p(N), and we invert equation 2 to find p(S), assuming each

function to be well-behaved and non-singular,

p(S) = F−1

(F(p(Z))

F(p(N))

)(3)

The Fast Fourier Transform (FFT) algorithm in Matlab assists us in finding p(S).

To check the validity of our results, we perform an integral convolution p(S) ∗ p(N)

and compare with the empirical p(Z). Their agreement (starred points and solid line

in Figure 7) affirms our earlier assumptions.

6.6 Error Bars

Since we seek to maximize our statistics we choose to generate each PDF with the full

available sample. In the case of our Milky Way-Sized candidate hosts, we have only

one such set, so repeated trials is not an option. Instead, the jackknife resampling

technique is used to estimate the bias-corrected mean and error on each histogram

value.

With Stripe 82 divided into 30 spatial sections of equal area, we use the subtract-

one technique to compute the normalized PDF of counts 30 times, each time omitting

one such section. The average in each histogram bin is the unbiased mean, and

11

Figure 7: Deconvolution through Fast Fourier Transform.

the errors on these values are approximately√N − 1 · σi, where σi is the standard

deviation in each bin of these subtract-one values.

Error bars on our histograms are then propagated through the deconvolution into

our results using stochastic methods. The same computation is performed approxi-

mately 10,000 times, each time with a set of initial values (for the PDF’s) randomly

drawn from Gaussian distributions with the means and standard deviations found

above. The result is a normalized vector of values representing the probability of

occurrence of N satellites around a Milky Way-Sized primary, with N = 0,1,2,3...

The spread in each bin is the empirically determined uncertainty.

7 Initial Results

For the 1332 Milky Way-Sized galaxies discussed in Section 5, we find that the prob-

ability of occurrence of zero, one, and two bound satellite galaxies is 0.61 ± 0.05, 0.21

± 0.8, and 0.12 ± 0.9 respectively, which compares well with results from the Bolshoi

numerical simulation on large-scale galaxy formation [3]. Full results are displayed in

Figure 8.

12

−1 0 1 2 3 4 5 6 7 8 9 10−0.2

0

0.2

0.4

0.6

0.8

Number of Satellites

Norm

aliz

ed P

robab

ilit

y

Noise Profile

Composite Counts

Satellites

Figure 8: Deconvolved PDF of Satellites Around Milky Way-Sized Galaxies, generated with Stripe

82 catalog trimmed to zphot < 0.21 .

13

8 Variation of Parameters

8.1 Choice of Maximum Photometric Redshift Cut-Off

An otherwise trivial exercise of counting is complicated immensely by lack of precise

redshift information for the dim objects of interest populating the Stripe 82 Co-Add

catalog. Much of our problem becomes a matter of finding an optimal signal to

noise ratio. Since our candidate hosts extend no further than z = 0.15, including

objects at redshifts greater than 0.15 in our sample of potential satellites in reality

adds only noise, as our signal, the count of actual satellites, remains unchanged. We

have, however, in our possession no precise measurement of the redshift of the deeper

catalog, so we must be careful not to eliminate vital signal and introduce a significant

systematic error to our findings. Therefore, we have in our hands yet another purity

vs. completeness problem. But this time, the emphasis is on completeness. While

losing as few satellites as possible, we seek to make the most efficient redshift cut we

can in the Co-Add catalog, where the only redshift information we have follows from

the Photo-Z work of ((Ribamar’s group)).

We cite their future paper as reference for a more detailed description of their

work. For our purposes, we present a relevant portion of the results of their Photo-Z

validation set which proved to be tremendously helpful as a estimator of systematic

error (Figure 9). A validation set is used to quantify the effectiveness of any Photo-Z

algorithm. It is a database of objects of disparate apparent magnitudes and known

redshifts which is fed through the procedure and each assigned a Photo-Z same as an

object of unknown redshift would be. Ideally, the two redshifts should be the same,

but due to the imperfect nature of the exercise, at times, the photometric redshift fails

quite dramatically to approximate the spectroscopic redshift of the object, usually

by overestimating. In these cases, a real satellite galaxy of redshift z < 0.15 may be

mistaken as a larger galaxy at a much higher redshift and left out of our analysis.

The probability of this occurring varies with apparent magnitude, with dimmer

objects more likely to have failed Photo-Z’s. Thus, in order to obtain a realistic

account of the systematic error introduced by performing the cut in photo-Z, we

must use a sample of the validation set which has the same distribution in apparent

magnitude (and to some extent, spectroscopic redshift), as those which we ultimately

count as satellites. When this matching is applied, we obtain the result in Figure 10.

The general shape of the distribution is a skewed Gaussian with an apparently

constant noise floor, indication that catastrophic failures are equally likely to result

14

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Known Spectroscopic Redshift

Gen

erat

ed P

hoto

met

ric

Red

shif

t

SDSS ObjectsY=X

Region of Interest

Figure 9: Photo-Z validation set for those objects with apparent magnitudes brighter than 21.77

(corresponding to a limit of 4 magnitudes dimmer than the dimmest object in the spectroscopic

catalog). Plotted is spectroscopic redshift against photometric redshift. Those objects which lie far

from the y = x line are catastropic failures which are of concern.

15

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

2

4

6

8

10

12

Photometric Redshift

Pro

bab

ilit

y D

ensi

ty

Probability Density

Cumulative Probability

0

0.2

0.4

0.6

0.8

1.0

Cum

ula

tive

Dis

trib

uti

on

Figure 10: PDF of objects in Photo-Z validation set matched in apparent magnitude with our

satellite objects of interest in the Stripe 82 catalog. The cumulative distribution gives us the com-

pleteness of our sample.

16

in a mis-assignment in any photo-z bin. Analyzing our cumulative distribution, we

find that a photometric redshift cut at 0.21 corresponds to no better than about 85%

completeness.

In progress: there are several ways to incorporate this knowledge into our analy-

sis. 1) Deem the systematic error small enough to be negligible by comparison with

random error. 2) Improve the completeness of the sample so that 1) is true. This

can be done by a) increasing the max photo-z threshold or b) changing the apparent

magnitude lower limit on candidate hosts which removes the dimmest satellites from

contention. The trouble with the a) is the issue of too much noise. The trouble

with b) is lack of statistics. Both result in an expansion of error bars. 3) Which

may be the best option ultimately, is adjusting our PDF taking into account this

effect. Essentially introducing a second random variable, η, – number of satellites

missed due to photo-z errors– which is uncorrelated with the measured signal (that

is, we are no more or less likely to lose a satellite from a galaxy with 3 than with 2

satellites). We must make certain informed assumptions about the PDF of η. The

simplest case ignores the possibility of losing two or more satellites from a single

galaxy. Given a completeness of 85%, the PDF would be 0.85 for η = 0, and 0.15

for η = 1, and uniformly zero everywhere else. Our bias adjusted results would be

simply the convolution of the PDF from Section 7 with p(η).

References

[1] van den Bergh, Sidney. ”The Local Group of Galaxies.” Astronomy and Astro-

physics Review (1999) 9: 273-318.

[2] Schlegel, David. ”Princeton/MIT SDSS Spectroscopy Home Page.”

http://spectro.princeton.edu/

[3] Michael’s paper.

17

how common are the magellenic clouds? (in...

Documents