miranda braselton, kelley cartwright, michael hurley ...so/redshift_slides.pdf · c. gaussian...

INTRODUCTIONTHE PROBLEM

THE SOLUTIONCONCLUSION

IMPROVED LINEAR ALGEBRA METHODS FOR

REDSHIFT COMPUTATION FROM LIMITED

SPECTRUM DATA

Miranda Braselton, Kelley Cartwright, Michael Hurley,Maheen Khan, Miguel Angel Rodriguez, David Shao,

Jason Smith, Jimmy Ying, Genti Zaimi

December 8, 2006

MIRANDA BRASELTON, KELLEY CARTWRIGHT, MICHAEL HURLEY, MAHEEN KHAN, MIGUEL ANGEL RODRIGUEZ, DAVID SHAO, JASON SMITH, JIMMY YING, GENTI ZAIMIIMPROVED LINEAR ALGEBRA METHODS FOR REDSHIFT COMPUTATION FROM LIMITED SPECTRUM DATA



MOTIVATIONOUTLINE

Drs. Way and Srivastava (2006) compared differentstatistical models to approximate the redshifts of galaxies.

Computational limitations prevented them from using oneparticular model, known as Gaussian process regression,for large data sets.

We will present several efficient methods that allow us toovercome these limitations.

GENTI ZAIMI CAMCOS REPORT DAY, DECEMBER 8, 2006



MOTIVATIONOUTLINE

OUTLINE

I. The Problem:A. Photometric Redshift - Miranda BraseltonB. Gaussian Process - Jason SmithC. Gaussian Elimination - Miguel Angel RodriguezD. Polynomial Kernel - Kelley Cartwright

II. The Solution:A. Reduced Rank - Genti ZaimiB. Cholesky Update - Maheen KhanC. Conjugate Gradient - Jimmy YingD. Gibbs Sampler - Michael HurleyE. Testing Results - David Shao




PHOTOMETRIC REDSHIFTGAUSSIAN PROCESSGAUSSIAN ELIMINATIONPOLYNOMIAL KERNEL

WHAT IS A REDSHIFT?

A redshift is the change in wavelengthdivided by the initial wavelengthIndicates that an object is moving away fromyou

The Doppler Effect:

the light waves emitted

by the moving object shiftMIRANDA BRASELTON CAMCOS REPORT DAY, DECEMBER 8, 2006




WHY ARE THEY IMPORTANT?

Scientists can determine many characteristics ofgalaxies and our universe.This information is useful for understanding thestructure of the universe.

MIRANDA BRASELTON CAMCOS REPORT DAY, DECEMBER 8, 2006




SPECTROSCOPY VS. PHOTOMETRY

Uses diffraction grating tocollect data from few objects

Diffraction grating

Uses filters to collect datafrom multiple objects

Photometric Filters





SPECTROSCOPY VS. PHOTOMETRY

More accurate Faster and cheaper





PHOTOMETRIC DATA

Photometric data collectedfrom a galaxy: area undereach filter curve gives us 5numbers (u, g, r , i , z) that areused in our calculations.





TRAINING SET METHOD

Scientists need to find a better mathematicalmodel for the photometric data that willestimate the redshift.Linear and quadratic regression, ArtificialNeural Network Approach (ANNz), theensemble model, Gaussian process.





REDSHIFT CALCULATIONS

Goal: Predict the redshift of a galaxy from aphotometric measurement (u, g, r , i , z)based upon the known redshifts of othergalaxies.The statistical model developed for thisproject is based on Gaussian process.

JASON SMITH CAMCOS REPORT DAY, DECEMBER 8, 2006




GAUSSIAN PROCESS

A Gaussian process is a collection of randomvariables, any finite number of which have ajoint Gaussian (i.e. bell shaped) distribution.





THE MODEL

Predicted photometric redshift is given by

y = κT(λ2I + K )−1y

K and y come from a training set.κ comes from a new photometricmeasurement.





OUR JOB

Major calculation is reduced to solving thefollowing system:

(λ2I + K )w = y

up to 180,000 equations and 180,000 variables.

MIGUEL ANGEL RODRIGUEZ CAMCOS REPORT DAY, DECEMBER 8, 2006




GAUSSIAN ELIMINATION

general n n = 1000 n ≈ 180, 000

Operations count 23n3 7× 108 4× 1015

Storage n2 106 3× 1010





THE BAD NEWS

Out of memory!

at n ≈ 11, 000





THE CHALLENGE

Find efficient ways to solve linearsystems for large n.





THE COVARIANCE MATRIX, K

K models the interdependenciesbetween the data.Its properties depend on a kernelfunction of the training data

K = Σ(u, g, r , i , z)

Choices of Σ give different models.KELLEY CARTWRIGHT CAMCOS REPORT DAY, DECEMBER 8, 2006




THE POLYNOMIAL KERNEL

K is low rank:

K = QQT

where Q is n by 21.Q can be obtained quickly from the trainingdata.Storage requirement for Q is less than K .This special structure of K leads to fast linearsolvers.

KELLEY CARTWRIGHT CAMCOS REPORT DAY, DECEMBER 8, 2006




AND NOW

Our challenge is to solve

(λ2I + QQT )w = y

quickly for large n.

KELLEY CARTWRIGHT CAMCOS REPORT DAY, DECEMBER 8, 2006



REDUCED RANKCHOLESKY UPDATECONJUGATE GRADIENTGIBBS SAMPLERTESTING RESULTS

BIG-O NOTATION

A procedure is O(n) if the number of operationsrequired is proportional to the input size n.

Let n = 106, then

operationsO(n) 106

O(n3) 1018





SHERMAN-MORRISON-WOODBURY FORMULA

(λ2I + QQT )−1

y =1λ2

[y −Q

((λ2I + QT Q

)−1(QT y)

)]↑ ⇑ ↑

n × n 21×21 21× 1





REDUCED RANK METHOD

ADVANTAGES

This method is O(n), VERY FAST.Storage requirement is O(n).

DISADVANTAGE

Numerical instability





ITERATIVE REFINEMENT

Iterative Refinement is a method for improving the accuracy ofa solution produced by solving a linear equation.

Size Before After50 1.7 ×10−6 9.1 ×10−9

100 1.2 ×10−5 2.2 ×10−8

500 1.1 ×10−4 2.8 ×10−7

1000 3.5 ×10−4 8.4 ×10−7

5000 4.2 ×10−3 1.3 ×10−5

TABLE: Error before and after iterative refinement





SPEED COMPARISON

Method n = 1000 n = 10,000 n = 180,000Gaussian 0.1510 NA NA

EliminationReduced Rank 0.0035 0.0047 0.1173

w/ IterativeRefinement

TABLE: Running time (in seconds) for finding w





CHOLESKY RANK-ONE UPDATE

Find L1, D1 so that LDLT + qqT = LL1D1LT1 LT , where

L1 =

1

p2β1 1p3β1 p3β2 1

......

. . . . . .pnβ1 pnβ2 . . . pnβn−1 1

Storage: only p, β vectors and diagonal entries of D1 arestored which is O(n)

Operations: finding p, β and D1 is O(n)

MAHEEN KHAN CAMCOS REPORT DAY, DECEMBER 8, 2006




CHOLESKY RANK-ONE UPDATE FOR

A = λ2I + QQT = λ2I +∑21

i=1(qiqTi )

λ2I + q1qT1 =

L1D1LT1






A = λ2I + QQT = λ2I +∑21

i=1(qiqTi )

λ2I + q1qT1 + q2qT

2 =

L1L2D2LT2 LT

1






A = λ2I + QQT = λ2I +∑21

i=1(qiqTi )

λ2I + q1qT1 + q2qT

2 + q3qT3 =

How many? L1L2L3D3LT3 LT

2 LT1






A = λ2I + QQT = λ2I +∑21

i=1(qiqTi )

λ2I + q1qT1 + q2qT

2 + . . . + q21qT21 =

Just 21! L1L2 . . . L21D21LT21 . . . LT

2 LT1





FORWARD AND BACKWARD SOLVERS

Solving for (L1L2 . . . L21D21LT21 . . . LT

2 LT1 )w = y

1p2β1 1p3β1 p3β2 1

... ... . . . . . .pnβ1 pnβ2 . . . pnβn−1 1

w1

w2

w3...

wn

=

y1

y2

y3...

yn

wj = yj − pj(β1w1 + β2w2 + . . . + βj−1wj−1)





CHOLESKY UPDATE METHOD

ADVANTAGES

This method is O(n), FAST.Storage requirement is O(n).Numerically more stable.





SPEED COMPARISON


EliminationReduced Rank

w/ Iterative 0.0035 0.0047 0.1173RefinementCholesky 0.2066 0.4437 6.2529Update






CONJUGATE GRADIENT METHOD

A semi-iterative method that works well withspecial (i.e. positive definite) linear system.In practice, it takes a small number ofiterations to get an acceptable solution.Each iteration involves a matrix-vectormultiplication.In general, matrix-vector multiplicationrequires O(n2) operations.

JIMMY YING CAMCOS REPORT DAY, DECEMBER 8, 2006




FAST MULTIPLICATION

The special form of A gives fast matrix-vectormultiplication which requires O(n) operations:

Au = λ2u + Q(QT u)





CONJUGATE GRADIENT METHOD

ADVANTAGES

It is O(n), FAST.Storage requirement is O(n).It converges quickly to the actual solution.

DISADVANTAGE

Susceptible to rounding error for badlybehaved matrices.





SPEED COMPARISON


EliminationReduced Rank

w/ Iterative 0.0035 0.0047 0.1173RefinementCholesky 0.2066 0.4437 6.2529Update

Conjugate 0.2487 0.3058 10.4278Gradient






THE MONTE CARLO IDEA

Predict the mean of a population using theaverage of a sample.

Random (unbiased) sample must be used.

The sample is generated from a simulateddistribution of the population.

MICHAEL HURLEY CAMCOS REPORT DAY, DECEMBER 8, 2006




THE GIBBS SAMPLER

Solve special (i.e. symmetric positivedefinite) linear system using the covariancematrix of a sample of vectors.

Vectors are generated from simulatednormal distributions with means andstandard deviations derived from the givenlinear system.





GIBBS SAMPLER METHOD

Works properly if

1 <max eigenvaluemin eigenvalue

< 100

ADVANTAGE

Works well for the exponential kernel,the above ratio ≈ 50.

DISADVANTAGE

Works badly for the polynomial kernel,the above ratio ≈ 1010.





HOW TO MEASURE A METHOD’S PERFORMANCE

Given separate training and testing datasets, each withugriz and redshift values.

1 Fit model using training dataset.2 Use model to predict redshift values for ugriz of

testing dataset.3 Average the square of the errors, then take square

root to obtain root mean square error (RMS).4 Resample using bootstrap 100 times.

DAVID SHAO CAMCOS REPORT DAY, DECEMBER 8, 2006




QUADRATIC REGRESSION VS GAUSSIAN PROCESS

Quadratic regressionSimplest model with ugriz andinteractions.Baseline method for comparison.

Gaussian processPolynomial kernelThree improved linear algebra methods:Reduced Rank, Cholesky Update, andConjugate Gradient.





TIME FOR 100 BOOTSTRAP RMS CALCULATIONS

Training QR Reduced Cholesky ConjugateSet Size (baseline) Rank Update Gradient

1000 10 10 12 145000 11 12 23 23

10000 14 15 38 4050000 37 44 165 202

100000 68 83 387 578180045 116 142 690 1055

TABLE: Time (in seconds) for 100 bootstrap RMS calculations





REDUCED RANK RMS AS TRAINING SIZE INCREASES

Behavior for ReducedRank as training set sizeincreases is the same asfor Cholesky Update andConjugate Gradient.

As training set sizeincreases, range of RMSdecreases.





RMS FOR SMALL TRAINING SET SIZE 1000

Compare QuadraticRegression, ReducedRank, Cholesky Update,and Conjugate Gradientfor small training setsize.

RMS has wide rangefrom 0.02 to 0.3.





RMS FOR LARGE TRAINING SET SIZE 180045

Compare QuadraticRegression, CholeskyUpdate, and ConjugateGradient for largetraining set size.RMS has narrow rangefrom 0.0240 to 0.0256Reduced Rank notshown because of largerRMS range.




CONCLUSIONSFUTURE DIRECTIONS

CONCLUSIONS

We are able to solve large linear systemsfast in a limited memory environment.

The performance of Gaussian processimproves as the size of training sampleincreases.

Gaussian process performs better thanquadratic regression when small training setis used, BUT it does not when large trainingset is used.





FUTURE DIRECTIONS

Explain the data.

Use optimal value of λ.

Try exponential kernel function.





ACKNOWLEDGEMENT

We would like to thank the Woodward Fund for thefinancial support and the following people for theirguidance.

Drs. Michael Way, Ashok Srivastava, Tim Lee, PaulGazis (NASA scientists)Dr. Tim Hsu (CAMCOS director)Drs. Bem Cayco, Wasin So (Faculty advisors)Drs. Leslie Foster, Steve Crunk (SJSU faculty)





THANK YOU!

Any Questions?


miranda braselton, kelley cartwright, michael hurley ...so/redshift_slides.pdf · c. gaussian...

Documents