astroinformatics 2015: large sky surveys: entering the era of software-bound astronomy

46
1 ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015 Name of Meeting • Location • Date - Change in Slide Master Large Sky Surveys: Entering the Era of Software-Bound Astronomy Mario Juric WRF Data Science Chair in Astronomy, University of Washington LSST Data Management Project Scientist ASTROINFORMATICS 2015 Dubrovnik, Croatia, October 7 th , 2015

Upload: mario-juric

Post on 14-Apr-2017

544 views

Category:

Science


0 download

TRANSCRIPT

Page 1: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

1ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015Name of Meeting • Location • Date - Change in Slide Master

Large Sky Surveys:Entering the Era of Software-Bound Astronomy

Mario JuricWRF Data Science Chair in Astronomy, University of WashingtonLSST Data Management Project Scientist

ASTROINFORMATICS 2015Dubrovnik, Croatia, October 7 th, 2015

Page 2: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

2ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Page 3: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy
Page 4: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

• Large Survey and Why They’re Different

Page 5: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

Hipparchus of Rhodes (180-125 BC)

Discovered the precession of the equinoxes.

Measured the length of the year to ~6 minutes.

In 129 BC, constructed one of the first star catalogs, containing about 850 stars.

n.b.: also the one to blame for the magnitude system …

Page 6: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

Galileo Galilei (1564-1642)

Researched a variety of topics in physics, but called out here for the introduction of the Galilean telescope.

Galileo’s telescope allowed us for the first time to zoom in on the cosmos, and study the individual objects in great detail.

Page 7: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

The Astrophysics Two-Step

• Surveys– Construct catalogs and maps of objects in the sky. Focus

on coarse classification and discovering targets for further follow-up.

• Large telescopes– Acquire detailed observations of a few representative

objects. Understand the details of astrophysical processes that govern them, and extrapolate that understanding to the entire class.

Page 8: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

Analogy: Google Search

Google’s index is a catalog of the Web. We use it to “zoom in” on individual entries to find out more.

Page 9: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

But, it’s more than just a catalog of pointers – more and more, Google itself collects, processes, indexes, visualizes, and serves the actual information we need.More and more often, our “research” begins and ends with Google!

Page 10: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

Entering the Era of Massive Sky Surveys

• There’s a close parallel with large surveys in astronomy, in scale, quality, and richness of the collected information

– Scale: We’re entering the era when we can image and catalog the entire sky– Quality: Those catalogs will be as precise as the measurements taken with

“pointed” observations (used to be ~5-10x worse)– Richness: Those catalogs contain not only positions and magnitudes, but also

shapes, profiles, and temporal behavior of the objects.

• Quite often, the research can begin and end with the survey.• This is what makes large surveys of today not just bigger, but better;

different. They’re much more than just “finding charts”.

Page 11: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

Sloan Digital Sky Survey2.5m telescope >10000 deg2 0.1” astrometry r<22.5 flux limit

5 band, 2%, photometry for >50M stars>300k R=2000 stellar spectra

10 years of ops: ~10 TB of imaging

Page 12: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

SDSS DR6 Imaging Sky Coverage(Adelman-McCarthy et al. 2008)

Panoramic Survey Telescope and Rapid Response System

1.8m telescope 30000 deg2 50mas astrometry r<23 flux limit

5 band, better than 1% photometry (goal)

~700 GB/night

Page 13: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

13Astroinformatics 2015 • Dubrovnik, Croatia • October 7th, 2015

LSST: A Deep, Wide, Fast, Optical Sky Survey

8.4m telescope 18000+ deg2 10mas astrom. r<24.5 (<27.5@10yr)

ugrizy 0.5-1% photometry

3.2Gpix camera 30sec exp/4sec rd 15TB/night 37 B objects

Imaging the visible sky, once every 3 days, for 10 years (825 revisits)

Page 14: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

14ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Turning the Sky into a Database

− A wide (half the sky), deep (24.5/27.5 mag), fast (image the sky once every 3 days) survey telescope. Beginning in 2022, it will repeatedly image the sky for 10 years.

− The LSST will be an automated survey system. In nighttime, the observatory and the data system will operate with minimum human intervention.

− The ultimate deliverable of LSST is not the telescope, nor the instruments; it is the fully reduced data.• All science will be come from survey catalogs and images

Telescope Images Catalogs

Page 15: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

15ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

History

1996-2000 “Dark Matter Telescope”This project began as a quest to understand cosmology and the Solar System.

2000 - … “LSST”Emphasizes a broad range of science from the same multi-wavelength survey data, including unique time domain exploration

We’ve reached the scale where a single telescope, a single general purpose data set, can serve to answer a wide swath of science questions

The evolution of LSST design

LSST: Evolution of Design and Purpose

Page 16: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

16ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Frontiers of Survey Astronomy (1/4)

− Time domain science• Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong Lensing• Weak Lensing• Constraining the nature of dark energy

Page 17: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

17ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Exposure 1

Exposure 2

Exposure 1-

Exposure 2

Frontiers of Survey Astronomy (2/4)

− Time domain science • Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong Lensing• Weak Lensing• Constraining the nature of dark energy

Page 18: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

18ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Frontiers of Survey Astronomy (3/4)

− Time domain science • Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong Lensing• Weak Lensing• Constraining the nature of dark energy

RR Lyrae limit

MS starslimit

Dwarf Galaxies

Page 19: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

19ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Frontiers of Survey Astronomy (4/4)

− Time domain science • Nova, supernova, GRBs • Source characterization • Instantaneous discovery

− Census of the Solar System• NEOs, MBAs, Comets• KBOs, Oort Cloud

− Mapping the Milky Way• Tidal streams• Galactic structure

− Dark energy and dark matter• Strong lensing• Weak lensing• Constraining the nature of dark energy

Page 20: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

Astroinformatics 2015 • Dubrovnik, Croatia • October 7th, 2015

Location: Cerro Pachon, Chile

Leveling of El Peñón (the summit of Cerro Pachón)

Page 21: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

21ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Site (April 14th, 2015)

Page 22: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

22ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Observatory (cca. late ~2018)

Page 23: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

23ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Done!

Page 24: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

24ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Camera

Parameter ValueDiameter 1.65 mLength 3.7 mWeight 3000 kgF.P. Diam 634 mm

1.65 m5’-5”

– 3.2 Gigapixels– 0.2 arcsec pixels– 9.6 square degree FOV– 2 second readout– 6 filters

Page 25: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

25ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST’s #1 Challenge:

Processing the Data in a way that Enables Science Effectively

Page 26: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

26ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

From Data to Knowledge

Model inference – Data

And metadata!

Model inference – Catalog Data Processing – Data

ProjectScientistsScientists

Scientists Project Project ProjectScientists

Computationally (and cognitively) expensive, science-case speciific

Computationally cheaper,Easier to understand,Science-case speciific

• Computationally expensive, general• Reprojection; may or may not involve

compression• Almost always introduces some

information loss• Data Processing == Instrumental

Calibration + Measurement

Page 27: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

27ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Guiding Principles for LSST Data Products

− There are virtually infinite options on what quantities (features) one can measure on images. But if catalog generation is understood as a (generalized) cost reduction tool, the guiding principles become easier to define:

1. Maximize science enabled by the catalogs- Working with images takes time and resources; a large fraction of LSST

science cases should be enabled by just the catalog.- Be considerate to the user: provide even sub-optimal measurements if

they will enable leveraging of existing experience and tools2. Minimize information loss

- Provide (as much as possible) estimates of likelihood surfaces, not just single point estimators

3. Provide and document the transformation (the software)- Measurements are becoming increasingly complex and systematics

limited; need to be maximally transparent about how they’re done

Page 28: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

28ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

What LSST will Deliver: A Data Stream, a Database, and a (small) Cloud

− A stream of ~10 million time-domain events per night, detected and transmitted to event distribution networks within 60 seconds of observation.

− A catalog of orbits for ~6 million bodies in the Solar System.

− A catalog of ~37 billion objects (20B galaxies, 17B stars), ~7 trillion single-epoch detections (“sources”), and ~30 trillion forced sources, produced annually, accessible through online databases.

− Deep co-added images.

− Services and computing resources at the Data Access Centers to enable user-specified custom processing and analysis.

− Software and APIs enabling development of analysis codes.

Level 3Level 1

Level 2

Page 29: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

29ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Level 1: Transient Alerts

− LSST will generate ~10k time-domain events per observation (average)• Includes anything that changed wrt. deep template (variables, explosive

transients, asteroids, etc.)

− Planning to measure and transmit with each alert:• position• PSF flux, aperture flux(es), trailed model fits• shape (adaptive moments)• light curves in all bands (up to a ~year; stretch: all)• variability characterization (eg., low-order light-curve moments,

probability the object is variable; Richards et al. 2011)• cut-outs centered on the object (template, difference image)

Page 30: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

30ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Level 2: Data Release Catalogs

− Object characterization (models):• Moving Point Source model• Double Sérsic model (bulge+disk)

- Maximum likelihood peak- Samples of the posterior (hundreds)

− Object characterization (non-parametric):• Centroid: (α, δ), per band• Adaptive moments and ellipticity

measures (per band)• Aperture fluxes and Petrosian and Kron

fluxes and radii (per band)− Colors:

• Seeing-independent measure of object color

− Variability statistics:• Period, low-order light-curve moments,

etc.

LSST Science Book, Fig. 9.3

Page 31: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

31ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

The DPDD Captures the Details of this Thinking

LSST Data Products Definition Document

A document giving a high-level description of LSST data products.

http://ls.st/dpdd

Level 1 Data Products: Section 4.

Level 2 Data Products: Section 5.

Level 3 Data Products: Section 6.

Special Programs DPs: Section 7.

Questions: http://community.lsst.orgChange Requests: https://github.com/lsst/data_products

Page 32: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

32ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Looking Ahead: Astronomical Data Processing in the 2020ies

(or why software is important)

Page 33: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

33ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy was hardware-limited

− Astronomical data processing software is a lesson in (intelligent) cutting of corners. We (generally) didn’t need particularly sophisticated codes so far.

− We were lucky because typically• a) there was no data,• b) the data was poor, or• c) there wasn’t enough computing power available to process it right.

− Very frequently, all of the above.

− Simple algorithms were sufficient. The challenge was in the hardware.

Page 34: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

34ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy is becoming software-bound

− Things are changing at a rapid pace.

− We can collect amazing quantities of data.− Telescopes and cameras built today are extremely well characterized and

calibrated instruments.− Computing has reached a level where more complex algorithms are

feasible. It’s also getting cheap.• Also, lagging memory bandwidths are forcing us towards more complex

algorithms.

− The science requires it. The alternatives (“build a bigger telescope!”) are incredibly costly.

− Time to develop software is becoming >= time to build (astronomical) hardware

Page 35: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

35ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

LSST Science Pipelines

− 02C.01.02.01/02. Data Quality Assessment Pipelines

− 02C.01.[02.01.04,04.01,04.02] Calibration Pipelines− 02C.03.01. Single-Frame

Processing Pipeline− 02C.03.02. Association pipeline− 02C.03.03. Alert Generation

Pipeline− 02C.03.04. Image Differencing

Pipeline− 02C.03.06. Moving Object Pipeline− 02C.04.03. PSF Estimation Pipeline− 02C.04.04. Image Coaddition

Pipeline− 02C.04.05. Deep

Detection Pipeline− 02C.04.06. Object

Characterization Pipeline− 02C.01.02.03. Level 3 Toolkit

− 02C.03.05/04.07 Core Libraries (afw)

Leve

l 1Le

vel 2

L3

Data Management Applications Design (LDM-151)

+ middleware, databases, user interfaces, computing+storage hardware, networking, etc…

Page 36: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

36ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Example: Multi-Epoch Measurement via Coadds

Exposure 1

Exposure 2

Exposure 3

Coadd

Galaxy / Star Models

FittingWarp,

Convolve

Coadd Measurement

Hard, but we only have to do it once.

Easy; relatively few data points.

Page 37: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

37ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Except that you should not do that…

− Impossible to combine multi-epoch data taken in different seeings/exposure times without information loss• Suboptimal S/N

− Warping and resampling correlates pixel values and noise; correlation matrices are (practically) impossible to carry forward.• Source of systematic error

− Detector effects have to be taken out at the pixel level• Further correlates the noise

− The effective bandpass changes from exposure to exposure.• Coadding different sorts of apples…

− Stars move!

Page 38: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

38ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Multi-Epoch Measurement with Forward Modeling

Exposure 1

Exposure 2

Exposure 3

Galaxy / Star

Models

Transformed Model 1

Transformed Model 2

Transformed Model 2

Warp, Convolve

Fitti

ng

MultiFit (Simultaneous Multi-Epoch Fitting)

Easier (depends on model), but we have to do it every iteration!

Same number of parameters, but with orders of magnitude

more data points.

Page 39: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

39ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Detecting and Estimating Proper Motions Below the Single-Epoch Flux Limit

Optimal measurement of properties of objects imaged in multiple epoch. Left: extraction of a moving point source (Lang 2009).

Individual exposures: objects are undetected or marginally detected

Moving point-source and galaxy models are indistinguishable on the coadd

Page 40: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

40ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Example: Sampling and retaining the Likelihoods

Perform importance sampling from a proposal distribution determined on the coadd. Plan to characterize (and keep!) the full posterior for each object. (Unexplored) possibilities for compression.

Page 41: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

41ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Looking Ahead: Astrophysical Inference in the 2020s

(or why software is even more important than you think)

Page 42: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

42ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Pushing the Boundaries of Optimal Inference

− As our measurements become more and more systematics limited, what occurs in the “Data Processing” box above becomes incredibly import

Model inference – Data

Model inference – Catalog Data Processing – Data

− Sometimes, an assumption or an algorithmic choice that’s been made there may introduce a systematic that drowns out the signal (or eliminates it).

− For optimal inference, one wants to design measurements that directly probe the relevant aspects of the original (imaging data), and not the (lossy-compressed) catalog.

• Or derive more appropriate catalogs/feature sets/etc.

Page 43: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

43ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

− Reasons we don’t do this today:1. Computationally (and I/O) intensive2. Conceptually difficult

- Expertize in statistics, applied math, and software engineering is often not there- Catalogs are too often taken as “God given”, fundamental, result of a survey

− Things are changing• Big data problems are becoming increasingly computationally tractable• Average astronomer in the 2020s will grow up with an expectation of being well

versed in Stats, SE, Appl. Math.• A concerted effort is under way, primarily driven by people in large survey and

telescope projects, to create the necessary software to make this possible.

Model inference – Data

Model inference – Catalog Data Processing – Data

Pushing the Boundaries of Optimal Inference

Page 44: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

44ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy 2025: Personalized Medicine

− LSST “Level 3” concept our first step in that direction: Enabling the community to create new products using LSST’s software, services, or computing resources. This means:• Providing the software primitives to construct custom

measurement/inference codes• Enabling the users to run those codes at the LSST data center,

leveraging the investment in I/O (piggyback onto LSST’s data trains).

− Looking ahead: Right now, we see the data releases as the key product of a survey. By the end of LSST, I wouldn’t be surprised if we saw the software as the key product, with hundreds specialized (and likely ephemeral) catalogs being generated by it.

− LSST “data releases” will just be some of those catalogs, designed to be more broadly useful than others, and retained for a longer period of time.

− LSST software software and hardware is being engineered to make this possible.

Page 45: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

45ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015

Astronomy in the Age of Large Surveys

− Traditionally, astronomy was a data-starved science. Our methods and approach to research were shaped by this environment. Surveys are altering it; data is becoming abundant, well characterized, and rich.

− LSST is a poster-child of this transformation: it will deliver the positions, magnitudes and variability information for virtually everything in the southern sky to 24th-27th magnitude.

− In that environment success in research will depend on the ability to mine knowledge from that, existing, data. A bigger instrument is not a cost-effective solution any more. A Stage V DETF experiment may be a software project.

− This will re-ignite the drive behind VO-like ideas, though emphasis will shift to algorithms, tools, and reusable, collaborative, organically grown software frameworks. Projects like AstroPy and LSST are already pushing on this.

− Need is the mother of invention, and she’s here.

Page 46: AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound Astronomy

46ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015