timeseriesanalysisfor - kepler & k2 · introductory time series with r p. cowpertwait& a....

31
Time Series Analysis for evenly spaced !! Kepler Photometry Eric Feigelson Penn State University Kepler Science Conference V March 2019

Upload: others

Post on 21-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Time Series Analysis forevenly spaced !!

Kepler Photometry

Eric FeigelsonPenn State University

Kepler Science Conference V March 2019

Page 2: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Who am I?

Prof. of Astronomy & Astrophysics and of Statistics, Penn StateLead author of award-winning 2012 graduate text:

Modern Statistical Methods for Astronomy with R ApplicationsCo-organizer Penn State Summer Schools in astrostatistics (2005--)

Past-President, IAU Commission on Astroinformatics & AstrostatisticsEditor, Astrostatistics & Astroinformatics Portal

Statistical Scientific Editor, AAS Journals

A space astronomer who has been talking with statisticians for 30 years

Page 3: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Outline

• Broad introduction to time series analysis for evenly spaced cadences (12 slides)

• A new approach to Kepler planet detection: Autoregressive Planet Search (16 slides, poster #11)

• An R tutorial illustrating some time series methods:– Displaying Kepler lightcurves– Time series diagnostics (normality, autocorrelation, stationarity) – Smoothing & gap interpolation– ARIMA modeling– Periodicity search– Wavelet analysis & denoising

The tutorial is available as a Jupyter notebook and a R script

Page 4: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

The Essential Message of Keplerfrom a Methodology Viewpoint

Since Tycho Brahe & Johannes Kepler, astronomers have had to interpret sparse, irregularly spaced time series with a limited suite of specialized statistical tools.

But Kepler provides a rich, regularly spaced time series. We can thus inherit a vast world of powerful analytic tools from the statistical field of time series analysis designed for signal processing & econometrics.

These tools are described in textbooks and have software implementations in Matlab and R.

Page 5: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Useful books for Kepler researchers

The Analysis of Time Series: An Introduction C. Chatfield, 6th ed, 2004

Ubiquitous undergraduate text, brief & clearTime Series Analysis: Forecasting and Control G. Box, G. Jenkins et al. 5th ed 2015

Famous text since 1970, foundation of our ARPS methodologyForecasting: Principles and Practice R. Hyndman & G. Athanasopoulos, 2nd ed 2018

Excellent, author of important `forecast’ R/CRAN package. Free ebook. Introductory Time Series with R P. Cowpertwait & A. Metcalfe 2008

Brief presentation & cookbook, Springer Use R! seriesWavelet Methods in Statistics with R G. Nason

Brief presentation & cookbook, Springer Use R! seriesIntroduction to Nonparametric Regression K. Takezawa 2006

Broad coverage of methods for irregular cadences with R codeTime Series Analysis by Space State Methods J. Durbin & S. Koopman, 2nd ed, 2012

Advanced likelihood modeling (e.g. stochastic + periodic + astrophysics)

Page 6: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Fundamental concepts of time series

• Cadence & noise Nearly all standard methodology assumes: [a] regularly spaced observations, although missing data (NAs) are often allowed; [b] signal of interest ; and [c] homoscedastic Gaussian white noise, e=N(0,s2)

• Stationarity Statistical properties unchanged by shifts in time. Variations can be deterministic or stochastic. Nonstationarityincludes: trends, heteroscedasticity, and change points.

• Periodicity Measured levels repeat with fixed periods. The signal is concentrated in a narrow range of frequencies (spectral, Fourier, harmonic analysis)

Standard TSA

Page 7: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Standard TSA

• Autocorrelation Measured levels depend on previous values. The nonparametric autocorrelation function ACF(k) gives fraction of variance attributed to correlation at lag k:

ACF(k) = S(xt - <x>)(xt-k - <x>) / S (xt - <x>)2

• Stochastic process Dynamic evolution with random ingredients. Short-memory behaviors are concentrated in a few lags of the ACF. Long-memory behaviors include 1/fa

`red’ noise.

• (Non)(semi)parametric methods Nonparametric procedures make no assumption regarding the global behavior (e.g. kernel smoothing). Parametric procedures assume a mathematical form to the global behavior; regression is used to find the best-fit parameter values (e.g. ARIMA). Semi-parametric methods have local but not global models (e.g. Gaussian Processes regression).

Page 8: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Nonparametric time domain methods

• (Partial) autocorrelation function For zero-mean Gaussian white

noise, the ACF is asymptotically normal (i.e. probabilities can be

inferred): Var(ACF) = (n-k)/n(n+2). PACF(k) gives effect at lag k

with smaller lags removed.

• Density estimation = smoothing Kernel density estimation, local

polynomial regressions (splines, LOESS, …), Gaussian Processes

regression. Preferred if deterministic global functional form is not

known.

• Differencing operator Replacing xt with xt – xt-1 removes many

types of trend. Reversed with integrating operator.

Standard TSA

Page 9: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Parametric time domain methods

• Deterministic trend Variation in time has known form, xt = f(t). Example: an astrophysical Keplerian orbit for radial velocities.

• Stochastic trend Many stochastic behaviors can be successfully fit with linear autoregressive models: ARMA (short-memory stationary), ARIMA (with nonstationary trend), ARFIMA (with long-memory 1/fa), GARCH (with volatility), etc.

Standard TSA

ARMA-type models are fit by maximum likelihood with order (p,q) chosen with the Akaike Information Criterion. Lots of methodology: parameter errors, goodness-of-fit tests, hierarchical models, Bayesian inference, Kalman filter updates, time series diagnostics, etc

Page 10: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Nonparametric frequency domain method

• Fourier analysis Fourier transform, power spectrum (Schuster periodogram) concentrates strictly periodic signal into sharp peak. Theorems are highly restrictive: evenly spaced data of infinite duration, Gaussian white noise with single sinusoidal-shaped signal at fixed period. For realistic data, difficult to establish significance of periodogram peaks.

• Modern Fourier analysis improves the power spectrum with smoothing (reduced variance), tapering (reduced bias), detrending(reduced zero-frequency power), and pre-whitening (removing strong periodicity). Multitapering is recommended.

Standard TSA

Page 11: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Advanced signal processing methods are being introduced into astronomy by leading astrostatistics groups:

§ Independent Component Analysis (Waldmann et al. 2012)§ Correntropy (Huijse et al. 2012)§ Hilbert-Huang transform (Kolotkov et al. 2015)§ Empirical Mode Decomposition (Roberts et al. 2013) § Singular Spectrum Analysis (Boufleur et al. 2018)§ Dynamic time warping (B.Zhang et al. 2016)§ … …§ Ensemble methods to identify astronomical instrumental &

atmospheric effects: SysRem, TFA, EPD, PDC-MAP, ARC (Tamuz et al. 2005, Kovac et al. 2005, Bakos et al. 2007, Stumpe et al. 2012, Roberts et al. 2013)

§ Time series classification methods (J.Richards et al. 2011, Huppenkothen et al. 2017, etc.)

Standard TSA

Page 12: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Standard TSA

The ACF, ARMA maximum likelihood fit, wavelet transform, and the

Fourier transform all contain the same information as the original

time series if an infinite number of coefficients are maintained.

Each specializes in highlighting different characteristics of the dataset.

Choice of method depends on properties of the variabilityand the scientific question being addressed. There is no problem using different methods for different purposes.

General strategy for time series analysis

Page 13: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Change Point Analysis

Wide variety of nonparametric and parametric methods to find sudden change in level or variability behaviors.

• CUSUM (cumulative sum) tests used extensively in manufacturing quality control

• Many extensions to Wald’s Sequential Probability Ratio Test for sudden change in mean, variance, etc.

• Many tests of sudden change of ARMA coefficients.

No applications in astronomy yet

Irregular TS

Page 14: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

R for KeplerR is the most comprehensive and widely used software environment for statistical analysis of data.

– R is an easily learned scripting language similar to IDL and Matlab. No methods coding, just data wrangling, graphics & interpretation.

– With >12,000 community-written CRAN packages, R has ~100,000 functions: infrastructure, graphics, statistical tests, regressions, machine learning, etc.

– ~300 packages treat evenly-spaced time series (https://cran.r-project.org/web/views/TimeSeries.html)

Page 15: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Comments on standard time series analysis

• Time series analysis is huge field of methodology and there is a lot

for us to learn. I suggest that everyone be familiar at the level of

texts like Feigelson/Babu (2012) and Chatfield (6th ed 2003).

• Characterization of time series with weird ground-based cadences is

not in any textbook, and non-trivial challenges arise. Caution needed

in interpretation (no theorems). But regularly cadenced space-based

time series can use established standard methods (lots of theorems).

• A vast array of statistics codes are already written, most in R/CRAN

and Matlab (Signal Processing & Econometrics toolboxes). Can wrap

R/CRAN from Python and vice versa.

Page 16: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Kepler: Transit detection of exoplanets

Photometric time series with periodic transits when planet passes in front of star

But a significant challenge isremoval of stellar variabilityunrelated to the planet.

“[E]xoplanets may still be detected by exploiting differences intimescale, shape and wavelength dependence between theplanetary and stellar signals. ... [Stellar] variability, combinedwith residual instrumental systematics, is still limiting thedetection of habitable planets by Kepler.”(Suzanne Aigrain IAU 2015)

Page 17: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Planets and stellar activity: Hide and seek in the CoRoT-7 systemHaywood et al. 2014

GP local regression fit

Page 18: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Stellar variability reduction methodsNonparametric modeling

§ Wavelet analysis (Jenkins 2002 & Kepler pipeline, Carter & Winn 2009)

§ Gaussian Processes regression (Gibson 2014, Aigrain et al 2016, Luger et al 2016)

§ Advanced signal processing methods: Independent Component Analysis, correntropy, trend entropy, Empirical Mode Decomposition, Singular Spectrum Analysis, … (Waldmann et al 2013, Huijse et al 2012, Roberts et al 2013,

Greco et al 2016, Boufleur et al 2018, …)

Parametric modeling

Rarely used because stellar & atmospheric variations do not follow any

obvious function: Flux ~ f(time). However, textbooks in time series analysis

are dominated by stochastic autoregressive regression models,

Flux ~ f(past fluxes, past changes). These are broad model families widely

used in engineering signal processing, econometrics, and other fields

(millions of Google hits).

We have found these models are also very effective for time domain astronomy!!Eric Feigelson 2018 18

Page 19: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

ARPS: Four steps to exoplanet transit detection

1. Pre-process the data2. Reduce/remove uninteresting stellar variability but …

“Don’t eat the planet!“ David Jones, SAMSI3. Conduct periodicity search for recurring transits4. Establish decision criteria to report Planet Candidates

Ingest' AR'fit'Preprocess' TCF'periodogram'

Features'Classifica7on''&'ve;ng'

Tables''&'plots'

Page 20: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Autoregressive modeling forevenly spaced time series

ARIMA(p,d,q) includes d differencing operationsx’t = xt – xt-1

ARFIMA(p,d,q) has fractional differencing

CARMA (continuous), VARIMA (vector), SARIMA (periodic), GARCH and dozens of other extensions to the ARMA family

Current valueof stellar flux

Recent pastflux value

Coefficient of linear regression

Current Gassian noisevalue

Recent pastnoise value

Page 21: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

• AR & MA components model short-memory processes

• I differencing operator removes many forms of trendand non-stationarity

• F component models long-memory processes and isequivalent to the astronomers’ 1/fa-type red noise. The coefficient d is arithmetically related to a and theeconomists’ Hurst parameter.

Roughly speaking …

The ARIMA and ARFIMA models can remove an enormous variety of temporal variations

seen in astronomical data

Page 22: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

0 1000 2000 3000 4000

3780

037

860

Time

Orig

inal

Flu

x

Sample Lightcurve with Fit & Residuals

0 1000 2000 3000 4000

3780

037

860

Time

Fitte

d Fl

ux

0 1000 2000 3000 4000

−40

020

Time

Resid

ual F

lux

Typical 4 yr Keplerlightcurve

Maximum likelihoodARFIMA model

Residuals

KARPS: Kepler AutoRegressive Planet Search

Caceres, Feigelson, et al. 2019b

The ARPS procedure is applied to ~200,000 Kepler stars

Page 23: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Problem:

The differencing operator transforms box-shaped transit signal into a periodic double-spike signal

0 500 1000 1500 2000

950

980

Time

Transit

Schematic of ARMA−Model Effect on Transit Signal

0 500 1000 1500 2000

−40

040

TimeResidual

Transit Comb Filter: a matched filter for a periodicdouble-spike pattern produced by a box-shaped dip subjectto the differencing operator. The TCF power plotted for arange of periods gives a periodogram analogous to the BLSperiodogram

Caceres, Feigelson et al. 2019a

Page 24: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Comparison of TCF and BLS periodogramsKIC 010024701

Here neither method detects periodicity in the original lightcurve

BLS detect periodicity in ARIMAresiduals but with lower SNR than TCF

Eric Feigelson 2018 24

BLS

TCF

Page 25: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Feature importance in Random Forest classifier

Page 26: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

ROC curves for 1 vs. 37 features

Page 27: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

Result for discovery of new Kepler planets

Red = previously known KOI planetsMagenta = new KARPS planetsGray = stars without planets

Caceres, Feigelson et al. 2019b

Page 28: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

The ARIMA & ARFIMA TCF periodograms show an interesting spectral peak: Period~0.3 days

The folded lightcurve shows a faint double-spike (left) and box (right).

The Random Forest classifier gives high confidence for a transiting planet: Prob(RF) = 0.53

Page 29: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

ARPS Conclusion

Low-dimensional stochastic ARIMA-type models can be very effective in removing autocorrelated noise in stellar lightcurves, leaving periodic transits in the residuals.

TCF gives an effective periodogram for ARIMA residuals.

Random Forest is an effective classifier if strong training sets are available. False Positive rejection is tricky but feasible.

Coding is easy: extensive econometric software in R. Computational burden is reasonable.

Science result: 97 candidate new planets, most hot (P<5 days) sub-Earths (Depth < 100 ppm)

Page 30: TimeSeriesAnalysisfor - Kepler & K2 · Introductory Time Series with R P. Cowpertwait& A. Metcalfe 2008 Brief presentation & cookbook, Springer Use R! series Wavelet Methods in Statistics

AutoRegressive Planet Search Project

Methodology: Caceres, Feigelson et al. 2019a, submitted arXiv:1901.0511

Application to 4-year Kepler lightcurves: Caceres, Feigelson et al. 2019b, submitted

Simulations for irregular-space ground-based transit surveys: Stuhr, Feigelson et al. 2019, submitted

Application to HATSouth lightcurves: Stuhr, Feigelson, Hartman et al., in progress

Application to TESS lightcurves: Wolfgang & Feigelson, in progress

Supported by NSF AST-1614690 and NASA 80NSSC17K0122 at Penn State