a novel methodology for identification of inhomogeneities in climate time series andrés farall 1,...

A novel methodology for identification of inhomogeneities in climate time series

Andrés Farall1, Jean-Phillipe Boulanger1, Liliana Orellana2

1CLARIS LPB Project - University of Buenos Aires 2Biostatistics Unit - Deakin University

CLARIS LPB. A Europe-South America Network for Climate Change Assessment and Impact Studies in La Plata Basin

Climate time series. Quality control Climatology relies on observational data to understand the climate

In order to accurately monitor long-term marine or atmospheric climate change the quality of the data is of utmost importance

One key challenge is to discriminate the climatic signal from noise generated by errors or inhomogeneities

Errors and inhomogeneities are due to changes in the conditions data are measured, recorded, transmitted and/or stored

2

Quality control

In this talk• we will focus in the problem of detection of inhomogeneities in

temperature series

Most common causes of inhomogeneities• Station relocations• Changes in instruments• Changes in the surroundings or land use (gradual changes)• Changes in the observational and calculation procedures 3

Instant change ⇒ ErrorDetection of atypical data

Lasting change ⇒ Inhomogeneity Detection of breakpoints

1920 1940 1960 1980 2000

p5

p25p50

p75

p95

Minimum temperature Salta Aero

19581949Metadata: Station Relocation in 1931, 1949, 1958

1931

? ?

Traditional approaches• Rely on metadata and/or expertise to identify the breakpoints

(e.g. Craddock et al 1976) • Make strong DGP assumptions

(e.g. Anderson et al.1997, Caussinus and Mestre, 2004)• Use a reference (homogeneous) time series

(e.g. Vincents, 1999; Della-marta and Wanner, 2006)

• Some are designed to • detect one type of change in the series (usually a shift)• detect just one breakpoint in the time series • work on univariate time series

• Many assume independent observations or group daily data, say monthly, to overcome dependence

5

Goal ⇒ Identify all “inhomogeneities” in a climate time seriesi.e., identify all potential breakpoints

Let be the temperature TS at station adjusted for seasonalityif the data generating process changes at

6

Inhomogeneity definition

Natural fluctuations may be confused with inhomogeneitiesInformation of neighbouring stations can help distinguishing between natural and artificial changes

Target station, , the one to be controlled the influence set of station vector of observations recorded on day in the stations

7

Influence set for a target station

8

Target station

Detecting an inhomogeneity ⇒ comparing multivariate distributions before and after potential breakpoints.

To retain the multivariate pattern and make the problem tractable we use the depth of the observations, . Mahalanobis depth

can be calculated plugging in robust estimates of and .

9

Depth of a multivariate observation

Using sliding windows centred at multivariate median Orthogonalized Gnanadesikan/Kettenring (OGK) ¥ procedure • relatively fast, based on robust estimation of

• Assumption: correlations between monitoring stations do not change over time

¥ Maronna and Zammar, 200210

Estimation of and

Distribution of depths (shift at )

{𝑥𝑡𝑖 , 𝑡=1 ,…,𝑛}

{𝑥𝑡𝑖 , 𝑡=𝑛+1 ,…,𝑛+𝑚}

12

The standardized Kolmogorov-Smirnov statistic

We can compare the distributions of depths before and after the potential breakpoint using the statistics

The approximate distribution of under the null () can be obtained using Block-Bootstrap¥

• We sample blocks of consecutive observations to capture the structure of the stationary process.

¥Hall et al (1995)

13

Block BootstrapBlocks of fixed length are defined • non-overlapping or overlapping (moving BB)• blocks are randomly sampled with replacement• the sequence of blocks forms a new TS of length

The null distribution of is approximated by the distribution of

Performance of BB depends on , the DGP and the statistics under study¥

¥Lahiri (1999)

14

Multiple breakpoints – Binary treesWe have methodology to decide whether there is a breakpoint at a given time. How do we identify all the breakpoints in a TS? Binary trees with non-crossing partition (Time binary trees)• Recursive partitioning of the TS in two time spans, such that their

distributions of depths are as distant as possible • The first best breakpoint splits the multivariate time series in two time

series with the largest standardized

•We repeat the procedure until some stopping rule is satisfied

Growing the tree. First step

Growing the tree. Second step

The finest partition (saturated tree)

7 breakpoints8 segments

Pruning of the tree

3 breakpoints4 segments

For each detected breakpoint

1. We aim to identify the “responsible” station (if any)• Jackknife: statistics is recalculated excluding one station at a time to

detect the station that produces the smallest and largest p-value

2. Once the responsible station has been singled out we could identify the kind of inhomogeneity • Comparing distributional parameters before and after the breakpoint.

Approximated p-values can be obtained under block bootstrap.

Final step

Four time series of daily minimum temperature, Argentina were generated Time span: 1981 to 2100 (120 years = 43929 days) We introduced 4 inhomogeneities

1. Grid point 1, day 8,000, mean shift = + 0.5 °C2. Grid point 2, day 16,000, mean shift = - 0.5 °C3. Grid point 3, day 24,000, mean shift = + 0.5 °C4. Grid point 4, day 30,000, mean shift = - 0.5 °C

*Rossby Center Regional Climate model (Swedish Meteorological and Hydrological Institute) simulates the main atmospheric variables for the South American region on a daily basis

Regional Model Simulated Data*

Growing the tree

Detected breakpoints

8005 29985

P-value

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Identifying the responsible station

Performance of the methods Multivariate time series were generated from regional climate models under different scenarios• Number of stations in the influence set and distances between them• Kind and magnitude of changes in distributions

5 breakpoints at random locations (separated at least 5 years), i.e., 6 different regimes were artificially created, mean expected duration 20 years.

Procedure is repeated 20 times to allow for 100 breakpoints to be detected in the same conditions

Performance of the method was evaluated using AUC (ROC curves)

Performance increases with information (# stations, closeness of stations) and size/length of the change.

ConclusionsWe have developed a methodology that• Is automated, does not require expert knowledge input• Uses information from multiple stations simultaneously• Detects several breakpoints per station• Evaluates the significance of the breakpoint• Identifies the kind of change/inhomogeneity (mean, variance, etc.)• Makes no distributional assumptions• Accounts for dependence in the climatic data • Is based on robust estimators

Codes developed in R

RemarksThe methodology can be used with for any continuous variable like atmospheric pressure, humidity or heliophany.

Detecting breakpoints in precipitation TS requires an adaptation

1. precipitation is less spatially -and temporally- smooth than temperature

2. precipitation data encloses two pieces of information, whether the event rain had occurred (rain yes/no) and given that it occurred, its intensity

26

Thank you!

27

a novel methodology for identification of inhomogeneities in climate time series andrés farall 1,...

Documents

temperature series

univariate time series

atmospheric climate

climate change assessment

observational data

generated time span

mean shift

conditions data