a novel methodology for identification of inhomogeneities in climate time series andrés farall 1,...
TRANSCRIPT
A novel methodology for identification of inhomogeneities in climate time series
Andrés Farall1, Jean-Phillipe Boulanger1, Liliana Orellana2
1CLARIS LPB Project - University of Buenos Aires 2Biostatistics Unit - Deakin University
CLARIS LPB. A Europe-South America Network for Climate Change Assessment and Impact Studies in La Plata Basin
Climate time series. Quality control Climatology relies on observational data to understand the climate
In order to accurately monitor long-term marine or atmospheric climate change the quality of the data is of utmost importance
One key challenge is to discriminate the climatic signal from noise generated by errors or inhomogeneities
Errors and inhomogeneities are due to changes in the conditions data are measured, recorded, transmitted and/or stored
2
Quality control
In this talk• we will focus in the problem of detection of inhomogeneities in
temperature series
Most common causes of inhomogeneities• Station relocations• Changes in instruments• Changes in the surroundings or land use (gradual changes)• Changes in the observational and calculation procedures 3
Instant change ⇒ ErrorDetection of atypical data
Lasting change ⇒ Inhomogeneity Detection of breakpoints
1920 1940 1960 1980 2000
p5
p25p50
p75
p95
Minimum temperature Salta Aero
19581949Metadata: Station Relocation in 1931, 1949, 1958
1931
? ?
Traditional approaches• Rely on metadata and/or expertise to identify the breakpoints
(e.g. Craddock et al 1976) • Make strong DGP assumptions
(e.g. Anderson et al.1997, Caussinus and Mestre, 2004)• Use a reference (homogeneous) time series
(e.g. Vincents, 1999; Della-marta and Wanner, 2006)
• Some are designed to • detect one type of change in the series (usually a shift)• detect just one breakpoint in the time series • work on univariate time series
• Many assume independent observations or group daily data, say monthly, to overcome dependence
5
Goal ⇒ Identify all “inhomogeneities” in a climate time seriesi.e., identify all potential breakpoints
Let be the temperature TS at station adjusted for seasonalityif the data generating process changes at
6
Inhomogeneity definition
Natural fluctuations may be confused with inhomogeneitiesInformation of neighbouring stations can help distinguishing between natural and artificial changes
Target station, , the one to be controlled the influence set of station vector of observations recorded on day in the stations
7
Influence set for a target station
8
Target station
Detecting an inhomogeneity ⇒ comparing multivariate distributions before and after potential breakpoints.
To retain the multivariate pattern and make the problem tractable we use the depth of the observations, . Mahalanobis depth
can be calculated plugging in robust estimates of and .
9
Depth of a multivariate observation
Using sliding windows centred at multivariate median Orthogonalized Gnanadesikan/Kettenring (OGK) ¥ procedure • relatively fast, based on robust estimation of
• Assumption: correlations between monitoring stations do not change over time
¥ Maronna and Zammar, 200210
Estimation of and
Distribution of depths (shift at )
{𝑥𝑡𝑖 , 𝑡=1 ,…,𝑛}
{𝑥𝑡𝑖 , 𝑡=𝑛+1 ,…,𝑛+𝑚}
12
The standardized Kolmogorov-Smirnov statistic
We can compare the distributions of depths before and after the potential breakpoint using the statistics
The approximate distribution of under the null () can be obtained using Block-Bootstrap¥
• We sample blocks of consecutive observations to capture the structure of the stationary process.
¥Hall et al (1995)
13
Block BootstrapBlocks of fixed length are defined • non-overlapping or overlapping (moving BB)• blocks are randomly sampled with replacement• the sequence of blocks forms a new TS of length
The null distribution of is approximated by the distribution of
Performance of BB depends on , the DGP and the statistics under study¥
¥Lahiri (1999)
14
Multiple breakpoints – Binary treesWe have methodology to decide whether there is a breakpoint at a given time. How do we identify all the breakpoints in a TS? Binary trees with non-crossing partition (Time binary trees)• Recursive partitioning of the TS in two time spans, such that their
distributions of depths are as distant as possible • The first best breakpoint splits the multivariate time series in two time
series with the largest standardized
•We repeat the procedure until some stopping rule is satisfied
Growing the tree. First step
Growing the tree. Second step
The finest partition (saturated tree)
7 breakpoints8 segments
Pruning of the tree
3 breakpoints4 segments
For each detected breakpoint
1. We aim to identify the “responsible” station (if any)• Jackknife: statistics is recalculated excluding one station at a time to
detect the station that produces the smallest and largest p-value
2. Once the responsible station has been singled out we could identify the kind of inhomogeneity • Comparing distributional parameters before and after the breakpoint.
Approximated p-values can be obtained under block bootstrap.
Final step
Four time series of daily minimum temperature, Argentina were generated Time span: 1981 to 2100 (120 years = 43929 days) We introduced 4 inhomogeneities
1. Grid point 1, day 8,000, mean shift = + 0.5 °C2. Grid point 2, day 16,000, mean shift = - 0.5 °C3. Grid point 3, day 24,000, mean shift = + 0.5 °C4. Grid point 4, day 30,000, mean shift = - 0.5 °C
*Rossby Center Regional Climate model (Swedish Meteorological and Hydrological Institute) simulates the main atmospheric variables for the South American region on a daily basis
Regional Model Simulated Data*
Growing the tree
Detected breakpoints
8005 29985
P-value
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Identifying the responsible station
Performance of the methods Multivariate time series were generated from regional climate models under different scenarios• Number of stations in the influence set and distances between them• Kind and magnitude of changes in distributions
5 breakpoints at random locations (separated at least 5 years), i.e., 6 different regimes were artificially created, mean expected duration 20 years.
Procedure is repeated 20 times to allow for 100 breakpoints to be detected in the same conditions
Performance of the method was evaluated using AUC (ROC curves)
Performance increases with information (# stations, closeness of stations) and size/length of the change.
ConclusionsWe have developed a methodology that• Is automated, does not require expert knowledge input• Uses information from multiple stations simultaneously• Detects several breakpoints per station• Evaluates the significance of the breakpoint• Identifies the kind of change/inhomogeneity (mean, variance, etc.)• Makes no distributional assumptions• Accounts for dependence in the climatic data • Is based on robust estimators
Codes developed in R
RemarksThe methodology can be used with for any continuous variable like atmospheric pressure, humidity or heliophany.
Detecting breakpoints in precipitation TS requires an adaptation
1. precipitation is less spatially -and temporally- smooth than temperature
2. precipitation data encloses two pieces of information, whether the event rain had occurred (rain yes/no) and given that it occurred, its intensity
26
Thank you!
27