rclimtool user manual uno/rclimtool...rclimtool user manual ... spearman’s rank correlation* and...

16
www.aclimatesectoragropecuariocolombiano.org RClimTool USER MANUAL By Lizeth Llanos Herrera, student Statistics This tool is designed to support, process automation and analysis of climatic series within the agreement of CIAT-MADR. It is not intended to compete or supplant other available tools developed by other entities. Rather, we seek a collaborative and ongoing feedback between methodologies work.

Upload: phamtuong

Post on 14-Jul-2018

251 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

RClimTool USER MANUAL By Lizeth Llanos Herrera, student Statistics

This tool is designed to support, process automation and analysis of climatic series within the agreement of CIAT-MADR. It is not intended to compete or supplant other available tools developed by other entities. Rather, we seek a collaborative and ongoing feedback between methodologies work.

Page 2: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

RClimtool has been designed with the objective to facilitate the

performance of statistical analysis, quality control, filling

missing data, homogeneity analysis and calculation of

indicators for daily weather series of maximum temperature,

minimum temperature and precipitation.

INSTALLING AND RUNNING R

The tool was developed under the R language, therefore you have to install this program, specifically the

R 2.15.0 version, which can be downloaded from the following link: http://cran.r-project.org/bin /

windows/base/old/2.15.0 /

Once you have R installed, the following window will appear:

Page 3: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

INSTALLING AND RUNNING RClimTool

To run the application interface you have to load the source code as shown in the following figure:

Once the code has been loaded successfully the subsequent GUI will appear:

The previous figure shows the main window of the tool, which is divided into different modules, each

located in the left panels of the interface. The content of these modules will be developed later.

Page 4: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

WHAT MAKES RClimTool?

RClimTool offers different analysis options, designed with the objective of providing an application that

brings together everything needed to perform a comprehensive study of climate data.

To illustrate the functions of each of the modules, the analysis of daily weather series for the variables

maximum temperature, minimum temperature and precipitation from 10 meteorological stations will

be demonstrated in the next chapters.

1. Data reading

In the data reader module you will find different buttons that allow you to read and load the databases

with the information of the variables of interest. Important: Do not use accents or the letter "ñ" to

name folders and files to be used with the tool, as this creates conflict when using the application.

The buttom Change Directory (1) provides the option to select the directory where the files will be

loaded. This will also be the location to save all outputs of the application.

Figure 1: Data reading

In part (2) of Figure 1 are buttons that allows you to upload the information for each variable. For

example, by clicking on the Maximum Temperature, a popup window will be appear where you can find

the file that contains the maximum daily temperatures of different stations. You can perform this

procedure for all other variables that need to be analyzed.

1

2

Page 5: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

Figure 2: Example file selection

In this window, the location and the file we want to load is selected. Select the file and then click OK as

shown in Figure 2. Remember to close the popup window each time a different variable loads.

Important: The input data format is specified in Appendix A

2. Graphical and descriptive analysis

Once we have loaded the data for all variables to be analyzed, we proceed to the descriptive analysis for

each of them. Consequently, you can specify the analysis period, which is useful if you want to analyze

only a section of the series, e.g. March-1990 to January-1991. However, if you want to analyze the full

data set then these fields must be empty.

Figure 3: Example descriptive analysis

Popup window

i

a

g

n

o

s

t

i

c

s

R

e

p

o

r

t

(

)

i

a

g

n

o

s

t

i

c

s

R

e

p

o

r

Period of

analysis

Page 6: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

After selecting the variable to be analyzed as shown in Figure 3, proceed to click on the Descriptives

button and the results can be seen on the R console (see Figure 4).

Figure 4: Descriptive analysis

For graphical analysis, you can generate different types of automatic graphics, which are generated for

all variables. If you want to work with monthly climatological information (monthly average temperature

and monthly total for precipitation) you have to select the Monthly Analysis Type option, then click any

of the buttons (Plot Charts, Graphs Scatter plots or Boxplot) and a message with the location of the

graphs generated will appear (see Figure 5).

Figure 5: Automatic graphical analysis

Option to monthly

graphs

R Console

Page 7: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

Another option is custom shape graphics: By clicking on the module buttons Custom Graphics a window

will appear, where the fields will need to be specified for the x and y arguments and the according

variables can be chosen by a dropdown list.

Other attributes, such as title, axis labels, color, etc. can be used to customize the graph (if you require

more information on the attributes of the graph, click on the Help button). Once the variables are

selected and the attributes are modified, you can click OK and a new window will display the graph (see

Figure 6).

Figure 6: Custom graphics

3. Quality control

An important aspect to consider for the analysis of climate data is quality control. This is useful to

generate criteria and/or filters in order to identify unreasonable and/or erroneous data.

Figure 7: Quality control

i

a

g

n

o

s

t

i

c

s

R

e

p

o

r

Page 8: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

In Figure 7 the Quality Control module is displayed. Here are some editable fields that have to be filled in

by the user, for example the number of standard deviations, a useful criterion for identifying outliers in a

series (the default is 3). The range of the variable has to be specified according to the expected logic

values that the variable can take.

By clicking the button Validate a window will pop up, indicating the status of each station regarding the

range set for the variable. The criteria executed in the console are (see Figure 8):

% Atypical data: This is defined as the percentage of data that are not within the following range

[ ̅ ], where ̅ and are the sample mean and sample standard deviation of the variable to

validate respectively. Note: This criterion is not suitable for the precipitation variable, which usually

has an asymmetric distribution.

% Data out of range: Indicates the percentage of data that are outside the limits defined for the

range of the variable. The data identified for this criterion will be replaced automatically by NA's.

% Data tmax <tmin: Calculated only for temperatures and indicates the percentage of data in which

the maximum temperature was lower than the minimum temperature on the same date. The data

identified for this criterion will be replaced automatically by NA's.

% Data variation ≥ 10 (TM_10): Only calculated for temperature variables, and indicates the

percentage of days in which the variation of temperature data over another one was higher than or

equal to 10°C.

% Consecutive data: Identifies the equal data in a period longer than five consecutive days in the

analyzed time series and these are replaced by NA's.

Figure 8: Criteria for the quality control

For outliers data and TM_10 filters, different files will be created for each of the stations in Excel. There

you will find the data that were identified before, accompanied by their respective date. It is up to the

user to replace data identified by these filters by NA’s. This has to be performed manually on the files

generated in the Missing Data folder, where you can find the files after you have completed the Quality

Control of all variables (see Figure 9).

Page 9: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

File folders unreasonable

and/or erroneous data for each

station

If you want to replace the data

identified in the Quality Control

by NA's should be done on

these files.

Figure 9: Identification and replacement of unreasonable data by NA's

Figure 10: Creating the preliminary report

By clicking the button you can generate a pre-report and a Word file is automatically created with a

report. This report includes a preliminary descriptive analysis and further criteria generated in the

Quality Control module, supplemented with the graphics made by the application. The pre-report will be

stored in the directory listed in the popup window, as shown in Figure 10.

4. Missing data

Filling missing data is performed using the R package RMAWGEN which from VAR model estimation

performs data filling. Importantly, this methodology is useful when you have low percentages of NA data

and when information from various stations is linked and not showing much variability.

For this module it is essential that data from several stations are in the SAME PERIOD variables for

maximum temperature, minimum temperature and precipitation because of their interaction with each

other to complete the missing data.

Page 10: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

Figure 11: Filling missing data

In Figure 11 the required fields that must be specified to fill the missing data are shown, click on the

complete data button to start. This process can take several minutes to finish.

Once the process is finished, a window appears again indicating that the process is complete. In the

Missing Data folder databases for each of the variables and graphics of the original series versus series

generated will be created (see Figure 12).

Figure 12: Location data missing files

Folders with

graphical outputs

Data files generated

(no missing data)

d

i

a

g

n

o

s

t

i

c

s

R

e

p

o

r

t

(

)

d

Page 11: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

5. Homogeneity Analysis Series

In this module, several statistical tests were implemented to analyze the homogeneity of the series:

Normality tests: These tests check whether the variable data in the study came from a normal

distribution, and if this assumption is true, parametric tests should be used. However, in case the

assumption is false, non-parametric tests are required.

Seasonality (trend): Spearman’s rank correlation* and Mann-Kendall test are proposed. For future

estimates it is necessary that this assumption of Seasonality is met.

Stability in variance: F- Test* is applied on subsets of information.

Stability in Media: Includes T-Test* and U Mann-Whiney test as non-parametric alternative to the T-

test, using the medium as a more robust statistic than the statistical average.

Note: Tests with * require of compliance with the normality assumption.

In Figure 13 some of the results obtained for this module can be seen. In this example, the variable tmax

and a significance level of 5% were used. The displayed console tables obtained for each test, which

include the p-value and the decision according to the significance level chosen for each station.

Figure 13: Homogeneity analysis

Page 12: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

For this module provides the option to generate a report that summarizes all statistical tests included in

the analysis of homogeneity. To do so, you can click on the Generate Report button.

6. Indicator calculations

You have got the following sub-modules for indicator calculations:

Annual indicators: The number of days that meet the specified condition each year (Higher than or

Lower than) is calculated. The value of the criterion defining the condition is up to the user.

Monthly Indicators: For this sub-module monthly maximum, minimum or average temperatures/

precipitation data are calculated.

To perform these calculations, you firstly need to select the period and the variable to be analyzed. In

the following the value for the indicator of interest is selected by clicking on the checkbox. Finally, the

Indicators folder Excel files will be generated with the calculated indicators (see Figure 14).

Figure 14: Calculation of annual and monthly indicators

7. ENSO Condition (El Niño/ Southern Oscillation )

RClimTool has information on ENSO conditions from 1950 to 2013 which is available on monthly (1) or

quarterly (2) intervals (see Figure 15). After selecting the period of interest you can proceed by clicking

the consultation of your interest and the results will appear in the R console (see Figure 16).

dia

gno

sti

csR

epo

rt(

)

Page 13: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

Figure 15: ENSO condition

Figure 16: Example consultation ENSO Condition

KNOWN ISSUES

One problem identified for this version is in the form of missing data: In order to carry out the data

filling, the range of dates of the variables has to contain data from January 1 of the initial year of analysis

until 31 December of the final year.

REPORT PROBLEMS

Please report any problem to Lizeth Llanos [email protected] and David Arango [email protected] including screenshots of error messages and data used for analysis. Furthermore we appreciate any suggestions that contribute to the improvement of the tool.

1

2

Page 14: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

APPENDIX A: INPUT DATA FORMAT

Files have to be in CSV format (comma delimited). You must apply different bases for each of the variables that contain the analyzed stations. These bases must comply with the following aspects:

1. Columns in the following sequence: day, month, year followed by the names of the stations. NOTE: units precipitation= mm and temperature units = degrees Celsius

2. For cases in which missing data are submitted, they have to be coded as NA; data records must

be in chronological order. Missing dates are not allowed. Example input data format for RClimTool:

Figure 17: Precipitation variable input format

Stations

names

Page 15: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org

Figure 18: Variable input format maximum temperature

Figure 19: Variable input format minimun temperature

Page 16: RClimTool USER MANUAL Uno/RClimTool...RClimTool USER MANUAL ... Spearman’s rank correlation* and Mann-Kendall test are ... on ENSO conditions from 1950 to 2013 which is available

www.aclimatesectoragropecuariocolombiano.org