discovering linkages between catchment characteristics and

16
Discovering linkages between catchment characteristics and water quality using catchment classification Ankit Deshmukh 1 , Riddhi Singh 2 , Ashok Samal 3 1 Indian Institute of Technology Hyderabad, India 2 Indian Institute of Technology Bombay, India 3 University of Nebraska, Lincoln, USA

Upload: others

Post on 07-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Discovering linkages between catchment characteristics and water quality using catchment classification

Ankit Deshmukh1, Riddhi Singh2, Ashok Samal3

1 Indian Institute of Technology Hyderabad, India

2 Indian Institute of Technology Bombay, India

3 University of Nebraska, Lincoln, USA

Water Future Conference 2019.

Classification helps to organize knowledge

2

adapted and recreated from Whittaker, R.H., 1969. New

concepts of kingdoms of organisms. Science, 163(3863),

pp.150-160.

adapted from RH Whittaker, Communities and

Ecosystems, 1975

Water Future Conference 2019.

Classification help to guide modelling studies in ungauged catchments and it also used to understand regional drivers of hydrology

3

Pechlivanidis and Arheimer, 2015, Hydrology and Earth System Sciences.

Clustering based on flow signaturesClustering based on physio-climatic characteristics

Water Future Conference 2019.

However key gaps remain:

4

Clustering of variables other than streamflow quantity derived metrics

Multiplicity of clustering algorithms

Relative utility of clustering/ modelling/correlation studies in disentangling

catchment behaviour

Water Future Conference 2019.

In this study, we:

5

Clustering of variables other than streamflow quantity derived metrics: explore

the value of classification in understanding drivers of water quality

Multiplicity of clustering algorithms: explore the impact of different

algorithmic choices on results

Develop a generic framework to perform classification using

multiple algorithms

Water Future Conference 2019.

A classification study to reveal how water quality metrics are explained by different catchment properties

6

Explanatory variables

Groupings based

on water quality

indicator

Groupings based

on properties

Target variables

Data cleaning,

common period

Independent

metrics/ properties

Clusters of water quality/

catchment propertiesClustering

algorithms

Catchment properties;

Water quality metrics

Select high performing

clusters Similarity

metrics

Deshmukh et al., In Preparation

Water Future Conference 2019.

a. We develop physio-climatic characteristics

dataset for 567 Indian catchments. [CC]

7

Category #Propertie

s

Climate 39

Geology 16

Hydrology 09

Land cover 48

Land use 19

Socio Economic 08

Soil 38

Topography 14

b. Water quality data is

obtained from WRIS India for

358 catchments [WQ]

Water quality data set is monthly dataset

with 33 indicators. Data availability is

different for each case.

c. We are able to find 254

common catchments, in both

datasets (CC and WQ).

We further reduce the catchments based on

the data availability in the water quality

dataset.

Water Future Conference 2019.

We clean water quality dataset and shortlist 6 indicator with 88 catchment across

8

Deshmukh et al., In Preparation

Water Future Conference 2019.

Grouping of water quality indices and catchment characteristics

Grouping Name Indicators1. Basic Temperature, pH, electrical conductivity2. BaK Temperature, pH, electrical conductivity, Potassium 3. BaSO4 Temperature, pH, electrical conductivity, Sulphate4. BaCO3 Temperature, pH, electrical conductivity, Carbonate5. KSOCO Potassium, Sulphate and Carbonate6. PhEKS pH, electrical conductivity, Potassium, Sulphate7. PhEKC pH, electrical conductivity, Potassium, Carbonate8. AllWQ Temperature, pH, electrical conductivity, Potassium,

Sulphate, Carbonate

9

Explanatory variables

Groupings based

on water quality

indicator

Groupings based

on properties

Target variables

Data cleaning,

common period

Independent

metrics/ properties

Clusters of water quality/

catchment properties

Select high

performing clusters ?

Groupin

g

Name Indicators

1. Climate Mean annual precipitation, temperature, mean

January precipitation. 2. GeolSl Unconsolidated sediments, basic volcanic

rock, siliciclastic sedimentary, acid plutonic

rock , average silt concentration3. Hydrol Topographic wetness index, stream density,

first order stream, forth order stream

4. LndCov Agriculture, barren Land5. LndUse Village, Residential cropland6. Topo Mean elevation of basin, aspect degrees,

aspect northness, slope

7. Pop Population, agriculture, GDP 8. AllNat All-natural properties except for human. 9. AllHum10. AllCC All 25 catchment characterises.

Water Future Conference 2019.

Clustering methods used in the study

10

Category of algorithm Algorithm name Features

Centroid based K-means, Partitioning around

medoid (PAM)

Each cluster is represented by a central vector, and the objects

are assigned to the clusters based on the proximity such that the

squared distance from the central vector is minimized.

Connectivity based Hierarchical (Hclust) Connectivity-based clustering is also known as hierarchical

clustering, where clustering analysis builds the cluster in an

hierarchy.

Distribution based Gaussian mixture model (GMMs) Distribution-based Clustering is an iterative process over the input

data. Each input record is read in succession.

Membership based Fuzzy C-means (FCM) Membership grades are assigned to each of the data

points(tags). These membership grades indicate the degree to

which data points belong to each cluster.

Dimensionality

reduction

Spectral clustering (Specc) spectral clustering techniques make use of

the spectrum (eigenvalues) of the similarity matrix of the data to

perform dimensionality reduction before clustering in fewer

dimensions.

Explanatory variables

Groupings based

on water quality

indicator

Groupings based

on properties

Target variables

Data cleaning,

common period

Independent

metrics/ properties

Clusters of water quality/

catchment properties

Select high performing

clusters ?

Water Future Conference 2019.

Optimum number of clusters selected by 0.75 intra to inter cluster distance criteria

11

Good clustering

Minimize intra-cluster distance

Maximize inter-cluster distance

Explanatory variables

Groupings based

on water quality

indicator

Groupings based

on properties

Target variables

Data cleaning,

common period

Independent

metrics/ properties

Clusters of water quality/

catchment properties

Select high

performing clusters ?Deshmukh et al., In Preparation

Water Future Conference 2019.

6 metrices as an external measure of similarity of clustering

12

• Adjusted rand index (ARI)

• Fowlkes mallows index

• Jaccard index

• Normalized variation of information (NVI)

• Normalized Mutual Information (NMI)

• Purity

We use 4 external measures A good similarity is:

High

High

High

Low

High

High

Explanatory variables

Groupings based

on water quality

indicator

Groupings based

on properties

Target variables

Data cleaning,

common period

Independent

metrics/ properties

Clusters of water quality/

catchment properties

Select high

performing clusters ?

Water Future Conference 2019.

Grouping between catchment characteristics and water quality indices

13

Normalized mutual information

for K-means clustering

Deshmukh et al., In Preparation

Water Future Conference 2019.

A 10 fold cross validation is performed to identify robust clustering combinations

We did 10-fold cross-validation using 90% of total catchment.

Inset window we are showing result for K-means clustering using Normalized mutual information similarity measure.

With the selected threshold we found for climate and basic 6 out of 10 cases shows agreement.

14

Water Future Conference 2019.

We chose combinations that perform well across all clustering methods and similarity metrics

15

Deshmukh et al., In Preparation

We accumulate all 10 fold cross validation results for all the similarity index for each clustering algorithms.

Water Future Conference 2019.

Result 1: Land cover properties explain variation in combination of PhEKC water quality metrics.

16

ARI:0.06

FMI:0.35

JI :0.20

NMI:0.14

NVI:0.93

Purity:0.62

Deshmukh et al., In Preparation