discovering linkages between catchment characteristics and
TRANSCRIPT
Discovering linkages between catchment characteristics and water quality using catchment classification
Ankit Deshmukh1, Riddhi Singh2, Ashok Samal3
1 Indian Institute of Technology Hyderabad, India
2 Indian Institute of Technology Bombay, India
3 University of Nebraska, Lincoln, USA
Water Future Conference 2019.
Classification helps to organize knowledge
2
adapted and recreated from Whittaker, R.H., 1969. New
concepts of kingdoms of organisms. Science, 163(3863),
pp.150-160.
adapted from RH Whittaker, Communities and
Ecosystems, 1975
Water Future Conference 2019.
Classification help to guide modelling studies in ungauged catchments and it also used to understand regional drivers of hydrology
3
Pechlivanidis and Arheimer, 2015, Hydrology and Earth System Sciences.
Clustering based on flow signaturesClustering based on physio-climatic characteristics
Water Future Conference 2019.
However key gaps remain:
4
Clustering of variables other than streamflow quantity derived metrics
Multiplicity of clustering algorithms
Relative utility of clustering/ modelling/correlation studies in disentangling
catchment behaviour
Water Future Conference 2019.
In this study, we:
5
Clustering of variables other than streamflow quantity derived metrics: explore
the value of classification in understanding drivers of water quality
Multiplicity of clustering algorithms: explore the impact of different
algorithmic choices on results
Develop a generic framework to perform classification using
multiple algorithms
Water Future Conference 2019.
A classification study to reveal how water quality metrics are explained by different catchment properties
6
Explanatory variables
Groupings based
on water quality
indicator
Groupings based
on properties
Target variables
Data cleaning,
common period
Independent
metrics/ properties
Clusters of water quality/
catchment propertiesClustering
algorithms
Catchment properties;
Water quality metrics
Select high performing
clusters Similarity
metrics
Deshmukh et al., In Preparation
Water Future Conference 2019.
a. We develop physio-climatic characteristics
dataset for 567 Indian catchments. [CC]
7
Category #Propertie
s
Climate 39
Geology 16
Hydrology 09
Land cover 48
Land use 19
Socio Economic 08
Soil 38
Topography 14
b. Water quality data is
obtained from WRIS India for
358 catchments [WQ]
Water quality data set is monthly dataset
with 33 indicators. Data availability is
different for each case.
c. We are able to find 254
common catchments, in both
datasets (CC and WQ).
We further reduce the catchments based on
the data availability in the water quality
dataset.
Water Future Conference 2019.
We clean water quality dataset and shortlist 6 indicator with 88 catchment across
8
Deshmukh et al., In Preparation
Water Future Conference 2019.
Grouping of water quality indices and catchment characteristics
Grouping Name Indicators1. Basic Temperature, pH, electrical conductivity2. BaK Temperature, pH, electrical conductivity, Potassium 3. BaSO4 Temperature, pH, electrical conductivity, Sulphate4. BaCO3 Temperature, pH, electrical conductivity, Carbonate5. KSOCO Potassium, Sulphate and Carbonate6. PhEKS pH, electrical conductivity, Potassium, Sulphate7. PhEKC pH, electrical conductivity, Potassium, Carbonate8. AllWQ Temperature, pH, electrical conductivity, Potassium,
Sulphate, Carbonate
9
Explanatory variables
Groupings based
on water quality
indicator
Groupings based
on properties
Target variables
Data cleaning,
common period
Independent
metrics/ properties
Clusters of water quality/
catchment properties
Select high
performing clusters ?
Groupin
g
Name Indicators
1. Climate Mean annual precipitation, temperature, mean
January precipitation. 2. GeolSl Unconsolidated sediments, basic volcanic
rock, siliciclastic sedimentary, acid plutonic
rock , average silt concentration3. Hydrol Topographic wetness index, stream density,
first order stream, forth order stream
4. LndCov Agriculture, barren Land5. LndUse Village, Residential cropland6. Topo Mean elevation of basin, aspect degrees,
aspect northness, slope
7. Pop Population, agriculture, GDP 8. AllNat All-natural properties except for human. 9. AllHum10. AllCC All 25 catchment characterises.
Water Future Conference 2019.
Clustering methods used in the study
10
Category of algorithm Algorithm name Features
Centroid based K-means, Partitioning around
medoid (PAM)
Each cluster is represented by a central vector, and the objects
are assigned to the clusters based on the proximity such that the
squared distance from the central vector is minimized.
Connectivity based Hierarchical (Hclust) Connectivity-based clustering is also known as hierarchical
clustering, where clustering analysis builds the cluster in an
hierarchy.
Distribution based Gaussian mixture model (GMMs) Distribution-based Clustering is an iterative process over the input
data. Each input record is read in succession.
Membership based Fuzzy C-means (FCM) Membership grades are assigned to each of the data
points(tags). These membership grades indicate the degree to
which data points belong to each cluster.
Dimensionality
reduction
Spectral clustering (Specc) spectral clustering techniques make use of
the spectrum (eigenvalues) of the similarity matrix of the data to
perform dimensionality reduction before clustering in fewer
dimensions.
Explanatory variables
Groupings based
on water quality
indicator
Groupings based
on properties
Target variables
Data cleaning,
common period
Independent
metrics/ properties
Clusters of water quality/
catchment properties
Select high performing
clusters ?
Water Future Conference 2019.
Optimum number of clusters selected by 0.75 intra to inter cluster distance criteria
11
Good clustering
Minimize intra-cluster distance
Maximize inter-cluster distance
Explanatory variables
Groupings based
on water quality
indicator
Groupings based
on properties
Target variables
Data cleaning,
common period
Independent
metrics/ properties
Clusters of water quality/
catchment properties
Select high
performing clusters ?Deshmukh et al., In Preparation
Water Future Conference 2019.
6 metrices as an external measure of similarity of clustering
12
• Adjusted rand index (ARI)
• Fowlkes mallows index
• Jaccard index
• Normalized variation of information (NVI)
• Normalized Mutual Information (NMI)
• Purity
We use 4 external measures A good similarity is:
High
High
High
Low
High
High
Explanatory variables
Groupings based
on water quality
indicator
Groupings based
on properties
Target variables
Data cleaning,
common period
Independent
metrics/ properties
Clusters of water quality/
catchment properties
Select high
performing clusters ?
Water Future Conference 2019.
Grouping between catchment characteristics and water quality indices
13
Normalized mutual information
for K-means clustering
Deshmukh et al., In Preparation
Water Future Conference 2019.
A 10 fold cross validation is performed to identify robust clustering combinations
We did 10-fold cross-validation using 90% of total catchment.
Inset window we are showing result for K-means clustering using Normalized mutual information similarity measure.
With the selected threshold we found for climate and basic 6 out of 10 cases shows agreement.
14
Water Future Conference 2019.
We chose combinations that perform well across all clustering methods and similarity metrics
15
Deshmukh et al., In Preparation
We accumulate all 10 fold cross validation results for all the similarity index for each clustering algorithms.