NEW : Surface water quality assessment (ACP/Cluster approach, and piper/Schoeller diagrams) using R

Hicham AMAR · 7 مارس 2022

Introduction

Today, surface water is becoming a serious concern and a very sensitive issue in the world, as it constitutes the most important resource for subsistence (Kumar et al., 2018; Varol et al., 2012). Access to clean water has been reduced in the last years and it’s a crucial limiting factor for economic development and improvement of the population life quality (Sener et al., 2017).

Environmental contaminants as heavy metals are the main problem of surface water quality deterioration worldwide (Cengiz et al., 2017). Pollution by these heavy metals is a serious problem due to their: i) toxicity, ii) accumulation in the biota, and iii) not readily biodegradable in the environment (Nasrabadi, 2015).

The natural sources of the heavy metals are atmospheric transport and bedrock erosion, while the anthropogenic activities are mainly agricultural activity, mining, and mineral processing with chemical and metallurgical operation (Prasanna et al., 2012; Vasistha and Ganguly, 2020).

Thirty-four surface water points were sampled in November 2017 and collected in pre-cleaned plastic bottles. The sampling was made in the study area with no seasonal analysis, to have an overview of the surface water quality in the study area.

Physicochemical parameters were measured in situ such as electrical conductivity (EC), temperature (°C), and pH. The main chemical elements (Mg2+, Ca2+, HCO3–, Na+, K+, Ni2+, Fe2+, Fe3+; Cl–, Na+, SO42-, Cd2+, Pb2+, Zn2+, Cu2+) were determined.

Basic statistics

The summary results of the physicochemical parameters and heavy metal concentrations are shown in Table 1.

كود:

#if necessseary 
pck <- c("factoextra","readxl","shiny","PerformanceAnalytics","pastecs","tidyverse","factoextra")
install.packages(pck)

#load package 
library(readxl)
library(shiny)
library(Factoshiny)
library(PerformanceAnalytics)
library(pastecs)
library(tidyverse)
library(factoextra)
#import data from file 
## You need juste to change thefile path  
#Data <- read.csv2("C:/Users/hicha/Desktop/Bureau/khdoje/Spatial parameters/Data_Ziz-csv.csv")
Data <- read.csv2("C:/Users/hicha/Desktop/Desktop_Hicham/Bureau/Khadija_Dani-2020/Data_KD.csv")
#prepare data for statistical approach 
rownames(Data) <- Data[,1]
Data_Ziz.csv<- Data[,-1]
#Data_Ziz.csv
#basic statictics 
stat.desc(Data_Ziz.csv)

Min	Median	Max	Avg	SD	Control standard	Number of observations over Control standard
Temp °C	4.0	17.5	49	19.0	8.8	<25	6
pH	6.7	7.6	10.0	7.8	0.8	6.5-8.5	5
EC 25°C	110.0	889.2	14380.0	2174.1	3720.5	≤1300	13
Ca2+	22.0	84.0	505.0	111.8	99.9	≤200	3
Mg2+	16.8	47.9	153.8	54.6	34.0	≤50	17
Na+	10.9	66.5	2584.2	251.3	540	≤20	31
K+	0.5	2.8	18.8	3.5	3.8	10 -15	34
HCO3–	46.4	205.5	549.0	191.0	95.6	≤518	1
CO3–	0.0	0.0	0.0	0.0	0,0	–	0
Cl–	7.6	81.7	4565.7	531.4	1006.5	<300	12
SO42-	37.0	142.2	1442.4	240.2	39.6	<250	7
PO43-	0.0	0.0	0.0	0.0	0,0	≤0.5	0
NO3–	0.1	10.1	39.5	15.0	12.7	10-25	25
Fe2+	0.0	20.4	33.1	19.5	8.4	<5	31
Cd2+	0.0	0.0	0.0	0.0	0,0	≤3	0
Cu2+	0.0	4.5	20.1	5.0	4.8	<0.05	26
Fe3+	0.0	21.9	33.1	18.1	10.6	≤1	27
Ni2+	0.0	0.0	8.9	1.0	2.5	≤ 20	0
Pb2+	0.0	29.8	283.2	41.9	56.9	≤10	22
Zn2+	0.0	16.3	42.9	14.8	10.7	≤1	28

Table 1 : Basic statistics of the metals in the studied samples compared to standard values

Identification of the water facies

The piper diagram is used to graphically present the chemistry of water samples (Piper, 1944). The magnesium, calcium, and potassium plus sodium cations (%), and the chloride, sulphate, and hydrogen carbonate plus carbonate anions (%) of these six ions groups are projected in a graph. Two separate ternary plots are represented by cations and anions and projected onto a rhomb.

كود:

#install packages
install.packages("remotes") #to install the remotes package
remotes::install_github("USGS-R/smwrGraphs")
remotes::install_github("USGS-R/smwrData")
remotes::install_github("USGS-R/smwrBase")

#Load packages
library(remotes)
library(dplyr)
library(smwrBase)
library(smwrGraphs)
#Data preparation 
Data_Ziz.csv
# Visualisation of the columns 
colnames(Data_Ziz.csv)
# Data transformation of teh analysed data 
df <- transform(Data_Ziz.csv, Ca.meq = conc2meq(Ca, "calcium"), 
                Mg.meq = conc2meq(Mg, "magnesium"),
                Na.meq = conc2meq(Na, "sodium"), 
                Cl.meq = conc2meq(Cl, "chloride"),
                SO4.meq = conc2meq(SO4, "sulfate"),
                HCO3.meq = conc2meq(HCO3, "bicarb"))
df.pd <- df %>% select(Ca.meq, Mg.meq, Na.meq, Cl.meq, SO4.meq, HCO3.meq) %>%
  rename(Ca = Ca.meq, Mg = Mg.meq, Na = Na.meq, Cl = Cl.meq, SO4 = SO4.meq, HCO3 = HCO3.meq)
#Production of piper diagrm
setPNG("piperplot_demo23.png", 8, 8)
clr21 <- c(c("yellow", "brown", "pink", "maroon","purple","red","blue", "green","black"))
pipr <- with(df.pd, piperPlot(Ca, Mg, Na, Cl, HCO3, SO4, 
                              Plot=list(name=rownames(df.pd), 
                                        color=colorRampPalette(clr21)(34), pch=2)))
graphics.off()
#Production of piper diagrm legend with 34 samples 
setPNG("legend_plot13.png", 8, 8)
plot(NULL ,xaxt='n',yaxt='n',bty='n',ylab='',xlab='', xlim=0:1, ylim=0:1, main = "Code samples")
legend("topleft", legend =c(1:34), ncol = 3,pch=16, pt.cex=1.5, cex=0.7, bty='n',
       col = colorRampPalette(clr21)(34))
graphics.off()
#You can paste the legend on the piper digarm figure to obtain the final figure

The Piper diagram provides that the samples identified in the study area show multi-facies which are: i) chloride, sulphated calcium, and magnesium for 2, 3, 19, 20, 13, 10, 18, 17 and 22 samples, ii) chloride, sodium and potassium for 1, 8, 9, 24, 25 and 27 samples, and iii) bicarbonate, calcium and magnesium for 28, 30, 31 and 32 samples (Fig. 3A).

Fig 1 : Identification of surface water facies using piper diagram

The Schoeller-Berkaloff diagram is a semi-logarithmic graphical representation: on the abscissa, axes are represented the different ions. For each of the major ions, the actual mg/l content is plotted on the y-axis, the points obtained are connected by straight segments. The piper and Schoeller-Berkaloff diagrams were illustrated by DIAGRAMME (Version 6.72) (Simler, 2009).

The schoeller.plot function requires as specific file structure to produce the final plot as illustrated in the figure bellow (Fig.2).

Fig.2: structure of the input file for shcoeller digram

In that context, we need to adopt the same structure file with our data file.

كود:

remotes::install_github("jentjr/manager")
library(manager)
library(dplyr)
#Display the data to select the column to use for schoeller plot 
colnames(Data_Ziz.csv)
#Those columns are "Ca","Mg","Na","K", "Cl","SO4","Alk" with Na VALUES 
df_sw <- data.frame("Surface water",1:34,Data_Ziz.csv[c(4:7, 9, 10)],NA  )
colnames(df_sw)
names(df_sw)[names(df_sw) == 'NA.'] <- 'Alk'
#reshape the file data to have the same format file using melt function 
df_swf <- melt(df_sw, id.vars=c("X.Surface.water.", "X1.34"))
#Built the demanded file for schoeller plot 
df_swf2 <- data.frame(df_swf$X.Surface.water., df_swf$X1.34, df_swf$value,"NA", "mg/L",df_swf$variable )
colnames(df_swf2) <- c("location_id",	"sample_date",	"analysis_result",	"lt_measure",	"default_unit",	"param_name")
df_swf2
view(df_swf2)
#replace the column names by their labels
levels(df_swf2$param_name)
df_swf2$param_name <- with(df_swf2, factor(param_name, 
                           levels = c('Ca', 'Mg', 'Na','K', 'Cl', 'SO4','Alk'  ), 
                     labels = c("Calcium, dissolved", "Magnesium, dissolved","Sodium, dissolved", "Potassium, dissolved", "Chloride, total","Sulfate, total", "Alkalinity, total (lab)")))
#Execute the schoeller_plot
df_swf2 %>%
  filter(location_id %in% c("Surface water")) %>%
  schoeller_plot(., facet_var = "location_id", title = "Scholler Plot of water surface")

The Schoeller-Berkalof diagram was used to determine the chemical profile of each sample (Fig.3).

Fig.3 : Schoeller digram

Statistical approaches to evaluate surface water pollution

** Data suitability for factor analysis

Kaiser-Meyer-Olkin (KMO) Test was measured to evaluate the data suitability for factor analysis (Kaiser, 1974).

كود:

#[B]Data suitability for factor analysis[/B]
kmo <- function(x)
{
  x <- subset(x, complete.cases(x))
  # Omit missing values
  r <- cor(x)
  # Correlation matrix
  r2 <- r^2
  # Squared correlation coefficients
  i <- solve(r)
  # Inverse matrix of correlation matrix
  d <- diag(i)
  # Diagonal elements of inverse matrix
  p2 <- (-i/sqrt(outer(d, d)))^2
  # Squared partial correlation coefficients
  diag(r2) <- diag(p2) <- 0
  # Delete diagonal elements
  KMO <- sum(r2)/(sum(r2)+sum(p2))
  #Equation for KMO test
  MSA <- colSums(r2)/(colSums(r2)+colSums(p2))
  #Equation for individual MSA
  return(list(KMO=KMO, MSA=MSA))
}
colnames(Data)
kmo(Data_Ziz.csv)

As the KMO test value is 0.7 as middling and near to 0.8, the sample is generally adequate for factor analysis.

> kmo(Data_Ziz.csv)
$KMO
[1] 0.6631314
$MSA
Temp pH EC Ca Mg Na K HCO3
0.5900944 0.6108969 0.7167425 0.7563461 0.6807744 0.6822760 0.6752804 0.5994221
Cl SO4 NO3 Fe_II Cu Fe Ni Pb
0.9104867 0.7258076 0.6045702 0.6431057 0.4731010 0.5962081 0.4798145 0.6106080
Zn
0.4409104

** Visualize Correlation Matrix using Correlogram

Correlation matrix is used to analyze the correlation between multiple variables at the same time.

كود:

#chart.Correlation
chart.Correlation(Data_Ziz.csv, histogram=TRUE, pch=23)

Fig.4: Variables correlation using chart.Correlation function

Correlogram is a graph of correlation matrix. Useful to highlight the most correlated variables in a data table. In this plot, correlation coefficients are colored according to the value.

كود:

library(corrplot)
library(RColorBrewer)
cor_data <-cor(Data_Ziz.csv)
corrplot(cor_data, type="upper", order="hclust",
         col=brewer.pal(n=8, name="RdYlBu"))

Fig.5: Variables correlation using corrplot package

** Principal component analysis

The dataset consisting of quantitative variables of different individuals can be analysed and visualised using principal component analysis (PCA). The variance of a large dataset with several variables was explained with PCA as efficient pattern recognition (Khoshnam et al., 2017; Sharma et al., 2020).

Figs.5 to 7 illustrate the results of PCA analysis using R software.

كود:

##############################PCA#########################
Out_PCA <- PCA(Data_Ziz.csv)
### PCA with supplementary variables
summary(Out_PCA, nbelements=Inf)
Out_PCA$eig
Out_PCA$ind
Out_PCA$var
### Description of the dimensions
dimdesc(Out_PCA)
dimdesc(Out_PCA, proba=0.2)
## Percentage of explained variance
fviz_eig(Out_PCA, addlabels = TRUE, ylim = c(0, 50))

Fig 5 : Percentage of explained variance

Four factors PC(1) to PC(4) in the samples led to the reduction of the initial dimensions of the dataset (Fig.5) with a cumulative variation of 75.7 %, where PC(1), PC(2), PC(3), PC(4) explain 33.8 %, 18.4 %, 14.9 % and 8.6 % of the total variance, respectively.

كود:

### Drawing individuals according to the competition
plot(Out_PCA, cex=1.2, invisible="quali", title="Individuals PCA graph")
### Graph for dimensions 1 and 2
plot(Out_PCA, choix="var", title="Variables PCA graph", axes=1:2)

PCA loadings of variables and individuals of significant principal components

PC(1) is highly loaded on Na+, EC, Ca2+, Cl–, K+ and SO42- (Fig. 6). These parameters are widely distributed in 5, 24 and 25 samples (Fig. 7).

PC(2) is loaded onto Fe3+, Fe2+ , Cu2+, Pb2+, and Zn2+ which are significantly distributed in 6, 7, 10 to 13 samples.

While PC(3) is highly loaded with HCO3–, NO3–, Mg2+ and pH. PC(3) shows high scores for 4, 15, 16, 18, 19, 21 samples.

Fig 6 : Projection of the statistical variables in the factorial plane F1-F2

Fig 7 : Representation of water points of the factorial plan F1×F2

** Clusters of variables and the water samples studied

Group identification of samples based on the similarity of the heavy metal content is used by cluster analysis (CA) (Nasrabadi, 2015; Sharma et al., 2020). The CA was employed to measure the distance separating clusters characterised by similar metal concentrations (Prasanna et al., 2012).

Euclidian distance for hierarchical cluster analysis (CA) was performed using R software to i) understand the groupings of physicochemical parameters and ii) illustrate the similarity of different surface water samples. Grouping the parameters in the same cluster means that these last have a common source as their origin. The Visual observation of the dendogram in Fig.8 and 9 allow to classify water samples and variables. This dendogram offers the ability to understand the correlation between different elements. HCA was used to bulk the concentration data using average method with Euclidian distances. “Average” method use the average distance between all pairs of different clusters for the distance between observations in each cluster (Yoo et al., 2020). The classification of the 34 samples points and the 17 variables into cluster is formed.

كود:

##############################cluster of variables#############
#cluster for samples 
# Hierarchical ascendant clustering for samples
#-you have the ability to define the number of cluster (n=-1 is by default)
res.hcpc <- HCPC(Out_PCA,nb.clust=4, metric="euclidean", method="ward")

Four clusters are generated by the CA analysis based on the samples analysed. The parameters Pb2+, Mg2+, HCO3–, NO3–, Ni2+, Temp, K+, Ni2+, Ca2+, pH, Zn2+, Cu2+, Fe2+ and Fe3+ form cluster 1; Cl– for cluster 2, and Na+ and EC for cluster 3, and SO42- for cluster 4 (Fig. 8).

Fig 8 : Cluster analysis of the studied variables

كود:

#################cluster of the studied samples############ 
# Hierarchical ascendant clustering for variables
data.ca <- CA(Data_Ziz.csv,graph = F,ncp = 3)
data.ca$col
# perform HCPC for rows (i.e. individuals); up until here everything works just fine
data.hcpc <- HCPC(data.ca, nb.clust=4, cluster.CA = "columns", cex=1.2, graph = T)

The studied samples are classified into four clusters (Fig. 9), one is attributed to the samples 5, 24 and 25; the second contains 1 to 3, 7 to 9, 14, 16, 20 to 23, 26, and 28 to 34; the third includes 6, 10 to 12 and 13 samples; and the fourth is allocated to 4, 15, 18, 19, and 27 samples.

Fig 9 : Cluster analysis of the studied water surface samples

You will find the detailed characterisation related to this study in Diani et al. (2021).

Hicham AMAR, Ing in Geomatic Sciences, Co-founder of Geoinfo4all.com

References

Cengiz, M. F., Kilic, S., Yalcin, F., Kilic, M., Gurhan Yalcin, M., 2017. Evaluation of heavy metal risk potential in bogacayi river water (antalya, turkey). Environ Monit Assess, 189(6), 248. doi:https://doi.org/10.1007/s10661-017-5925-3

Diani, K., Amar, H., Rida, A., Hicham, E., Nordine, N., Najlaa, F., 2021. Surface water quality assessment in the semi-arid area by a combination of heavy metal pollution indices and statistical approaches for sustainable management. Environmental Challenges, 5, 100230.

Kaiser, H. F., 1974. An index of factorial simplicity. Psychometrika, 39(1), 31-36.

Khoshnam, Z., Sarikhani, R., Ghassemi Dehnavi, A., Ahmadnejad, Z., 2017. Evaluation of water quality using heavy metal index and multivariate statistical analysis in lorestan province, iran. Journal of Advances in Environmental Health Research, 5(1), 29-37.

Kumar, V., Sharma, A., Kumar, R., Bhardwaj, R., Kumar Thukral, A., Rodrigo-Comino, J., 2018. Assessment of heavy-metal pollution in three different indian water bodies by combination of multivariate analysis and water pollution indices. Human and Ecological Risk Assessment: An International Journal, 26(1), 1-16. doi:10.1080/10807039.2018.1497946

Nasrabadi, T., 2015. An indexapproach tometallic pollution in riverwaters. International Journal of Environmental Research, 9(1), 385-394. doi:https://doi.org/10.22059/IJER.2015.910

Piper, A. M., 1944. A graphic procedure in the geochemical interpretation of water‐analyses. Eos, Transactions American Geophysical Union, 25(6), 914-928.

Prasanna, M. V., Praveena, S. M., Chidambaram, S., Nagarajan, R., Elayaraja, A., 2012. Evaluation of water quality pollution indices for heavy metal contamination monitoring: A case study from curtin lake, miri city, east malaysia. Environmental Earth Sciences, 67(7), 1987-2001. doi:https://doi.org/10.1007/s12665-012-1639-6

Sener, S., Sener, E., Davraz, A., 2017. Evaluation of water quality using water quality index (wqi) method and gis in aksu river (sw-turkey). Sci Total Environ, 584-585, 131-144. doi:https://doi.org/10.1016/j.scitotenv.2017.01.102

Sharma, A., Ganguly, R., Kumar Gupta, A., 2020. Impact assessment of leachate pollution potential on groundwater: An indexing method. Journal of Environmental Engineering, 146(3), 05019007. doi:https://doi.org/10.1061/(asce)ee.1943-7870.0001647

Simler, R., 2009. Diagrammes: Logiciel d’hydrochimie multilangage en distribution libre. Laboratoire d’Hydrogéologie d’Avignon, France.

Varol, M., Gökot, B., Bekleyen, A., Şen, B., 2012. Spatial and temporal variations in surface water quality of the dam reservoirs in the tigris river basin, turkey. Catena, 92, 11-21. doi:https://doi.org/10.1016/j.catena.2011.11.013

Vasistha, P., Ganguly, R., 2020. Assessment of spatio-temporal variations in lake water body using indexing method. Environ Sci Pollut Res Int, 27(33), 41856-41875. doi:https://doi.org/10.1007/s11356-020-10109-3

Yoo, C., Yoo, Y., Um, H. Y., Yoo, J. K., 2020. On hierarchical clustering in sufficient dimension reduction. Communications for Statistical Applications and Methods, 27(4), 431-443. doi:https://doi.org/10.29220/csam.2020.27.4.431

متابعة القراءة...

NEW : Surface water quality assessment (ACP/Cluster approach, and piper/Schoeller diagrams) using R

Hicham AMAR

Guest