- Research
- Open access
- Published:
Machine learning downscaling of GRACE/GRACE-FO data to capture spatial-temporal drought effects on groundwater storage at a local scale under data-scarcity
Environmental Systems Research volume 13, Article number: 38 (2024)
Abstract
The continued threat from climate change and human impacts on water resources demands high-resolution and continuous hydrological data accessibility for predicting trends and availability. This study proposes a novel threefold downscaling method based on machine learning (ML) which integrates: data normalization; interaction of hydrometeorological variables; and the application of a time series split for cross-validation that produces a high spatial resolution groundwater storage anomaly (GWSA) dataset from the Gravity Recovery and Climate Experiment (GRACE) and its successor mission, GRACE Follow-On (GRACE-FO). In the study, the relationship between the terrestrial water storage anomaly (TWSA) from GRACE and other land surface and hydrometeorological variables (e.g., vegetation coverage, land surface temperature, precipitation, and in situ groundwater level data) is leveraged to downscale the GWSA. The predicted downscaled GWSA datasets were tested using monthly in situ groundwater level observations, and the results showed that the model satisfactorily reproduced the spatial and temporal variations in the GWSA in the study area, with Nash-Sutcliffe efficiency (NSE) correlation coefficient values of 0.8674 (random forest) and 0.7909 (XGBoost), respectively. Evapotranspiration was the most influential predictor variable in the random forest model, whereas it was rainfall in the XGBoost model. In particular, the random forest model excelled in aligning closely with the observed groundwater storage patterns, as evidenced by its high positive correlations and lower error metrics (Mean Absolute Error (MAE) of 54.78 mm; R-squared (R²) of 0.8674). The downscaled 5 km GWSA data (based on random forest) showed a decreasing trend in storage associated with variability in the rainfall pattern. An increase in drought severity during El Niño lengthened the full recovery time of groundwater based on historical storage trends. Furthermore, the time lag between the occurrence of precipitation and recharge was likely controlled by the drought intensity and the spatial recharge characteristics of the aquifer. Projected increases in drought severity could further increase groundwater recovery times in response to droughts in a changing climate, resetting storage to a new tipping condition. Therefore, climate change adaptation strategies must recognise that less groundwater will be available to supplement the surface water supply during droughts.
Introduction
Climate change poses a significant challenge to water resources in Sub-Saharan Africa, where forecasts predict not only a warming trend but also increased aridity and altered rainfall patterns, particularly a reduction in southern Africa (Mupangwa et al. 2023). These environmental shifts have direct implications for both surface and groundwater storage, which are crucial components of terrestrial water storage (TWS) (Serdeczny et al. 2017). The TWS encompasses all the water stored within the land environment, including lakes, groundwater, soil and rivers (Rodell et al. 2009). Traditionally, TWS measurements involve resource-intensive methods that directly observe hydrological variables in the water budget equation. However, the advent of the Gravity Recovery and Climate Experiment (GRACE) satellite mission in 2002 and its successor, GRACE Follow-On (GRACE-FO), revolutionised this process by providing a remote, geodetic approach to monitor water storage globally (Ferreira et al. 2023; Humphrey et al. 2023).
Groundwater storage (GWS) is a critical constituent of TWS because it is indispensable for human utilisation. Several studies concerning water availability and usage have been conducted in the Barotse catchment where groundwater was identified as the principal source of the rural water supply (Banda et al. 2021; Chongo et al. 2011; Milupi et al., 2022). GRACE/GRACE-FO has made GWS quantifiable at global scales; nevertheless, the coarse spatial resolution of the dataset limits its application at local and regional scales, necessitating the use of downscaling methods to improve its spatial resolution and utility. Downscaling can be accomplished using dynamic approaches, which use physically based global models that are computationally expensive, or statistical methods that leverage relationships between large-scale and small-scale data to improve local estimates (Fatolazadeh et al. 2022; Zuo et al. 2021).
Downscaling groundwater storage estimates from GRACE/GRACE-FO data has increased in dependence on high-resolution hydrometeorological variables to improve spatial resolution. Early methods focused on simple correlations, such as the study by Yin et al. (2018), which primarily utilised evapotranspiration (ET) data, leveraging the relationship between ET and groundwater storage by employing a correlation regression approach. However, this approach can be limiting when applied in different geological settings. More thorough techniques, such as those employed by: Kalu et al. 2024; Khorrami et al. 2023; Ning et al. 2014; and Zhong et al. 2021; integrate a wider variety of variables, such as vegetative indices, soil moisture and precipitation, to enable a deeper examination through preliminary correlation tests. This improvement highlights the importance of integrating different datasets to obtain precise and accurate downscaling outcomes (Yin et al. 2022).
Several past studies have incorporated various machine learning (ML) algorithms, such as: artificial neural networks (ANN); random forest (RF); boosted regression tree; extreme gradient boosting (XGBoost); and deep learning, to downscale GRACE satellite data to produce GWS variations at high resolution (Agarwal et al. 2023; Ali et al. 2024; Chen et al. 2019, 2020; Milewski et al. 2019; Miro and Famiglietti 2018; Rahaman et al. 2019). In their study, Chen et al. (2019) employed the RF algorithm to downscale the resolution of terrestrial water storage anomaly (TWSA) and groundwater storage anomaly (GWSA) data, which was achieved by integrating six hydrological variables. The results indicated a maximum Nash-Sutcliffe efficiency (NSE) of 0.68 and a correlation coefficient (R) of 0.83. Rahaman et al. (2019) employed the RF model to downscale GRACE-derived GWSA data. The authors observed a notable increase in the NSE values, ranging from 0.58 to 0.84; specifically, inside the Northern High Plains aquifer. The researchers successfully generated analytics depicting changes in the GWSA from 2009 to 2016, achieving a greater level of resolution. Ali et al. (2021), utilised RF and ANN techniques to downscale GRACE TWSA and GWSA data in the Irrigated Indus Basin Irrigation System, and the R values obtained ranged from 0.67 to 0.99. Previous studies have shown various limitations that have been identified in the machine learning-based downscaling of GRACE/GRACE-FO data. These include issues with data handling and variable selection, such limitations can affect the accuracy and reliability of the downscaled groundwater storage estimates. Furthermore, variability of machine learning output accuracy depends not only on the choice of input variables but also on the selection of ML algorithms and the way these models are built (Foroumandi et al. 2023; Satizabal-Alarc et al. 2023; Seyoum et al. 2019; Tao et al. 2023; Yazdian et al. 2023). Additionally, there is a need to explore better approaches for setting up machine learning models.
The Upper Zambezi catchment in southern Africa is a climate hotspot area, and projections predict a temperature increase more than twice as high as the global rate (Engelbrecht et al. 2015). This will have serious implications for the hydrological dynamics leading to increased evaporation and low soil moisture content. This study therefore aimed to use machine learning techniques to downscale the GWS derived from the GLDAS (Global Land Data Assimilation System) dataset, which incorporates GRACE/GRACE-FO estimates. The GLDAS GWS dataset was directly used for the downscaling process due to its detailed integration of terrestrial water and energy fluxes, including the high-precision harmonic solutions of GRACE/GRACE-FO data. This dataset separates soil moisture from TWS, thereby providing a distinct representation of GWS (Liu et al. 2015; Rodell et al. 2004; Save et al. 2016; Scanlon et al. 2012; Zaitchik et al. 2008). The capabilities of the XGBoost and RF algorithms were harnessed to perform downscaling, leveraging their strengths in handling complex, nonlinear data relationships. RF and XGBoost were chosen for their superior performance in the context of downscaling GRACE/GRACE-FO data. RF’s ability to handle large, multi-dimensional datasets and its resistance to overfitting, along with XGBoost’s high accuracy, efficiency and scalability, make them ideal. Under certain conditions, these algorithms outperform others such as: ANN, which requires extensive tuning and computing resources; k-nearest neighbours (kNN), which is inefficient with large datasets; and support vector machine (SVM), which struggles with large-scale data and multiclass problems. These strengths ensure accurate downscaling of satellite-derived data (Breiman 2001; Jyolsna et al. 2021; Rodriguez-Galiano et al. 2012). The unique contribution of this study was that it applied a unique threefold approach which integrated: data normalization; interaction of hydrometeorological variables; and application of a time series split for cross-validation (Bhanja and Das 2019; Izonin et al. 2022). This combination resulted in an improved distributed estimation of groundwater storage and depletion variations, leading to the development of a novel locally relevant remote sensing-assisted spatial water balance approach for identifying climatic effects (droughts) on groundwater storage. The results of this research are relevant for the development of water resource management interventions.
Materials and methods
The study area
The Barotse catchment is a major sub basin in Zambia’s western and southern provinces that is situated in the Upper Zambezi River Basin. This vast catchment spans 402 km from north to south and 530 km east to west, with an approximate total area of 106,486 km2 (Chomba et al. 2022).
With an average slope of only 0.015%, the catchment is characterised by Kalahari sands. It consists of a trellis drainage system maintained by the Luanginga, Lungwebungu and Kabompo Rivers, which drain into the Zambezi River, and the landscape’s elevation varies from approximately 1,187 m above sea level in the north east to 993 m in the south (Banda et al. 2019). The Zambezi River flows through this region, with many channels moving from upstream at Lukulu, to midstream at Mongu, and downstream at Senanga. The main factors influencing the hydrological dynamics of floodplains are yearly flooding sequences and rainfall (Money 1972). The catchment is an important hydrological and ecological zone, with an annual rate of evaporation reaching 1,578 mm. As noted by Banda et al. (2023) in their study, the region is characterised by an unconfined Kalahari aquifer system that exhibits specific yield rates that vary from 0.04 to 0.28, converging around a median value of 0.16. Silts, sands and sandstones with weak cement that compose the Kalahari aquifer are found there. The alternating clay layers in the region can sometimes lead to the formation of perched water habitats. The recharge of the Kalahari aquifer is primarily driven by seasonal rainfall, and water is supplied to the wetland area by this recharge. With aquifer transmissivity ranging from 0.44 m³/day to 63 m²/day, the transmissivity of the aquifers varies greatly. Its exact yield, which might vary from 3 to 10 L per second, is likewise unpredictable (Beilfuss 2012; Makungu and Hughes 2021).
The wet season in the Barotse spans from October to May of the following year (Pasqualino et al. 2015). Some academics contend that there is a knowledge gap, whereby what indigenous people currently understand about droughts and floods does not always align with what is confirmed by quantitative assessments (Mapedza et al. 2022). The Barotse Catchment is also a renowned tourist attraction because it hosts the Kuomboka Ceremony, which celebrates the evacuation of the monarch of the Lozi people to higher land before the commencement of the flood season (Cai et al. 2017).
The study utilised observation well data from three wells situated in the: Lukulu; Mongu; and Senanga districts, which correspond to upstream, midstream, and downstream locations along the Zambezi River. These wells were selected as they uniquely provide time series water level data within the region, offering daily measurements from December 2019 to October 2020, thus serving as the sole sources of time series water level data in the region. The wells vary in depth, but all exhibit a near-surface water table at a depth of less than approximately 10 m. The study region is depicted in Fig. 1 below.
Study design
The study applied a statistical downscaling method using machine learning algorithms, where the GLDAS-GRACE GWS dataset was the target feature and utilised hydrometeorological variables as input features. The goal was to enhance the spatial detail of GLDAS-GRACE GWS data to a fine 5 km scale, as highlighted by the flow chart in Fig. 2.
a) Pre-processing
In order to ensure that every input variable had the same spatial and temporal resolution before being passed to the machine learning model, the spatial resolution and temporal structure of the input data was modified. The dependent variable in this study was the GWS derived from the GLDAS dataset, which incorporates GRACE/GRACE-FO data. This approach allowed the utilisation of the groundwater storage dataset provided by GLDAS which separates the groundwater component from the terrestrial water storage.
The independent variables included various hydrological variables that are essential components of the water balance, such as precipitation, soil moisture and evapotranspiration. These variables were selected to capture the complex interactions within the hydrological cycle that influence groundwater storage.
The target spatial resolution of 5 km was achieved through two processes: resampling high-resolution variables; and interpolating lower-resolution datasets. High-resolution datasets were directly resampled to the 5 km resolution. For lower-resolution datasets, bilinear interpolation was applied to increase their spatial resolution to 5 km. Bilinear interpolation was chosen due to it providing a smoother transition between pixels compared to simpler methods like nearest-neighbour interpolation, thereby preserving the spatial patterns of the data.
Additionally, all input data with daily or weekly temporal resolutions were aggregated into monthly means to align with the temporal structure of the GWS dataset.
b) Feature engineering and data normalization
The datasets were normalized using the Z score approach to standardise values and lessen the influence of outliers to optimise the machine learning process. The dataset’s prediction potential was subsequently increased by feature engineering, which included variable multiplication, to ensure that the intricate relationships observed in the hydrometeorological data were appropriately captured.
c) Downscaling
Downscaling through the use of RF and XGBoost-based machine learning models was the main focus of the research.
d) Validation
Groundwater level measurements from on-site observation wells were used to validate the downscaled model outputs.
e) Hypothesis testing
Thorough hypothesis testing was required in the final stage to validate the model outputs’ statistical significance in relation to the predictions that were made.
The programming of the downscaling model was carried out using Python 3.11.4, and the cartographic work was performed in ArcGIS Pro.
Datasets and processing
The study period, spanning from January 2009 to December 2020, was carefully selected to encompass a significant climatic event, as the study aimed to capture the impact of the 2015–2016 El Niño event as reported by Kolusu et al. (2019). This period provided a comprehensive dataset that includes significant climatic variability, which was crucial for the analysis. The datasets utilised in the study are detailed below and summarised in Table 1. Each dataset was carefully selected and processed to ensure compatibility with the GLDAS-GRACE GWS data and the specific needs of the downscaling approach.
GRACE/GRACE-FO
The measurement of Earth’s gravitational field by the GRACE/GRACE-FO mission has been an essential component of Earth observation efforts, providing insights into changes in ice sheets, water reservoirs and crustal movements. The German Research Centre for Geosciences (GFZ), Jet Propulsion Laboratory (JPL) and the University of Texas’ Centre for Space Research (CSR) are the three main analysis centres supporting this project (Byron et al. 2019). For the purpose of researching changes in terrestrial water storage, each centre processes GRACE data in a distinct manner, resulting in a variety of gravity field solutions.
This study integrates high-resolution hydrometeorological data with the CSR dataset to obtain a localised understanding of groundwater dynamics and to improve the management of water resources (Landerer and Swenson 2012).
GLDAS
In the effort to offer estimates of land surface states and fluxes, GLDAS combines land surface modelling with data from satellites and ground-based observations. At a spatial resolution of 0.25°, the GLDAS Catchment Land Surface Model (CLSM) Version 2.2 provides groundwater storage data. Detailed elements such as soil moisture, snow water equivalent and canopy water storage are provided by land surface models such as NOAH (Zaitchik et al. 2008). The residual, which is obtained by deducting these elements from the total TWS as determined by GRACE, shows the variations in groundwater storage. To effectively assess global groundwater variability, groundwater storage estimations from GLDAS use data assimilation techniques using GRACE observations from the CSR solutions (Rodell et al. 2004).
In this work, downscaling GRACE groundwater storage using high-resolution hydrometeorological variables critically depends on GLDAS data. Through the utilisation of GRACE’s gravimetric solutions processed by GLDAS, the study endeavoured to enhance the forecasts of groundwater availability at local scales (Li and Rodell 2015). The study refers to the GLDAS derived GWS dataset as the GLDAS-GRACE GWS, reflecting the incorporation of the GRACE/GRACE-FO CSR solutions in the GLDAS dataset to generate its groundwater storage component.
FLDAS
Optimised fields of land surface states and fluxes are produced globally by the Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS) Model L4 Global Monthly (McNally et al. 2022). Given that soil moisture fluctuations and water availability are important factors in hydrogeology, this dataset helps to pinpoint regions that are critical for groundwater recharge. This study uses the soil moisture parameter at a depth of 100 to 200 cm underground. This deeper soil moisture data provides a more stable and reliable input for downscaling, reflecting long-term trends and subsurface hydrogeological processes critical for predicting groundwater storage (van der Schalie et al. 2017).
MODIS
The National Aeronautics and Space Administration (NASA) Terra satellite provides daily land surface temperature (LST) data at a resolution of one kilometre as well as normalized difference vegetative index (NDVI) and enhanced vegetative index (EVI), packaged as the Moderate Resolution Imaging Spectroradiometer (MODIS) MOD11A1 version 6 and MOD13A2 version 6 datasets respectively (Didan, 2021; Wan et al, 2021). The study of the surface energy balance, ecosystem health and effects of climate change requires the use of LST data. Whereas the indices, which are compiled every 16 days at a resolution of one kilometre, are essential for determining the presence and amount of green vegetation as well as biomass, have the potential to identify regions with near-surface groundwater. While the EVI accounts for atmospheric conditions and canopy background signals, the NDVI is primarily associated with biomass productivity (Gstaiger et al. 2012).
WaPOR
Provided by the Food and Agriculture Organisation of the United Nations (FAO), the Water Productivity Open-access Portal (WaPOR) offers comprehensive ET datasets for the Near East and Africa (FAO 2020). WaPOR version 2.2 provides information on water consumption and stress in various landscapes through the integration of remote sensing technology (Zimba et al. 2024).
CHIRPS
The Climate Hazards Centre at the University of California Santa Barbara provides a moderate resolution precipitation dataset named the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) (Funk et al. 2015). For trend analysis and drought monitoring, CHIRPS generates a rainfall time series dataset using in situ station data and satellite imagery at a spatial resolution of 5 km and availability from 1981 to the present.
Well data
Validating remotely sensed and modelled groundwater data requires observation well data, which are measured as water levels or head in metres (Gleeson et al. 2011). This dataset contains direct measurements, taken either manually or automatically, from wells drilled into aquifers. Head measurements provide accurate information about groundwater conditions, seasonal variations and long-term trends by indicating the height of the water column above a reference point (Fan et al. 2013). The well data for this study was collected from three water loggers installed in observation wells located in: Lukulu; Mongu; and Senanga.
Groundwater storage from terrestrial water storage
The water budget equation can be modified to account for satellite data, such as GRACE/GRACE-FO data, which provide TWS changes in this estimation. Therefore, distinguishing between the main elements of TWS is crucial for isolating changes in GWS, as represented in the equation below (Rodell et al. 2009):
where:
(\(\:\varDelta\:TWS\)) is the change in total terrestrial water storage as measured by GRACE/GRACE-FO;
(\(\:\varDelta\:GWS\)) is the change in groundwater storage;
(\(\:\varDelta\:SW\)) is the change in surface water storage, including all surface reservoirs, lakes and rivers;
(\(\:\varDelta\:SM\)) is the change in the soil moisture content; and
(\(\:\varDelta\:VC\)) is the change in the vegetative cover.
Rearranging to solve for (\(\:\varDelta\:GWS)\):
Correlation of input variables
In the analysis, it was ensured that the datasets used in the ML models had significant predictive potential. Therefore, a correlation test between groundwater storage (GLDAS-GRACE GWS) and hydrometeorological variables was conducted, for the time period of January 2009 to December 2020 before passing them to the machine learning models. To better understand the relationships, we used seasonal trend decomposition with loess (STL) to isolate the trend component from seasonal and residual noise and we analysed the trend component only, which considered long-term changes, (Ouyang et al. 2021). Thereafter, the trend component was used to determine the Pearson, Kendall tau and Spearman correlations between the GLDAS-GRACE GWS and the input variables (Puth et al. 2015; Teng and Chen 2024).
Spearman’s rank correlation coefficient (ρ) measures the strength and direction of the association between two ranked variables, which can be calculated as:
where:
-
\(\:{d}_{i}\) is the difference between the ranks of corresponding values; and
-
n is the number of observations.
Kendall tau (τ) measures the association between two variables by considering the concordance and discordance of pairs of observations. It is calculated using the following formula:
where:
-
C is the number of concordant pairs;
-
D is the number of discordant pairs; and.
-
n is the number of observations.
Pearson’s correlation coefficient (r) measures the linear relationship between two variables, as represented in the equation:
where:
\(\bar x\)and \(\bar y\)are the individual sample points; and
x̄ and \(\bar y\) are the means of the x and y variables, respectively.
Feature engineering and data normalization
In machine learning frameworks, feature engineering and data normalization are essential for dataset optimisation. First, we applied Z score normalization, which recalibrates every variable to guarantee consistency and lessen the impact of outliers.
Next, we constructed interaction terms between pairs of normalized hydrometeorological variables. The dataset’s predictive power was increased by these interacted terms, which represent interactions between important hydrological and climatic elements.
Machine learning-based downscaling
We downscaled GLDAS-GRACE GWS data to a finer spatial resolution using RF and XGBoost algorithms. The data included a range of hydrometeorological variables and their interacted terms, which were normalized for consistency.
Random forest
Breiman (2001) developed the random forest algorithm, which reduces overfitting by building numerous decision trees during training and averaging their outputs. Each tree is built from a bootstrap sample of the data, with a random subset of features considered at each split introducing randomness and diversity among the trees. This ensemble method reduces the risk of overfitting and results in a model that is both accurate and able to be applied broadly. In the context of groundwater storage, RF models capture complex interactions and nonlinear relationships between the dependent and independent variables, enhancing the spatial resolution of GRACE-derived estimates (Rahaman et al. 2019). The algorithm also includes feature importance estimation, helping to identify significant predictors and handles missing data effectively.
Extreme gradient boosting (XGBoost)
XGBoost, introduced by Chen and Guestrin (2016), is a gradient boosting algorithm known for its performance and speed. Gradient boosting builds an ensemble of decision trees sequentially, where each tree corrects the errors of its predecessors, enhancing overall model accuracy. Unlike random forests, which build trees independently, XGBoost’s sequential approach and optimisation lead to significantly faster training times. XGBoost incorporates several strategies to prevent overfitting, including L1 (Lasso) and L2 (Ridge) regularisation, which control model complexity. It also prunes trees to remove branches that add little predictive power, reducing the risk of overfitting. Like RF, XGBoost supports parallel processing, further speeding up the training process. The algorithm has been used to downscale GRACE/GRACE-FO derived GWS estimates, obtaining promising results that model groundwater dynamics after validation with in situ data (Sahour 2020).
Model training and evaluation
The input datasets consisted of 25 features (7 hydrometeorological variables in their original form and 18 interacted terms generated from the hydrometeorological variables) which served as independent variables, while the GLDAS-GRACE GWS data was used as the dependent variable (Verdonck et al. 2024). The datasets were flattened into 2D arrays to fit the machine learning models. To account for the temporal structure of the data during cross-validation, scikit-learn’s `TimeSeriesSplit` was utilised. `TimeSeriesSplit` ensures that the training set always precedes the testing set, maintaining the temporal order and preventing data leakage, which is crucial for time series data (Peixeiro 2022).
For both models, a randomised cross-validation search was employed to optimise the hyperparameters. Randomised cross-validation search involves randomly sampling from a predefined set of hyperparameters, making the process more efficient compared to an exhaustive grid search (King et al. 2021). Sevenfold cross-validation and 700 fits were used in the tuning procedure for the XGBoost model and the parameters from the cross-validation search included a subsample rate of 0.7, 1,000 estimators, a maximum tree depth of 11, a learning rate of 0.005 and regularisation (Rodriguez et al. 2010). On the other hand, the RF model utilised 350 fits. The best parameters from the cross-validation search for the RF model included using 100 trees, configuring the minimum sample split and leaf, and enabling bootstrap sampling.
The accuracy of the two models was evaluated using: R-squared (R²); Mean Absolute Error (MAE); NSE; and Root Mean Square Error (RMSE) to obtain insights into their accuracy, consistency and predictive capabilities.
R² measures the proportion of the variance in the dependent variable that is predictable from the independent variables. MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. NSE is used to assess the predictive power of hydrological models. RMSE quantifies the average magnitude of the prediction errors in a model, giving an indication of the accuracy of predictions. These can be calculated using the equations below.
where:
\(\:{y}_{i}\:\)is the observed value;
\(\hat y_{i}\)is the predicted value;
\(\bar y\) is the mean of the observed values; and
\(\:{y}_{max}\)
\(\:{y}_{min}\:\)
n is the number of observations.
Validation of downscaled GWS estimates
The need to validate groundwater storage estimates derived from GRACE/GRACE-FO data is underscored by the discrepancies observed between ground measurements and those obtained from remote sensing techniques. To effectively perform this validation process, it was essential to transform data from ground-based observation wells, specifically water level readings, into water storage (WS) anomalies. This transformation hinges on detailed knowledge of the aquifer’s characteristics. In the context of this study, data for the transformation was sourced from observation wells situated in the Kalahari aquifer, located within the Barotse Flood Plain. Consequently, to adjust the water level data for the purposes of validation, the following equation was utilised.
(Nenweli et al. 2024)
where:
(\(\:\varDelta\:GWS\)) is the groundwater storage derived from the specific yield value and the water level;
(\(\:Sy\)) is the specific yield of the aquifer; and
(Δ\(\:h\)) is the water level in the observation well.
Results
Correlations of the input hydrometeorological variables
The results from the Pearson, Kendall tau and Spearman correlation analyses in Table 2 present the relationships between the GLDAS-GRACE GWS and the hydrometeorological variables.
Precipitation’s moderate correlations with GLDAS-GRACE GWS indicated a monotonic relationship, but a higher Spearman correlation indicated a potential nonlinear relationship. ET exhibited considerable linear and nonlinear components, with stronger Spearman correlations indicating a nonlinear relationship. Whereas the LST - Day had substantial negative correlations with the GLDAS-GRACE GWS, indicating a reverse connection, the Pearson and Spearman scores were similar, indicating a strong linear correlation with probable nonlinear elements. The NDVI exhibited substantial positive correlations, revealing a strong linear relationship with groundwater storage. The EVI showed lower correlations, implying a weak linear relationship with potential nonlinear features, as evidenced by the greater Spearman correlation. The LST - Night exhibited broad negative correlations, indicating a strong inverse association with a strong linear component.
This analysis served as the foundation for selecting inputs for the machine learning model. We ensured that only datasets that met the 18% correlation threshold were selected for downscaling process. This approach was critical for the models’ best performance, as it guided the inclusion of variables that have a considerable influence on groundwater storage.
The permutation feature importance technique (PFIT) was applied to identify the most influential predictor variables in the machine learning models. According to PFIT in the downscaling of GLDAS-GRACE GWS data in Table 3, the XGBoost model exhibited a concentrated reliance on one variable, which was rainfall as it emerged as the predominant predictor, indicating a strong model fit to precipitation data. This focus suggests that XGBoost effectively leveraged the high predictive power of rainfall, along with soil moisture and NDVI. Conversely, the RF model demonstrated a more distributed reliance across a broader range of features, including ET, EVI, soil moisture, NDVI and both daytime and night time LST. This broader distribution indicates that RF integrated a diverse set of variables to capture the multifaceted nature of hydrological processes, providing a more balanced approach compared to the more narrowly focused XGBoost model.
Model accuracy
Different capabilities of each model were demonstrated in their handling of the input hydrometeorological data. Hyperparameter tuning was used in the optimisation phase for both models to customise the learning process of each algorithm to the particular qualities of the GLDAS-GRACE GWS data, and the obtained accuracies and error metrics are shown in the scatter charts in Fig. 3 and the error metrics in Table 4.
When scatter plots comparing the predictions of the two models were visually analysed, it was evident that the RF model generated predictions that were less random and more evenly distributed. This indicated that the RF method of averaging several decision trees reduces overfitting and yields more consistent and reliable results.
Validation of downscaled GWS with in situ well data
We validated the downscaled GLDAS-GRACE GWS estimates where observed well data (WS anomaly) served as the ground truthing proxy and the validity of the downscaling procedure was ascertained by analysing the correlations between the residuals (detrended and deseasonalised data) of the WS anomaly and downscaled GWS anomalies from the XGBoost and RF models.
Our analysis of GWS predictions using XGBoost and RF models across three locations, DWRD in Mongu, Lukanda primary school in Senanga and Lukulu hospital in Lukulu, revealed significant insights into the models’ performance, as summarised by Fig. 4; Table 5.
The XGBoost model results at the DWRD in Mongu exhibited strong negative correlations, suggesting that accurately capturing the groundwater dynamics was difficult. On the other hand, the RF model showed a strong positive correlation, indicating its better accuracy in representing the groundwater storage that was observed. This finding implies that at this midstream site, the random forest model is more reliable.
Similarly, high positive associations were observed at Lukulu hospital, where the RF model outperformed the XGBoost model. The ability of the RF model to capture groundwater fluctuations upstream in the study region was confirmed by the consistency of the correlation metrics.
The outcomes at Senanga’s Lukanda primary school differed. The XGBoost model revealed positive associations that were modest but not statistically significant. In contrast, the RF model showed significant negative monotonic correlations but moderate negative linear relationships. This negative correlation is likely due to the lag effect that exists between surface processes and groundwater response. The inability of the models to accurately capture the groundwater dynamics at this location could be attributed to the observation of artesian wells, which are characteristic of confined aquifer systems. In such systems, groundwater is stored under pressure, leading to delayed responses to surface recharge events.
Rainfall-induced groundwater recharge
After the validation procedure, we used data spanning the entire period from January 2009 to December 2020 to examine the relationship between rainfall during the wet season and groundwater recharge. Our research concentrated on three distinct groundwater storage datasets: RF; XGBoost; and GLDAS-GRACE. The time frame from October to May of the following year was designated the wet season. The objective was to identify wet season rainfall thresholds that, if not reached, resulted in considerable decreases in groundwater recharge, as depicted in Fig. 5. Generally, with a significant reduction in rainfall across the study region, there was a linked reduction in recharge for concurrent months, as denoted by the El Niño years (2015, 2016, 2018 and 2019).
Spatiotemporal improvement in GWS estimates
Figure 6 below illustrates the visual differences between the downscaled datasets and the GLDAS-GRACE GWS data. This visual comparison highlighted the enhanced resolution and detail captured by the XGBoost and RF downscaling models. The GLDAS-GRACE GWS data provided a broad overview of groundwater storage changes, while the downscaled datasets offered a more refined view capturing localised variations and finer spatial patterns.
Discussion
Downscaling of GRACE/GRACE-FO GWS estimates using machine learning
This study used hydrometeorological variables derived from the water budget equation to downscale GRACE/GRACE-FO GWS data from GLDAS to a 5 km spatial resolution. The XGBoost and RF machine learning algorithms were employed in a monthly time series spanning from January 2009 to December 2020. The most significant inputs for the downscaling procedure were chosen by conducting an initial correlation analysis on the input variables using the GLDAS-GRACE GWS as the target. Spearman, Kendall and Pearson correlations were examined to ensure that the correlations were greater than 18% (Gemitzi et al. 2021).
The XGBoost model showed promising results, capturing a significant portion of the variance in GLDAS-GRACE GWS data, which is similar to what was reported by Sahour (2020). The model showed heterogeneity in its dependability based on local conditions, with some places showing high correlations with observed well data and others unable to adequately depict true groundwater dynamics and underestimate the anticipated GWS data. In comparison, the RF model proved to be more accurate and dependable in general, producing accuracy metrics that were similar to the findings of Ali et al. (2021). The validation results showed that this model regularly produced substantial positive correlations with the observed well data. This pattern was consistent at both upstream and downstream locations, highlighting the effectiveness of the model under different hydrogeological settings.
Both models had weakly significant correlations at places exhibiting artesian well features, which are characteristic of deep confined aquifer systems. Model predictions of confined aquifers are difficult due to their pressurised conditions, isolated recharge zones and delayed responses to surface conditions. This finding suggested that neither model works well under these conditions, indicating that unconfined aquifer systems with simpler groundwater dynamics are better suited for this machine learning-based downscaling technique (Fajar et al. 2021; Wang et al. 2015; Zhang et al. 2022).
Despite the promising results, there are several limitations to the downscaled GWS estimates. Firstly, the accuracy of the downscaling process is highly dependent on the quality and spatial resolution of the input hydrometeorological variables. Any discrepancies or errors within these input datasets can propagate through the model, leading to significant inaccuracies in the downscaled outputs (Li et al. 2011). Secondly, the performance of the models is suboptimal in regions characterised by complex hydrogeological conditions, such as deep confined aquifers. The unique pressurised conditions and isolated recharge zones inherent to these areas complicate the accurate estimation of GWS (Fan et al. 2013). Furthermore, the temporal resolution of the input data is a critical factor, as it can restrict the model’s ability to accurately capture short-term fluctuations in groundwater storage. Additionally, the validation of downscaled GWS estimates was constrained by the short-term time series data from a few observation wells, leading to unusually high correlations. This highlights the need for more extensive observational data to fully validate the models. These limitations highlight the need for ongoing refinement and validation of downscaling techniques to enhance their reliability and applicability.
Drought effects on groundwater
We determined precipitation thresholds using the 90th and 10th percentiles of the dataset as upper and lower values, respectively, for rainfall, leading to a drastic reduction in groundwater storage with methodologies inspired by Huang et al. (2015), Shilengwe et al. (2023) and Deng et al. (2022); thereafter, we determined periods that fell within or outside the determined threshold values (Table 6). According to the data, there was a considerable decrease in groundwater recharge during El Niño years (2015, 2016, 2018 and 2019) due to the cumulative rainfall falling below the lower bound of the 10th percentile threshold. These periods are marked by low groundwater storage and low rainfall, suggesting that groundwater recharge is highly sensitive to lower rainfall during these anomalous years (Leasor et al. 2020).
For example, the XGBoost dataset was used to confirm that low rainfall during El Niño years was less than 330.56 mm, which led to GWS values less than 62.76 billion m³ (XGBoost). These observations are similar to those of studies on reservoir conditions during anomalous years reported by Mathivha et al. (2024). The majority of the remaining intervals were within the threshold range, indicating typical circumstances for recharge. Rainfall totals surpassing 438.78 mm (XGBoost) were recorded in high-rainfall years such as 2010 and 2012, which resulted in GWS levels reaching 77.16 billion m³ (XGBoost). The RF dataset also revealed that rainfall and GWS were below the 10th percentile threshold during El Niño years. In 2015 and 2016, for example, rainfall was frequently less than 331.23 mm, and the associated GWSs were less than 63.20 billion m³ (RF). These results are in agreement with the findings of Kolusu et al. (2019).
Additionally, the analysis revealed several outlier periods where the GWS and rainfall were significantly outside the normal thresholds. In 2010 and 2012, there was exceptionally high rainfall and GWS values across all the datasets. The GLDAS-GRACE GWS exceeded 78.49 billion m³, the XGBoost GWS also exceeded 77.16 billion m³, and the RF GWS exceeded its GWS threshold of 77.78 billion m³, which is consistent with the findings of previous studies by Xulu et al. (2020). The corresponding rainfall values also surpassed the upper thresholds of 441.87 mm (GLDAS-GRACE GWS), 438.78 mm (XGBoost) and 440.45 mm (RF). Based on these storage variations, it is clear that previous hydrological season storage has a bearing on the level of increase in the subsequent season depending on the precipitation reached.
Spatiotemporal characterisation of GWS
GWS fluctuations from 2009 to 2020 were analysed spatiotemporally, and the results revealed important patterns and trends throughout the study region. The quantity of groundwater decreased significantly in the western portion of the catchment, where the largest reductions in GWS were observed, as depicted in Fig. 7. Significant decreases in GWS were observed upstream in Lukulu, for which this trend was also visible. Similar declines in GWS were observed in the most populous regions of the study area, which are Mongu and Senanga. There were some increases in groundwater storage in the middle portion of the basin, which is sparsely populated and suggests localised groundwater recharge (Oiro et al. 2020). Additionally, the south-western part of the catchment also experienced reductions in groundwater, although these reductions were not as severe as those on the western side.
A significant decrease in the quantity of groundwater was observed during the run-up to and following the El Niño events of 2015–2016 and 2018–2019, mainly on the western side of the catchment. These decreases may have been caused by a combination of groundwater depletion caused by climate change and interactions between surface water and groundwater (Ndehedehe et al. 2023). This indicates that these climatic events combined with human activity have a substantial negative effect on groundwater levels, as was noted in the research by Bierkens and Wada (2019). With values ranging from − 400 mm to + 36 mm, the resultant map shown in Fig. 7 represents areas of significant groundwater depletion and marginal gains in storage.
Long-term trend of GWS
Unique trends in groundwater storage were identified, spanning from January 2009 to December 2020 based on the trend analysis data for groundwater storage for the RF, XGBoost and GLDAS-GRACE datasets (Figs. 8 and 9). For all three models, the research showed a period of significant groundwater recharge beginning in mid-2009 and peaking around mid-2011, indicating an early increase in groundwater storage. However, after this peak, a clear downward trend started in 2012 and persisted until the end of 2016. According to the research of Hellwig and Stahl (2018) and Gong et al. (2015), this time period coincides with decreasing rainfall and may have been influenced by El Niño episodes, resulting in drier conditions and decreased groundwater recharge.
Groundwater storage showed noticeable stabilisation and some recovery starting in 2017; however, it did not reach the levels noted in the first half of the study period. Throughout the course of the study, the general trend showed a net decrease in groundwater storage throughout the Barotse catchment, as highlighted in Fig. 9. While the downscaled XGBoost and RF GWS datasets revealed more localised patterns, with RF demonstrating greater consistency and reliability in accurately capturing the trends, the GLDAS-GRACE GWS data exhibited the most variability, reflecting its broader geographical scope. Understanding these trends is essential for comprehending long-term shifts in groundwater storage.
Conclusions
In this study, using a novel machine learning approach, we have shown that we can downscale the GLDAS-GRACE GWSA from 27 km to a finer 5 km. Hydrometeorological data, primarily obtained from remote sensing were used in the downscaling. Compared with those of XGBoost, the RF estimates performed better. Based on the downscaled GWSA, this study identified climatic thresholds of cumulative rainfall less than 330 mm throughout the rainy season that result in considerable decreases in groundwater recharge. Rainfall frequently dropped below the lower threshold during El Niño years (2015, 2016, 2018, and 2019), resulting in drastically reduced groundwater recharge. Spatially, the changes in groundwater storage varied, indicating that they are potentially controlled by rainfall and aquifer recharge properties. A change detection analysis conducted from 2009 to 2020 revealed an overall trend in GWS that captured changes in GWS anomalies ranging from − 400 mm to + 36 mm, indicating a net reduction in groundwater in the Barotse region. In conclusion, this study highlights the usefulness of machine learning models for downscaling GRACE/GRACE-FO GWS data, the significance of choosing suitable input variables and the crucial role that the determined climatic thresholds play in groundwater recharge. Expected increases in drought severity in the future will likely increase aquifer vulnerability to droughts as aquifers’ recovery time may increase. Climate change adaptation strategies must therefore recognise that less groundwater will be available to supplement the surface water supply during drought conditions.
Data availability
Datasets used and analysed during the study are described in the dataset section, where links for accessing these datasets are also provided. For any additional data produced in the study, interested parties may contact the corresponding author for access.
References
Agarwal V, Akyilmaz O, Shum CK, Feng W, Yang T-Y, Forootan E, Syed TH, Haritashya UK, Uz M (2023) Machine learning based downscaling of GRACE-estimated groundwater in Central Valley, California. Sci Total Environ 865:161138. https://doi.org/10.1016/j.scitotenv.2022.161138
Ali S, Liu D, Fu Q, Cheema MJM, Pham QB, Rahaman MM, Dang TD, Anh DT (2021). Improving the Resolution of GRACE Data for Spatio-Temporal Groundwater Storage Assessment Remote Sens 13:3513. https://doi.org/10.3390/rs13173513
Ali S, Ran J, Khorrami B, Wu H, Tariq A, Jehanzaib M, Khan MM, Faisal M (2024) Downscaled GRACE/GRACE-FO observations for spatial and temporal monitoring of groundwater storage variations at the local scale using machine learning. Groundw Sustain Dev 25:101100. https://doi.org/10.1016/j.gsd.2024.101100
Banda AM, Banda K, Sakala E, Chomba M, Nyambe IA (2021) Land Use change and its drivers in the wetlands of Barotse Floodplain of Zambezi River Sub-basin, Zambia. https://doi.org/10.21203/rs.3.rs-501786/v1
Banda KE, Mwandira W, Jakobsen R, Ogola J, Nyambe I, Larsen F (2019) Mechanism of salinity change and hydrogeochemical evolution of groundwater in the Machile-Zambezi Basin, South-western Zambia. J Afr Earth Sci 153:72–82. https://doi.org/10.1016/j.jafrearsci.2019.02.022
Banda K, Mulema M, Chomba I, Chomba M, Levy J, Nyambe I (2023) Investigating groundwater and surface water interactions using remote sensing, hydrochemistry, and stable isotopes in the Barotse Floodplain, Zambia. Geol Ecol Landsc. 1–16
Beilfuss R (2012) A risky climate for southern African hydro. assessing hydrological risks and consequences for Zambezi River Basin dams
Bhanja S, Das A (2019) Impact of data normalization on deep neural network for Time Series forecasting. https://doi.org/10.48550/arXiv.1812.05519
Bierkens MFP, Wada Y (2019) Non-renewable groundwater use and groundwater depletion: a review. Environ Res Lett 14:063002. https://doi.org/10.1088/1748-9326/ab1a5f
Breiman L (2001) Random Forests Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Byron D, Tapley MM, Watkins MM, Flechtner F, Reigber C, Bettadpur S, Rodell M, Sasgen I, Famiglietti JS, Landerer FW, Chambers DP, Reager JT, Gardner AS, Save H, Ivins ER, Swenson SC, Boening C, Dahle C, Wiese DN, Dobslaw H, Tamisiea ME, Velicogna I (2019) Contributions of GRACE to understanding climate change. Nat Clim Change 9:358-369. https://doi.org/10.1038/s41558-019-0456-2
Cai X, Haile AT, Magidi J, Mapedza E, Nhamo L (2017) Living with floods – Household perception and satellite observations in the Barotse floodplain. Zambia Phys Chem Earth Parts ABC 100:278–286. Infrastructural Planning for Water Security in Eastern and Southern Africahttps://doi.org/10.1016/j.pce.2016.10.011
Chen C, He W, Zhou H, Xue Y, Zhu M (2020) A comparative study among machine learning and numerical models for simulating groundwater dynamics in the Heihe River Basin, northwestern China. Sci Rep 10:3904
Chen L, He Q, Liu K, Li J, Jing C (2019) Downscaling of GRACE-derived groundwater storage based on the random forest model. Remote Sens 11:2979
Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. Association for Computing Machinery, New York, NY, USA, pp. 785–794. https://doi.org/10.1145/2939672.2939785
Chomba C, Banda K, Winsemius K, Eunice H, Sichingabula M, Nyambe H, I (2022) Integrated Hydrologic-Hydrodynamic Inundation Modeling in a Groundwater Dependent Tropical Floodplain. J Hum Earth Future 3:237–246. https://doi.org/10.28991/HEF-2022-03-02-09
Chongo M, Wibroe J, Staal-Thomsen K, Moses M, Nyambe IA, Larsen F, Bauer-Gottwein P (2011) The use of Time Domain Electromagnetic method and continuous Vertical Electrical sounding to map groundwater salinity in the Barotse sub-basin, Zambia. Phys Chem Earth Parts ABC 11th WaterNet/WARFSA/GWP–SA Symposium: IWRM for National and Regional Integration through Science, Policy and Practice 36:798–805. https://doi.org/10.1016/j.pce.2011.07.044
Deng R, Liu H, Zheng X, Zhang Q, Liu W, Chen L (2022) Towards establishing empirical rainfall thresholds for shallow landslides in Guangzhou, Guangdong Province, China. Water 14:3914. https://doi.org/10.3390/w14233914
Didan K (2021) MODIS/Terra Vegetation Indices 16-Day L3 Global 1km SIN Grid V061 [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center. https://doi.org/10.5067/MODIS/MOD13A2.061
Engelbrecht F, Adegoke J, Bopape M-J, Naidoo M, Garland R, Thatcher M, McGregor J, Katzfey J, Werner M, Ichoku C, Gatebe C (2015) Projections of rapidly rising surface temperatures over Africa under low mitigation. Environ Res Lett 10:085004. https://doi.org/10.1088/1748-9326/10/8/085004
Fajar MHM, Warnana DD, Widodo A, Prabawa SE, Iswahyudi A (2021) Aquifer System Analysis to identify the cause of Groundwater Depletion at Umbulan Spring. Indonesia Chem Eng Trans 89:385–390. https://doi.org/10.3303/CET2189065
Fan Y, Li H, Miguez-Macho G (2013) Global patterns of groundwater table depth. Science 339:940–943. https://doi.org/10.1126/science.1229881
FAO (2020) WaPOR V2 Database Methodology. Remote Sensing for Water Productivity Technical Report: Methodology Series. Rome: FAO
Fatolazadeh F, Eshagh M, Goïta K (2022) New spectro-spatial downscaling approach for terrestrial and groundwater storage variations estimated by GRACE models. J Hydrol 615:128635
Ferreira V, Yong B, Montecino H, Ndehedehe CE, Seitz K, Kutterer H, Yang K (2023) Estimating GRACE terrestrial water storage anomaly using an improved point mass solution. Sci Data 10:234. https://doi.org/10.1038/s41597-023-02122-1
Foroumandi E, Nourani V, Jeanne Huang J, Moradkhani H (2023) Drought monitoring by downscaling GRACE-derived terrestrial water storage anomalies: a deep learning approach. J Hydrol 616:128838. https://doi.org/10.1016/j.jhydrol.2022.128838
Funk C, Peterson P, Landsfeld M, Pedreros D, Verdin J, Shukla S, Husak G, Rowland J, Harrison L, Hoell A, Michaelsen J (2015) The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Sci Data 2:150066. https://doi.org/10.1038/sdata.2015.66
Gemitzi A, Koutsias N, Lakshmi V (2021) A spatial downscaling methodology for GRACE Total Water Storage anomalies using GPM IMERG Precipitation estimates. Remote Sens 13:5149. https://doi.org/10.3390/rs13245149
Gleeson T, Smith L, Moosdorf N, Hartmann J, Dürr HH, Manning AH, van Beek LPH, Jellinek AM (2011) Mapping permeability over the surface of the Earth. Geophys Res Lett 38. https://doi.org/10.1029/2010GL045565
Gong Y, Liu G, Schwartz FW (2015) Quantifying the response time of a Lake–Groundwater Interacting System to Climatic Perturbation. Water 7:6598–6615. https://doi.org/10.3390/w7116598
Gstaiger V, Huth J, Gebhardt S, Wehrmann T, Kuenzer C (2012) Multi-sensoral and automated derivation of inundated areas using TerraSAR-X and ENVISAT ASAR data. Int J Remote Sens 33:7291-7304. https://doi.org/10.1080/01431161.2012.700421
Hellwig J, Stahl K (2018) An assessment of trends and potential future changes in groundwater-baseflow drought based on catchment response times. Hydrol Earth Syst Sci 22:6209–6224. https://doi.org/10.5194/hess-22-6209-2018
Huang J, Ju NP, Liao YJ, Liu DD (2015) Determination of rainfall thresholds for shallow landslides by a probabilistic and empirical method. Nat Hazards Earth Syst Sci 15:2715–2723. https://doi.org/10.5194/nhess-15-2715-2015
Humphrey V, Rodell M, Eicker A (2023) Using Satellite-based Terrestrial Water Storage Data: a review. Surv Geophys. https://doi.org/10.1007/s10712-022-09754-9
Izonin I, Tkachenko R, Shakhovska N, Ilchyshyn B, Singh KK (2022) A two-Step Data Normalization Approach for improving classification accuracy in the medical diagnosis domain. Mathematics 10:1942. https://doi.org/10.3390/math10111942
Jyolsna PJ, Kambhammettu BVNP, Gorugantula S (2021) Application of random forest and multi-linear regression methods in downscaling GRACE derived groundwater storage changes. Hydrol Sci J 66:874–887. https://doi.org/10.1080/02626667.2021.1896719
Kalu I, Ndehedehe CE, Ferreira VG, Janardhanan S, Currell M, Kennard MJ (2024) Statistical downscaling of GRACE terrestrial water storage changes based on the Australian Water Outlook model. Sci Rep 14:10113. https://doi.org/10.1038/s41598-024-60366-2
Khorrami B, Gorjifard S, Ali S, Feizizadeh B (2023) Local-scale monitoring of evapotranspiration based on downscaled GRACE observations and remotely sensed data: an application of terrestrial water balance approach. Earth Sci Inf 16:1329–1345. https://doi.org/10.1007/s12145-023-00964-2
King RD, Orhobor OI, Taylor CC (2021) Cross-validation is safe to use. Nat Mach Intell 3:276–276. https://doi.org/10.1038/s42256-021-00332-z
Kolusu SR, Shamsudduha M, Todd MC, Taylor RG, Seddon D, Kashaigili JJ, Ebrahim GY, Cuthbert MO, Sorensen JPR, Villholth KG, MacDonald AM, MacLeod DA (2019) The El Niño event of 2015–2016: climate anomalies and their impact on groundwater resources in East and Southern Africa. Hydrol Earth Syst Sci 23:1751–1762. https://doi.org/10.5194/hess-23-1751-2019
Landerer FW, Swenson SC (2012) Accuracy of scaled GRACE terrestrial water storage estimates Key Points Water Resources Research 48(4) https://doi.org/10.1029/2011WR011453
Leasor ZT, Quiring SM, Svoboda MD (2020) Utilizing Objective Drought Severity thresholds to Improve Drought Monitoring. https://doi.org/10.1175/JAMC-D-19-0217.1
Li B, Rodell M (2015) Evaluation of a model-based groundwater drought indicator in the conterminous U.S. J Hydrol 526:78-88. https://doi.org/10.1016/j.jhydrol.2014.09.027
Li J, Heap AD, Potter A, Daniell JJ (2011) Application of machine learning methods to spatial interpolation of environmental variables. Environ Model Softw 26:1647–1659. https://doi.org/10.1016/j.envsoft.2011.07.004
Liu YY, van Dijk AIJM, de Jeu RAM, Canadell JG, McCabe MF, Evans JP, Wang G (2015) Recent reversal in loss of global terrestrial biomass. Nat Clim Change 5:470–474. https://doi.org/10.1038/nclimate2581
Makungu E, Hughes DA (2021) Understanding and modelling the effects of wetland on the hydrology and water resources of large African river basins. J Hydrol 603:127039. https://doi.org/10.1016/j.jhydrol.2021.127039
Mapedza E, Rashirayi T, Xueliang C, Haile AT, van Koppen B, Ndiyoi M, Sellamuttu SS (2022) Chapter 11 - indigenous Knowledge systems for the management of the Barotse Flood Plain in Zambia and their implications for policy and practice in the developing world. In: Sioui M (ed) Current directions in Water Scarcity Research, Indigenous Water and Drought Management in a changing World. Elsevier, pp 209–225. https://doi.org/10.1016/B978-0-12-824538-5.00011-X
Mathivha FI, Mabala L, Matimolane S, Mbatha N (2024) El Niño-Induced Drought impacts on Reservoir Water resources in South Africa. Atmosphere 15:249. https://doi.org/10.3390/atmos15030249
McNally A, Jacob J, Arsenault K, Slinski K, Sarmiento DP, Hoell A, Pervez S, Rowland J, Budde M, Kumar S, Peters-Lidard C, Verdin JP (2022) A Central Asia hydrologic monitoring dataset for food and water security applications in Afghanistan. Earth Syst Sci Data 14:3115-3135. https://doi.org/10.5194/essd-14-3115-2022
Milewski AM, Thomas MB, Seyoum WM, Rasmussen TC (2019) Spatial downscaling of GRACE TWSA data to identify spatiotemporal groundwater level trends in the Upper Floridan Aquifer, Georgia, USA. Remote Sens 11:2756
MILUPI ID, Wallace CS, Janes C (2022) Impact and adaptation to flooding: a focus on water supply, sanitation, and health in rural communities on the Barotse floodplain in Zambia. https://doi.org/10.21203/rs.3.rs-1283256/v1
Miro M, Famiglietti J (2018) Downscaling GRACE Remote sensing datasets to High-Resolution Groundwater Storage Change maps of California’s Central Valley. Remote Sens 10:143. https://doi.org/10.3390/rs10010143
Money NJ, AN OUTLINE OF THE GEOLOGY OF, WESTERN ZAMBIA. [WWW Document] (1972). URL https://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=PASCALGEODEBRGM7620022330 (accessed 6.7.24)
Mupangwa W, Chipindu L, Ncube B, Mkuhlani S, Nhantumbo N, Masvaya E, Ngwira A, Moeletsi M, Nyagumbo I, Liben F (2023) Temporal changes in Minimum and Maximum temperatures at selected locations of Southern Africa. Climate 11:84. https://doi.org/10.3390/cli11040084
Ndehedehe CE, Adeyeri OE, Onojeghuo AO, Ferreira VG, Kalu I, Okwuashi O (2023) Understanding global groundwater-climate interactions. Sci Total Environ 904:166571. https://doi.org/10.1016/j.scitotenv.2023.166571
Nenweli R, Watson A, Brookfield A, Münch Z, Chow R (2024) Is groundwater running out in the Western Cape, South Africa? Evaluating GRACE data to assess groundwater storage during droughts. J Hydrol Reg Stud 52:101699
Ning S, Ishidaira H, Wang J (2014) Statistical Downscaling of Grace-Derived Terrestrial Water Storage Using Satellite and Gldas Products. 土木学会論文集b1(水工学) 70, I_133-I_138. https://doi.org/10.2208/jscejhe.70.I_133
Oiro S, Comte J-C, Soulsby C, MacDonald A, Mwakamba C (2020) Depletion of groundwater resources under rapid urbanisation in Africa: recent and future trends in the Nairobi Aquifer System, Kenya. Hydrogeol J 28:2635–2656
Ouyang Z, Ravier P, Jabloun M (2021) STL decomposition of Time Series can benefit forecasting done by statistical methods but not by machine learning ones. Eng Proc 5:42. https://doi.org/10.3390/engproc2021005042
Pasqualino M, Kennedy G, Nowak V (2015) Seasonal food availability: Barotse floodplain system (Working Paper). WorldFish
Peixeiro M (2022) Time Series forecasting in Python. Simon and Schuster
Puth M-T, Neuhäuser M, Ruxton GD (2015) Effective use of Spearman’s and Kendall’s correlation coefficients for association between two measured traits. Anim Behav 102:77–84. https://doi.org/10.1016/j.anbehav.2015.01.010
Rahaman MM, Thakur B, Kalra A, Li R, Maheshwari P (2019) Estimating High-Resolution Groundwater Storage from GRACE: a Random Forest Approach. Environments 6:63. https://doi.org/10.3390/environments6060063
Rodell M, Houser PR, Jambor U, Gottschalck J, Mitchell K, Meng C-J (2004) The Global Land Data Assimilation System in: Bulletin of the American Meteorological Society Volume 85 Issue 3 (2004) [WWW Document]. URL https://journals.ametsoc.org/view/journals/bams/85/3/bams-85-3-381.xml (accessed 6.7.24)
Rodell M, Velicogna I, Famiglietti JS (2009) Satellite-based estimates of groundwater depletion in India. Nature 460:999–1002. https://doi.org/10.1038/nature08238
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002
Rodriguez JD, Perez A, Lozano JA (2010) Sensitivity analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans Pattern Anal Mach Intell 32:569–575. https://doi.org/10.1109/TPAMI.2009.187
Sahour H (2020) Statistical downscaling techniques to enhance the spatial resolution of the Grace Satellite Data and to fill temporal gaps. Western Michigan University
Satizabal-Alarc DA, Suhogusoff A, Ferrari LCKMF (2023) Characterization of groundwater storage changes in the Amazon River Basin based on downscaling of GRACE/GRACE-FO data with machine learning models [WWW Document]. URL https://www.researchgate.net/publication/375964151_Characterization_of_groundwater_storage_changes_in_the_Amazon_River_Basin_based_on_downscaling_of_GRACEGRACE-FO_data_with_machine_learning_models (accessed 6.5.24)
Save H, Bettadpur S, Tapley BD (2016) High-resolution CSR GRACE RL05 mascons. J Geophys Res Solid Earth 121:7547–7569. https://doi.org/10.1002/2016JB013007
Scanlon BR, Longuevergne L, Long D (2012) Ground referencing GRACE satellite estimates of groundwater storage changes in the California Central Valley, USA. Water Resour Res 48. https://doi.org/10.1029/2011WR011312
Serdeczny O, Adams S, Baarsch F, Coumou D, Robinson A, Hare W, Schaeffer M, Perrette M, Reinhardt J (2017) Climate change impacts in Sub-saharan Africa: from physical changes to their social repercussions. Reg Environ Change 17:1585–1600. https://doi.org/10.1007/s10113-015-0910-2
Seyoum WM, Kwon D, Milewski AM (2019) Downscaling GRACE TWSA Data into High-Resolution Groundwater Level Anomaly using machine learning-based models in a glacial Aquifer System. Remote Sens 11:824. https://doi.org/10.3390/rs11070824
Shilengwe C, Nyimbili PH, Msendo R, Banda F, Mukupa W, Erden T (2023) Synthetic aperture radar and optical sensor techniques using Google earth engine for flood monitoring and damage assessment – a case study of Mumbwa district. Zambia Zamb ICT J 7:7–15. https://doi.org/10.33260/zictjournal.v7i1.122
Tao H, Al-Sulttani AH, Salih SQ, Mohammed MKA, Khan MA, Beyaztas BH, Ali M, Elsayed S, Shahid S, Yaseen ZM (2023) Development of high-resolution gridded data for water availability identification through GRACE data downscaling: development of machine learning models. Atmospheric Res 291:106815. https://doi.org/10.1016/j.atmosres.2023.106815
Teng T-P, Chen W-J (2024) Using Pearson correlation coefficient as a performance indicator in the compensation algorithm of asynchronous temperature-humidity sensor pair. Case Stud Therm Eng 53:103924. https://doi.org/10.1016/j.csite.2023.103924
van der Schalie R, de Jeu RAM, Kerr YH, Wigneron JP, Rodríguez-Fernández NJ, Al-Yaari A, Parinussa RM, Mecklenburg S, Drusch M (2017) The merging of radiative transfer based surface soil moisture data from SMOS and AMSR-E. Remote Sens Environ 189:180-193. https://doi.org/10.1016/j.rse.2016.11.026
Verdonck T, Baesens B, Óskarsdóttir M, Broucke vanden, S (2024) Special issue on feature engineering editorial. Mach Learn 113:3917–3928. https://doi.org/10.1007/s10994-021-06042-2
Wang J-Z, Jiang X-W, Wan L, Wörman A, Wang H, Wang X-S, Li H (2015) An analytical study on artesian flow conditions in unconfined-aquifer drainage basins. Water Resour Res 51:8658–8667. https://doi.org/10.1002/2015WR017104
Wan Z, Hook S, Hulley G (2021) MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V061 [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center. https://doi.org/10.5067/MODIS/MOD11A1.061
Xulu NG, Chikoore H, Bopape M-JM, Nethengwe NS (2020) Climatology of the Mascarene High and its influence on Weather and Climate over Southern Africa. Climate 8:86. https://doi.org/10.3390/cli8070086
Yazdian H, Salmani-Dehaghi N, Alijanian M (2023) A spatially promoted SVM model for GRACE downscaling: using ground and satellite-based datasets. J Hydrol 626:130214. https://doi.org/10.1016/j.jhydrol.2023.130214
Yin W, Hu L, Zhang M, Wang J, Han S-C (2018) Statistical downscaling of GRACE-derived groundwater storage using ET data in the North China plain. J Geophys Res Atmos 123:5973–5987
Yin W, Zhang G, Han S-C, Yeo I-Y, Zhang M (2022) Improving the resolution of GRACE-based water storage estimates based on machine learning downscaling schemes. J Hydrol 613:128447. https://doi.org/10.1016/j.jhydrol.2022.128447
Zaitchik BF, Rodell M, Reichle RH (2008) Assimilation of GRACE Terrestrial Water Storage Data into a Land Surface Model: results for the Mississippi River Basin. https://doi.org/10.1175/2007JHM951.1
Zhang Y-P, Jiang X-W, Cherry J, Zhang Z-Y, Wang X-S, Wan L (2022) Revisiting hydraulics of flowing artesian wells: a perspective from basinal groundwater hydraulics. J Hydrol 609:127714. https://doi.org/10.1016/j.jhydrol.2022.127714
Zhong D, Wang S, Li J (2021) Spatiotemporal downscaling of GRACE Total Water Storage using Land Surface Model outputs. Remote Sens 13:900. https://doi.org/10.3390/rs13050900
Zimba HM, Coenders-Gerrits M, Banda KE, Hulsman P, van de Giesen N, Nyambe IA, Savenije HHG (2024) On the importance of plant phenology in the evaporative process of a semi-arid woodland: could it be why satellite-based evaporation estimates in the miombo differ? Hydrol Earth Syst Sci 28:3633–3663. https://doi.org/10.5194/hess-28-3633-2024
Zuo J, Xu J, Chen Y, Li W (2021) Downscaling simulation of groundwater storage in the Tarim River basin in northwest China based on GRACE data. Phys Chem Earth Parts ABC 123:103042. https://doi.org/10.1016/j.pce.2021.103042
Funding
This work was supported by the Germany Federal Ministry of Education and Research-supported SASSCAL 2.0 project, Tipping Points Explained by Climate Change (TIPPECC).
Author information
Authors and Affiliations
Contributions
C. S. conceived the ideas and designed methodology, collected the data, analysed the data, visualised the data and led the writing of the manuscript. K. B. co-developed the methodology, interpreted the study, designed the study, supervised and sourced the funding. I. N. supervised the study and sourced the funding. All authors contributed critically to the drafts and gave final approval for publication.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Shilengwe, C., Banda, K. & Nyambe, I. Machine learning downscaling of GRACE/GRACE-FO data to capture spatial-temporal drought effects on groundwater storage at a local scale under data-scarcity. Environ Syst Res 13, 38 (2024). https://doi.org/10.1186/s40068-024-00368-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40068-024-00368-1