Evaluation of CORDEX Africa regional climate models performance in simulating climatology of Zarima sub-basin northwestern Ethiopia

Climate models are basic tools to obtain reliable estimates of future climate change and its effects on the water resources and agriculture in given basin. However, all climate models are not equally valuable for all areas. Therefore, determining the most appropriate climate model for a specific study area is essential. This study examines the performance of 10 CORDEX-AFRICA -22 0 Regional Climate Models ( RCM s ), three downscaling institutional based ensembles mean (Reg ensemble, CCLM ensemble and REMOO ensemble) and the multi-model ensemble mean. The models were evaluated based on their ability in replicating the seasonal and annual rainfall, minimum and maximum temperature and inter-annual variability for the period of 1986–2005 using statistical metrics such as BIAs , Root Mean Square Error ( RMsE ), Pearson correlation coefficient (r), coefficient of variation (CV), Kling Gupta Efficiency (KGE) and Taylor diagram. The findings indicated that HadREMOO, MPI-Reg4-7


Introduction
Climate models are fundamental tools used in projecting future climate conditions and developing strategies for adaptation and mitigation (Giorgi et al. 2009;Endris et al. 2017;Luhunga et al. 2018).Global Climate Models (GCMs) are widely used for climate projections; however, their low spatial resolution can limit their effectiveness in capturing small-scale climate variations influenced by topography and land surface heterogeneity (Dosio and Panitz 2016;Yimer et al. 2022).
Regional Climate Models (RCMs) are developed with higher spatial resolution compared to GCMs, making them better suited for simulating climate in coastal, mountainous regions, and at regional scales.RCMs are capable of providing more detailed and accurate information about local climate conditions (Giorgi et al 2009;Endris et al. 2013;Dosio and Panitz 2016).
Studies have shown that RCMs outperform GCMs in simulating climate in coastal and mountainous areas, as well as at regional levels.This is due to their ability to capture localized features such as land-sea interactions, complex terrain, and mesoscale atmospheric processes (Endris et al. 2013;ARSET 2022).The higher spatial resolution of RCMs makes them more suitable for endusers, such as water resource managers and agricultural planners, who require predictions of future regional and local climate changes.These predictions are crucial for conducting climate change impact assessments on water resources and agriculture at regional and local scales (Endris et al. 2013;Kassie et al. 2014;Matiu et al. 2020).By providing more detailed and localized information, RCMs can improve the accuracy of climate change impact assessments and aid in the development of targeted adaptation and mitigation strategies (Ilori and Balogun 2022;Demessie et al. 2023).
Currently, the Coordinated Regional Climate Downscaling Experiment (CORDEX) program sponsored by the World Climate Research Programs aims to fill this gap by coordinating international efforts towards regional climate downscaling (Giorgi et al. 2009;Gutowski and Solman 2019).The CORDEX provides dynamically downscaled regional climate information for the climate modeling community and climate information to end users (Hernández-Díaz et al. 2013;Nikulin et al. 2012).
The CORDEX Africa is a CORDEX domain, experimentally developed specifically for climate impact studies in Africa, which is especially vulnerable to climate change due to limited adaptive capacity (Giorgi et al. 2009).Phase two CORDEX-AFRICA 0.22 ° has 10 Regional Climate Models (RCMs) provides valuable climate model data in countries like Ethiopia, where data scarcity exists.The CORDEX-AFRICA RCMs are particularly useful for climate projections and impact assessments due to their ability to generate downscaled climate information at regional scales.To gain confidence in evaluating how climate change affects water resources and agriculture in regions with complex topography like Northwestern part of Ethiopia best RCMs should be selected (Yimer et al. 2022).However, before selecting RCMs for climate change projections and impact assessments it is essential to evaluate the RCMs and the ensemble mean performance in replicating the historical climate of area is important (Endris et al. 2013;Worku et al. 2018; Kamworapan and Surussavadee 2019; Demissie and Sime 2021).
The Zarima subbasin in Northwestern part Ethiopia is characterized by its complex topography and is home to the Zarima river, which serves as a crucial water source for water supply and traditional irrigation systems in the region.This river, along with its tributaries, plays a vital role in supporting agricultural activities and meeting the water needs of the local communities.However, the subbasin faces challenges related to water management and competition for irrigation water.Without proper water management practices in place, there may be conflicts and unsustainable use of water resources.The lack of coordinated water management can lead to inefficient allocation and utilization of water, potentially impacting agricultural productivity and livelihoods (Lemma et al. 2010).
Additionally, there is construction of an irrigation project covering approximately 9500 hectares of land in the Zarima subbasin is a significant development.The dam can store a total of 3.6 billion cubic meters of water, the project aims to support irrigation for a sugarcane plantation spanning 50,000 hectares (Weldesadik 2021).This initiative has the potential to enhance local food security, contribute to the country's economy by creating employment opportunities, and facilitate the export of industrial products, particularly from the sugarcane industry.To sustain this benefit addressing the water management challenges such as land use land cover and climate change in the Zarima subbasin is crucial for sustaining the availability of water resources, supporting agriculture, and promoting the well-being of the local communities that rely on the Zarima river and its tributaries.Thus, evaluating the performance of CORDEX-Africa RCMs, as well as the ensemble mean of these models, in replicating historical rainfall and temperature patterns in the Zarima subbasin is a crucial step.By assessing how well the RCMs and the ensemble mean replicate historical climate data, researchers can gain insights into the models' accuracy and reliability for future climate projections.This evaluation provides valuable information for selecting appropriate RCMs to inform water resource management and climate change adaptation strategies in the Zarima subbasin.The main objective of the study is to evaluate the performance of available CORDEX-Africa RCMs and the ensemble means in replicating historical rainfall and temperature patterns in the Zarima subbasin.This evaluation is an important step in ensuring the reliability and applicability of the selected RCMs for climate change projection and impact assessment studies in the region.

Description of the study area
The study was carried out at Zarima subbasin, which is part of the Tekeze river basin and where Zarima river flows.According to information obtained from the Ministry of Water and Energy (MoWE), the subbasin is located in the northwest highlands of Ethiopia, 804.9 km from Addis Ababa, and has a surface area of 663,157.58ha, Geographically, it is located between latitudes 37 0 30'-38 0 27'E and 13 0 15'-14 0 13'N latitude, with an elevation between 736 and 4399 m a.s.l (Fig. 1).The sub basin's topography is characterized by steep, undulating hills and a narrow gorge that includes a portion of Semen Mountain (Weldesadik 2021).
The major land use types of the subbasin are agricultural land, forestland, shrubland, grassland, bare land and water bodies.Most of the lower portions are lowland areas, owing to the increased demand for agricultural, while forests and shrubland are found in the upper portions of the basin.57.81% of the subbasin is covered by agricultural land, 23.81% of shrubland and 16.26% covered by forest land (Fig. 2b).
Agriculture is one of the most basic and important economic activities that sustains the subbasin's people's livelihood.Irrigated agriculture, commercial farming, agropastoral, gum/incense collection methods, and cereal cropping are examples.Mixed agricultural techniques, which involve crop and livestock production, are widespread in the subbasin.Smallholder farmers harvest grain for sustenance with a traditional ox-drawn rainfed plough (Mowie 2009).
Based on the rainfall amount Ethiopia has three local seasons namely Bega, Belg and Kiremit.Bega is the dry season that covers period from October to January.The Bega season is characterized by hot, dry days and cool nights.Frosty in early mornings in the majority of the highland areas (NMA 2015).Belg is the small rainy season in Ethiopia except southern and southeastern lowlands areas.It covers the period from February to May.High variability of rainfall in time and space and high maximum temperature are common characters of Belg season (Ven Chow et al. 1988).It's the warmest season as March, April and May months are the warmest months of the year (NMA 2015).Kiremt is the main rainy season which contributed 85-95% of the annual rainfall and food crop production of the country (Ven Chow 1988).It spans from June to September with frequent rains and homogeneous temperatures in July and August (NMA 2015).
The subbasin received annual rainfall ranging from 1051 mm at the low land areas to 1863.72 mm at the high land area from 1984 to 2018.The average annual temperature was 13.7 °C in the highlands and 20.4 °C around the lowlands (Zegeye et al. 2022).The Sub-basin

Observed climate data
The gridded daily observational rainfall and temperature data sets for the Zarima subbasin was obtained for the period of 1984-2018 were obtained from the Ethiopian Meteorological Institute (EMI).To compare the observed and model simulated climate data for the Maytsebri, Adiramets, Debarik, Ketema Niguse, and Zarima stations of the Zarima subbasin for the time period of 1986-2005 the station data were extracted using the R statistical package Climate Data Tools (CDT).

Data analysis
There is no single criterion used to identify the best RCMs.In this study, combined performance metrics Bias (BIAS)(Eq.2), RMSE (Eq.2) and Pearson's correlation (r) (Eq.3), coefficient of variation (CV) (Eq.4), Kling-Gupta Efficiency (KGE) (Eq.5) and Taylor diagram were used to evaluate the performance of the CORDEX-AFRICA-RCMs, the institutional based ensembles mean (Reg ensemble, CCLM ensemble and REMOO ensemble) and the multi-model ensemble mean (grand ensemble) in replicating kiremit season and annual climate of the subbasin.BIAS, RMSE and r are commonly used in multiple studies to evaluate the performance of CORDEX-AFRICA-RCMs (Dibaba et al. 2019;Ayugi et al. 2020;Mendez et al. 2020).
The BIAS measures the systematic error between the observed and simulated climate variables and zero indicates good performance, while values away from zero show the deviations from observed data.RMSE measures how accurately climate models simulate climate variables.There is no common acceptable value for BIAS and RMSE.Smaller values of RMSE close to zero had good model performance and vice versa (Gleckler et al. 2008;Chai and Draxler 2014).On the other hand, the correlation coefficient (r) values showed the linear relationship between observed and simulated by RCMs.Correlation coefficient values can range from −1 for a perfect negative correlation to 1 for a perfect positive correlation between the modeled (RCMs) and the observed climate variables (Schober et al. 2018).Furtherer more, the correlation strength was evaluated using Evans (1996) suggests that the value of r (0.00-0.19 = very weak), (0.20-0.39 = weak), (0.40-0.59 = moderate), (0.60-0.79 = strong), and (0.80-1.00 = very strong).In this study, CV is used to classify the degree of variability as less (CV < 20%), moderate (20 < CV < 30%), high (CV > 30%), very high (CV > 40%) and CV > 70% indicate extremely high inter annual variability of the rainfall and temperature data which was applied by Eshetu, (2020).
Kling-Gupta efficiency (KGE) is a performance metric which was developed as an upgrade to the commonly used Nash-Sutcliffe efficiency, taking into account multiple types of model errors, namely mean, variability, and correlation.It was introduced by Gupta et al. (2009) and modified by Kling et al. (2012) and is defined as Eq. 5.
There are three major components involved in the calculation of this KGE index: 1.The Pearson product-moment correlation coefficient, denoted as r.The ideal value is r = 1. 2. Beta (β): the ratio of the mean simulated ( µ s ) values to the mean observed values ( µ o ) Beta = 1 is the ideal value.3. Alpha (α): variability ratio, which is coefficient of variation of simulated ( CV S ) and observed ( CV O )value where β = µ s µ o and α = CV S CV O .
KGE value ranges between negative infinity and one.Model performance to be poor for 0.5 > KGE > 0 (Vrugt and de Oliveira 2022).
In addition to statistical metrics, monthly and annual plots of observed versus simulated rainfall, maximum and minimum temperature were used.
Cumulative distribution function (CDF) was used to evaluate the areal rainfall distribution.Number (1) (5)   -e).This finding implied that different simulations or ensembles performed better at different stations, highlighting the spatial variability in their performance.

Results and discussion
Similarly, Some RCMs models overestimated the rainfall in the dry months and underestimated it in the wet months in south West Ethiopia (Demissie and Sime 2021).
The statistical evaluations of the mean annual rainfall are shown in Table 3.The result revealed that the majority of the RCMs, all institutional based ensembles mean and the multi-model ensemble mean simulations overestimate the observed mean annual rainfall, which ranged between 0.02 and 2.81 mm.The lowest and highest overestimation were observed in HadReg4-7 and MPI-CCLM simulations at Ketema niguse and Maytsebri stations respectively (Table3).While an underestimation of mean annual rainfall was observed in NorESM-Reg4-7 and HadReg4-7 simulations at Ketema niguse and Zarima stations respectively (Table 3).The underestimation value was ranged range of 0.02-17.93mm.Similarly, Canadian Centre for Climate Modelling and Analysis (Canada)   Hernández-Díaz et al. (2013), reported that a dynamically downscaled version of CORDEX-AFRICA showed over estimation of the annual rainfall in the Ethiopian highlands and elevated area of Sudan; Ayugi et al. (2020) evaluated RCMs in East Africa and found that most of the models and ensembles overestimated the basin-average annual rainfall amount.In another study by Otieno and Anyah (2013) focused on the Great Horn of Africa, similar overestimations of basin-average annual rainfall were observed by the RCMs.Yimer et al. (2022) conducted an assessment of RCMs in the East Africa region and the Ethiopian highlands area.Their findings also indicated that the RCMs tended to overestimate rainfall in these regions.Additionally, Demessie et al. (2023) specifically examined the Guder subbasin of the upper Blue Nile basin.Their study revealed that the mean annual rainfall in this subbasin was overestimated by the CORDEX-AFRICA-RCMs.
The RCMs, institutional based ensembles mean and multi-model ensemble mean (ensemble in tables) of kiremit season rainfall of Zarima subbasin showed both over estimation and under estimation.The overestimation magnitude deviated from 0.02 to 2.99 mm.The lowest and highest over estimation was seen in HadReg4-7 and MPI-CCLM simulations at Ketema niguse and Maytsebri stations respectively.whereas, the underestimation was ranged from 0.15 to 2.53 mm, which was observed at HadReg4-7 simulation of at Debarik station and NorESM-Reg4-7 at Ketema niguse stations respectively (Table 3).In terms of bias HadReg4-7 was better at Adiramets, Ketema niguse and Maytseri stations; MPI-CCLM and CCLM ensemble were better at Debarik station and the multi-model ensemble mean at Zarima station (Table3).
The RMSE value indicated the error occurred between the observed and the model simulated mean annual rainfall reached up to 426.25 mm (Table 3).The lowest value was almost zero, which was observed at Zarima station MPI-CCLM, while the highest RMSE was 426.25 mm, which was seen at Ketema Niguse station NorESM-REMOO simulation.The multi-model ensemble mean showed lower RMSE value than the individual RCMs in Debarik and Ketema Niguse, whereas Reg ensemble, NorESMReg4-7 and MPI-CLLM showed lower RMSE values at Adiramets, Maytsebri and Zarima stations respectively.The kiremit season rainfall RMSE value ranged between 77.43 mm and 432.90 mm (Table 3).The lowest and highest values were observed at Debarik station multi-model ensemble mean simulation and Adiramets station NorESM-REMOO simulation respectively.In terms of RMSE, the multi-model ensemble mean was better in all of the stations of the subbasin except the Adiramets station where Reg ensemble mean was better (Table 3).This finding is in line with a study conducted in Eastern Africa where, the ensemble mean showed lower RMSE than other RCMs (Endris et al. 2013).The ensemble mean showed less RMSE values than the individual RCMs in most stations in southwest Ethiopia (Demissie and Sime 2021).
The correlation (r) of observed and model simulated mean annual and kiremit season rainfall was strong (0.60-0.79) and very strong (0.80-0.99) in the majority of the simulations except Ketema Niguse station mean annual and kiremit season rainfall simulations of MPI-REMOO, NorESM-Reg4-7; Debarik station kiremit season rainfall of NorESM-CCLM and NorESM-REMOO, MPI-Reg4-7 and MPI-REMOO, which showed moderate correlation (Table 3).This result coincides with a study conducted in Western Africa where the correlation coefficient value indicated that RCMs simulations of the annual rainfall well matched with the observed mean annual rainfall (Ilori and Balogun 2022).The multi-model ensemble mean had superior correlation coefficient values than the individual RCMs and institutional based ensembles mean throughout the subbasin at seasonal and the annual time scale except the Maytsebri station annual rainfall where MPI-CCLM was better (Table 3).Similarly, a study conducted in Eastern Africa RCMs (Endris et al. 2013) and Jemma subbasin (Worku et al. 2018), found out that the multi-model ensemble mean had relatively higher correlation than other RCMs.
The observed and mean annual and kiremit season rainfall showed less inter annual variability (CV < 20%) over the subbasin.The RCMs, the institutional based ensembles mean and the multi-model ensemble mean annual rainfall showed low (CV = less than 20%) to extremely high (CV > 70%) variability (Table 3).The mean annual rainfall variability of the REMOO ensemble mean was closer to observed rainfall at Adiramets, Debarik and Ketema niguse stations, whereas at Maystebri station the CCLM ensemble and Zarima station Reg ensemble simulations showed closer variability than others.The kiremit season rainfall variability of REMOO ensemble, NorReg4-7, NorESM-CCLM and HadReg4-7 simulations showed closer seasonal rainfall variability at Adiramets and Ketema Niguse stations, Debarik, Maytsebri and Zarima stations respectively (Table 3).Generally, the ensemble means showed better results than the individual RCMs at annual time scale.Similarly, the ensemble was better in simulating the annual rainfall variability (Endris et al. 2013;Alemseged and Tom 2015;Worku et al. 2018;Dibaba et al. 2019;Demissie and Sime 2021).
Kling-Gupta efficiency (KGE) of the mean annual (Table 3) and mean kiremit season (Table 4) rainfall results revealed that the majority of the simulations had good performance (KGE > 0.50).The multi-model  ensemble had relatively better performance than the others in simulating the annual and kiremit season rainfall over the subbasin, except some simulations such as MPI-CCLM and CCLM ensembles mean annual rainfall of Debarik and Ketema Niguse stations respectively (Table 3).Maytsebri station CCLM ensemble mean simulation; all the institutional based ensembles simulations of Ketema Niguse and Adiramets stations had better performance in replicating the kiremit season mean rainfall (Table 4).
Figure 6a, b illustrates the taylor diagram of the long term (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) areal annual and kiremit season rainfall between observational rainfall values with RCMs, the institutional based ensemble mean and the multi-model ensemble mean.Relatively, the best models are those whose simulated patterns correspond with the observed ones and are located near the specified point on the x-axis.Accordingly, the simulation of the areal annual and kiremit season rainfall of the multi-model ensemble mean demonstrates best matches with the observed values throughout the subbasin.The overlapped models (multi-model ensemble mean) agree best with the observed rainfall, had the smallest and equivalent values of RMSE, a closer value of the standard deviation and strong correlation.The poorly performed models MPI-RECMOO and MPI-Reg4-7 showed relatively lower correlation; high RMSE and standard deviation (SD) as compared to the best fitted multi-model ensemble mean in models simulating the mean annual rainfall and kiremit season rainfall respectively (Fig. 6a, b).
Statistical metrics (bias, RMSE, r, CV and KGE) evaluation of the mean annual and kiremit season rainfall showed that the multi-model ensemble was better in two or more statistical metrics at each station except Maytsebri station kiremit season rainfall, where the CCLM ensemble was better.Similarly, Taylor diagram showed that the multi-model ensemble was better in the replication of the areal annual and kiremit season rainfall of the subbasin.This finding, in line with other studies (Nikulin et al. 2012;Endris et al. 2013;Dibaba et al. 2019;Demissie and Sime 2021;Mengistu et al. 2021) reported that ensemble mean showed relatively better performance than individual RCMs.The grand ensemble was better in replicating the mean annual and seasonal rainfall in western Africa (Ilori and Balogun 2022).
The cumulative distribution function of the multimodel ensemble mean daily rainfall of the subbasin showed that the RCMs, institutional based ensembles mean and multi-model ensemble mean were good in replicating the CDF of the daily rainfall of the observed data (Fig. 7).Similarly, the occurrence of the number of rainy days showed the same pattern as the number of daily rainy days occurrence decreases from dry to very heavy rainfall both in the observed and model simulated daily rainfall.(Fig. 8).There was a large (4500 days) number of dry days (RR = 0) occurrence both observed and model simulated rainfall.On rainy days, the light rainy days (RR ≤ 10 mm) occurrence was larger than heavy (RR ≥ 10 mm) and very heavy (RR ≥ 20 mm) rainy days (Fig. 8).Additionally, the average light rainfall days were over estimated by the RCMs, institutional based ensembles mean and multi-model ensemble mean simulation, while the rate of heavy and very heavy rainy days were higher in the observed rainfall than in the model simulation except MPI-Reg4-7 and Reg ensemble mean heavy rainfall days.The result revealed that the MPI-REMOO, NorESM-REMOO, Reg ensemble and MPI-CCLM were better in replicating the number of days of dry, light,    Similarly, Table 4 showed that the mean annual maximum temperature of the subbasin was over estimated by all the considered RCMs, institutional based ensembles mean and the multi-model ensemble mean simulations except Adiramets station MPI-Reg4-7 simulation, which under estimated by the magnitude of 0.10 0 C. The overestimating magnitude ranged between 0.06 and 8.9 0 C. The lowest and highest over estimation was observed at Maytsebri HadReg4-7 and Zarima NorESM-Reg4-7 simulations (Table 4).Similar result was found in South west Ethiopia where, all evaluated CORDEX-AFRICA-RCMs overestimate the maximum temperature (Demissie and Sime 2021).
Regarding the bias of the mean annual maximum temperature of Adiramets station MPI-Reg4-7, the Debarik station multi-model ensemble mean, Ketema niguse and Maytsebri stations HadReg4-7 and Zarima stations the CCLM ensemble mean simulations demonstrated superior performance than the others.
Based on analysis of the observed and model simulated mean annual minimum temperature majority of the model simulations exhibit good representation at Adiramets, Debarik, and Zarima stations (Fig. 12).Specifically, CanESM-RCM, HadReg4-7, REMOOensemble, multi-model ensemble, and Regensemble simulations perform better at Adiramets, Debarik, Ketema Niguse Maystebri, and Zarima stations, respectively.On the other hand, the mean annual minimum temperature simulations of NorESM-CCLM at Adiramets, Ketema Niguse, and Zarima stations, NorESM-REMOO at Debarik station, and HadReg4-7 at Maystebri station demonstrate poor performance in representing the observed values.
The statistical evaluation of the annual minimum temperature is tabulated in Table 5.The result showed that similar to the annual maximum temperature majority of the simulation overestimated the mean annual minimum temperature.The annual minimum temperature bias values range was from 0.43 to 3.57 0 C. The lowest and highest bias values are observed HadReg4-7 and MPI-CCLM simulation at Ketema niguse and Zarima station respectively.Whereas the underestimation values were ranged from 0.09 to 5.07 0 C which was observed at REMOO ensemble simulation of Zarima station and HadReg4-7 simulation of the Adiramets station (Table 5).The over estimation was high in highland areas and low in lowland area of the subbasin.Similarly, the study conducted in Tanzania the bias was relatively high in highly elevated area and low in low elevated area (Luhunga et al. 2016).The RCMs performance varies based on locations and topography in upper Blue Nile basin (Dibaba et al. 2019).North western Ethiopia climate simulations CORDEX-AFRICA-RCMs are sensitive to elevation, resulting higher biases for higher elevation (Van Vooren et al. 2019).
The simulations of the mean annual minimum temperature at Adiramets station the multi-model ensemble mean, HadReg4-7at Debarik station, REMOO ensemble mean at Ketema niguse and Zarima stations, and CanESM-RCM ensemble at Maystebri stations outperformed than other simulations in terms of bias.
The kiremit season mean maximum temperature was over estimated by majority of RCMs, institutional based ensembles mean and multi-model ensemble mean except HadCCLM at Adiramets station, HadREMOO and MPI-Reg4-7 at Maytsebri station and MPI-CCLM, CCLM ensemble mean at Zarima station which underestimated the mean kiremit season maximum temperature with magnitude ranged between 0.02 and 2.75 0 C. The lowest and highest underestimations were observed at Adiramets station HadCCLM simulation and Zarima station CCLM ensemble simulation.On the other hand, the overestimation magnitude was ranged from 0.1 to 6.12 0 C. The lowest and highest over estimation were observed at Adiramets CCLM ensemble mean simulation and Zarima station MPI-REMOO simulation respectively (Table 4).
Similar to the kiremit season mean maximum temperature mean minimum temperature of the kiremit season was overestimated by the majority of the simulation with values ranged from 0.01 to 6.03 0 C whereas the underestimation value was ranged between 0.32 and 7.01 0 C. The highest and lowest overestimation was observed in HadREMOO and MPI-REMOO simulations at Debarik and Adiramets stations respectively while the underestimation highest and lowest value was observed at Ketema niguse REMOO simulation and Adiramets station HadReg4-7 simulation respectively (Table 5).Similarly, almost 50% of the CORDEX-RCMs simulations showed overestimation of minimum temperature during the kiremit season (JJAS) (Tumsa 2022).
The simulation of the HadCCLM at Adiramets station HadReg4-7 at Debarik and Ketema niguse; NorESM-CCLM at Maystebri and Zarima stations kiremit season mean maximum temperature showed superior performance than the other RCMs.HadReg4-7 outperformed kiremit season mean minimum temperature over the subbasin except Ketema niguse station where REMOO ensemble was better.
The RMSE values between the observed and the RCMs simulated, the institutional based ensembles mean and the multi-model ensemble mean of the annual maximum temperature was low (0.24 °C) at Ketema niguse station HadReg4-7 simulation and high (9.28 °C) at Zarima station NorESM-Reg4-7 simulation (Table 5).Whereas, the NorESM-REMOO and MPI-REMOO simulations showed the lowest (0.36 °C) and highest (6.31 °C) RMSE values at Adiramets and Maytsebri stations of kiremit season mean maximum temperature respectively (Table 4).Generally, in terms of RMSE the multi-model ensemble mean was better at Adiramets and Ketema niguse station, HadReg4-7 at Debarik station and CCLM ensemble at Maystebri and Zarima stations.The kiremit season RMSE value showed that the multi-model ensemble mean was superior in simulating the observed kiremit season mean maximum temperature at adirmets, Debarik and Ketema niguse stations, whereas at Maytsebri and Zarima stations CanESM-RCM and NorESM-REMOO was better than the other respectively.
Unlike the maximum temperature the multi-model ensemble mean and HadReg4-7 simulated minimum temperature showed low RMSE values in both annual and kiremit season time scale at Adiramets station.The RMSE value observed at annual and kiremit season time scale ranged between 0.09-4.08°C and 0.27-5.55°C respectively.Regarding RMSE value the multi-model ensemble was superior in simulating the annual and kiremit season mean minimum temperature over the subbasin except Debarik and Zarima mean annual minimum temperature and Ketema niguse kiremit season mean minimum temperature where HadReg4-7, REMOO ensemble mean and Reg ensemble mean was better.
The majority of the annul and kiremit season maximum temperature simulations showed strong correlation with the observed mean annual and seasonal maximum temperature except Zarima station HadCCLM, HadREMOO, MPI-CCLM, NorESM-CCLM, NorESM-REMOO mean annual maximum temperature simulations which showed weak correlation.The multi-model ensemble showed superior correlation performance than the individual RCMs, institutional based ensembles mean in majority of the subbasin areas at annual and kiremit season time scales (Tables 4 and 5).Further CanESM-RCM showed equal performance with the multi-model ensemble mean at Adiramets and Debarik station and HadReg4-7 and CCLM ensemble at Debarik station in simulating the mean annual maximum temperature (Table 4).Similarly, the multi-model ensemble mean showed better correlation performance than the other simulations at annual and kiremit season time scale minimum temperature except kiremit season REMOO ensemble mean simulation at a Zarima station (Table 5).Contrary, negative correlation was observed at Adiramets station HadREMOO mean annual minimum temperature simulation (Table 5).
The KGE value of the annual and kiremit season mean maximum and minimum temperatures demonstrated good performance (KGE > 50) over the subbasin area, with the exception of few areas where the other RCMs were better.The mean annual maximum temperature of MPI-Reg4-7at Maystebri station and CCLM ensemble simulation of Adiramets and Ketema niguse mean minimum temperature performed better.Seasonally, the CCLM ensemble simulation of the Maystebri station kiremit season mean maximum temperature and Reg ensemble mean and MPI-Reg 4-7 simulation of the Zarima station mean maximum and minimum temperature respectively had superior performance than the other RCMs.
The RCMs, institutional based ensemble means and multi-models ensemble mean of annual and kiremit season mean maximum and minimum showed low inter annual and seasonal variability (CV < 20) throughout the subbasin.In terms of CV, the mean annual maximum temperature simulation was better in HadReg4-7 at Adiramets and Debarik stations; MPI-REMOO at Ketema Niguse and MPI-Reg4-7 at Maytsebri and Zarima stations (Table 4) whereas, Adiramets station NorESM-REMOO; Debarik station Reg ensemble mean; Ketema niguse REMOO ensemble mean; Maytsebri station MPI-CCLM and Zarima station HadREMOO simulations showed closer simulation to kiremit season mean maximum temperature (Table 4).This finding is in line with study conducted in southwestern Ethiopia where the CORDEX-AFRICA-RCMs showed low variability of the maximum and minimum temperatures on annual and seasonal time scales (Demissie and Sime 2021).
Among the evaluated simulations of the mean annual minimum temperature closer CV value was shown in the multi-model ensemble mean simulation of Ketema Niguse and Zarima stations; CCLM ensemble of Adiramets stations; HadReg4-7 of Debarik and NorESM-REMOO of Zarima station (Table 5).The multi-model ensemble mean, REMOO ensemble, CCLM ensembles and MPI-Reg4-7 mean kiremit season minimum temperature showed closer CV value with the observed minimum temperature at Adiramets and Debarik, Ketema Niguse, maytsebri and Zarima stations respectively (Table 5).
The Taylor diagram of the mean annual and kiremit season maximum and minimum temperature showed that the multi-model ensemble mean was better in replicating the subbasin areal mean annual and kiremit season maximum and minimum temperature (Fig. 13a-d).Like the rainfall, the mean kiremit and annual multi-model ensemble mean maximum and minimum temperature was better in two to four of the evaluation metrics thus the multi-model ensemble was better in replicating the maximum and minimum temperature of the subbasin.

Conclusion
This study evaluated the performance of historical simulation of 10 CMIP5 CORDEX-AFRICA-22 0 RCMs, three institutional based ensembles mean (Reg ensemble, CCLM ensemble REMOO ensemble), and one over all ensemble mean in replicating the observed mean annual and kiremit season rainfall, maximum and minimum over Zarima subbasin between 1986 and 2005 using statistical metrics, including bias, RMSE (Root Mean Square Error), r (Pearson correlation coefficient), CV (coefficient of variation), KGE (Kling-Gupta Efficiency)and Taylor diagram which shows the r, standard deviation, and centered root mean square difference graphically.The result indicated that monthly rainfall pattern of the multimodel ensemble and CanESM-RCM simulations exhibited better performance across the subbasin, except for the Ketema niguse station where, the REMOO ensemble and Reg ensemble showed superior performance in capturing the monthly rainfall patterns.Additionally, the NorESM-Reg4-7 simulation demonstrated good performance at the Debarik and Maytsebri stations, while the CCLM ensemble performed well at the Adiramets station in terms of capturing the monthly rainfall variations.
The monthly pattern of monthly maximum and minimum temperatures in the RCMs, institutional based ensemble means, and multi-model ensemble mean revealed that they generally performed well in representing the observed patterns.While some simulations were closer to the observed maximum and minimum temperatures, others showed some differences in replicating the monthly maximum and minimum temperatures.Specifically, the MPI-CCLM simulation, multi-model ensemble mean, REMOO ensemble mean, HadREMOO, and HadReg4-7 performed relatively better than other simulations in replicating the monthly pattern of maximum temperature at specific stations.Specifically, these simulations showed good agreement with the observed monthly patterns of maximum temperature at the Maytsebri, Adiramets, and Debarik stations.On the other hand, for the monthly minimum temperature, the multimodel ensemble mean, HadCCLM, CCLM ensemble mean, MPI-CCLM, MPI-Reg4-7 were relatively better compared to other simulations in replicating the monthly pattern.These simulations exhibited closer agreement with the observed monthly patterns of minimum temperature at the Maytsebri, Adiramets, Debarik, Ketema niguse, and Zarima stations.
The mean annnual rainfall analysis showed that Had-REMOO, MPI-Reg4-7, HadReg4-7, Reg ensemble, and multi-model ensemble mean performed relatively better in representing the mean annual observed rainfall at the Adiramets, Debarik Ketema, Niguse Maystebri, and Zarima stations, respectively.Whereas, NorESM-CCLM, MPI-CCLM, NorESM-Reg4-7, and NorESM-REMOO exhibited a weak performance in reproducing the observed mean annual rainfall at the Adiramets, Debarik Ketema niguse, Maystebri, and Zarima stations, respectively.Similarly, RCMs generally capture the mean annual maximum temperature of climatic stationsof Zarima subbasin well.Specifically, the MPI-Reg4-7 simulation performs well in representing the mean annual observed maximum temperature at Adiramets and Maytsebri stations, while the Debarik and Ketema niguse stations exhibit superior performance in the HadReg4-7 simulation and the Zarima station shows better representation in the CCLM ensemble simulations.The majority of the model simulations exhibit good representation of mean annual minimum temperature at Adiramets, Debarik, and Zarima stations.Specifically, CanESM-RCM, Had-Reg4-7, REMOOensemble, multi-model ensemble, and Regensemble simulations perform better at Adiramets, Debarik, Ketema niguse, Maystebri and Zarima stations respectively.
The majority of RCMs, as well as the ensemble mean simulations from various institutions and the multi-model ensemble were found to overestimate the observed mean annual and kiremit season rainfall.Specifically, the annual HadReg4-7 simulation showed the highest overestimation at the Maytsebri station, while the MPI-CCLM simulation had the lowest overestimation at the Ketema niguse station.On the other hand, the NorESM-Reg4-7 at Ketema niguse station and HadReg4-7 simulations at the Zarima station were observed to underestimate the mean annual rainfall.Seasonally, the minimum over estimation was seen in HadReg4-7 simulation of Ketema niguse station and maximum overestimation was seen MPI-CCLM simulations of Maytsebri stations.The RCMs are sensitive to elevation as the finding indicated that highest BIAS values were observed at highly elevated area of subbasin Ketema niguse and lowest BIAS values was observed at low land area of the Maytsebri and Zarima stations.
The combined analysis of statical metrics revealed that the multi-model ensemble mean, comprising multiple model simulations, outperformed the individual models in two or more statistical metrics at each station in simulating the annual and seasonal rainfall, maximum and minimum temperature, except Maytsebri station kiremit season rainfall simulation.In that case, the CCLM ensemble showed better performance.Additionally, the Taylor diagram demonstrated that the multi-model ensemble mean exhibited the best matches with the observed values for both the areal annual and kiremit season rainfall, maximum and minimum temperature across the entire subbasin.The findings implied that the multi-model ensemble mean provides a more reliable representation of the observed values and performs better across multiple evaluation criteria.This reinforces the value of using ensemble approaches to capture uncertainties and enhance the accuracy of climate model simulations.
The models had different performance in different statistical metrics at different location, and time considered (seasonal and annual) even the multi-model ensemble mean thus selecting the best representative simulation and bias correcting is important for the climate projection and climate change impact assessment study in the Zarima subbasin.This study assists water resource managers and hydrologists in selecting suitable models for their specific needs and responsibilities.Moreover, it contributes to the development of a reliable climate service assessment and facilitates decision making for climate adaptation, ultimately leading to optimal benefits in mitigating the impacts of climate change.

Fig. 2
Fig. 2 Agroclimatic zone (a) and land use land cover (b) of Zarima subbasin

Fig. 3
Fig. 3 Climate Diagram of Zarima Subbasin.Data obtained from the Ethiopian Meteorological Institute for the period of 1984 to 2018

Fig. 5
Fig. 5 The observed and CORDEX-AFRICA RCMs simulated mean annual rainfall

Fig. 11
Fig. 11 The observed and model simulated mean annual maximum temperature of the Zarima subbasin

Fig. 13
Fig. 13 Taylor diagram of a areal annual maximum temperature, b the areal kiremit season maximum temperature, c areal annual minimum temperature and d the areal kiremit season minimum temperature

Table 1
List of stations and their location

Table 3
Statistical evaluation of the mean annual and kiremit season rainfall of Zarima subbasin (the statistical evaluation of kiremit season mean rainfall is written in brackets

Table 4
Statistical evaluation of RCMs, institutional based ensembles mean and multi-model ensemble mean of the annual and kiremit season maximum temperature (kiremit season statical evaluation are written in bracket)

Table 5
Statistical evaluation of the observed and model simulated mean annual and kiremit season minimum temperature (kiremit season statical evaluation are written in bracket)