Analysis of urban sprawl dynamics using machine learning, CA-Markov chain, and the Shannon entropy model: a case study in Mbombela City, South Africa

Over half of the world’s population resides in urban areas. We anticipate that this pattern will become more evident, notably in South Africa. Therefore, research on urban spirals, both past and projected, is necessary for efficient urban land use planning and management. This study aims to assess the spatio-temporal urban sprawl dynamics from 2003 to 2033 in Mbombela, South Africa. We employed robust approaches such as machine learning, the cellular automata-Markov chain, and the Shannon entropy model to look at how urban sprawl changes over time using both the Landsat 4–5 Thematic Mapper and the 8 Operational Land Imagers. We conducted this study to bridge the gaps in existing research, which primarily focuses on past and current urban growth trends rather than future trends. The findings indicated that the coverage of built-up areas and vegetation has expanded by 1.98 km 2 and 13.23 km 2 between the years 2003 and 2023. On the other hand, the amount of land continues to decrease by -12.56 km 2 and − 2.65 km 2 annually, respectively. We anticipate an increase in the built-up area and vegetation to a total of 7.60 km 2 and 0.57 km 2 , respectively, by the year 2033. We anticipate a total annual decline of -7.78 km 2 and − 0.39 km 2 in water bodies and open land coverage, respectively. This work has the potential to assist planners and policymakers in improving sustainable urban land-use planning.


Introduction
More than half of the world's population lives in cities, changing land use and socioeconomic phenomena (Magidi and Ahmed 2019).During the 19th and early 20th centuries, Western regions like Europe and America experienced a significant increase in urbanization, influenced by regional factors.Asia and Africa have also shown a higher rate of urbanization compared to other regions (Dhanaraj and Angadi 2022).Sub-Saharan Africa (SSA) will account for 50% of global population growth by 2050, causing urbanization, land use, and land cover changes (Forget et al. 2021).Urbanization is an inevitable outcome of economic advancement and rapid population growth (Deep and Saklani 2014).However, urban sprawl, a phenomenon characterized by uncontrolled, disordered expansion due to population growth and migration, is a significant concern due to its rapid rate of unforeseen urbanization (Hamad 2020).Urbanization increases infrastructure demand for transportation, water systems, housing, businesses, healthcare, education, and recreation, often invading rural areas and potential agricultural lands (Dhanaraj and Angadi 2022;Shao et al. 2021).Improper urban land use planning has significantly impacted the loss of potentially productive agricultural land (Gidey et al. 2023a), as urbanization occurs due to the substantial shift from various land use types to builtup areas.Protecting ecosystems, such as agricultural land and wetlands, is critical to maintaining ecological balance and preventing air and water resource contamination.
Monitoring changes in urban land use and cover, as well as the < from-to > transformation, is crucial in order to identify socioeconomic factors that facilitate sustainable development, given the intensification of urbanization on a global scale (Kabanda 2022).Similarly, urban growth will inevitably require land from various land cover types (Gidey et al. 2023a).This necessitates proper land use planners' guidance for social, economic, and ecological sustainability, as informal settlements pose challenges like traffic congestion, deforestation, limited open space, and climate change effects (e.g., increased urban heat island).Urbanization in the SSA is causing significant changes in land use and cover due to humans because of factors such as urban sprawl and industrialization (Gidey and Mhangara 2023b).Furthermore, a variety of factors, including infrastructure development, industrial growth, economic trends, population growth, regulatory policies, water resources, and transportation developments, drive the expansion (Dhanaraj and Angadi 2022;Hamad 2020;Mudau et al. 2014).This can expedite the alteration of land use and land cover, impacting the carbon cycle and increasing atmospheric carbon monoxide levels (Hegazy and Kaloop 2015).Geosciences such as Geographic Information Systems (GIS) and remote sensing can accurately measure and analyze urban sprawl.We have developed the Google Earth Engine (GEE), machine learning algorithms, cellular automata, and the Markov Chain model, all supported by ArcGIS Pro v. 3.2, to address the shortcomings of existing studies that primarily focus on historical urban growth rather than future patterns.The CA-Markov chain model (Gidey et al. 2023a) is more accurate at predicting future changes in land use and cover than other models, such as the agentbased model (ABM) and the conversion of land use and its effects in a small-scale modeling framework (CLUE or CLUE-S model).The application of Shannon's entropy also enhances comprehension of urban growth patterns by assessing the distribution of urban expansion and identifying changes in built-up areas.This technology assists in monitoring and planning future urbanization, providing comprehensive information for policymakers, managers, and planners to improve decision-making and land use planning in urban and semi-urban regions.This helps to improve the transformation of urban land use systems and identify key factors that affect urban land use planning due to its accuracy, consistent coverage, and high spatial, spectral, and temporal resolution (Hamad 2020;Hegazy and Kaloop 2015;Magidi and Ahmed 2019;Shao et al. 2021).
Several researchers use high-and moderate-resolution satellite images to study urban sprawl right now (Gidey et al. 2023a;Forget et al. 2021;Kabanda 2022;Satterthwaite 2017;Wolff et al. 2020), such as optical Landsat, Sentinel-2, Worldview, Satellite Pour observation de la Terre (SPOT) 5, and RapidEye, to name a few.Because it is a cause for concern that the pattern of land use has changed both within and beyond the city area (Ghosh et al. 2023), in this context, researchers use each image's spectral bands to align objects that exhibit comparable spectral responses and signatures with each feature.On the other hand, retrieving and managing a substantial amount of data may present difficulties for computational systems.However, cloud computing solutions like Google Earth Engine (GEE), supported by machine learning algorithms using support vector machines (SVM), offer a solution to this problem.Zhao et al. (2021) describe GEE as an open-source platform that analyzes satellite data on a petabyte scale.This classification method utilizes a mathematical algorithm to organize data and produce a categorized map that contains comprehensive information.On the other hand, the quality of the data primarily determines the accuracy of image classification (Lang et al., 2006).
Human-made infrastructure and areas of vegetation, which host a variety of wildlife and have a higher population density than natural areas, distinguish the urban environments in Southern Africa (McPherson et al. 2021).For instance, South African cities are experiencing a decline in green spaces and an increase in impervious surfaces due to natural urban expansion and government-led initiatives such as the reconstruction and development program (Kabanda 2022).In a country like South Africa, which is experiencing rapid economic growth, it is of the utmost importance to analyze the phenomenon of urban sprawl to develop effective strategies and policies that encourage sustainable development (Magidi and Ahmed 2019).South Africa's urban population is rapidly expanding, especially in the districts of Polokwane, Rustenburg, Vanderbijlpark, Nelspruit, and Ekurhuleni, recognized as the five regions with the highest growth rates as of the current year.For instance, Kabanda (2022) reported that urban areas in Kimberley, South Africa, experienced a 15% total urban expansion rate between 2013 and 2018, with an increase of 6.7 km 2 in extent.Likewise, Rustenburg's urban growth surpasses the UN's 1.21% annual African average, primarily due to population growth, post-apartheid development, and mining expansion (Mudau et al., 2024).Several factors, such as migration, economic progress, and population growth, have further exacerbated this increase (Magidi and Ahmed 2019; Mudau et al. 2014).As a result, urban areas have a higher population density than their rural counterparts.Inadequate urban land use planning significantly impacts the environment, leading to an expected increase in high demand in the study area.High demand is expected to increase in the study area, and inadequate urban land use planning significantly impacts the environment.Thus, historical and predicted urban spiral studies are essential for effective urban land use planning and management to protect our ecosystem for future generations, reduce conflict, and promote equitable development.However, urban sprawl research in Mbombela City, South Africa, is scarce (Yiran et al. 2020).If urban land use is not efficiently managed, the Sustainable Development Goals may be harder to achieve because urbanization is expected to continue growing at twice the current rate over the next twenty years.This study aimed to examine the historical and projected patterns of urban growth in Mbombela City, South Africa, during the years 2003, 2013, 2023, and 2033, focusing on their spatial and temporal characteristics.It is essential to have a comprehensive understanding of the findings of this study to effectively manage the transformation of urban environments, protect the environment, and promote the efficiency of urban land use planning.

Description of the study area
We conducted the study in Mbombela, which is the administrative center of Mpumalanga Province.The city sits at an elevation of 667 m above sea level, with coordinates of 25°15'S to 25°30'S and 30°30'E to 30°15'E (Fig. 1).In the past, Mbombela city was known as Nelspruit.It is currently one of the urban areas experiencing rapid growth.Manikela (2009) states that the site is located at the intersection of the N4 National Road and the R40 Road.The area in question is part of the Inkomati catchment, a transboundary catchment that Mozambique and Swaziland cooperatively share.A temperate, highland tropical climate with dry winters characterizes the study area (Mbombela city).Based on the long-term climate data , the study area receives an average annual rainfall of 75.3 millimeters, with high days without rain (i.e., 238.9 days, or 65.45%).
On the other hand, the highest and lowest annual average temperatures are about 28.16 °C (82.69 °F) and 14.74 °C (58.53 °F), respectively.The seasonal characteristics of rainfall significantly impact the phenology cycle of plants in the region, which includes both natural and farmed flora.The seasonal characteristics of rainfall have a significant impact on the plant phenology cycle in the region, which includes both natural and farmed flora.Extensive wetlands distinguish the riverbanks from one another.In addition to a wide range of fruits and vegetables, the agricultural produce also includes staple crops like maize, potatoes, and beans on occasion.Granite and migmatite are the two minerals that make up the earth's material composition.The predominant coloration of the soil ranges from red to yellow-brown, known as the Hutton form.Research has determined that Mbombela first received recognition as a village in 1905, and it wasn't until 1940 that it achieved the status of a municipality.The geographical area under consideration encompasses three designated urban hubs, namely Mbombela, White River, and Hazyview, along with three informal communities.The primary applications of agriculture, tourism, and mining led to the establishment of these centers.According to Manikela (2009), the South African National Space Agency Policy Brief emphasizes the significant increase in urban population caused by apartheid planning.

Data acquisition techniques
We monitored urban expansion from 2003 to 2023 in ten-year intervals using a comprehensive Landsat image.
It was necessary to have information pertaining to these particular dates obtained from two different sources in order to be able to make comparisons with subsequent periods.This study utilized the Landsat 4-5 OLI/TIRS Collection 2 as the initial dataset, providing calibrated Top-of-Atmosphere (TOA) reflectance data for the year 2003.The calibrated TOA reflectance from Landsat 8 OLI/TIRS Collection 2, which covered the years 2013 and 2023, was used as the second dataset.The United States Geological Survey and the National Aeronautics and Space Administration generated comparable datasets, but Table 1 revealed differences in the bands.The rationale for choosing this time was the 2005 launch of the Tourism Black Economic Empowerment Charter and Scorecard (Monakhisi 2008), which sought to promote tourism growth in the Mpumalanga region.Urban growth is crucial for tourism development as cities become increasingly attractive destinations for tourists.Factors such as affordable transportation, increased mobility, and technological advancements contribute to this expansion.Cities offer a wide variety of cultural activities, historical landmarks, and lively or vibrant life, making them popular tourist destinations due to their diverse cultural activities and lifestyles.
We decided to conduct this study during that particular time because the influence of the weather was relatively low during May.We made this decision to facilitate the use of clear images, a crucial step in preventing misclassification due to cloud and shadow formation.For each year, we carried out the procedures for acquisition and processing in an independent manner, utilizing a methodology that was consistent throughout.During the image collection process, filtering metrics were defined in Google Earth Engine for every single image acquisition.We applied an additional filtering process to the image to capture only those with a cloud cover level of less than 1%.We uploaded the shapefile designating the region of interest, specifically Mbombela City, onto the GEE platform to crop and align the satellite images with the boundaries of the study area.

Satellite image processing, classification, and analysis
We used various image pre-processing techniques on the GEE platform to address discrepancies in the collected satellite images.These techniques included geometric corrections (specifically, the UTM Zone 35s projection), atmospheric corrections, and radiometric corrections (to eliminate issues like cloud cover, darkness/haze, and noise) (Fig. 2).After that, we estimated the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Built-Up Index (NDBI) to assess the condition of the vegetation and gather insights into the characteristics and development patterns of urban sprawl as follows (Eqs.1-2): NDVI measures the health and quantity of vegetation by determining the disparity between near-infrared (NIR) and red light.On the other hand, NDBI focuses on builtup areas by utilizing the NIR and shortwave infrared (SWIR) bands.The values of NDVI and NDBI can range from − 1 to 1. NDBI values close to -1 indicate low built-up density in vegetation, water bodies, and areas with bare soil or rocky terrain.Similarly, NDBI values close to 0 indicate a combination of built-up and non-built-up areas, such as suburban regions with dispersed structures and vegetation.This category includes arable land and various structures that have been built.However, factors such as urban area expansion, infrastructure construction, industrial operations, intensified agricultural practices, sensor and data issues, unique urban configurations, changes in land use, and threshold selection can contribute to NDBI values that exceed the established threshold of 1.These factors can lead to elevated NDBI values as a result of amplified urban development, industrial operations, sensor and data complications, distinctive urban configurations, and alterations in land use.
Both NDVI and NDBI indices facilitate the differentiation between aquatic bodies, vegetation, built-up areas, and open regions.We employed these indices to facilitate the differentiation of various components in satellite imagery.By applying these indices, one may readily detect distinct elements such as water, vegetation, built-up areas, and open lands, as each of these features exhibits unique reflection characteristics.The feature collection tool was utilized to train points on GEE.The process entailed the independent addition of each class, each of which was assigned a distinct label.Each class accumulated a total of 150 points.We collected the points for each class and then combined them to form a training dataset.We then superimposed the training points and overlay points.We chose the inputs for all the bands using a scale of 10.We then divided the training dataset into two distinct halves.The training dataset accounted for 70% of the total data, with the remaining 30% designated for testing purposes.Once this task is complete, we can implement the classification algorithm.We selected SVM due to its strong performance in both classification and regression tasks, robustness, and accuracy (Ahmad et al. 2014;Shaharum et al. 2020).It also does a good job of dealing with the problem of not having enough samples available and has strong fitting abilities, good predictive abilities, and a high-dimensional feature space (Gao et al. 2022;Zhou et al. 2021).SVM employs binary functions that are non-probabilistic (Zhou et al. 2021).

Analysis of urban sprawl using the SVM machine learning model
This study used the SVM algorithm because it excels at classification and regression (Shaharum et al. 2020).This method solves non-linear problems, but it takes longer and requires more computation.The algorithm categorizes data points accurately by separating support vectors using a hyperplane (Gao et al. 2022;Shaharum, 2020).Specifying kernel type, gamma, and cost parameters improves GEE classification accuracy.Both polynomial and sigmoid kernels are available.Gamma affects data distribution in the new eigenspace, while support vectors affect training.Gao et al. (2022)

McNemar's test
McNemar's test is a straightforward, parametric, and user-friendly technique for comparing maps that have the same reference points but different categorization methods for the same year.In contrast, the Kappa coefficient presupposes that the samples are independent (Manandhar et al. 2009) as shown in Eq. 3: where 12 is the number of misclassified cases by classifier one but correctly classified by two, and 21 is the number of correctly classified cases by classifier one but misclassified by classifier two.

Analysis of urban sprawl dynamics
We classified the satellite images, exported them in the TIFF format, and then integrated them into ArcGIS Pro v. 3.2.We reclassified the maps to assign a distinct numerical identifier to each class.We then employed two change detection analytic methodologies.The first step was to convert raster maps to polygons for each year.The purpose of this action was to ascertain the extent of land area in hectares by utilizing the summary statistics tool.A bar graph was created to display the overall coverage for each class in each year.The purpose of this study was to visually assess the differences in area coverage across different classes throughout each year.We graphed the constructed class on its axis because it was significantly smaller than the other classes.Therefore, by plotting it on its axis, one could observe changes in the developed regions that would have gone unnoticed if presented alongside the remaining areas.This procedure was intended to visually represent the extent of coverage for each class across all years.Shannon's entropy was computed as follows:

Shannon's entropy model
According to Jat et al. (2008), Shannon's entropy is a commonly employed measure for assessing and comprehending the extent of spatial concentration or dispersion of a variable in each space.The integration of remote sensing and GIS has gained significant popularity in recent years (Das and Angadi 2021).This is due to its ability to quantify the extent of urban expansion across many levels, ranging from state to national (Das and Angadi 2021).Its utilization (Eq.4) facilitates the calculation of spatial concentration or distribution at all spatial units.
where  is the probability of viable occurring in the  h zone, and n is the total number of zones.
The range of entropy values can span from 0 to 1.According to Verma et al. (2017), values that are closer to the latter imply a dispersed distribution.The formula is used to calculate Shannon's entropy for each year based on the area coverage in hectares for each class.We employed the ArcGIS Pro change detection tool to generate a category map that illustrates the changes between 2003 and 2013 and between 2013 and 2023.

Accuracy assessment
We used the remaining 30% of the data, specifically the testing data, to evaluate accuracy.We generated a confusion matrix and other accuracy metrics, including overall accuracy, the Kappa statistic, consumer's accuracy, and producer's accuracy.According to Tassi and Vizzari (2020), the confusion matrix measures the overall accuracy, consumer accuracy, producer accuracy, and Kappa coefficient.According to Tassi and Vizzari (2020), the confusion matrix measures the overall accuracy, consumer accuracy, producer accuracy, and Kappa coefficient.These metrics indicate the level of accuracy and reduction of errors during classification.

Prediction of future urban sprawl using CA-Markov model
With breaks and simple rules like unit, state, proximity range, and transition rules, the Cellular Automata (CA) and Markov Chain Model can be used to run dynamic simulations (Zhang et al. 2021).We have analyzed the dynamics of urban sprawl to identify the patterns, magnitude, and potential future changes.We estimated the CA model as follows (Eqs.5-7): where S is the state set of discrete and finite cells, N is the neighborhood of the cell, t and t + 1 represent two different moments, and f is the cellular state transition rule.Initially, Hamad et al. (2018) and Subedi et al. ( 2013) primarily used the Markov chain model for ecological modeling and monitoring.Sang et al. (2011) proposed a theory that bases prediction and optimal control theory on the formation of Markov chain process systems.We used the following equation (Eq.6) to predict changes in land use: where  () ,  (+1) are the system status at the time of () or ( + 1);   is the transition probability matric which is calculated as follows (Eq.7): According to Subedi et al. (2013), the CA-Markov model is a combination of the CA model and the Markov chain.Zhang et al. (2021) assert that the Markov model possesses the capability to provide quantitative predictions, although it is deficient in its capacity to generate spatial predictions.The use of geographical information in the CA model allows for the inclusion of dynamic evolution simulations.The Markov chain focuses on calculating probability fluctuations, whereas the CA focuses on changes in specific locations.The CA-Markov prediction model consists of four unique components related to spatial objects, namely cellular space, cellular state, neighborhood, and transition rules.Furthermore, the CA-Markov prediction model is made up of four distinct components related to spatial objects.These components are cellular space, cellular state, neighborhood, and transition rules.Nevertheless, precise forecasting proves to be arduous due to the intricacy of associated socioeconomic factors (Gidey et al. 2023a).We imported the projected map into ArcGIS Pro, following the preceding procedures, and conducted a change detection analysis by comparing the maps from 2023 to 2033.We intended to measure and illustrate the differences between the two maps.We also determined and plotted the area coverage for each class on the projected map.

A spatial and temporal trends of urban sprawl from 2003 to 2023
Based on the SVM model, we identified four main land cover types: vegetation, built-up areas, water bodies, and open or barren land (Fig. 3a-c).The vegetation in the study area was primarily concentrated in the northern and southwestern regions.In contrast, in other areas, it was more scattered because it was characterized by aridity or scarcity due to meteorological factors, and it is plausible that the conditions have improved in the subsequent years.In contrast, the built-up area encompassed 17.08 km 2 , which accounted for only 0.78% of the total study area.After analyzing the built-up area, it is evident that the city of Mbombela, situated to the east of the central point on the map, has a comparatively smaller area than the other classes.The other land cover types, such as water bodies and open land, cover a total area of 80.23 km 2 (3.66%) and 963.3 km 2 (43.97%), respectively.The spatial extent of the built-up area, on the other hand, increased to 20.71 km 2 (1.13%) in 2013, while the vegetation cover decreased to 957.75 km 2 (43.71%), which had the effect of increasing the amount of open land (Fig. 3b).To increase the vigor of the vegetation, physical conservation measures that decrease surface runoff and increase infiltration are essential (Gebregergs et al. 2021).During the same year, the water body's area decreased to 47.02 km 2 (2.15%).Conversely, the open land demonstrated a positive trend with an increase of 1161.50 km 2 (53.11%).Furthermore, during the period 2023, the builtup area and vegetation coverage both increased to a total of 54.71 km 2 (2.50%) and 1381.68 km 2 (63.06%), respectively (Fig. 3c).Nevertheless, the water body experienced a substantial reduction in the area to 29.87 km 2 (1.36%).The growth of built-up areas has led to a similar pattern in bare lands, increasing by 724.73 km 2 (33.08%).In the study area, the total area covered by built-ups and vegetation has increased by 37.63 km 2 over the period from 2003 to 2023.This corresponds to an annual increase of 1.98 km 2 for built-up areas and 13.23 km 2 for vegetation coverage.Gidey et al. (2023a) predict a 2.80% increase in urban and peri-urban lands from 2019 to 2029, while arable land is expected to shrink by 2.8%.In addition, water bodies have decreased by -50.36 km2 (-2.65 km 2 per year), and open land has decreased by -238.58 km 2 (-12.56 km 2 per year).Identifying and classifying highways poses numerous classification challenges due to their spectral properties.However, higher-spectralresolution images could provide a more comprehensive depiction of road infrastructure than optical Landsat images.
During the years 2003, 2013, and 2023, the average NDVI values of the study area show a consistent increase of 0.543, 0.578, and 0.596, respectively.This indicates that there has been a gradual increase in vegetation density throughout these different periods (Figs.4a and 5a-c).In addition, Fig. 5a-c indicates that a higher NDBI value shows the presence of urban development, while a lower value indicates a greater presence of vegetation or water bodies in areas that have not been developed.Nevertheless, the mean NDBI values of the study are all negative, which indicates a dominance of non-built-up areas (such as vegetation and water bodies) over built-up areas (such as urban areas and infrastructure) (Fig. 4b).NDBI values can be influenced by a variety of factors, including the image capture date and time, sensor characteristics, atmospheric conditions, architecture, and the morphology of urban areas.
On the other hand, during the same time period, the water body experienced a significant decrease of 50.36 km 2 (a change of 2.65% annually), from 80.23 km 2 (3.66% of the total) to 29.87 km 2 (1.36%).
In general, the SVM model exhibited an overall accuracy ranging from 96 to 99%.The highest level of accuracy was attained in 2003 and 2013.In 2023, the accuracy was 3% lower compared to the other years.Differences in accuracy levels were observed over several years, with 2003 displaying the highest level of accuracy, followed by 2013 and 2023.When comparing the classified base map to the original satellite image, it became clear that 2023 outperformed the preceding years despite having the lowest overall accuracy.The observed difference may be attributed to the specific training samples everyone uses.
Furthermore, it is important to consider the possible impact of radiometric interference, as emphasized by Mishra et al. (2016), who have noted improvements in the radiometric calibration of Landsat since 1972.The observation of relatively small, disordered, and scattered regions of incongruity, attributed to a type of interference on the initial satellite image, supports this assertion.As a result, the images demonstrated improved visual clarity and greater variety due to the increased number of bands, allowing for a broader range of inaccuracy while simultaneously encountering less interference.All the kappa statistics ranged between 0.95 and 0.99.The year 2003 recorded the highest kappa value (0.99) due to the relative clarity of all the objects compared to the other years during the classification; however, in 2013, the kappa values experienced a slight decrease to 0.98 as a result of misclassification in the satellite images.The kappa value that we observed in 2023 was 0.95, which was the lowest; however, it was still above the acceptable range.Gidey et al. (2017) reported that kappa index values above 0.50 are considered acceptable.If the value is less than 0.40, the analyst should reclassify the images.Several groups of researchers, including Akalu et al. (2019), have applied this principle and conducted their investigations in accordance with the same ideas.

Prediction of future urban and non-urban land use patterns from 2023 to 2033
This study, based on the 2023 spatial coverage trend, predicted the future land use status of the study area for 2033 (Fig. 6).The findings indicate that built-up areas and vegetation cover will occupy (increase) a total land area of 65.62 km 2 (2.99%) and 1525.99 km 2 (69.65%), respectively.Both Rahman and Ferdous (2021) and El Haj et al. (2023) also reported an increase in open land and built-up areas.The observation can be considered sensible, as the region consists of vegetation and arid terrain.The construction of asphalt and gravel roads, residential housing, industrial areas, and factories may impede air circulation, increase land surface temperature, and contribute to the urban heat island effect (Hishe et al. 2024).Additionally, we expect a decline in the water body and open land to 22.42 km 2 (1.02%) and 576.95 km 2 (26.33%), respectively.Significantly, the availability of water decreased over time, concomitant with a constant increase in the number of developed regions.We can attribute the observed decline to the disruption of water systems resulting from accumulated growth.This Fig. 6 The predicted map of Mbombela for 2033 was produced using the CA-Markov model phenomenon makes sense because population growth tends to increase water demand.
Consequently, the water systems underwent adjustments to meet the increased demand, which resulted in subsequent reductions.In addition to this, we anticipate that the total area of built-up areas and vegetation will increase to 10.91 km2 (0.57 km 2 annually) and 144.31 km 2 (7.60 km 2 annually), respectively, over the next few years.Conversely, we anticipate a decrease of 6.45 km 2 (a 0.39 km 2 annual decrease) and a decrease of -7.78 km 2 in water bodies and open lands, respectively.However, it is important to examine additional variables, such as ENSO events and their impact on the water supply, which have not been accounted for.Overall, human activities have gradually changed the surface of the Earth, which has been gradually changing as a result of human activities (Hishe et al. 2024).

Validation of the predicted urban sprawl trend of 2033
The map's validation process yielded further validation, as evidenced by the Kappa coefficient, which demonstrated a significant level of concurrence with the 2023 map.The overall kappa statistic value in our study was 0.95 when we compared the predicted map of 2033 to the map of 2023 at the time of validation.Additionally, the kappa statistic for the degree of agreement in location was also found to be 0.99.The assessment of change detection on the maps of 2023 and projected maps of 2033 demonstrated that a significant share of land use categories remained unchanged (Fig. 7).Gidey et al. (2023a) analyzed the expected urban/peri-urban growth and land cover classes in Shire Indaselassie, Northwestern Tigray, Ethiopia, comparing actual and simulated data and finding a cellular automata-Markov chain model agreement of 98.66%.As such, the CA-Markov chain model has great predictive power for future land use change probabilities, both urban and non-urban.

Shannon's entropy
Table 2 shows Shannon's entropy values from 0.81 to 0.86.It peaked at 0.81 in 2013 after reaching its highest in 2003.Jat et al. (2008) suggest that Shannon's entropy increases with sprawl.The results showed that 2003 had the highest entropy, followed by 2013 and 2023.Sprawl is expected to increase entropy in 2023, contrary to expectations.Das and Angadi (2021) and Ozturk (2017) found that subdividing study areas into wards and municipalities improves their significance and applicability.This is due to the ability to compare values and pinpoint the sprawl.This study's approach may have been incomplete, but it shows that 2023 is more coherent and consistent than previous years, which were more dispersed.Entropy is useful for comparing past, present, and future years to determine whether growth trends are upward, downward, or stable, even without expected patterns (Zachary and Dobson 2021).
Another interesting result from the change detection was that the water class only changed into vegetation and not any other classes (Fig. 7).The intriguing aspect lies in the fact that any alteration in water levels is attributed to a drought, resulting in the expansion of desolate area due to water evaporation, rather than the growth of flora.According to Draper and Kundell (2007), there is an anticipated decrease in the availability and release of water, namely in Southern Africa.The subject under consideration pertains to both the welfare of individuals and economic activities in the region, as well as the substantial safety implications arising from the area's positioning inside a transboundary watershed.Without a synchronized effort to address disputes, the shared river basins possess the capacity to present a safety hazard (Heyns et al. 2008).The implementation of efficient management practices and strategic planning can achieve the prevention of such scenarios.
However, open land in 2023 did change to vegetation and built-up areas, but not to water.Over time, the conversion of open land into built-up areas is directly proportional to the extent of vegetation conversion into built-up areas, as depicted in Fig. 6.One of the most intriguing observations revolved on the metamorphosis of developed regions, primarily concentrated at the periphery of the metropolitan center or in previously undesignated areas.The empirical evidence indicates that Mbombela (study area) is currently undergoing and will persist in undergoing urban sprawl.Hence, the main determinant of the highest level of transformation in open land is its inherent simplicity in being exploited, in contrast to other land categories like vegetation and water, which are less prone to modification.It is worth mentioning that places with less vegetation are classified as open land, which includes areas where vegetation can be easily removed.The categorization of these phenomena intensifies their occurrence.However, these alterations have significant consequences, such as a deterioration in the overall state of the ecosystem, the endangerment of biodiversity and wildlife, a decrease in the quality of air, and the creation of heat islands (Mansour et al. 2020).As a result, determining the occurrence of urban sprawl is challenging due to the intricate debate surrounding its definition and implications.We can infer a certain level of urban sprawl, albeit slight.However, there has been a notable rise in the built-up area between  The inherent characteristics of Mbombela's economy, which relies on the agricultural and forestry sectors in the region, ecotourism, and nature-oriented activities, such as outdoor pursuits, are significant draws for tourists.
Hence, to sustain the influx of tourists, it is imperative to preserve the allure, including the natural environment, to sustain the economy.According to Adams and Moila (2004), Mbombela has experienced growth in its tourism industry, resulting in the inclusion of peri-urban communities inside its boundaries, totaling 2000.Nevertheless, despite the efforts made by local municipalities, these regions face inadequate service provision, a backlog of services, and elevated levels of unemployment.In 2005, the Tourism Black Economic Empowerment Charter and Scorecard introduced significant changes in the tourism sector.However, further efforts are required in urban areas such as Mbombela (Monakhisi 2008).Mpumalanga's tourism industry is expanding across various regions, contributing significantly to other industries like agriculture, mining, and manufacturing (Rogerson 2002).Overall, the proliferation of urban growth was a more expeditious and straightforward phenomenon.The change detection maps revealed that open land experienced the most substantial conversion to built-up status.Shahraki et al. (2011) produced similar results in their research.Subsequently, the vegetation was replaced by water, which exhibited the least extent of alteration in the built-up area owing to the difficulties inherent in constructing on water.We can identify the primary element contributing to the notable transformation of open land as the significant presence of open land within urbanized areas, which served as yard space or inter-house spacing.

Conclusion
This study utilized optical remote sensing to assess and predict the historical and future urban sprawl in Mbombela, South Africa

Fig. 1
Fig. 1 Location of Mbombela city (study area) linked the error term cost to the regulation parameter C. SVM image classification requires both C and gamma.The regularization parameter C optimizes model complexity, and data fit.Gamma controls the kernel function's locality from 0 to infinity.We optimized all model parameters for optimal performance.The Gaussian radial basis function (RBF) outperformed the other two kernel types in performance and stability, according toGao et al. (2022) andZhou et al. (2021).After splitting the data, we trained the SVM classifier (Classifier.libsvm)with the training data.The specification needs more parameters.We used the Radial Basis Function (RBF) kernel type, which has a gamma 0.5 and costs 10.With these metrics, we trained and applied the classifier to GEE.The gamma value controls the kernel spread, model complexity, generalization performance, and classification boundary shape, making it essential to SVM kernel function optimization.A

Fig. 2
Fig. 2 Overall schematic diagram of urban sprawl dynamics analysis using machine learning, the CA-Markov chain, and the Shannon entropy model

Fig. 3
Fig. 3 A spatial and temporal trend of urban and non-urban land use types from 2003-2023

Fig. 4
Fig. 4 a-b.Mean NDVI and NDBI values of the study area from 2003-2023

Fig. 5
Fig. 5 a-c.Spatio-temporal trends of NDVI and NDBI from 2003-2023 the years 2003 and 2023, and it is projected to persist until at least 2033.The central region, where the river is located to the east and south, is primarily the focus of urban area expansion.Notably, a significant portion of the Mbombela region lacks urban infrastructure, with the exception of scattered agricultural dwellings.

Fig. 7
Fig. 7 The change detection results comparing the 2023 and 2033 anticipated land use land cover maps . The results indicated that between 2003 and 2023, the built-up area and vegetation coverage in the study area increased by 37.63 km 2 and 251.31 km 2 , respectively.During the same period, the water body experienced a significant decrease of 50.36 km 2 , a 2.65% annual change, from 80.23 km 2 to 29.87 km 2 .During the same period, the total open (bare) land area decreased by 238.58 km 2 , representing a 12.56% annual change from 963.30 km 2 to 724.73 km 2 .The results generated using the supervised machine learning SVM model ranged from 96 to 99% accuracy.Additionally, the study area's average NDVI values increased consistently between 2003, 2013, and 2023, indicating a gradual increase in vegetation density.The estimated NDBI value indicates urban development, while a lower value indicates vegetation or water bodies in undeveloped areas.Factors such as image capture date and time, sensor characteristics, atmospheric conditions, architecture, and urban morphology influence NDBI values.Furthermore, the CA-Markov chain model predicted the study area's future urban sprawl status.The findings showed that built-up and vegetation cover would increase to 65.62 km 2 and 1525.99 km 2 , respectively, during the period 2023-2033.Conversely, we predicted a decrease in water bodies and open land to 22.42 km 2 and 576.95 km 2 , respectively.To address the shortcomings of this study, future research should examine the socioeconomic effects of urban sprawl dynamics.Despite the limitations, policymakers and planners can use our study's findings for a variety of developmental plans.

Table 1
Characteristics of Landsat images for urban sprawl analysis