Skip to main content

Fusion of sentinel-1 SAR and sentinel-2 MSI data for accurate Urban land use-land cover classification in Gondar City, Ethiopia

Abstract

Effective urban planning and management rely on accurate land cover mapping, which can be achieved through the combination of remote sensing data and machine learning algorithms. This study aimed to explore and demonstrate the potential benefits of integrating Sentinel-1 SAR and Sentinel-2 MSI satellite imagery for urban land cover classification in Gondar city, Ethiopia. Synthetic Aperture Radar (SAR) data from Sentinel-1A and Multispectral Instrument (MSI) data from Sentinel-2B for the year 2023 were utilized for this research work. Support Vector Machine (SVM) and Random Forest (RF) machine learning algorithms were utilized for the classification process. Google Earth Engine (GEE) was used for the processing, classification, and validation of the remote sensing data. The findings of the research provided valuable insights into the performance evaluation of the Support Vector Machine (SVM) and Random Forest (RF) algorithms for image classification using different datasets, namely Sentinel 2B Multispectral Instrument (MSI) and Sentinel 1A Synthetic Aperture Radar (SAR) data. When applied to the Sentinel 2B MSI dataset, both SVM and RF achieved an overall accuracy (OA) of 0.69, with a moderate level of agreement indicated by the Kappa score of 0.357. For the Sentinel 1A SAR data, SVM maintained the same OA of 0.69 but showed an improved Kappa score of 0.67, indicating its suitability for SAR image classification. In contrast, RF achieved a slightly lower OA of 0.66 with Sentinel 1A SAR data. However, when the datasets of Sentinel 2B MSI and Sentinel 1A SAR were combined, SVM achieved an impressive OA of 0.91 with a high Kappa score of 0.80, while RF achieved an OA of 0.81 with a Kappa score of 0.809. These findings highlight the potential of fusing satellite data from multiple sources to enhance the accuracy and effectiveness of image classification algorithms, making them valuable tools for various applications, including land use mapping and environmental monitoring.

Introduction

Sentinel-2 and Sentinel-1 data have found extensive applications across various scientific disciplines. The uses of Sentinel-2 MultiSpectral Instrument (MSI) data encompass tasks such as distinguishing burned areas, mapping hydrothermally altered minerals, assessing landslide susceptibility, and mapping the extent of mangroves (Huang et al. 2016; Roteta et al. 2019; Hu et al. 2018; Wang et al. 2018). Additionally, Sentinel-2 MSI data have proven valuable for land cover mapping in urban areas. However, a notable challenge arises when classifying dark impervious surfaces and water, as the visual differences between them are not distinct. Consequently, accurate classification using solely Sentinel-2 MSI data becomes problematic, as dark impervious surfaces may be erroneously identified as water, causing confusion (Zhang et al. 2018).

Sentinel-1 Synthetic Aperture Radar (SAR) imagery has been widely employed in various applications, including land cover mapping, flood monitoring, soil moisture retrieval, and rice production estimation (Balzter et al. 2015; Ruzza et al. 2019; Bao et al. 2018). Notably, when the image quality of Sentinel-2 MSI is affected by cloud cover or fog, Sentinel-1 SAR data can serve as a suitable replacement. The timely production of land cover maps using Sentinel-1 SAR images is particularly valuable for effective urban management. However, due to the intricate material distributions in urban areas, generating accurate land cover classifications from Sentinel-1 SAR data remains a challenge. To address this challenge and enhance the classification outcomes, researchers have explored the combination of Sentinel-2 MSI and Sentinel-1 SAR data. This fusion approach has proven useful for vegetation type classification, biomass estimation of mangrove forests, and mapping of burned areas (Erinjery et al. 2018; Castillo et al. 2017; Colson et al. 2018). By leveraging the complementary strengths of both sensors, the integration of Sentinel-2 MSI and Sentinel-1 SAR data offers improved insights and more robust results for these specific applications (Mahdianpari et al. 2019).

Numerous studies have demonstrated that the integration of SAR data with optical data can enhance land cover classifications (Walker et al. 2010; Weng 2012). To achieve higher accuracy in land cover classification, it is essential to synergistically combine Sentinel-1 SAR data with Sentinel-2 MSI data. However, there is a lack of comprehensive research focusing on the fusion of Sentinel-2 MSI and Sentinel-1 SAR data specifically for urban land cover classification. As a result, the effectiveness of fusing these two types of satellite images for urban land cover classification remains an area that requires further assessment and investigation. By evaluating the fusion approach, researchers can gain valuable insights into its potential benefits and limitations, ultimately advancing the capabilities of remote sensing techniques for urban land cover mapping.

Now a days, Google Earth Engine (GEE) is becoming a powerful platform that offers image processing, data fusion, classification, and interactive visualization capabilities for Sentinel-2 MSI and Sentinel-1 SAR data (Vizzari M, 2022). It enables users to perform various image processing tasks, including filtering, geometric corrections, and radiometric adjustments on vast archives of satellite imagery. GEE’s ability to fuse optical and radar data from different sources like Sentinel-2 and Sentinel-1 provides users with a comprehensive view of the Earth’s surface, overcoming the limitations of individual sensors (Tassi and Vizzari 2020). Moreover, it supports the implementation of classification algorithms for land cover mapping, change detection, and time-series analysis, making it a valuable tool for monitoring environmental changes and urban development. The platform’s cloud-based architecture simplifies data access, processing, and interactive visualization, making it an essential resource for researchers, scientists, and developers in the remote sensing community (Gomes et al. 2020).

Support Vector Machine (SVM) and Random Forest (RF) are popular machine learning algorithms utilized extensively in desktop software for image classification (Pal et al. 2013). The integration of these algorithms into the Google Earth Engine (GEE) platform offers the advantage of processing big remote sensing datasets efficiently, eliminating the time-consuming nature of standalone computer software. However, the performance of the machine learning algorithms, specifically Support Vector Machine (SVM) and Random Forest (RF), has not been evaluated yet in producing urban land cover classifications using Sentinel-2 MSI and Sentinel-1 SAR satellite imagery in the study area of Gondar city, Ethiopia. Thus, this research aimed to assess the performance of SVM and RF in generating urban land cover classifications using Sentinel-2 MSI and Sentinel-1 SAR satellite imagery, presenting a novel approach to exploit the capabilities of GEE for large-scale image analysis. The objectives of this research are; (i) to independently classify Sentinel-2 Multispectral Instrument (MSI) data using both Random Forest (RF) and Support Vector Machine (SVM) algorithms; (ii) to independently classify Sentinel-1A Synthetic Aperture Radar (SAR) data using RF and SVM algorithms; (iii) to fuse the classified results from the Sentinel-2 MSI and Sentinel-1 SAR datasets; (iv) to evaluate the performance of the SVM and RF classifiers individually and in combination for the specific urban land cover classification task.

Methodology

Study area

Gondar city is situated in the northwestern part of Ethiopia, with location ranging from 12°10′0" to 12°40′0" North latitude and 37°21′0" to 37°47′30" East longitude. It serves as the capital of the Central Gondar administrative Zone in the Amhara National Regional State (Fig. 1). Founded in 1636 A.D. by Emperor Fasiledes, it once served as the capital of Ethiopia for more than two centuries, featuring numerous historical sites that attract tourists and stimulate the local economy. The landscape of Gondar, characterized by rugged hills and plateaus, contributes to its variable temperatures. Additionally, the town is strategically located on the southern shore of Lake Tana, the source of the Blue Nile (Abay) River, and serves as a promising center for transit of goods and services between Ethiopia and Sudan (Wubneh 2021).

Fig. 1
figure 1

Location map of the study area

Satellite data

For the study, data from two satellite missions were used: Sentinel-2 (MSI) and Sentinel-1 (SAR). Both satellite images were acquired on February 12, 2023. The decision to opt for data acquisition on February 12, 2023, in Gondar city is intricately tied to the region's seasonal nuances. Given that January witnesses the culmination of crop harvesting in neighboring areas, the subsequent month of February experiences a decrease in vegetation density. The Sentinel-2 MSI sensor was equipped with a sophisticated optical system that captured data in multiple spectral bands (Pahlevan et al. 2019). The satellite had a spatial resolution of 10 m for bands 2, 3, 4, and 8, which represented the visible and near-infrared wavelengths. These particular bands corresponded to the visible and near-infrared wavelengths were selected for image classification in the study. For the other bands, which captured red edge, short-wave infrared, and atmospheric data, the spatial resolution is 20 m (Drusch et al. 2012).

On the other hand, the Sentinel-1 SAR mission employed an active radar sensor operating in the C-band. The SAR data provided all-weather and day-and-night imaging capabilities. The spatial resolution of Sentinel-1 SAR varied depending on the operational mode. For the Interferometric Wide Swath (IW) mode, the resolution was approximately 20 m in both range and azimuth directions. In the Strip map mode, the resolution was approximately 5 m in range and 20 m in azimuth (Bauer-Marschallinger et al. 2021). In this study, the Interferometric Wide Swath (IW) mode was used, providing a spatial resolution of 20 m.

Reference data acquisition and accuracy assessment

The study performed a validation of remote sensing data using 169 Ground Control Points (GCP) collected between January 20, 2023, and February 28, 2023. These GCPs were distributed among six land use land cover types. For the “built-up” category, GCPs were obtained from impervious surfaces and roads. Urban forest areas were selected as GCPs for the “forest” category. Barren lands provided reference points for “bare soil” GCPs, while the Angereb Dam was used for “water body” reference points. GCPs for “shrubland” were collected from outskirts with short trees, and urban green areas covered with grasses served as GCPs for “grassland.” The utilization of GCPs from representative locations aimed to validate the remote sensing data and enhance the accuracy of land use land cover classification (Fallati et al. 2017).

Table 1 outlines the number of Ground Control Points (GCPs) collected for validation and Region of Interest (ROI) selection for each LULC class. The “Training ROI” column represents the number of ROI gathered from diverse representative regions, with a total of 300 ROI used as training data for the classification algorithm. Similarly, the “Validation GCP” column indicates that 169 GCPs were set aside solely for validation purposes, independent of the training data. These validation GCPs were used to assess the accuracy of the classification results. The “Training Pixels” column specifies the number of pixels within the training ROIs, with a total of 14,300 pixels used for training the classification model and capturing the spectral signatures of different land cover classes. By utilizing these GCPs and pixels in the GEE platform, the classification algorithm can be trained, and the accuracy of the land cover maps produced from Sentinel-2 MSI and Sentinel-1 SAR data can be evaluated and improved iteratively using the validation GCPs.

Table 1 Description of training ROI, validation ROI, and training pixels used for the study

The rationale for determining the number of Ground Control Points (GCPs) per land use land cover class and the ratio between training and validation GCPs is based on practical considerations to ensure a reliable validation of land use land cover classification. The study utilized 169 GCPs distributed among six land use land cover classes. The selection of GCPs from representative locations (Roads, buildings, urban forest areas, barren lands, the Angereb Dam, outskirts with short trees, and urban green areas) aimed to cover the spectral and spatial diversity within each class, enhancing the accuracy of land use land cover classification (Fig. 2). The specific number of GCPs per class and the training-validation split were likely determined based on a combination of the factors mentioned above to ensure a reliable and accurate validation of the classified image.

Fig. 2
figure 2

Picture of each LULC class; (a) Shrubland, (b) Forest, (c) Agriculture, (d) Grassland, (Water body i.e., Angereb Dam), (Impervious Surface i.e., asphalt road), (g) Building, (h) Bare land

Method of data analysis

Figure 3 outlines the general methodological flow of the research work. The Sentinel 2 MSI and Sentinel 1 SAR data for the year 2023 are first preprocessed in GEE. Radiometric calibration and atmospheric corrections are applied to the MSI data, while terrain corrections are performed on the SAR data. This ensures the data is in a consistent and usable format for subsequent analysis. The two satellite datasets are fused in the GEE platform. Fusion techniques are applied to combine the optical (MSI) and radar (SAR) data, exploiting the complementary information from both sensors to enhance the land cover classification.

Fig. 3
figure 3

Methodological flowchart

Training samples representing different urban land use land cover classes are collected within the study area using GEE tools. These training samples are essential for training the machine learning algorithms and building the land cover classification model. GEE provides access to machine learning algorithms such as Random Forest (RF) and Support Vector Machine (SVM) in its catalog. These algorithms are imported into the GEE environment for use in the land cover classification. Using the training samples, the RF and SVM algorithms are applied to perform the land cover classification. The classifiers are trained on the training samples, and the entire study area is classified based on the derived models.

Validation points, collected separately from the training samples, are imported into the GEE platform. These validation points serve as ground truth data for accuracy assessment. The classified land cover map is compared to the validation points to evaluate the accuracy of the classification. The accuracy of the land cover classification is assessed by comparing the classified results to the validation points. Metrics such as overall accuracy and kappa coefficient are calculated to quantify the classification performance.

After validation and accuracy assessment, the final urban land use land cover map is produced. The classified land cover map is refined based on the accuracy assessment results, and the final output is generated for the entire study area.

Sentinel (sentinel 2 MSI and sentinel 1 SAR) image processing in GEE platform

The image processing workflow for Sentinel-2 MSI and Sentinel-1 SAR data in the Google Earth Engine (GEE) platform involves several key steps. Initially, the relevant satellite imagery is acquired from the GEE data catalog. Preprocessing procedures, such as radiometric calibration for MSI and backscatter values for SAR, are applied to ensure accurate and consistent data. Additionally, atmospheric corrections are performed for Sentinel-2 MSI data, while terrain corrections are conducted for Sentinel-1 SAR data (Filipponi 2019). The images are registered to a common geographic coordinate system for seamless integration. Subsequently, land cover classification is carried out using machine learning algorithms on the processed Sentinel-2 MSI data, while the Sentinel-1 SAR data is analyzed for backscatter intensities and change detection. Results from both analyses are integrated for a comprehensive understanding of the study area. Visualization tools in GEE are utilized to create maps and visual representations.

Classification algorithms

Random forest (RF)

Random Forest (RF) is a widely used ensemble learning algorithm that combines the predictions of multiple decision trees to achieve more robust and accurate results. In the GEE platform, RF is available for both classification and regression tasks (Shelestov et al. 2017). The algorithm starts by creating a diverse set of decision trees through bootstrapping, where random subsets of the dataset are sampled with replacement. For each tree, a random subset of features is considered at each node for splitting, reducing the risk of overfitting and increasing the model's stability. The final prediction in classification tasks is determined by majority voting among the individual trees (Eq. 1).

The mathematical formula for Random Forest can be described as follows:

$$D=\{\left(x1,y1\right),\left(x2,y2\right),\dots .,\left(xn,yn\right)\}$$
(1)

where; D is bootstrapped dataset, xi is a vector of M feature values for the ith sample, and yi is the corresponding class label for classification.

In the Google Earth Engine (GEE) platform, Random Forest (RF) for classification tasks is implemented using the ee.Classifier.randomForest() function(Mahdianpari et al. 2019, 2020). This function allows users to create an ensemble of decision trees, where each tree contributes to the final classification. Users can customize the RF model by specifying parameters such as the number of trees, variables considered for splitting, and the size of bootstrapped samples. Once the RF classifier is defined, it can be applied to Earth Engine Images or ImageCollections using the classify() method, producing accurate land cover classification results. With its seamless integration with Earth Engine's extensive data processing capabilities and cloud-based infrastructure, ee.Classifier.randomForest() simplifies the implementation of RF for large-scale remote sensing applications, providing an efficient tool for generating reliable land cover maps and supporting various environmental monitoring and land management tasks.

Support vector machine (SVM)

Support Vector Machine (SVM) is a powerful machine learning algorithm available in the Google Earth Engine (GEE) platform for classification tasks (Tassi and Vizzari 2020). SVM is designed to find the optimal hyperplane that separates different classes in the feature space. Mathematically, SVM aims to solve the following optimization problem (Eq. 2):

$${\text{D}}\, = \,{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$2$}}\left\| {\text{W}} \right\|^{2} \, + \,{\text{C}}\, * \,\sum {\left( {\max \,\left( {0,\,1\, - \,{\text{y}}_{{\text{i}}} \left( {{\text{W}} \cdot \,{\text{X}}_{{\text{i}}} \, + \,{\text{b}}} \right)} \right)} \right)}$$
(2)

where, w is the weight vector and b is the bias term, which together define the hyperplane. xi represents the feature vector of the ith data point, and yi is its corresponding class label (+ 1 or -1). C is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error.

In the case of non-linearly separable data, SVM employs a kernel trick to map the original feature space into a higher-dimensional space, where the data may become linearly separable. The kernel function K (xi, xj) computes the inner product of the mapped feature vectors in the higher-dimensional space without explicitly calculating the transformation. Common kernel functions include the linear kernel (K(xi, xj) = xi xj), polynomial kernel (K(xi, xj) = (γxi xj + r)d), radial basis function (RBF) kernel (K(xi, xj) = exp(-γ||xi–xj||^2)), and sigmoid kernel (K(xi, xj) = tanh (γxi xj + r)).

In GEE, the SVM classifier is implemented through the ee.Classifier.svm() function, allowing users to create an SVM classifier and specify the kernel type, C parameter, and kernel parameters (Bayable et al. 2023). The SVM classifier can be applied to Earth Engine Images or Image collections using the classify () method to perform land cover classification, producing accurate and reliable results for various environmental and land management applications.

Accuracy assessments

Overall accuracy (OA)

The Overall Accuracy (OA) is a fundamental metric in image classification that assesses the performance of a classification model. It represents the percentage of correctly classified instances, both positive (True Positives) and negative (True Negatives), out of the total instances in the dataset.

$$OA=\frac{TP+TN}{TP+TN+FP+FN}*100$$
(3)

where: TP represents True Positives (correctly classified positive instances); TN represents True Negatives (correctly classified negative instances); FP represents False Positives (negative instances incorrectly classified as positive); FN represents False Negatives (positive instances incorrectly classified as negative).

Kappa coefficient (K)

The Kappa coefficient (often referred to as Cohen's Kappa) is a statistical measure used to assess the level of agreement between the observed classification results and the expected results when classifying data, such as in image classification. The formula for calculating the Kappa coefficient is as follows:

$$K=\frac{{P}_{o}-{P}_{e}}{1-{P}_{e}}$$
(4)

where; κ represents the Kappa coefficient; Po is the relative observed agreement, which is the proportion of actual agreement (the sum of the diagonal elements of the confusion matrix) to the total number of observations; Pe is the expected agreement by chance, which is calculated based on the marginal frequencies of the confusion matrix.

F-score

The F-score, often referred to as the F1-score, is a widely used metric in image classification and other classification tasks. It is a measure of a model’s accuracy that balances both precision and recall. The formula for calculating the F-score is as follows:

$$Fscore=\frac{2(Precision*Recall)}{Precision+Recall}$$
(5)

where: F1 represents the F1-score; Precision (also known as Positive Predictive Value) is the ratio of true positive predictions to the total number of instances predicted as positive. It measures the accuracy of positive predictions; Recall (also known as Sensitivity or True Positive Rate) is the ratio of true positive predictions to the total number of actual positive instances. It measures the model’s ability to identify all positive instances.

Result

Spectral signatures of LULC classes using sentinel-2 MSI bands

Figure 4 delineates the specific spectral signatures of various land use and land cover categories in the visible, near-infrared, and short-wave infrared regions of the electromagnetic spectrum. Forests manifest high reflectance in the near-infrared (NIR) range (1200–1400 nm), while built-up areas exhibit heightened reflectance in both the visible (400–500 nm) and NIR (800–900 nm) bands. Agriculture displays moderate reflectance in the visible (600–700 nm) and NIR (900–1000 nm) ranges, while shrubland has moderate reflectance in the visible (550–650 nm) and NIR (1000–1200 nm) regions. Grasslands feature moderate reflectance in the visible (576–625 nm) and NIR (1100–1200 nm) ranges, and water bodies show low reflectance across the entire spectrum, with specific notations in the visible (450–500 nm), NIR (720–800 nm), and short-wave infrared (1900–2000 nm) bands.

Fig. 4
figure 4

Spectral Signature of different LULC classes from Sentinel 2 MSI bands

SAR backscattering coefficient

SAR backscattering coefficients were generated in GEE platform. Following the preprocessing of SAR data within the GEE platform, SAR backscattering coefficients were computed for the entire city of Gondar. To enhance visualization, the backscattering coefficient values in decibels (dB) were extracted at specific Ground Control Points (GCPs) corresponding to different Land Use and Land Cover (LULC) classes. The bar graph (Fig. 5) displays the SAR (Synthetic Aperture Radar) backscattering coefficients in decibels (dB) for distinct land use and land cover (LULC) classes. Built-up areas exhibit a strong radar reflection with a coefficient of 14.03 dB, while water bodies have a low coefficient of 2.65 dB, indicating minimal radar reflection due to signal absorption. Agricultural areas show a moderate SAR coefficient of 9.88 dB, reflecting an intermediate radar response. Forested regions have a moderate coefficient of 10.91 dB, suggesting a balanced radar reflection due to the scattering of radar signals. Grasslands and shrublands exhibit similar moderate coefficients of 8.64 and 8.39 dB, respectively, indicating their comparable radar response characteristics. These coefficients are essential for applications such as land use classification, environmental monitoring, and disaster assessment, enabling a deeper understanding of radar interactions with different land cover types (Fig. 6).

Fig. 5
figure 5

SAR Backscattering coefficient for different LULC classes

Fig. 6
figure 6

(a) LULC classification from Sentinel 2B MSI using Random Forest (RF) algorithm; (b) LULC classification from Sentinel 2B MSI using Support Vector Machine (SVM) algorithm

RF and SVM performance evaluation in sentinel 2 MSI classification

Table 3 presents the performance evaluation of the Support Vector Machine (SVM) and Random Forest (RF) algorithms in classifying Sentinel 2B MSI data for different land use land cover (LULC) types. The evaluation metrics used to assess the performance of each algorithm include Producer's Accuracy (PA), User's Accuracy (UA), and F-score. PA measures the percentage of correctly classified pixels for each land use type, while UA indicates the accuracy of correctly identifying a specific land use class. The F-score is the harmonic mean of precision and recall and provides a balanced measure of classification accuracy. For the Forest land use type, both SVM and RF achieved high accuracy with PA and UA scores of 1.00, resulting in an F-score of 1.00, indicating perfect classification for this category.

In the Built-up land use type, SVM attained perfect classification with a PA of 1.00 and a UA of 0.40, leading to an F-score of 0.57. However, RF showed lower performance with a PA of 0.38, a UA of 0.38, and an F-score of 0.45. For Agriculture, Shrubland, and Water body land use types, SVM demonstrated good accuracy, with PA values ranging from 0.89 to 0.93 and UA scores between 0.81 and 0.87. The F-scores for these classes were also high, ranging from 0.85 to 0.90. On the other hand, RF’s performance for these classes was slightly lower, with PA values ranging from 0.65 to 0.80, UA scores between 0.65 and 0.80, and F-scores from 0.69 to 0.85. For Grassland, SVM achieved perfect classification with PA, UA, and F-score of 1.00. RF, however, showed a lower performance with a PA of 0.84, a UA of 0.72, and an F-score of 0.61.

The Overall Accuracy (OA) for SVM was 0.87, indicating a high level of overall accuracy in the classification results. The Kappa coefficient, a measure of agreement between observed and predicted classifications, was 0.80, indicating substantial agreement. In contrast, RF's OA was 0.69, suggesting a slightly lower overall accuracy compared to SVM. The Kappa coefficient for RF was 0.357, indicating fair agreement between observed and predicted classifications. This accuracy assessment table (Table 2) provides valuable information about the performance of SVM and RF algorithms in classifying different land use types, which can help in selecting the most suitable algorithm for accurate land use land cover mapping in the study area using Sentinel 2B MSI data (Talukdar et al. 2020).

Table 2 Performance evaluation of SVM and RF in classifying Sentinel 2B MSI data

RF and SVM performance evaluation in sentinel 1A SAR classification

Table 3 presents the performance evaluation of the Support Vector Machine (SVM) and Random Forest (RF) algorithms in classifying Sentinel 1A Synthetic Aperture Radar (SAR) data for different land use land cover (LULC) types.

Table 3 Performance evaluation of SVM and RF in classifying Sentinel 1A SAR data

For the Forest land use type, SVM achieved a PA of 0.40, a UA of 1.00, and an F-score of 0.571. RF performed slightly better, with a PA of 0.62, a UA of 0.67, and an F-score of 0.65. In the Built-up land use type, both algorithms achieved a high UA of 1.00, indicating accurate identification of this class. However, SVM outperformed RF with a higher PA of 0.79 and an F-score of 0.882, while RF showed a PA of 0.80 and an F-score of 0.67.

For Agriculture, Shrubland, and Grassland land use types, both algorithms showed varying levels of accuracy. SVM achieved PAs ranging from 0.41 to 0.63, UAs from 0.78 to 0.92, and F-scores from 0.542 to 0.743. RF's performance for these classes was also mixed, with PAs ranging from 0.56 to 0.80, UAs from 0.47 to 0.80, and F-scores from 0.50 to 0.80. For the Water body land use type, both algorithms demonstrated similar performance, with SVM achieving a PA of 0.74, a UA of 0.69, and an F-score of 0.71, while RF achieved a PA of 0.74, a UA of 0.69, and an F-score of 0.71.

The Overall Accuracy (OA) for SVM was 0.69, indicating a relatively high level of overall accuracy in the classification results. The Kappa coefficient, a measure of agreement between observed and predicted classifications, was 0.67, indicating substantial agreement. For RF, the OA was slightly lower at 0.66, suggesting a slightly lower overall accuracy compared to SVM. The Kappa coefficient for RF was 0.55, indicating moderate agreement between observed and predicted classifications.

In the comparison of classification results between Sentinel 2B MSI and Sentinel 1A SAR satellites using RF and SVM algorithms, the performance based on the Kappa statistics and Overall accuracy metrics were assessed. These metrics provide insights into the agreement between the predicted and reference classifications and the overall accuracy of the classification. Sentinel 2B MSI RF achieved an Overall accuracy of 0.69, indicating that approximately 69% of the pixels were correctly classified. The Kappa statistic for Sentinel 2B MSI RF was 0.35, suggesting a fair but relatively low level of agreement between the predicted and reference classifications (Fig. 7). This indicates that the classification results may have some inconsistencies and may not fully capture the true distribution of land use classes (Vasilakos et al. 2020) (Figs. 8, 9).

Fig. 7
figure 7

Random Forest (RF) and Support Vector Machine (SVM) algorithms performance in classifying Sentinel 2B MSI

Fig. 8
figure 8

(a) LULC classification from Sentinel 1A SAR using Random Forest (RF) algorithm; (b) LULC classification from Sentinel 1A SAR using Support Vector Machine (SVM) algorithm

Fig. 9
figure 9

Random Forest (RF) and Support Vector Machine (SVM) algorithms performance in classifying Sentinel 1A SAR data

RF and SVM performance evaluation in combined (sentine-2MSI and sentinel SAR) classification result

The SVM algorithm achieved an overall accuracy (OA) of 0.69 for image classification using the Sentinel 2B MSI dataset. The Kappa score, a measure of agreement between predicted and actual classes, was 0.357 (Fig. 10). This indicates a moderate level of agreement. However, it’s important to consider other factors, such as the specific application and the desired level of accuracy, when evaluating the performance of SVM with Sentinel 2B MSI data (Figs. 11, 12).

Fig. 10
figure 10

Whisker plot for overall accuracy comparisons of Sentinel 2B MSI and Sentinel 1A SAR using RF and SVM

Fig. 11
figure 11

(a) LULC classification from Combined Satellites (Sentinel 2B MSI and Sentinel 1A SAR) using Random Forest (RF) algorithm; (b) LULC classification from Combined Satellites (Sentinel 2B MSI and Sentinel 1A SAR) using Support Vector Machine (SVM) algorithm

Fig. 12
figure 12

Comparisons of machine learning algorithm in performing Sentinel Image classification

Similar to SVM, the RF algorithm also achieved an overall accuracy of 0.69 when applied to the Sentinel 2B MSI dataset. The Kappa score was 0.357, suggesting a moderate level of agreement. RF is known for its ability to handle high-dimensional data and capture complex relationships between features. These results indicate that RF performs comparably to SVM for image classification using Sentinel 2B MSI data.

When using the Sentinel 1A SAR dataset, the SVM algorithm achieved an OA of 0.69, which is the same as the performance with Sentinel 2B MSI data. However, the Kappa score increased to 0.67, indicating a higher level of agreement between predicted and actual classes. This suggests that SVM might be more suitable for image classification using Sentinel 1A SAR data compared to Sentinel 2B MSI data.

Unlike SVM, the RF algorithm showed a slightly lower OA of 0.66 when applied to the Sentinel 1A SAR dataset. The Kappa score was 0.55, indicating a moderate level of agreement. While RF performed slightly worse than SVM in terms of accuracy, it's worth noting that RF can still provide valuable insights and capture complex patterns in the Sentinel 1A SAR data for image classification tasks.

When the SVM algorithm was applied to the combined dataset of Sentinel 2B MSI and Sentinel 1A SAR, a significantly higher OA of 0.91 was achieved. The Kappa score also increased substantially to 0.80, indicating a high level of agreement. These results suggest that combining the information from both satellites can greatly improve the performance of SVM for image classification tasks.

Similar to SVM, RF also benefited from the combination of Sentinel 2B MSI and Sentinel 1A SAR datasets. RF achieved an OA of 0.81, higher than the individual performance with either Sentinel 2B MSI or Sentinel 1A SAR data. The Kappa score also increased to 0.809, indicating a high level of agreement. These findings highlight the potential of combining satellite data from multiple sources to enhance the accuracy of RF-based image classification algorithms.

In summary, the performance evaluation of the SVM and RF algorithms for image classification using different datasets (Sentinel 2B MSI and Sentinel 1A SAR) was conducted, and the results revealed interesting insights. Both SVM and RF achieved an overall accuracy (OA) of 0.69 when applied to the Sentinel 2B MSI dataset, with a moderate level of agreement indicated by the Kappa score of 0.357. For Sentinel 1A SAR data, SVM maintained the same OA of 0.69 but showed an improved Kappa score of 0.67, suggesting its suitability for SAR image classification. RF, on the other hand, achieved a slightly lower OA of 0.66 with Sentinel 1A SAR data. However, when combining the datasets of Sentinel 2B MSI and Sentinel 1A SAR, both SVM and RF showed significant improvements in performance. SVM achieved an impressive OA of 0.91 with a high Kappa score of 0.80, while RF achieved an OA of 0.81 with a Kappa score of 0.809. These findings emphasize the potential of combining satellite data from multiple sources to enhance the accuracy and effectiveness of image classification algorithms, making them valuable tools for various applications, such as land use mapping and environmental monitoring.

Confusion matrix for combined sentinel 1 A SAR and sentinel 2 MSI datasets

As indicated in Table 4, Water Body stands out as the best-performing class, with a high User Accuracy of 0.71, indicating that a significant proportion of actual water bodies were correctly classified as such. On the other hand, Built-up demonstrates the highest Producer Accuracy (0.68), meaning that a substantial portion of instances classified as Built-up by the model were indeed Built-up areas. Conversely, Shrubland appears to be the poorest-performing class with a User Accuracy of 0.56, suggesting that the model struggled to correctly classify instances within this land use category. Additionally, Grassland has the lowest Producer Accuracy (0.63), indicating a suboptimal overall accuracy in classifying instances for this class. These results indicated that, Water Body and Built-up are the top-performing classes in this combined dataset, while Shrubland and Grassland exhibit comparatively lower classification.

Table 4 Confusion matrix for the combined datasets

Discussion

(Hu et al. 2021) conducted a study on improving urban land cover classification using the combined data of Sentinel-2B Multispectral Instrument (MSI) and Sentinel-1A Synthetic Aperture Radar (SAR) imagery over Wuhan Metropolis, China. They introduced the concept of the Support Vector Machine with Composite Kernels (SVM-CK) approach, which effectively integrates spatial information from the fusion of Sentinel-2B and Sentinel-1A data. The classification results obtained from the fused data showed superior performance, with an overall accuracy (OA) of 92.12% and a kappa coefficient (K) of 0.89, surpassing the results achieved using individual Sentinel-2B MSI and Sentinel-1A SAR imagery.

Similarly, our study highlights the benefits of combining Sentinel-2B MSI and Sentinel-1A SAR data for land use and land cover (LULC) classification in the case of Gondar city, Ethiopia. The SVM algorithm with the combined data exhibited high accuracy in classifying various LULC types in Gondar city, achieving impressive producer accuracy for multiple classes. While the RF algorithm performed well in forest classification, the SVM algorithm outperformed RF in other categories, showing higher user accuracy and comparable F-scores. These findings underscore the potential of combining Sentinel-2B MSI and Sentinel-1A SAR data and utilizing SVM for accurate and comprehensive urban land cover classification. In addition, (Steinhausen et al. 2018) conducted a study on land use and land cover mapping in the cloud-prone monsoon region of the Chennai Basin in India during the Rabi 2015/16 cropping season. The study achieved the highest overall accuracy of 91.53% when combining one Sentinel-2 scene with eight Sentinel-1 scenes, representing a significant improvement of 5.68% compared to using Sentinel-2 data alone. Their findings demonstrate the value of fusing Sentinel-1 and Sentinel-2 data for land use classification in cloud-prone monsoon regions and have important implications for environmental modeling and water resource management in the area.

Studies, conducted by Hu et al. (2018) in Wuhan Metropolis, China, and by Steinhausen et al. (2018) in the Chennai Basin, India, explored the benefits of combining Sentinel-2B Multispectral Instrument (MSI) and Sentinel-1A Synthetic Aperture Radar (SAR) data for land use and land cover classification. (Hu et al. 2018) introduced the Support Vector Machine with Composite Kernels (SVM-CK) approach, achieving superior performance with an OA of 92.12% and a KA of 0.89. In the Chennai Basin, (32) demonstrated the effectiveness of combining Sentinel-1 and Sentinel-2 data, achieving an OA of 91.53% with RF-based classification. Both studies highlight the potential of combining radar and optical data for accurate and comprehensive land use classification in complex environments, providing valuable insights for environmental modeling and resource management.

The novelty of our study lies in its investigation of the combined use of Sentinel-2B MSI and Sentinel-1A SAR datasets for image classification. While previous studies, such as those conducted by (Hu et al. (2018); Hu et al. 2021; Steinhausen et al. 2018), have explored the benefits of fusing these satellite data in specific contexts, this research evaluates the performance of two popular machine learning algorithms, SVM and RF, for image classification using both datasets separately and in combination. The results reveal that SVM and RF achieved moderate accuracies when applied to individual datasets, but a significant improvement in performance was observed when combining the Sentinel-2B MSI and Sentinel-1A SAR data. SVM achieved an impressive OA of 0.91 with high agreement (Kappa score of 0.80), while RF achieved an OA of 0.81 with a slightly lower, yet still considerable, Kappa score of 0.809. Our research paper highlights the potential of leveraging multiple satellite data sources to enhance the accuracy and effectiveness of image classification algorithms, contributing valuable insights for land use mapping and environmental monitoring applications.

Conclusion

In conclusion, the study successfully demonstrated the potential of integrating Sentinel-1A SAR and Sentinel-2B MSI data for improved urban LULC classification in Gondar city, Ethiopia. By combining these datasets and employing machine learning algorithms, namely SVM and RF, the research achieved higher accuracy and agreement in identifying various land cover types compared to using individual datasets. The SVM algorithm showed superior performance in classifying different LULC classes, while RF excelled in accurately identifying forested regions. The results emphasize the importance of leveraging the complementary information provided by optical and radar satellite data to enhance the accuracy and effectiveness of land cover classification in urban areas. The findings have significant implications for urban planning, environmental management, and sustainable development initiatives in Gondar city and other similar regions worldwide.

Moving forward, we recommend further exploration of fusion techniques for combining data from different satellite sources. The study employed a specific fusion technique for integrating Sentinel-1A SAR and Sentinel-2B MSI data, which yielded promising results. However, future research should explore and compare different fusion methods to identify the most suitable approach for specific urban environments. This could include investigating other machine learning-based fusion techniques. Advancements in fusion techniques lead to even more accurate and robust land cover classification results, especially in areas with complex and diverse land cover patterns.

Availability of data and materials

All of the datasets and materials of the study are included in the manuscript.

References

  • Balzter H, Cole B, Thiel C, Schmullius C (2015) Mapping CORINE land cover from sentinel-1A SAR and SRTM digital elevation model data using random forests. Remote Sens 7(11):14876–14898

    Article  Google Scholar 

  • Bao Y, Lin L, Wu S, Kwal Deng KA, Petropoulos GP (2018) Surface soil moisture retrievals over partially vegetated areas from the synergy of sentinel-1 and landsat 8 data using a modified water-cloud model. Int J Appl Earth Obs Geoinf 1(72):76–85

    Google Scholar 

  • Bauer-Marschallinger B, Cao S, Navacchi C, Freeman V, Reuß F, Geudtner D et al (2021) The normalised sentinel-1 global backscatter model, mapping earth’s land surface with C-band microwaves. Sci Data. https://doi.org/10.1038/s41597-021-01059-7

    Article  Google Scholar 

  • Bayable G, Cai J, Mekonnen M, Legesse SA, Ishikawa K, Imamura H et al (2023) Detection of water hyacinth (Eichhornia crassipes) in Lake Tana, Ethiopia, using machine learning algorithms. Water. https://doi.org/10.3390/w15050880

    Article  Google Scholar 

  • Castillo JAA, Apan AA, Maraseni TN, Salmo SG (2017) Estimation and mapping of above-ground biomass of mangrove forests and their replacement land uses in the philippines using Sentinel imagery. ISPRS J Photogramm Remote Sens 1(134):70–85

    Article  Google Scholar 

  • Colson D, Petropoulos GP, Ferentinos KP (2018) Exploring the potential of sentinels-1 & 2 of the copernicus mission in support of rapid and cost-effective wildfire assessment. Int J Appl Earth Obs Geoinf 1(73):262–276

    Google Scholar 

  • Drusch M, Del Bello U, Carlier S, Colin O, Fernandez V, Gascon F et al (2012) Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens Environ 15(120):25–36

    Article  Google Scholar 

  • Erinjery JJ, Singh M, Kent R (2018) Mapping and assessment of vegetation types in the tropical rainforests of the Western Ghats using multispectral sentinel-2 and SAR sentinel-1 satellite imagery. Remote Sens Environ 1(216):345–354

    Article  Google Scholar 

  • Fallati L, Savini A, Sterlacchini S, Galli P (2017) Land use and land cover (LULC) of the Republic of the Maldives: first national map and LULC change analysis using remote-sensing data. Environ Monit Assess. https://doi.org/10.1007/s10661-017-6120-2

    Article  Google Scholar 

  • Filipponi F (2019) Sentinel-1 GRD preprocessing workflow. Proc West Mark Ed Assoc Conf 18(1):11

    Google Scholar 

  • Gomes VCF, Queiroz GR, Ferreira KR (2020) An overview of platforms for big earth observation data management and analysis. Remote Sens. https://doi.org/10.3390/rs12081253

    Article  Google Scholar 

  • Hu B, Xu Y, Wan B, Wu X, Yi G (2018) Hydrothermally altered mineral mapping using synthetic application of Sentinel-2A MSI, ASTER and hyperion data in the Duolong area, Tibetan Plateau, China. Ore Geol Rev 1(101):384–397

    Article  Google Scholar 

  • Hu B, Xu Y, Huang X, Cheng Q, Ding Q, Bai L et al (2021) Improving urban land cover classification with combined use of sentinel-2 and sentinel-1 imagery. ISPRS Int J Geoinf. https://doi.org/10.3390/ijgi10080533

    Article  Google Scholar 

  • Huang H, Roy DP, Boschetti L, Zhang HK, Yan L, Kumar SS et al (2016) Separability analysis of sentinel-2A multi-spectral instrument (MSI) data for burned area discrimination. Remote Sens. https://doi.org/10.3390/rs8100873

    Article  Google Scholar 

  • Mahdianpari M, Salehi B, Mohammadimanesh F, Homayouni S, Gill E (2019) The first wetland inventory map of newfoundland at a spatial resolution of 10 m using sentinel-1 and sentinel-2 data on the google earth engine cloud computing platform. Remote Sens. https://doi.org/10.3390/rs11010043

    Article  Google Scholar 

  • Mahdianpari M, Salehi B, Mohammadimanesh F, Brisco B, Homayouni S, Gill E et al (2020) Big data for a big country: the first generation of Canadian wetland inventory map at a spatial resolution of 10-m using sentinel-1 and sentinel-2 data on the google earth engine cloud computing platform. Can J Remote Sens 46(1):15–33

    Article  Google Scholar 

  • Pahlevan N, Chittimalli SK, Balasubramanian SV, Vellucci V (2019) Sentinel-2/Landsat-8 product consistency and implications for monitoring aquatic systems. Remote Sens Environ 1(220):19–29

    Article  Google Scholar 

  • Pal M, Maxwell AE, Warner TA (2013) Kernel-based extreme learning machine for remote-sensing image classification. Remote Sens Lett 4(9):853–862

    Article  Google Scholar 

  • Roteta E, Bastarrika A, Padilla M, Storm T, Chuvieco E (2019) Development of a sentinel-2 burned area algorithm: generation of a small fire database for sub-Saharan Africa. Remote Sens Environ 1(222):1–17

    Article  Google Scholar 

  • Ruzza G, Guerriero L, Grelle G, Guadagno FM, Revellino P (2019) Multi-method tracking of monsoon floods using sentinel-1 imagery. Water. https://doi.org/10.3390/w11112289

    Article  Google Scholar 

  • Shelestov A, Lavreniuk M, Kussul N, Novikov A, Skakun S (2017) Exploring google earth engine platform for big data processing: classification of multi-temporal satellite imagery for crop mapping. Front Earth Sci 1(5):1–10

    Google Scholar 

  • Steinhausen MJ, Wagner PD, Narasimhan B, Waske B (2018) Combining sentinel-1 and sentinel-2 data for improved land use and land cover mapping of monsoon regions. Int J Appl Earth Obs Geoinf 1(73):595–604

    Google Scholar 

  • Talukdar S, Singha P, Mahato S, Shahfahad PS, Liou YA et al (2020) Land-use land-cover classification by machine learning classifiers for satellite observations-A review. Remote Sens. https://doi.org/10.3390/rs12071135

    Article  Google Scholar 

  • Tassi A, Vizzari M (2020) Object-oriented lulc classification in google earth engine combining snic, glcm, and machine learning algorithms. Remote Sens 12(22):1–17

    Article  Google Scholar 

  • Vasilakos C, Kavroudakis D, Georganta A (2020) Machine learning classification ensemble of multitemporal sentinel-2 images: the case of a mixed mediterranean ecosystem. Remote Sens. https://doi.org/10.3390/rs12122005

    Article  Google Scholar 

  • Vizzari M (2022) PlanetScope, sentinel-2, and sentinel-1 data integration for object-based land cover classification in google earth engine. Remote Sens. https://doi.org/10.3390/rs14112628

    Article  Google Scholar 

  • Walker WS, Kellndorfer JM, Kirsch KM, Stickler CM, Nepstad DC, Stickler CM et al (2010) Large-area classification and mapping of forest and land cover in the Brazilian amazon: a comparative analysis of ALOS/PALSAR and landsat data sources. IEEE J Sel Top Appl Earth Obs Remote Sens 3(4):594–604

    Article  Google Scholar 

  • Wang D, Wan B, Qiu P, Su Y, Guo Q, Wang R et al (2018) Evaluating the performance of sentinel-2, landsat 8 and pléiades-1 in mapping mangrove extent and species. Remote Sens. https://doi.org/10.3390/rs10091468

    Article  Google Scholar 

  • Weng Q (2012) Remote sensing of impervious surfaces in the urban areas: requirements, methods, and trends. Remote Sens Environ 15(117):34–49

    Article  Google Scholar 

  • Wubneh M (2021) Urban resilience and sustainability of the city of Gondar (Ethiopia) in the face of adverse historical changes. Plan Perspect 36(2):363–391

    Article  Google Scholar 

  • Zhang H, Li J, Wang T, Lin H, Zheng Z, Li Y et al (2018) A manifold learning approach to urban land cover classification with optical and radar data. Landsc Urban Plan 1(172):11–24

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude to the European Space Agency (ESA) for providing access to Sentinel images free of charge, enabling valuable data for this research. We extend our appreciation to the dedicated field data collectors whose efforts and commitment contributed significantly to the success of this study. Additionally, we acknowledge the collaboration and support received from the Gondar City Administration Office for their valuable assistance in the field data collection process, which greatly enriched the quality and reliability of our findings. The contributions of these organizations and individuals have been instrumental in the preparation of this manuscript and are deeply appreciated.

Funding

No funding is received for this research work.

Author information

Authors and Affiliations

Authors

Contributions

SS was responsible for the study's conceptualization, design, data collection, and analysis. HH played a key role in interpreting the data and developing the research's synthesis methodology. AT and AA were involved in the validation of LULC classification results. YBD was responsible for the final manuscript language editing. All authors collaborated on drafting, refining, and finalizing the manuscript, granting their approval for its submission.

Corresponding author

Correspondence to Shimelis Sishah Dagne.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dagne, S.S., Hirpha, H.H., Tekoye, A.T. et al. Fusion of sentinel-1 SAR and sentinel-2 MSI data for accurate Urban land use-land cover classification in Gondar City, Ethiopia. Environ Syst Res 12, 40 (2023). https://doi.org/10.1186/s40068-023-00324-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40068-023-00324-5

Keywords