Soil quality assessment based on machine learning approach for cultivated lands in semi-humid environmental condition part of Black Sea region

ABSTRACT To manage arable areas according to land resources for future generations, it is crucial to determine the quality of the soils. The main purpose of this study is to identify soil quality for cultivated lands in the semi-humid terrestrial ecosystem in the Black Sea region. Multi-criteria decision-analysis was performed in weighted linear combination approach and standard scoring function (linear-L and nonlinear-NL) integrated with GIS techniques and interpolation models It was tested to predict soil quality index (SQI) values using artificial neural network (SQIANN). The soil quality index values obtained using the linear method ranged from 0.444 to 0.751, while those obtained using the non-linear method ranged from 0.315 to 0.683. As a result, we determined the soil quality indices of cultivation areas. According to our statistical analysis, there were no statistically significant differences between the soil quality index values obtained from SQIL and SQIL-ANN while the same results were found between SQINL and SQINL-ANN. According to the cluster analysis, 98.2% similarity between SQIL and SQIL-ANN, and 99.2% between SQINL and SQINL-ANN was determined. In addition, the spatial distribution maps obtained by both the clustering analysis and the geostatistical analysis showed quite a lot of similarity between SQI values.


Introduction
The increasing population of the world is facing more and more problems on limited agricultural lands to meet the food needs.Soil is a whole that provides a large portion of the food needed by the world's population, serves as a sink for greenhouse gases whose concentration in the atmosphere is constantly increasing with global warming, is an important habitat for living things, contributes to the environmental cycle of nutrients and pollutants, and provides biological diversity (Kopittke et al. 2019;van Es and Karlen 2019;Van Leeuwen et al. 2019).Soils are one of the most important indicators of environmental change.Changes in land use and environmental impacts affect the physical, chemical and biological properties of the soil, thereby affecting its functions, or in other words, its quality (Zhang et al. 2016).Incorrect land use, improper agricultural practices and climate change resulting from global warming negatively impact the productivity of soils (Bakhshandehmehr et al. 2009).Quality soil should have biological, chemical and physical properties

Description of the study area
The study area covers arable lands of Gümüşhane province in the eastern Black Sea region, Türkiye, bordering Bayburt to the east, Trabzon to the north, Giresun and Erzincan to the west.Gümüşhane province' area has approximately 674,714 ha; the study area (arable lands) within the province is about 1291 km 2 , on coordinates 500,000-580000 east and 4,400,000-4520000 north (WGS 84, Zone 37, Universal Transverse Mercator-UTM) (Figure S1).
The study area is at an altitude of 380-3330 meters above sea level and most of the lands in the north, excluding the alluvial lands, are mountainous and rugged terrain, with very steep slopes (>45%).Although, some of the southern regions have very low slopes (Figure S2), Gümüşhane is surrounded by high mountains, including the Zigana-Trabzon Mountains to the north, the Çimen Mountains to the south, the Giresun Mountains to the west, and the Pulur and Soğanlı Mountains to the east.Gümüşhane province's mountainous terrain accounts for 56% of its total land area.
According to data from the Gümüşhane meteorology station, located in the research area (1980 -2021), the annual average precipitation is 453.6 mm and the temperature is 9.7 °C.Bölük (2016) reports that the study area is classified as 'semi humid', with a precipitation activity index of 29.28 points based on the macroclimate regions of Erinc in Turkey.Besides, the Newhall simulation model (Van Wambeke 2000) shows that the study area has xeric (typic xeric in subgroup) soil moisture regime and a mesic temperature regime.
The Gümüşhane is situated in the eastern part of the Pontide Orogenic belt in the northeast Turkey.The primary base rocks found in Gümüşhane and the surrounding area are made up of metamorphic rocks from the Palaeozoic age and Gümüşhane granites which rise by cutting these metamorphic rocks (Taş et al. 2003) provide an abundance of data about geological ages).Granite, granodiorite, and quartz-diorite, Eocene-volcanic facies, undifferentiated Cretaceous, upper Cretaceous, and Flysch make up the majority of the geological units adjacent to the city center and the center of the province of Gümüşhane (Figure S3) (Öztürk et al. 2021).There are nine common great soil groups in the study area; among these, brown soils, brown forest and non-calcareous brown forest soils have the highest distribution at about 68%, followed by basaltic soils, chestnut soils, gray brown podzolic soils, and alluvial soils (Figure S3).According to the World Reference Base for Soil Resources (WRB) (2014), most of the soils in the area are classified as Cambisols, Kastanozem, Leptosol, Alisol, and Podzol, with Fluvisol on alluvial regions.
In the research area based on Corine (2018).45.6% of the total study area is covered with forest and pastureland, while 17.6% is used for cultivation activities.Also, 0.4% of the study area consists of some artificial areas, like continuous urban fabric, discontinuous urban fabric, industrial or commercial units, mineral extraction sites, construction sites, and roads etc.

Soil sampling and analysis
Soil sampling was collected on a total of 319 coordinated points at 0-20 cm depth from cultivation areas between September and October 2018 (Figure S4).In the southern part of the study area, 2 × 2 km grid sampling was carried out on flat lands.Samples were made in the random system to represent the narrow areas in the alluvial lands where intensive agriculture is made in the northern part.Disturbed and undisturbed soil samples were taken from these cultivated areas.To ensure irrigation or fertilizer would not alter soil analysis, sampling was done after harvest.
The parameters and methods examined in the soils are specified in Table S1.

Soil quality assessment
The SQI values were identified by calculating the weights obtained because of the Analytical hierarchical process with parametric and non-parametric standard scoring functions using a linear combination technique.
Standard scoring functions (SSF) given in Table S2 will be used to convert the properties of soil quality indexes into dimensionless values and score them between 0 and 1. Linear and non-linear functions will be evaluated in the scoring functions.Generally, there are three scoring functions used (Karlen and Stott 1994;Wymore 1993).Here, obtaining a high score value for the parameter indicates a positive relationship (more is better) between soil quality and the parameter, and thus a positive SSF is used.In other cases, where a low value (less is better) is desired for a parameter associated with good soil quality, a negative SSF is used.Additionally, soil quality-related parameters are determined using an optimum SSF scoring formula (Armenise et al. 2013).
The Analytic Hierarchical Process (AHP) is a multi-criteria decision-making method in which variables are organized in a hierarchical structure (Saaty 1990).AHP is a measurement method that uses a priority scale of characteristics that are examined through binary comparison, based on the opinions of experts (Saaty 2008).AHP technique is one of the most commonly used multi-criteria decision-making methods due to its simplicity, flexibility, ease of use and ease of interpretation in solving complex decision-making problems (Akıncı et al. 2013).In AHP, the first step is to determine the criteria and sub-criteria that align with the decision maker's goal and create a hierarchical structure.In AHP, the goal is first identified and the other criteria that affect the selection are presented in the direction of this goal.In the AHP approach, -Firstly, the factors that make up a decision problem are identified and arranged in a hierarchical order and numerical values given by Saaty (1977) are assigned to determine the relative importance of each factor in binary comparisons.At each level, the criteria are compared to the criteria of the next level.If there is a direct relationship between the factors, the factors are rated from 1 to 9, and in the case of an inverse relationship, a rating is made from 1/2 to 1/9 according to the expert's opinion (Saaty 1977).
-Secondly, a comparison matrix is created and normalized eigenvectors that give the weight of each factor are calculated (Saaty and Vargas 2001).A significant feature of AHP is that it allows for the determination of inconsistencies with a defined consistency index (Saaty 2000).After determining the distribution of importance (weights %) and evaluating the consistency, the process is completed.

Machine learning approach
Machine learning methods are techniques that can make predictions based on new data using rules learned from data or classify data (Kayhan and İ ̇şeri 2022).Machine learning methods are divided into three main categories based on data sets as supervised (Zhou 2018;Jaiswal et al. 2020), unsupervised (Dike et al. 2018;Sha et al. 2021), reinforcement learning (Lin 1992).
Machine learning methods can vary based on the data set and can also vary based on the goals (Odabas et al. 2013).Artificial neural network used in this study is one of the machine learning methods and belongs to the category of supervised learning.
The arrows between the layers represent the connections between the neurons, which are weighted according to the importance of the input they receive.The weights and biases can be adjusted during the training process to improve the accuracy of the ANN's predictions.During the training of the networks, cases where the models presented the lowest loss value for the validation set (weight set) were recorded.The results were evaluated in terms of mean squared error (MSE) and correlation coefficient (R).MSE measures the performance of the estimator in a model, it is always positive, and it can be said that estimators with an MSE value close to zero perform better.The correlation, also known as R, is another metric used to evaluate the strength of the relationship between the dependent and independent variables in a regression model.
The proposed prediction tool, which is based on ANN, has its operation explained through a flowchart presented in Figure 1.The flowchart provides an overview of the tool's method of operation and includes some key rules for its use.
MATLAB (Matlab ® 7.11.0.584(R2010b)) was used for the artificial neural network (nftool).In this research, 70% of the data was chosen as the training data.The remaining data was split into two parts: 15% for testing and 15% for validation (MSE).The maximum number of iterations was set to 1000 and the tolerance for MSE was set to 0.001.To ensure that all levels of data were adequately represented, the randomization process was repeated until an acceptable distribution of data was obtained, as accuracy of the estimate is heavily influenced by this.In order to prevent overfitting, the algorithm is run iteratively.
In the selection of the most suitable architecture, considering the MSE values, the optimal number of neurons was evaluated as 10, starting with a minimal architecture, which is widely used in many literatures, and improving the capacity of the network by adding neurons (Wu et al. 2014;Qiao et al. 2016;Mehrabi 2021).In the study, the ANN architecture was determined as 28:10:2.Activation function is sigmoid and algorithm is Levenberg-Marquardt.
To analyze the uncertainty of the model, the variance of the 15 predicted values was computed for soil properties.In order to determine the uncertainty of the model, the standard deviation of the predicted values obtained by iteration was determined.It is recommended to perform 10-50 iterations in study (Malone et al. 2017).Standard deviation was calculated according to Malone et al. (2017); Sharififar et al. (2019).Li et al. (2023) and Li et al. (2023) similarly performed uncertainty calculations.

Statistical and geostatistical analysis
In this study, various interpolation techniques (Inverse Distance Weighting (IDW), Radial Basis Functions (RBF), Kriging (Ordinary, Universal, Simple) are applied to evaluate the spatial distribution of SQI.Spherical, Exponential and Gaussian models are used in the Kriging methods.
In IDW estimation, commonly used weight powers (1, 2, and 3) are evaluated (Keshavarzi and Sarmadian 2012).Radial basis functions are a method used for interpolation of multidimensional data.Thin plate spline (TPS), Spline with tension (SPT), Completely regularized spline (CRS), were chosen to evaluate the distribution of SQI.The Root Mean Square Error (RMSE) parameter was considered to evaluate different interpolation techniques.The interpolation process of physico-chemical properties of soil samples was done using the Geostatistical Analysis module in ArcGIS 10.7 v.
Descriptive statistics and binary comparison tests of soil properties were performed in MINITAB 18 package program.In the texture triangle created in R program, the 'soiltexture' package was used.

Soil physico-chemical properties
The descriptive statistics of soil properties are indicated in Table S3.The distribution of texture classes based on changes in the sand, silt, and clay content of the soils is shown in Figure 2. The texture class of the study area soils is generally in the medium-texture group, with the clayey loam, sandy clay loam, sandy loam, and loam classes being detected.The field capacity, wilting point, and available water content of soils with a bulk density content of 1.3-1.5 g cm −3 range from 11.8-42.5%,5.8-30.2%,and 5.4-15.5%,respectively.Yeşilsoy and Aydın (1993) stated that the dry bulk density of sandy soils can reach up to 1.5 g cm −3 , and in a clayey soil, this value can drop to 1.1 g cm −3 .High bulk density values in clayey soils may be a result of compaction (Alaboz et al. 2021).The field capacity and moisture content at the wilting point of the soil change depending on the texture, organic matter, and structure.The water content held in pores increases with the reduction of particle size, increase of organic matter content and improvement of structure (Karahan et al. 2014).Clay soils have a high cation exchange capacity and nutrient binding potential due to the high presence of negative charge sources.Sandy soils, on the other hand, have more macro pores, which allows for faster movement of water and air through the soil profile (Karaman et al. 2007).According to Hazelton and Murphy (2016), soil reaction is found to be between strong acid to light alkaline and organic matter content is determined to be in the 'low to very high' range.
According to Richards (1954), the EC content of the study area soils varied between 'non-salty-very high salty' classes.Soil reaction has a significant effect on the solubility and movement of nutrient elements, soil organisms, plant development, and fixation parameters.Micronutrient elements, except for molybdenum, are usually more available in lower pH levels.Soil bacteria thrive at 6.0-8.0 pH, while mushrooms thrive at 4.0-5.0pH ranges.Excessive salt in the soil solution hinders the absorption of water and nutrient elements by the plant (Karaman et al. 2007).The organic matter content of soils is also one of the important indicators of quality.Organic matter improves the unfavorable physical conditions of a clay soil while increasing the water holding capacity of a sandy soil.It is also important in erosion control.Organic matter also provides nutrient elements to the soil as it decomposes and breaks down.The total nitrogen content of the soils ranges from 0.011% to 0.60%, and the viable phosphorus content ranges from 0.53 to 118.07 mg kg −1 .According to Grewelling and Peech (1960), 62% and 71% of the soils have sufficient total N and viable P content, respectively.The variable Ca, Mg, K and Na contents of the soils were obtained as 441-9675, 38.38-965, 29-1812, 43-457 mg 100 g −1 , respectively.According to Lindsay and Norvell (1978), when the soil micro-element contents were examined, all the soils had sufficient Cu content (<0.2 mg kg −1 ), but 92% had excessive Fe content.71% of the soils' extractable Zn content and 35% of the Mn content were classified as 'sufficient level'.The availability of nutrient elements depends on the soil pH, and the availability of micro-nutrients is higher in slightly acid soils and macro-nutrients in slightly alkaline soils (Kacar and Katkat 2009).A balanced amount of nutrient elements is expected in high-quality soil for optimum plant growth.According to the Soil Pollution Control Regulation (2005), the Pb and Cd contents of the soils do not exceed the expected limit values (300, 3 mg kg −1 ; pH > 6).When the Cu, Zn, and Ni contents were examined, 17%, 22%, and 18% of the soils were found to be above the limit values (140, 300, 75 mg kg −1 , pH > 6), respectively.Heavy metals are divided into two groups according to their roles in metabolic functions.Microelements, such as Fe, Mn, Zn and Cu, are necessary metals that play a role in many physiological processes and are considered necessary in plant cultivation at certain concentrations in the soil.The second group of heavy metals includes metals such as As, Pb, Cd, and Hg that do not have any physiological role in cells and are highly toxic even at low concentrations.The presence of these heavy metals in soil above the limit values set by the regulations poses a significant threat to the environment and human health (Yerli et al. 2020).The distribution of the evaluated parameters has been examined (Table S3).The positive skewness coefficient obtained has a skewed left to right and values are generally lower than the average.In the case of negative skewness coefficient, it is skewed right to left and values are generally higher than the average.Physical soil quality indicators are usually determined with negative skewness coefficients.Also, it has shown a more normal distribution compared to other properties.Especially, the skewness and kurtosis coefficients of nutrient elements and heavy metal concentrations are high, which are due to their dynamic structure.
The coefficient of variation is a descriptive statistical parameter that indicates deviation from mean values.The coefficient of variation is generally found at higher levels in productivity parameters.Differences in the content of nutrients in dynamic structures may be caused by factors such as the main material, agricultural activities, and environmental factors (Şenol et al. 2020).

Evaluation of soil quality
Weight determinations for the four separate hierarchies of soil physical qualities (C1), soil chemical properties (C2), productivity (C3), and pollution (C4) have been created using the analytical hierarchical technique (Table 1).Hierarchy C1 (physical parameters) received the highest value (0.4333), while Hierarchy B3 (productivity) received the lowest value (0.1062) (Table 1).The structure of soil is extremely dynamic and is influenced by a variety of environmental factors.Therefore, when analyzing the characteristics of a soil, many criteria must be considered together.Additionally, since the influencing criteria are not of the same importance, determining the weights and scores of sub-criteria is considered to be a more accurate way to estimate the desired feature.For this purpose, the use of AHP is widely used in the evaluation of soil, which is a complex structure in recent times (Şenol et al. 2020).
1994 The highest contribution of indicators in the C1, C2, C3 and C4 Hierarchies is determined as percentage of soil (0.2960), OM (0.4820) and TN (0.2110) and tNi (0.2938).The texture found in the genetic properties of the soil is in a relatively stable structure and is a parameter that cannot be easily changed (Dengiz and Sarıoğlu 2013).Texture is a highly effective property on subjects such as water and nutrient retention, aeration, and root development, indicating that the contribution rate to soil quality is high.Similar results have also been obtained in other studies (Dengiz and Sarıoğlu 2013;Şenol et al. 2020).The physical properties of the soil significantly affect soil productivity and plant growth under optimum conditions.According to expert opinion, the highest weight in chemical properties is organic matter.The organic matter cycle of the soil is controlled by the activity and size of the microbial mass.Therefore, the soil's biological and biochemical parameters have an important role (Rolda´n et al. 2005).It is known that an increase in organic matter content in the soil is very important in terms of the soil's water holding capacity and nitrogen cycle (Palmer et al. 2017).Organic matter depletion can cause a decrease in cation exchange capacity (Ramos et al. 2018), aggregate stability (Annabi et al. 2007), yield (Kimetu et al. 2008) and therefore soil quality (Alaboz et al. 2021).Nitrogen, which is one of the productivity parameters, is an absolutely necessary macro nutrient element.It is important that the soil has nitrogen in a form and amount that plants can benefit from, as plants meet their nitrogen requirements from the soil.Soils contaminated with heavy metals have weak biological and physico-chemical properties (Lwin et al. 2018).Heavy metal pollution in soil can affect the physical properties of the soil by causing the depletion of organic carbon content and negatively impacting microbial diversity, resulting in weak physical properties such as high bulk density, lower soil porosity and lower water holding capacity (Ramamoorthy 2015).

Assessment of ANN
An artificial neural network is a computational model that is inspired by the way the human brain works.It is made up of layers of interconnected 'neurons', which process and transmit information.The structure of an ANN can vary, but a common structure consists of an input layer, one or more hidden layers, and an output layer.The input layer receives input data, which can be processed by the hidden layers using weights and biases.The hidden layers use activation functions to determine the output of each neuron, which is then passed on to the next layer.The output layer produces the final output of the ANN, which can be used for prediction or classification.
In the results of the study, soil quality index (SQI) values were evaluated separately as linear (SQI-L) and non-linear (SQI-NL) using ANN.The MSE and R values obtained are shown in Table 2. Upon examination of this table, it was found that the MSE was low and the R value was quite high for both linear and nonlinear data.This indicates that the ANN structure produces high accuracy predictions.The linear and nonlinear values were evaluated as training, validation, and testing.The MSE and R values for SQI-NL made more accurate predictions than the MSE and R values for SQI-L.While SQI-NL and SQI-L have the same R (99%) value in terms of training, the validation and test R values for SQI-NL are higher.In Figure 3a, Gradient refers to the slope of a line, and in this context, it is likely referring to the slope of a curve plotted during the training of a machine learning model.The value '1.4184e −05 ' is the numerical value of the gradient at epoch 14.Epoch refers to a full pass through all of the training data in a machine learning model.So, at epoch 14, the model has completed 14 full passes through the training data.The 'training state' of an ANN refers to the current stage of the process of training the network to perform a particular task.ANNs are trained using a dataset that consists of input data and corresponding desired output data.The process of training an ANN involves adjusting the weights and biases of the network's connections in order to minimize the difference between the output produced by the network and the desired output (Figure 3a).
In the results of training an ANN, the 'best validation performance' refers to the performance of the ANN on a validation dataset during the training process.This performance is usually measured using a metric such as MSE or accuracy.The value '0.00011074' is the numerical value of the best validation performance, and 'epoch 8' indicates that this performance was achieved during the 8 th epoch of training.An epoch refers to a full pass through all of the training data.In general, a low value for the best validation performance is generally a good sign, as it indicates that the ANN is making accurate predictions on the validation data.However, the specific meaning of this value will depend on the details of the task being performed by the ANN and the specific metric being used to measure performance (Figure 3b).
A graph that plots the performance of an ANN over time can provide insights into how well the ANN is learning to perform a task.The graph can show if the ANN's performance is improving as training progresses, which can be a good indication of learning.It can also reveal whether the ANN is overfitting to the training data or under fitting, which can be determined by comparing performance on the training data to performance on validation or test data Figure 4.As a result of ANN, the correlation between target and output values of training, validation, testing and all data was calculated as 0.99, 0.98, 0.96 and 0.98, respectively.This indicates that the ANN produced high  accuracy values.The fact that the parameters related to the soil quality included in the model are high is one of the reasons for the high accuracy of the model.Selecting features is also very important for large data sets.Parameters related to the features to be estimated should be included in the models (Alaboz et al. 2021).And, Bhavya et al. (2023) stated that most of the water quality parameters were highly correlated with the actual data in their estimation with ANN.In addition, it has been reported that the performance of ANN can be increased by adding values to the data set later.It has been revealed that estimation will not replace the original analysis, but it can be used to follow the general situation by reducing the sampling frequency.
When the non-linear soil quality index (SQI-NL) values were tested using an ANN, they produced higher accuracy values compared to the linear soil quality index (SQI-L).Upon examining Figure 5a and 5b, it can be seen that a higher performance was obtained.The best validation performance was reached at 191 epochs.
Upon examining the regression graph of the artificial neural network in which the nonlinear soil quality index (SQI-NL) values were predicted, the training, validation, test, and all R values show that predictions were made with 0.99 (Figure 6).According to Li et al. (2022), there are significant correlations between soil density, porosity, and heat conductivity.It has been stated that an ANN performs well in calculating the soil thermal conductivity.It was stated by Alaboz et al. (2021) that estimation with ANN is more successful than linear regression method in estimating crop yield using soil quality values.ANN acquires knowledge and makes choices based on their understanding of comparable occurrences.They possess a numerical capability that enables them to multitask effectively.Fast and reliable predictions with ANN are obtained by Le et al. (2020).ANN can provide important increments in assessing the quality of limited soils, including preventing soil degradation that may occur with rapid data estimation.The structure of ANN is not governed by a set of fixed rules; rather, it is developed through accumulated experience and a process of trial and error (Shen 2018).
ANN stands for artificial neural network, which is a type of machine learning model inspired by the structure and function of the human neural system.ANNs are trained using large datasets and algorithms that adjust the strengths of the connections between neurons in order to minimize the difference between the output produced by the network and the desired output.In this study, the ANN estimated the actual SQI-L and SQI-NL values with high accuracy with the help of machine learning (Table S4).Uncertainties or ambiguous interactions between variables frequently occur in natural systems, such as soil.To enable them to adapt to the real world in such a circumstance, models that can generate more suitable patterns should be adopted (Keshavarzi et al. 2022).
Uncertainty values were determined in the range of 0.000018-0.001147in the nonlinear method and between 0.000013-0.012984in the linear method.The high predictive accuracy of the models has led to the determination of the uncertainty values at very low levels.

Comparison of soil quality
The T-test results of the SQI values evaluated using two different methods are presented in Table 3.The soil quality index values (SQI L ) obtained using the linear method with 28 indicators ranged from 0.444 to 0.751, while those obtained using the non-linear method ranged from 0.315 to 0.683.The T-test results for determining the significance of the difference between the  data sets resulted in p = 0.001.Statistically significant differences were found between the two data sets, with the SQI values taken from the linear method showing lower deviation from the mean.Cluster analysis revealed a 98.2% similarity between SQI L and SQI L-ANN (Figure 7).A comparison of the SQI's obtained by the two methods with the results of ANN-based predictions was also carried out using T-test.Although these data sets showed statistically different changes, the similarity of the data sets (SQI NL and SQI NL-ANN ) was determined to be 99.2% according to the results of the cluster analysis.In the study conducted by Sinha et al. (2014), the soil quality index was determined as 71.1 with the average weighted linear index and 77.7 with the weighted nonlinear index.The results indicated that the effects of linear and nonlinear scoring methods were comparable.Since the nonlinear method shows a high coefficient of variation, it is said that it should be tried in different cultivation.In another study, correlations were high (r > 0.61) between soil quality index determined by linear and non-linear method and using both all data and minimum data set (r > 0.79).It has been shown that nonlinear functions are more sensitive (Bilgili et al. 2017).

Evaluation and spatial distribution of SQI
In the study, different interpolation methods were examined.The performances of these approaches, which have different approaches in spatial distribution, were evaluated.The highest performing models were selected and spatial distribution maps were produced.The distribution of SQI using different interpolation methods has been studied and the RMSE values are given in Table 4.The lowest error for SQI L and SQI L-ANN is found in the exponential semivariogram model of the Simple kriging.The non-linear method has been evaluated with different interpolation methods for real and predicted values and the highest prediction power is found in exponential semivariogram model of the Ordinary kriging.Among the interpolation methods studied, kriging models for soil quality are considered to be the most successful.In another study on soil quality, the most successful model was found to be kriging in the spatial distribution maps of real and decision tree predicted values (Şenol et al. 2020).As the estimation results reflect the reality at a high rate, the spatial distribution maps also showed a distribution with low error rates.In the literature, there are prediction maps that show similar distribution with the data obtained from models with high predictive accuracy (Alaboz et al. 2021;Alaboz and Dengiz 2023).
The distribution maps of the soil quality index for all approached are presentent in Figure 8.In the study results, SQI L and SQI L-ANN approaches showed a similar pattern each other, whereas SQI NL and SQI NL-ANN revealed a consistent pattern each other.This is also obvious from the dendrogram in Figure 7's clustering analysis results.In all soil quality models, It has been determined that while the high quality soils of arable lands distribute on alluvial lands in the south of Gümüşhane province, the low soil quality of agricultural areas located in the mountainous lands in the north part of the province.
Spatial maps are very important for status tracking.Prediction models may not always be as accurate as real analysis (Bhavya et al. 2023).In soil quality studies, the successful predictability of soil quality, especially by including soil physical and chemical properties, will lead to predetermination of the degradation that will occur.

Conclusion
As a result of this study, soil quality index values obtained by linear and nonlinear scoring methods were found to be similar to each other.Actual SQI-L and SQI-NL values were estimated with high accuracy by ANN.In addition, soil quality index obtained by linear and nonlinear scoring functions were found to be statistically similar when compared with the values estimated by ANN.The spatial distribution maps obtained by both the clustering analysis and the geostatistical analysis showed quite a lot of similarity between SQI values.The kriging interpolation model was determined to be the most successful in the spatial distribution of actual and predicted values.
In the study, it was revealed that in addition to the linear method, nonlinear methods can also be used successfully in soil quality determination.In addition, with large datasets, it has been determined that ANN can determine soil quality with a very good performance.It is an advantage that the performance of the model can be strengthened by adding data in prediction with ANN and it can be used in wider areas.In addition, it is very important for sustainable agriculture that the ANN predicts the spatial change of a large area quite well.The estimation spatial distribution maps to be created for the whole region with the data set to be expanded will reveal very important results for sustainable agriculture and economy.
The present study is limited to the semi-humid ecology condition in agricultural lands.The fact that this study was carried out only in agricultural lands limits the study.There may be a reason why ANN is performing well.In future researches, the study areas and land uses should be differentiated with various activation functions and algorithms, and the model should be tested for larger areas.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.The flowchart illustrates the proposed prediction tool based on the ANN.

Figure 3 .
Figure 3. Training state and best validation performance results for linear soil quality index (SQI-L).

Figure 4 .
Figure 4. Regression of network performance for linear soil quality index (SQI-L) values.
Obtained results are considered to have low uncertainty if the model has high predictive accuracy.Esfandiarpour-Boroujeni et al. (2020) estimated the soil classification with different algorithms and determined the error and uncertainty to be higher in the low-accuracy model.

Figure 5 .
Figure 5. Training state and best validation performance results for non-linear soil quality index (SQI-NL).

Figure 6 .
Figure 6.Regression of network performance for non-linear soil quality index (SQI-NL) values.

Figure 8 .
Figure 8. Maps of the soil quality index for SQI L , SQI L-ANN , SQI NL and SQI NL-ANN. .

Table 1 .
Contribution weight of total data set indicators to soil quality calculated by the AHP.

Table 2 .
MSE and R values calculated for L and NL values.

Table 3 .
Results of t-test.

Table 4 .
Interpolation models and RMSE values of SQI L , SQI NL , SQI L-ANN , and SQI L-ANN.