Prediction of the soil properties of Malagasy rice soils based on the soil color and magnetic susceptibility

ABSTRACT Accurate assessments of soil properties are required to improve fertilizer management practices for crop production. Conventional chemical analysis in the laboratory is costly and time-consuming. Soil color is related to different soil compositions, while soil magnetic susceptibility (MS) has been found to reflect the abundance of magnetic minerals relevant to soil properties. Improving proximal sensing techniques for the analysis of soil color and MS provides opportunities for affordable and rapid assessments of soil properties. The aim of this study was to evaluate the potential use of soil color parameters and MS values to predict soil properties using stepwise multiple linear regression (SMLR), random forest (RF), and nonlinear regression approaches in lowland and upland fields in the central highlands of Madagascar. The target properties included the contents of soil organic carbon (SOC), total nitrogen (TN), oxalate-extractable phosphorus and iron (Feox), and the soil texture. The model prediction accuracy was assessed using the coefficient of determination (R2), root-mean-square error (RMSE), and the ratio of performance to interquartile distance (RPIQ). The use of soil color parameters yielded an acceptable prediction accuracy of the Feox content (loge Feox) for all rice fields (R2 = 0.54, RMSE = 0.55, RPIQ = 1.70) using the RF algorithm, while the SMLR approach gave the most accurate prediction for upland fields with acceptable reliabilities for SOC, Feox, and clay and sand content prediction, with R2 ranging from 0.43 to 0.67 and RPIQ from 1.63 to 1.77. In lowland fields, TN content was predicted with acceptable accuracy (R2 = 0.34, RMSE = 0.49, RPIQ = 1.71) using SMLR with the color parameter. The combination of the soil color parameters with the MS value as predictor variables increased SOC prediction for lowland fields using the RF approach (R2 = 0.57, RMSE = 6.37, RPIQ = 1.96). Use of the soil color and MS parameters was revealed to be a promising way to simplify the assessment of soil properties in upland and lowland ecosystems by using RF and SMLR approaches. A combined use of the soil color and MS parameters improved the prediction accuracy for the SOC content.


Introduction
Accurate soil property assessments are essential for decisionmaking regarding agricultural practices and environmental management. The lack of knowledge on soil properties leads to inappropriate crop system management including the inefficient use of fertilizers, resulting in high costs, negative effects on the environment and especially on the soil quality, as well as low yields. In addition, conventional laboratory analyses of soil properties are costly and time-consuming (Al-Hamed et al. 2014;Camargo et al. 2016). Rapid and affordable methods are needed to simplify soil property assessments and improve crop management practices. Different proximal soil sensing methods were developed in soil survey including Gamma-and X-rays, magnetic-, gravity-, and seismic-sensors, ultraviolet, visible, and infrared spectra, color sensors, and ion-selective potentiometry (Viscarra Rossel and Adamchuck 2013).
Color sensor technology provides a promising simplified approach for the evaluation of soil properties. The soil color is considered to be a good indicator of the soil characteristics because it provides information about the soil composition and other properties (Viscarra Rossel et al. 2006;Ibanez-Asensio et al. 2013). The potential of the use of the soil color for a rapid and low-cost assessment of the soil properties has been reported in a recent study (Mikhailova et al. 2017). The Munsell color chart is commonly used to determine the soil color because of its ease of use and availability. However, the measured color dataset strongly depends on the light conditions and observer (Moritsuka et al. 2019). In addition, it is difficult to use the qualitative variables obtained from this method for statistical analyses (Barret 2002;Konen, Burras, and Sandor 2003). Based on the use of chroma meters, the soil color can be accurately determined with a quantitative color space model such as the Commission Internationale de l'Eclairage (CIE) Lab system (Viscarra Rossel et al. 2006) developed in 1976. The CIE Lab system is based on three coordinates including L* representing the lightness, a* representing the redness, and b* representing the yellowness. This color system is more appropriate for color data processing in multivariate analyses (Yang et al. 2001;Spielvogel, Knicker, and Kögelknabner 2004). Previous studies using this color system were mostly focused on predictions of soil organic matter contents or the soils in temperate regions (Aitkenhead et al. 2013;Mikhailova et al. 2017). Temperate soils were generally characterized by high organic matter content. The presence of this organic matter and particular clay minerals in sufficient quantity contribute strongly to the overall soil color suggesting the potential use of the soil color characteristics and proxy indicators of these elemental concentrations (Aitkenhead et al. 2013). Tropical soils are known as acid soils with low organic carbon and phosphorus content, and high iron and aluminum (hydro)oxides (Uexktill and Mutert 1995;Nishigaki et al. 2019;Rakotonindrina et al. 2020). The impact of the soil color characteristics as indicators/predictors of these physical and chemical characteristics of soils was insufficiently studied in tropical context. Among these soil properties, the texture is reported to be an important parameter determining crop productivity or responses to fertilizer inputs (Asai, Saito, and Kawamura 2021;Zingore et al. 2007). Nitrogen and phosphorus are the most important elements for plant growth (Leghari et al. 2016;Malhotra et al. 2018). Iron is one of the main adsorption agents for phosphate and plays a crucial role in the bioavailability of phosphorus (Jiang et al. 2015). The soil organic carbon (SOC) content reflects the major nutrients in the soil, while soil texture is responsible for the infiltration and water retention (Moral and Rebollo 2017;John et al. 2020) in soil in the rice fields of the central highlands in Madagascar. In this area, the rice yield is low and most farmers use small amounts of fertilizer, particularly in lowland fields .
The magnetic susceptibility (MS) is another potential method to determine soil attributes (Siqueira et al. 2010). The MS is mainly affected by the presence of ferromagnetic and ferrimagnetic minerals in the soil (Mullins 1977;Preetz et al. 2017). In previous studies, the MS was used for different purposes such as the measurement of soil contamination by heavy metals (Cervi, Costa, and Souza 2014), assessment of soil degradation (Sadiki et al. 2009a), and evaluation of the soil properties (Jiménez et al. 2017;Moritsuka et al. 2021). In general, the MS values obtained from magnetic sensor were used in prediction model as predictor variables where the relationship between an MS value and target soil properties was analyzed. However, the correlations between the MS and properties of soils in agricultural areas, particularly in tropical weathered soils, have rarely been studied.
The overall objective of this study was to investigate the performances of soil color sensors and MS in predicting the soil properties in Malagasy highland rice fields using modeling approaches. Lowland fields are characterized by flooding during the growing season, whereas upland soils are rainwater-dependent. The soil properties of these fields vary geospatially from one watershed to another, and also from one plot to another depending on farmers' management practices. The aims of this study were (1) to evaluate the potential performances of soil color parameters and MS values, which were used separately and in combination, in predicting the soil properties in lowland and upland rice fields, and (2) to determine the most accurate modeling approach for each soil property. This study is a preliminary demonstration of the potential of the use of color parameters and MS values to predict soil properties in Malagasy soil.
Soil samples were collected at a depth of 0 to 15 cm in 240 farming plots in lowland (n = 199) and upland (n = 41) rice systems from the year 2018 to the year 2019 (Table 1). In each plot, soil samples were obtained from five diagonal sample points after harvest.

Laboratory analysis
The soil samples (n = 240) were air-dried, ground, and sieved to obtain 2 mm and 0.2 mm fractions prior to the soil analysis. Subsequently, the SOC (g kg -1 ), oxalate-extractable P (P ox , mg kg -1 ), oxalate-extractable iron (Fe ox , mg kg -1 ), and total nitrogen (TN, g kg -1 ) contents were analyzed using the 0.2 mm sample fraction, while the texture (clay, sand, and silt; %) was determined using the 2 mm sample fraction. The SOC content was determined based on extraction by dichromate oxidation (Walkley and Black 1934). The P ox and Fe ox contents were determined using oxalate extraction according to Schwertmann (1964). The P ox concentration was determined using the malachite green colorimetric method (Van Veldhoven and Mannaerts 1987), whereas the Fe ox concentration was analyzed with atomic absorption spectrometry. The TN concentration was determined with a continuous analyzer after acid mineralization based on a method adapted from Rabeharisoa et al. (2012).
The volumetric pipette procedure was used to separate the soil fractions for the particle size analysis. The soil samples were pretreated with heat and H 2 O 2 (35%) to remove organic matter and then dispersed using NaOH.

Soil color and magnetic susceptibility measurements
Soil color parameters were measured with a CR-20 color reader (Konica Minolta, Osaka, Japan) and the CIE L*a*b* color system. The CIE Lab system can be used to express all natural colors (Gunal et al. 2008) and be applied to numerical and statistical analyses (Chen et al. 2018). The colors specified in the CIE Lab colorimetric space are based on the following three coordinates: L* represents the lightness ranging from 0 (dark/black) to 100 (light/white); a* axis represents the redness ranging from red (+a) to green (-a) color; and b* represents the yellowness ranging from negative values for blue (-b) and positive values for yellow (+b).
The 2 mm soil sample fraction was spread on petri dishes with a diameter of 85 mm and pressed to form a ~ 1 cm thick layer. The chroma meter was calibrated using a standard white plate (white reference) before the measurement of each sample series. Five measurements were performed on each sample to determine the average value of each parameter (L*, a*, and b*) for a given sample.
The MS of the soil was measured using the KT-10 Terraplus tool (Terraplus Inc., Richmond Hill, Ontario, Canada). The MS values are expressed as 10 -3 SI. Prior to the measurement, the accuracy of the sensor tool was tested by measuring an MS standard with a known value (26.3 × 10 −3 SI). Subsequently, sieved soil samples (2 mm, 100 g) were placed in a plastic box with a diameter of 8 cm and the soil surfaces were scanned five times. The samples were mixed between measurements (Moritsuka et al. 2021). The MS values obtained for each sample were averaged.

Statistical analysis and the development of the prediction models
All data analyses were performed using R software version 3.6.3 (R Core Team 2020). The soil color parameters (L*, a*, and b*) and MS values were used as explanatory variables for the prediction of the SOC, TN, P ox , Fe ox , clay, silt, and sand contents.
Descriptive statistics, including the minimum, maximum, mean, standard deviation (SD), and coefficient of variation (CV), were calculated for all data to determine the variability of the dataset. Pearson correlation tests were performed between the response and explanatory variables. Stepwise multiple linear regression (SMLR), random forest (RF), and nonlinear regression (NLR) were used to predict the physicochemical soil properties. The prediction models were built based on the L*, a*, and b* parameters. Subsequently, the MS value was used to improve the model performance. The combination of the color parameters and the MS value involves using them together as explanatory variables in a single model. The objective of this combination is to increase the variance described by the model by adding another explanatory variable to the model based on the color parameters.
The SMLR is a multivariate regression analysis in which the least significant explanatory variables are removed by a stepwise variable selection procedure based on F-or t-tests (Mohamed et al. 2018). The RF is a machine learning approach for classification or regression based on the construction of a multitude decision tree using a random selection of variables (Breiman 2001). Each regression tree was developed based on the bootstrap samples of the data and  a random subset of predictors was applied to fit each tree (Raeesi et al. 2019). The number of trees was set to 500 in this study and the minimum number of nodes (5) was based on the default node size of the 'RandomForest' package.
The NLR is a second-or third-degree polynomial equation, which depends on the accuracy of the model. Logtransformed Fe ox data with natural logarithm (log e ) were used, without a normal distribution. The K-fold cross-validation procedure was applied to all models using k = 10 folds. In the procedure, the dataset was divided into 10 subsets. The model was trained using nine subsets and tested with the remaining one, ten times. The final model was obtained from the average of all results. Based on this approach, all available characteristics of the soil samples are used to create the model (Shiri et al. 2020).
The accuracy of the model was assessed using the coefficient of determination (R 2 ), root-mean-square error (RMSE), and ratio of the performance to the interquartile distance (RPIQ). The RPIQ is the ratio of the interquartile (IQ = Q3 -Q1) of the measured data to the RMSE, where Q1 and Q3 are the 1 st and 3 rd quartiles of the measured data. The RPIQ value is more useful in assessing the quality of the prediction model for soil properties because soil sample sets generally have a skewed distribution (Bellon-Maurel et al. 2010).
A good model has a higher R 2 and a lower RMSE. Based on the R 2 values, the accuracy of the prediction can be categorized such as Good ( acceptable RPIQ = 1.6-2.0; and excellent RPIQ ≥ 2.0. RPIQ was used to compare the robustness of models constructed from different sample sizes between lowland and upland. RPIQ takes into account both prediction error and observed variability, provides a more objective measure of model effectiveness than RMSE, and can be more easily compared across model validation studies. The higher the RPIQ, the better the predictive power of the model.

General description of data
The physicochemical soil properties, L*, a*, and b* color parameters, and MS values of the studied soil samples are listed in Tables 2 and 3. In general, the coefficient of variation (CV) values of the SOC, TN, clay, silt, and sand contents (20%-39%) were similar, but large variabilities in the P ox and Fe ox contents (46%-125%) were observed.
The mean values of the soil variables for the lowland fields were slightly higher than those of the samples from the upland fields. The mean value of SOC content was 24.8 g kg -1 in lowland fields and 20.4 g kg -1 in upland fields ( Table 2). The TN ranged from 0.5 to 3.6 g kg -1 in lowland fields and from 1.2 g kg -1 to 2.3 g kg -1 in upland fields. The average P ox content in low-and upland fields was 58.8 mg kg -1 and 42.8 mg kg -1 , respectively. The mean Fe ox value exhibited the largest spatial variation and was six times higher in the lowland than in the upland. The soil texture distribution was similar in both lowland and upland fields. The lowland samples contained 32.2% clay and 40.3% sand on average, whereas the upland samples had clay and sand concentrations of 39.6% and 41.1%, respectively.
The average MS value of samples from the upland field was almost three times higher than that of the samples from the lowland field. The MS had a greater coefficient of variation (88.3%) than soil color parameters (9.5-26.6%). The CV values for both MS and color parameters were greater for the lowland soils compared with those for the upland soils.
The correlations among the soil properties, color parameters, and MS are shown in Table 4. The SOC and TN contents were negatively correlated with L* (P < 0.001) for both upland and lowland fields. Negative correlation was also observed both between SOC and TN contents, and a*, b* (P < 0.001) and MS (P < 0.05) for all combined data and in the lowland fields.
The P ox content positively correlated with a* (P < 0.05), b* (P < 0.05), and MS (P < 0.05) in the lowland field and negatively correlated with b* (P < 0.005) in the upland field. The Fe ox content showed negative correlations with L* (P < 0.001), a* (P < 0.05), and b* (P < 0.05) with the data of the lowland field soils and a negative correlation with MS (P < 0.001) with the combined data. The clay content was negatively correlated with L* (P < 0.001) and b* (P < 0.05) in the lowland field and positively correlated with a* (P < 0.001) in the upland field and with MS (P < 0.001) for the combined data. Regarding the correlations between the soil color parameters and MS values, a strong positive correlation was observed between a* and MS with the combined data and at each land use scale (r < 0.52, P < 0.001).

Models for the prediction of the soil properties
The most accurate prediction of the target soil properties and the performance of three approaches for their predictions are summarized in Table 5. The results showed the potential of soil color parameters to predict, with acceptable accuracy, the SOC contents of upland soils, TN contents of lowland soils, Fe ox contents of the combined lowland and upland soils, and the clay and sand contents of upland soils. Combining soil color and MS parameters increased the prediction accuracy for the SOC content of lowland fields to an acceptable level (Figure 2).
The prediction of soil properties with the combined data (combined lowland and upland data, n = 240) was effective for the prediction of Fe ox and clay contents, with an acceptable prediction accuracy (RPIQ ≥ 1.6). Building the prediction model at the lowland scale (n = 199) improved the prediction accuracy for SOC and TN with R 2 values of 0.57 and 0.34, respectively, and the corresponding RPIQ values of 1.96 and 1.71, respectively (Figure 2). In contrast, modeling soil properties using color parameters at the upland scale (n = 41) increased the prediction accuracy for SOC, Fe ox , clay, and sand contents. Here, the prediction models resolved 67% and 47% of the variation of SOC and Fe ox contents, respectively, with RPIQ values ranging from 1.63 to 1.74. However, despite this good prediction of Fe ox content, the relationship between the observed versus predicted values of Fe ox content demonstrated an underestimation of the predicted values with an observed value of log e Fe ox > 7 and an overestimation of predicted values with an observed value of log e Fe ox < 7 ( Figure 3). In addition, the range value of predicted log e Fe ox was small compared to the range value of observed log e Fe ox . This could suggest the need for an improvement in the prediction of the Fe ox content. The prediction model   Figure 2. Relationships between observed and predicted values of SOC (g kg-1) and TN (g kg-1) at lowland and upland field with the best prediction models such as random forest (RF), Stepwise multiple linear regression (SMLR), using soil color parameters or combination of soil color parameters and MS value specified in bracket ().
for clay and sand contents of soils in upland fields explained 43% and 51% of their total variances, respectively, with acceptable reliability of the prediction (RPIQ = 1.67 and 1.77, respectively)( Figure 4). Among the used approaches, the RF approach yielded acceptable prediction accuracy for SOC in lowland fields (RPIQ = 1.96) and Fe ox in all data with RPIQ = 1.70. Here, the RF model resolved 57% and 54% of the variations in the SOC and Fe ox contents, respectively. The SMLR approach improved the model performance at an acceptable level for SOC, Fe ox , clay, and sand contents in upland fields, and TN in lowland, with RPIQ values ranging from 1.63 to 1.77.

Predictor variables generating the best prediction models
The results showed that the soil color parameters can be used to predict SOC contents, while combining soil parameters and MS values improved the prediction accuracy of the model (Figure 2). This ability of target predictor variables to estimate SOC contents is related to the significant correlation between SOC, soil color parameters, and MS data in our dataset. The results of previous studies demonstrated that L* is the most important color parameter with respect to organic matter variation, particularly organic carbon (Mikhailova et al. 2017;Figure 3. Relationships between observed and predicted values of Pox (mg kg-1) and Loge Feox at lowland and upland field with the best prediction models such as random forest (RF), Stepwise multiple linear regression (SMLR), using soil color parameters or combination of soil color parameters and MS value specified in bracket (). Vodyanitskii and Savichev 2017;Chen et al. 2018). The organic matter induced the dark color of the soil according to the level of its concentration (Spielvogel, Knicker, and Kögel-knabner 2004). However, Liles et al. (2013) confirmed that the soil carbon can be more accurately predicted by including a* and b* in the model. The potential of the MS for the improvement of the SOC content prediction is related to the significant negative correlation between the SOC content and MS. This agrees with previous results which reported that the soil total carbon content and SOM negatively correlate with the MS (Sokołowska et al. 2016;Grison et al. 2017). The SOM is a natural factor that induces a decrease in MS (Thompson and Oldfield 1986). Our results were in line with those of the previous studies showing that soil color parameters (L*, a*, b*) and MS values can be used to estimate SOC content both in lowland and upland rice fields in the highlands of Madagascar.
Based on the RPIQ values when using the soil color parameters, the prediction of the other targeted soil properties (excluding SOC) yielded acceptable reliabilities for the prediction model (Table 5) depending on the data (all, lowland, and upland data). These results suggested that soil color parameters could be suitable to estimate soil properties such as TN, Fe ox , clay, and sand contents in the Malagasy highland rice fields. Our results for TN are in line with those of previous studies that showed a strong correlation between TN and soil color parameters L*, a*, and b* where the nitrogen content increases with decreasing soil lightness (Ibanez-Ansensio et al. 2013;Mikhailova et al. 2017).
With respect to Fe ox , acceptable reliabilities of Fe ox prediction were observed for all combined data and upland fields but not for lowland fields. This suggests that these soil color parameters can explain the variation of Fe ox status for all combined and upland data. Gunal et al. (2008) reported that the parameter +a* is a good indicator of red iron oxide, whereas parameter +b* is a good indicator of iron oxide with any color. Another study reported that strong correlation between Fe ox content and organic matter might hamper the prediction of Fe ox by soil color parameters (Moritsuka et al. 2014). The low reliability in Fe ox prediction using soil color for the soils in lowland fields might be attributed to the influence of organic matter. A significant correlation was observed between the Fe ox and SOC (r = 0.40, p < 0.001) and Fe ox and TN (r = 0.41, p < 0.001) contents (Table A1). The value of a* parameter demonstrated that upland soil was more reddish than lowland soil (Table 3) with significant differences between the values of a* (p < 0.001, data not shown) for lowland and upland fields, suggesting the predominance of hematite (Fe 2 O 3 ) in upland compared to lowland fields, which is the major source of the red color of the soil (Fritsch et al. 2005).
For clay and sand contents, the most accurate predictions were observed with the use of L* and a* color parameters in the upland field. With respect to clay content, the significant correlation between clay and the a* color parameter yielding acceptable reliability of the prediction model for the upland dataset was in agreement with the observations made in a previous study which reported that clay content significantly correlates with a* and b* (p < 0.01) (Gunal et al. 2008). Considering the correlations between the soil color parameters and sand contents, our results are slightly different from those of previous studies where the sand content exhibits a weak correlation with the soil color parameter or good correlation with b* (r = 0.46**) (Gunal et al. 2008;Akbas 2014). The significant positive correlation between the sand content and L* is likely attributed to the predominance of quartz. The predominance of quartz in sand fraction is generally reported in tropical weathered soils due to its high resistance to weathering (Silva et al. 2018;Ajiboye et al. 2019). The sand fraction in the soil samples of this study also exhibited mostly white color which is presumably associated with the dominance of quartz. The prediction accuracy of P ox was poor regardless of the model approach and the dataset ( Figure 3). The correlations between the P ox contents and soil color parameters or MS are insignificant for all datasets, suggesting that the P ox content is not directly related to the soil color and is difficult to predict with these parameters. Aitkenhead et al. (2013) reported a similar result in which they showed that the prediction of the total P content using a soil dataset from the National Soils Inventory for Scotland (NSIS) based on the use of the color parameters of the Red, Green, Blue (RGB), and CIE Lab systems were unsuccessful.

Model performance
The most accurate of all prediction was the one for the SOC content in lowland fields, where the RF model demonstrates the most acceptable reliability of prediction based on the combination of the soil color data and MS values. This result of RF model performance agrees with the study of Raeesi et al. (2019) who reported excellent reliability of the RF model for the prediction of soil organic matter (SOM) based on soil color data compared with linear regression. In contrast to SMLR and NLR approaches, the RF approach can handle both linear and nonlinear data correlations (Breiman 2001;Poppiel et al. 2020). Therefore, potential positive correlations between the SOC content and soil color variables and MS (Table 4) were exploited by RF approach to provide a more accurate prediction. In addition, the more heterogeneous distribution of the SOC content in the lowland field suggests that the SOC data are more suitable for the modeling.
The SMLR approach yielded the highest accurate models for the upland field data (Table 5). Regarding the correlations between soil color parameters, MS values, and the target soil properties, the SMLR approach selected the predictor variables which are the most correlated with the target soil properties, such as for clay and sand contents (Table 4). This is in line with previous study which reported that SMLR model selected the most significant explanatory variables for the prediction (Mahmood, Hoogmoed, and Henten 2012).
The results from these approaches suggest that the RF model considers the potential significant correlations between all explanatory variables and the target soil properties for yielding the most accurate prediction, whereas the SMLR model exploits variables with strong correlations with the predicted target soil properties. The RF algorithm uses the data partitioning for finding relations between predictor and response variable (Breiman 2001). This approach carried out different combinations of predictor variables with the development of a decision tree. In this way, the initial training sets are divided into smaller training sets in which different combinations of variables are tested with random bootstrap sample and the best combinations in each tree are averaged for yielding the best model (Ly, Nguyen, and Pham 2021). This operation with RF algorithm allows the exploitation of potential relations between predictor and response variable. In contrast to the RF model, the SMLR approach builds the model based on the linear relationship between predictor and response variables. In this study, the SMLR model is more accurate when the soil color variable strongly correlates with the predicted soil properties. Compared to the models generated with different datasets (all data and per land use), SMLR model yielded the most accurate prediction for the dataset from upland fields despite the target soil properties, where the amount of data was low (n = 41) compared to lowland fields datasets (n = 199) or all combined data (n = 240). This suggests that the SMLR approach presented better adjustment with a smaller amount of data compared to the RF approach. This is in line with a previous study that reported that the SMLR approach generated a better prediction model with a low number of data than with a greater amount of data for the estimation of different soil properties (Silva et al. 2017). Increasing the number of data can lead to multicollinearity of covariate data, however, the SMLR approach is sensitive to the multicollinearity of the variables leading to a decrease of the prediction accuracy (Mahmood, Hoogmoed, and Henten 2012). In contrast, previous studies noticed that the RF approach presented better adjustments with a greater than with a smaller amount of data, as in the case of the estimation of soil bulk density and other soil properties (Silva et al. 2017).
This study also showed that the NLR approach yielded the worst prediction model of the target soil properties. This approach is used to highlight the non-linear relationships between predictor and response variables. However, the RF algorithm yields more accurate prediction results compared to this approach with its ability to work on both linear and nonlinear relationships in the data.

Improvement of the prediction of soil properties at land use scale
The results of this study demonstrated an increase in model prediction accuracy with a separate analysis of the datasets of the lowland and upland fields. The largest difference between these two land-use types was the water condition which affects both the oxygen status in soil and the status and availability of nutrients. The flooded conditions that prevailed in the lowland fields influenced the availability of soil nutrients and their accessibility to plants, by increasing the soil pH and by improving nutrient delivery to rice plant roots by mass flow and diffusion mechanisms (Sahrawat 2011). In contrast, the upland rice fields in Malagasy highlands were characterized by water shortage, low soil fertility, and acidity with the prevalence of Al toxicity (Rakotoson 2014). In terms of nutrient availability, a previous study reported that rainfed soil is more P deficient than irrigated rice soil (Rabeharisoa et al. 2012). Flooding conditions increase phosphorus solubility depending on the release of the immobilized P resulting in the dissolution of Fe oxyhydroxides (Rabeharisoa et al. 2012). With respect to the values of the target soil properties, lowland and upland fields showed different soil characteristics. The distribution of the values of some soil properties is not the same with respect to the land use type ( Table 2). The analysis of variance results (data not shown) reported that land use type influences the soil nutrient status with p < 0.001. Similarly, the mean values of the color and MS parameters were significantly different for the lowland and upland data (p < 0.001). These significant differences between the lowland and upland characteristics can explain the improvement in the accuracy of the models upon the separation of the data.
Our study is an exploratory study to develop simple prediction methods adapted to the properties of Malagasy rice soils. The results of this study showed that the SOC can be predicted with the use of CIE Lab color parameters and the MS value with acceptable accuracies of the models for the soils of the lowlands and uplands. It can suggest that this approach can be used for soil management related to organic matter. Our results showed that the prediction of other target soil properties is also possible with the CIE Lab parameters with acceptable reliability depending on the dataset (Table 5). But, based on the total variance explained (R 2 ≤ 0.50), the improvement of the models is necessary before the use as a simple method for soil management. The soil color parameters only is insufficient for the prediction of soil variables. The previous study reported that combining soil color with other parameters like site-specific characteristics such as parent material, land management, geographical parameters improved the prediction accuracy of the model, where the more explanatory variables are added, the model was better (Aitkenhead et al. 2018). For more practical use of this approach in terms of soil management, improvement of the prediction models through the inclusion of other parameters such as site characteristic and land management is required. In addition, the limitation of this study is the small number of samples, particularly from the upland field. An increase in the number of soil samples from the upland field and the extension of sampling from other areas might increase the variability of the soil data and improve the reliability of the prediction models. In addition, increasing the number of soil samples from different localizations is relevant for developing a model for each site and extending for other areas.
However, the results of this study are still relevant given that for tropical soils, only few studies have been carried out with regard to the use of these approaches, especially for the prediction of soil properties other than SOC. This study opens perspectives on the use of these simple tools in the determination of soil properties and on the application of this approach on a larger scale. In addition, the results of this study can be used for the soil samples from the same region which be characterized by the same climate condition and the same land use type.
The potential of using remote sensing data for the determination of soil properties was indicated in previous studies (Tsui, Tsai, and Chen 2013;Zhang et al. 2019). Such data are more useful for prediction of soil properties on a large scale, which seems to be associated with the larger spatial variability. At the local scale, such a farmer plot, the use of the soil color parameters that may vary from one plot to another is more appropriate for the prediction of the soil properties and thus for the decision-making regarding fertilizer management. Based on new sensors, such as CR-20, the soil color variables can be easily extracted and then used as additional variables combined with remote sensing data to improve the soil properties prediction on a large scale.

Conclusion
In this study, soil color parameters and MS values were used to estimate soil properties, such as the SOC, TN, P ox , and Fe ox contents, as well as the soil texture, in low-and upland fields. Our results show that soil color parameters yielded an acceptable prediction accuracy of the Fe ox content for the combined data, SOC and TN for the lowland fields and SOC, Fe ox , clay, and sand contents in the upland fields with RPIQ ≥ 1.60. The combination of the soil color parameters with the MS value as predictor variables in the same model improved the accuracy of the SOC content prediction in the lowland fields. The RF approach yielded acceptable prediction accuracy for SOC in lowland fields (R 2 = 0.57, RMSE = 6.37, RPIQ = 1.96) and Fe ox (log e Fe ox ) for all combined data (R 2 = 0.54, RMSE = 0.55, RPIQ = 1.70). The SMLR approach yielded the most accurate predictions for upland field data with acceptable reliabilities on the prediction of SOC, Fe ox , clay, and sand contents. The prediction of the P ox concentration remains challenging due to the lack of a direct relationship of P ox with the soil color parameters. These results allowed to demonstrate the potential use of soil color parameters and MS values in the soil properties prediction in Malagasy highland rice fields with characteristics similar to those of the studied area. However, this study was limited by the small dataset, particularly from upland fields and the development of a universal model which can be applied at national scale needs a large number of soil samples. Therefore, this study opens a new perspective to facilitate or simplify the determination of Malagasy soils properties for decisionmaking in agricultural management. Further studies using a wide range of rice soil data from different localities are recommended so that this approach can be adopted as an alternative method to conventional analyses in Madagascar.