Examining LightGBM and CatBoost models for wadi flash flood susceptibility prediction

Abstract This study presents two machine learning models, namely, the light gradient boosting machine (LightGBM) and categorical boosting (CatBoost), for the first time for predicting flash flood susceptibility (FFS) in the Wadi System (Hurghada, Egypt). A flood inventory map with 445 flash flood sites was produced and randomly divided into two groups for training (70%) and testing (30%). Fourteen flood controlling factors were selected and evaluated for their relative importance in flood occurrence prediction. The performance of the two models was assessed using various indexes in comparison to the common random forest (RF) method. The results show areas under the receiver operating characteristic curves (AUROC) of above 97% for all models and that LightGBM outperforms other models in terms of classification metrics and processing time. The developed FFS maps demonstrate that highly populated areas are the most susceptible to flash floods. The present study proves that the employed algorithms (LightGBM and CatBoost) can be efficiently used for FFS mapping.


Introduction
The United Nations (UNISDR 2015) stated that approximately 43% of the natural disasters that occurred globally from 1995 to 2015 and affected more than half of all people were water-related disasters. Flash floods are more catastrophic than any other type of flooding due to their very short lag time (Vinet 2008), especially in arid environments (Negm 2020;Saber et al. 2020;Abdel-Fattah et al. 2018). The devastating impacts of flash floods have been recorded and documented in developing and developed countries (Bisht et al. 2018; Mart ın-Vide and Llasat 2018); however, flood events are more severe in developing countries, such as those in the Middle East and North Africa (MENA) region. The observed increase in flash flood frequency is mainly driven by changes in extreme storm patterns and global climate change (Hirabayashi et al. 2013;Pachauri et al. 2014). The frequencies of extreme flood events in the past few decades have increased in the MENA region (Zhang et al. 2005). Flash flood risk mitigation requires the use of precise and accurate flood monitoring measures to support hazard management (Arora et al. 2021). Mapping flash flood susceptibility (FFS) areas are considered one of the most important measures among scientists and governments around the world (Ali et al. 2020). This task is generally difficult and considered more difficult in the MENA region than in other regions due to difficulties with accessing affected areas; consequently, the performance of hydrological models can be affected, and detailed observational datasets are required for calibration and validation (Abushandi and Merkel 2011;Abdrabo et al. 2020).
Flood susceptibility mapping (FSM) has been conducted with several tools, such as geospatial tools available through geographic information systems (GIS) , the analytical hierarchy process (AHP) (Vojtek and Vojtekov a 2019; Abdrabo et al. 2020), the frequency ratio (FR) method (Samanta et al. 2018;Siahkamari et al. 2018) and the weights of evidence approach (Hong et al. 2018). However, these methods have drawbacks in FSM; notably, the AHP yields uncertainties associated with ambiguous judgments based on expert knowledge, which is used to set the weights of influential factors , and the use of the FR is reliant on the sample size (Samanta et al. 2018;Siahkamari et al. 2018). Some additional drawbacks include predefined assumptions related to flood occurrence and the corresponding influential factors (Dodangeh et al. 2020). Moreover, many types of models, including physically based, lumped, and statistical models, have been widely used to simulate hydrological events; however, physical models present various uncertainties (Unduche et al. 2018;Chourushi et al. 2019). Such uncertainties related to reliable quantitative flood predictions are due to data limitations (Fawcett and Stone 2010;Arabameri et al. 2020), and models require the use of extensive and detailed field observations for parametrization (Fenicia et al. 2008). Therefore, alternative tools and methods for assessing FFS in ungauged basins of arid regions are needed.
Globally, the application of machine learning approaches for flood susceptibility prediction has been widely assessed over the past two decades. Therefore, the recent development of machine learning methods has led to substantial improvements in flood modeling, and such methods have become widespread due to their ability to capture information without applying predefined assumptions and to process complex datasets with high levels of accuracy in short periods of time (Arabameri et al. 2020;Costache, Popa, et al. 2020). ML models that have been used to predict flood susceptibility include logistic regression (LR), artificial neural networks (ANNs) (Arora et al. 2021;Shahabi et al. 2021), the adaptive neuro-fuzzy inference system (ANFIS) (Costache, Hong, et al. 2020;Arora et al. 2021), genetic algorithms (GAs) (Darabi et al. 2019;Shirzadi et al. 2020), support vector machines (SVMs) Dodangeh et al. 2020), and random forest (RF) models. The RF model has been widely used for flood risk assessment Esfandiari et al. 2020). Additionally, various ML algorithms, together with novel ML ensemble methods, have been used to map FFS (Shahabi et al. 2021). ML approaches involve various steps (Arora et al. 2021). First, inventory datasets with reasonable accuracy must be prepared for both training and validation of the employed models. Second, the potential flood conditions or geoenvironmental factors related to flooding in the study area must be selected. Third, efficient and appropriate ML models are used, and the model performance is assessed by reliable evaluation indices.
Several studies of arid regions have been performed to develop flash flood hazard maps ) using GIS and rainfall-runoff model (RRI) tools, to establish flood susceptibility maps using conventional methods such as AHP and ArcGIS tools as part of a multicriteria approach (Souissi et al. 2020), and to assess flash flood hazards (Farhan and Anaba 2016;Kumar et al. 2018;Adnan et al. 2019;Abdelkader et al. 2021;Ahmed et al. 2021) using geomorphometric approaches combined with remote sensing data and rainfall-runoff modelling. Most of such studies have been limited by a lack of data available for model calibration and validation for flash flood modelling and forecasting, leading to uncertainty. Despite the challenges that have faced previous studies and attempts made to apply ML approaches to overcome such challenges, the application of ML algorithms remains limited in arid (El-Haddad et al. 2020) and semiarid regions (Janizadeh et al. 2019;Arabameri et al. 2020). Consequently, the use of ML approaches and soft computing methods is crucial and highly recommended for assessing FFS in ungauged basins of arid and semiarid regions.
In this study, we examine two methods (the light gradient boosting machine (LightGBM) and categorical boosting (CatBoost)) for FFS mapping for the first time. Both methods have been applied in other applications. For instance, LightGBM was used in previous studies due to its accurate predictions, short computational time, and outstanding ability to avoid overfitting issues. The method has been applied in the prediction of protein-protein interactions , price trend forecasting (Sun et al. 2020), wind power forecasting (Ju et al. 2019), flight departure delay prediction (Ye et al. 2020), web searches, miRNA identification in breast cancer , music recommendation challenges (Zhang and Fogelman-Souli e 2018), peer-to-peer (P2P) network credit default predictions (Ma et al. 2018), and audio scene classification systems (Gong et al. 2017). CatBoost has also been applied in many other fields, such as in driving style recognition ) and evapotranspiration prediction in humid climate regions (Huang et al. 2019;Zhang et al. 2020). Accordingly, the main objective of this work is to evaluate how effective two machine learning approaches (LightGBM and CatBoost) are in predicting FFS in Wadi environments (the city of Hurghada and upstream Wadi catchments along the Red Sea, Egypt) and to compare the performance of these methods to that of the common RF method. The subsequent sections of this study are as follows: study area description, introduction to the datasets and methodology, presentation of the results, discussion, and conclusion.

Study area
The city of Hurghada is located along the Red Sea coast, Egypt, and bounded by latitudes 270 10 0 and 270 30 0 N and longitudes 330 30 0 and 330 52 0 E, as shown in Figure 1. The watershed area is 138 km 2 , and the maximum and minimum elevations are approximately 2139 m and 13 m above sea level, respectively. The city is one of the most vulnerable areas to urban flash floods, with an increased rainfall trend from 1983 to 2019 . Additionally, several flash flood events occurred in Hurghada and the surrounding areas, especially in the spring and autumn of 1996, 2014, 2016; these events caused severe damage to the infrastructure and other facilities of the city (Abdel-Fattah et al. 2015). An overall assessment of flood risk maps in Wadi Qena showed that the El-Hurghada-Ras Gharib national road is located in a basin with one of the highest hazard levels in the surrounding area (Elsadek et al. 2019). Hurghada has experienced a fast development in tourism, an immense increase in population and considerable urban growth over the past two decades; accordingly, it has become a tourist and economic hub in the region. Additionally, flash flood events in the region have become more hazardous due to increases in the flash flood frequency and intensity. The frequency and magnitude of flash floods have increased in the past two decades based on historical climatic data obtained at the Hurghada weather station (Tutiempo Network, S.L. 2021), as shown in Figure 1c. This finding also corroborates the results of previous research on the frequency and intensity of rainfall in arid and semiarid regions . For instance, the number of rainy days based on the Hurghada weather station data has increased (approximately 19 days) over the past decade from 2010 to 2020; however, in the previous period from 1993 to 2009, the number of rainy days was approximately 12. The exposure of lives and infrastructure to flash floods affects social wellbeing at the individual and country levels ( Figure A.1, Appendix A).

Methodology
The methodology of this study consists of several steps, as illustrated in the flowchart in Figure 2. There are two main parts in the methodology. First, a flood inventory map was developed based on the 445 flooded points. These points were identified based on field surveys and records of historical flood events. Additionally, nonflooded points (445) were randomly selected throughout the catchment using a GIS environment. Furthermore, a total of 14 independent FSS factors (FFSFs) were considered for modeling based on the local topographical, hydrological, geological and landform characteristics, which have been used in several literature studies. These FFSFs are elevation, slope, aspect, plan curvature, vertical and horizontal distance from main streams, hillshade, flow accumulation, the topographic wetness index (TWI), rainfall, lithology, land use (LU), the sediment transport index (STI), and the normalized difference vegetation index (NDVI), which were used to determine the linear relationships with other factors. In a later stage, the dataset was divided into two groups for training (70%) and testing (30%) through a random selection scheme. Spatial maps for each FFSF were produced using ArcGIS considering the consistency of the spatial resolution. Then, two methods, the information gain ratio (IGR) and multicollinearity test (VIF) methods, were used to assess the importance of FFSFs in the study area. Second, with the implementation of the proposed machine learning approaches, the datasets were divided into two main categories for training (70%) and validation (30%), and three algorithms were used: RF, LightGBM, and CatBoost. The results of the models were assessed for accuracy using different measures, including the famous area under the curve (AUC) metric.

Flash flood inventory data
The first step in FFS mapping (FFSM) is to identify flood points or locations based on historical records of previous floods. Such records provide the most important inputs for FFSM (Tehrany et al. 2014). From the records of past event occurrences, locations of future hazard events can be estimated (Devkota et al. 2013). Therefore, conducting an analysis of similar past events and the corresponding influential factors is the first step in flood susceptibility assessment (Masood and Takeuchi 2012). A flood inventory map shows the sites of flooded areas in any flood-prone basin (Bellu et al. 2016). An inventory map can be prepared from several sources, including field surveys, flood forecasting records, and remote sensing data Esfandiari et al. 2020). Appropriately selecting flood points will enhance model accuracy for flood susceptibility prediction (Arora et al. 2019). In this study, 890 ground control points (Figure 1a, b) were identified for both flood (445) and nonflooded points (445). The flood locations were compiled from historical flood records, inundation maps developed through rainfall-runoff modelling , and field surveys of flooded and affected areas ( Figure A.1, Appendix A). Nonflooded points were randomly selected. The flood and nonflooded points were assigned values of 1 and 0, respectively, and then the points were divided into sets (70% and 30%) for the training and validation of the flash flood prediction model; notably, the model's performance and generalization ability were verified based on a random selection method.

Geospatial database (flood-related parameters)
The selection of flood-influencing factors in FFSM is important and impacts modelling accuracy (Kia et al. 2012). During floods in drainage systems, runoff patterns are related to the characteristics of the watershed, catchment area, topography and LU/land cover types (H€ olting and Coldewey 2019). There are no standard and universal criteria for selecting the controlling factors for FSM; therefore, according to the previous review and the study area characteristics, and data availability, 14 flood-triggering factors, including topographic, hydrological, geological, and landform factors, were selected. The topographic factors are elevation, slope, plan curvature, aspect, hillshade, flow direction, and flow accumulation. The hydrological factors include the TWI, the STI, rainfall, and vertical and horizontal flow distances. The geological and land form factors include lithology, the NDVI, and LU. All data were resampled and prepared in spatial raster format with a 90-m spatial resolution in ArcGIS ( Figure 3). All topographic factors were constructed based on the Multi-Error-Removed Improved-Terrain (MERIT) DEM (Yamazaki et al. 2017). The spatial resolution of the terrain elevation is 3 sec ($90 m at the equator). This DEM was developed by eliminating the error components in existing DEMs, such as SRTM3 v2.1 and AW3D30m v1. The data are freely accessible and available at http:// hydro.iis.u-tokyo.ac.jp/$yamadai/MERIT_DEM/.
3.1.2.1. Topographic factors. Elevation: There is a direct relation between elevation and flooding (Tehrany et al. 2013), which means that low land surfaces are more vulnerable to floods than high land areas (Khosravi et al. 2016); notably, the higher the elevation is, the lower the flood probability (Tehrany et al. 2014;Youssef et al. 2016). The study area has very complex topographic features, with very high elevations ranging from 500 m to more than 2000 m in the upstream zone, moderate elevations ranging from 100 m to 500 m in the middle zone, and low land areas with elevations less than 100 m in the coastal zone, which mainly includes urban and agricultural LU types ( Figure 3a).
Slope: Slope has a significant influence on flooding (Meraj et al. 2018) due to its effects on water velocity and surface runoff (Torabi Haghighi et al. 2018). Steep slopes contribute to a high water velocity and increase the flow volume in downstream areas . Additionally, the slope influences the hydrological features that directly affect runoff (Tehrany et al. 2019). In the study area, the slope varies from 0 to 50 degrees ( Figure 3b).
Aspect: Aspect is important for flooding , and many hydrologic parameters are influenced by aspect. When an aspect receives low and intense sun, soil moisture will increase; consequently, the moist slope will generate runoff, contributing to flooding risks downslope (Yariyan et al. 2020). There is an indirect relationship between aspect and flooding due to the corresponding effects on several geoenvironmental factors, including soils, rainfall, and vegetation (Rahmati et al. 2016). In this study, the aspect map was categorized into 9 classes ranging from flat to northwest (Figure 3c).
Plan curvature: Plan curvature is considered an important and essential flood-influencing factor by many researchers (Hong et al. 2018) and affects hyporheic conditions and heterogeneity (Cardenas et al. 2004). Curvature values vary between areas of accelerated runoff and those with decelerated runoff; negative and positive values are associated with increased and decreased runoff, respectively. Runoff is affected by slope shape, as flat (zero curvature) and concave (negative) forms have more potential for flooding than convex (positive) forms (Shahabi et al. 2021). For instance, concave slopes decelerate surface flows and generally increase infiltration losses (Young and Mutchler 1969), but convex slopes accelerate flow discharge, and infiltration is often limited (Cao et al. 2016). A curvature map with three main classes (concave, flat, and convex) was developed from a DEM, as shown in Figure 3d.
Hillshade: Hillshade or toposhade is directly related to the shade and length of hillslopes, which may affect the convergence of overland flow (Aryal et al. 2003). However, the use of toposhade has been limited in previous studies (Bui et al. 2019, catena), even though it was found to be the most important factor in FFSM after slope and elevation ; therefore, toposhade is selected as a flood-influencing factor in this study, as shown in Figure 3e.
Flow accumulation: Flow accumulation can be estimated from flow direction parameters to show the accumulation of flows among pixels; thus, this factor is important in FSM and hydrological studies (Kazakis et al. 2015), and it has been used in some previous studies (Alipour et al. 2020). Regions with high flow accumulation have a high probability of flooding (Lehner et al. 2006). The flow accumulation map was estimated from flow direction maps using ArcGIS, as shown in Figure 3f.
3.1.2.2 Hydrological factors. Topographic wetness index (TWI): Proposed by Beven and Kirkby (1979), the TWI expresses the water accumulation quantity per unit (or cell) in a watershed considering the downstream flow trends due to the gravitational forces (Eq. 1).
where a is the cumulative upslope catchment area draining through a point (per unit contour length) and tanðbÞ represents the steepest downslope direction of a cell surface (b is the angle in degrees). The TWI is related to the spatial changes in wetness in a catchment (Rahmati et al. 2016) and describes the location and size of saturated areas subject to overland flow (Wilson and Gallant 2000). This index quantifies the impact of the local topography on surface runoff generation (Qin et al. 2011) and reflects the long-term moisture availability in a landscape (Kopeck y and C ı zkov a 2010). The TWI varies from À7.3 to 13.28 in a given area, as demonstrated in Figure 3g.
Sediment transport index (STI): The processes of erosion and deposition can be reflected by the STI, which was calculated using Eq. (4). The changes in a riverbed caused by these processes can lead to variations in the water storage capability of the corresponding river and may have a considerable impact on flooding. The STI was considered a flood-related influential factor (Tehrany and Jones 2017), and pixels with low STI values were associated with high flood potential. The STI in the study area varies from 0 to 1.3, as shown in Figure 3h. (2) Flow distance: The distance from main rivers or streams has a considerable impact on the flooding occurrence in a given area (Glenn et al. 2012). The areas adjacent to streams are usually more prone to flooding than are other areas (Chapi et al. 2017). Additionally, the risk of flooding is proportionally related to the distance from rivers (Predick and Turner 2008), and floods frequently occur in areas adjacent to rivers Darabi et al. 2019). The distance from streams signifies the distance from stream networks, which are the main conduits of overland flow (Gonz alez-Arqueros et al. 2018). In this study, horizontal and vertical flow distances were estimated (Figure 3i, j) using ArcGIS from flow accumulation, flow direction and DEM data.
Rainfall: Precipitation has a considerable effect on flooding; notably, without rainfall, flooding would not occur. The total average rainfall was estimated over the period of 2001-2019 from the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN)-Dynamic Infrared Rain Rate in Near-Real Time (PDIR-Now) dataset . The data have a high global resolution (0.04 x 0.04 or ¼ 4 km x 4 km), are available in near-real time and are freely available and accessible for download from https://chrsdata.eng.uci.edu/. The spatial maps show that the average precipitation is highest in the downstream portion of the watershed, as shown in Figure 3k.

Geological and landform factors.
Land use and land cover: An LU and land cover layer was generated from a global cover map (Kobayashi et al. 2017) developed by the geospatial information authority of Japan (Geospatial Information Authority of Japan 2021) and mainly from the Earth Resources Observation and Science (EROS) Center website (Earth Resources Observation and Science (EROS) Center 2021). LU-land cover types influence infiltration and the runoff velocity. The study area includes approximately 8 classes ( Figure 3l): urban, consolidated rock, unconsolidated sand, water bodies, sparse vegetation, shrubs, and croplands. Most of the area is covered by rocks and sand, urban areas are limited to the coast with some vegetation, and other land cover types are very rare. The accuracy of land use/landcover is very important in hydrological studies. Therefore, high resolution remote sensing data are important to enhance the urban mapping (Raju et al. 2008) and to develop classified land cover maps (Townshend et al. 1991). Such accurate maps are essential to study the effect of land use changes on hydrological regime (Sun et al. 2013;Apollonio et al. 2016).
Lithology: Lithology is an important factor due to its effects on the infiltration and the flow velocity. The lithology was developed from the geological map of Egypt 1981 (scale 1:2000000), and the source of the data is the Ministry of Industry and Mineral Resources, Egyptian Geological Survey and Mining Authority. The lithology was classified into 6 geological types (Figure 3m), including Table 1 Dokhan volcanic, which is rare in the area. Carbonates are dominant in the northern part of the study area. Pleiomarine deposits are sparsely distributed throughout the area, old granite and young granites are mainly located in the upstream area, and most of the area, especially for parts of the middle and downstream regions, is dominated by Quaternary deposits. NDVI: NDVI data were extracted from Moderate-Resolution Imaging Spectroradiometer (MODIS) 6 data. The EROS Moderate-Resolution Imaging Spectroradiometer (eMODIS) collection was developed based on the MODIS data acquired by NASA-EOS. The spatial resolution of the data is 250 meters (m). The data can be freely accessed from the USGS at (Earth Resources Observation and Science (EROS) Center 2021). In the study area, the NDVI varies from À0.19 to 0.99 (Figure 3n).

Selection of flood-influencing factors
Feature selection is an important step in machine learning modeling. Removing redundant features may prevent a reduction in the training process speed caused by (1) the increased number of features and (2) the loss landscape ill-conditioning issue ( € Ozt€ urk and Akdeniz 2000). The latter occurs when highly correlated features exist in the training dataset, thus causing a problem in determining the learning rate hyperparameter. Therefore, an incorrect choice of this hyperparameter value may affect the model estimation ability. Therefore, the feature selection process was based on three analyses to detect irrelevant factors: (1) Spearman's rank correlation, (2) a multicollinearity test and (3) the IGR.

Spearman's rank correlation coefficient
Spearman's rank correlation coefficient is a nonparametric measure used to evaluate the strength of the monotonic link between two parameters X and Y. The value of the coefficient varies between À1 and 1, representing perfect negative and positive degrees of association, respectively. The closer the coefficient value is to 0, the weaker the relationship between X and Y. A Pearson correlation coefficient value larger than 0.7 is indicative of a high level of collinearity (Tien Bui et al. 2016). The correlation coefficient is calculated as follows: where, r is the correlation coefficient, x and y are two variables and n is the length of each variable sequence.

Multicollinearity test
In addition to the correlations between two features based on Spearman's coefficient, multicollinearity was assessed in this study for all influential factors considered. Multicollinearity analysis aims to detect the interrelatedness of variables and was performed (in this study) based on the VIF. This factor is commonly used in flood susceptibility assessment studies Khosravi et al. 2019;Rahman et al. 2019) , and a threshold of > 5 is recommended to assess multicollinearity. However, in other cases, if the VIF value is greater than 10, the corresponding predictors are considered collinear and should excluded from modeling . Therefore, in this study, we used a value of 5 as the threshold for selection. The independent predictors were defined as X ¼ fX 1 , X 2 … , X n g, and R 2 j refers to the coefficient of determination when the j th independent predictor X j is regressed for all other predictors. The VIF is calculated based on the following equation (4):

Information gain ratio
Conditioning factors were evaluated to identify their relative importance in flood occurrence prediction using the IGR test (Quinlan 1986). This feature selection method has been considered in many classification studies . For an input with an IGR of equal to zero, no relationship exists between this factor and the output. The use of such an input in the model does not add information to the applied model and generates noise, decreasing the predictive capability of the model. Therefore, removing these factors from input sets is highly recommended. The IGR was calculated using Eq. 5:

Machine learning methods
Supervised intelligent classifier methods can be classified into different categories. These categories include different techniques: quadratic discriminant analysis (QDA), support vector classifier (linear SVM), stochastic gradient descent, decision trees, and ANN models. Ensemble learning techniques combine multiple weak learning models to build robust learners. The aim of these predictive models is to increase the overall accuracy rate. The first type of method is based on the use of feature engineering, whereas the second type is based mainly on boosting algorithms. These boosting algorithms focus on training samples that are misclassified. Different boosting algorithms have been recently proposed for classification and regression, and the most well-known and widely used algorithms include AdaBoost (R€ atsch et al. 2001), CatBoost, LightGBM, XGBoost (Chen and Guestrin 2016) and gradient boosting (Friedman 2002). In this study, we selected the RF approach, as a famous classifier algorithm, and newly applied methods for FFSM: CatBoost and LightGBM.

Random forests (RFs)
RFs are algorithms that have been adopted to solve many problems involving prediction and multiclassification (Schoppa et al. 2020). The RF concept was introduced by Breiman (2001) as a combination of the random subspace method and bagging ensemble learning. This classification algorithm falls in the ensemble classifier category; it is based mainly on the use of decision tree models to achieve a high classification rate. Many trees are typically generated, and a bootstrap method is used for each tree based on training data. For a given classification problem, the RF procedure involves providing each tree in the forest with input data; then, each tree individually classifies the input data into the appropriate class. Final classification is accomplished by using the majority voting scheme for all individual classifiers (trees) (Pal 2005). Decision tree classifiers have several advantages over traditional methods; notably, their results are easy to interpret, they can handle both numerical and nominal data, and they are easy to construct. Nonetheless, decision trees are not always competitive with other classification techniques (Malekipirbazari and Aksakalli 2015). The RF algorithm combines straightforward operability with a high computational speed comparable to that of ANNs and SVMs (Mosavi et al. 2018). Furthermore, RFs outperformed other techniques, including ANN, SVM, and regression models, in some previous applications in hydrology (Bachmair et al. 2017). In this study, we selected an RF as one of the most widely applicable techniques in FSM in previous studies to obtain acceptable accuracy, and the results were compared with those of the two newly developed methods.

Light gradient boosting machine (LightGBM)
LightGBM is a variant of the gradient boosting decision tree (GBDT) algorithm developed by Microsoft (Ke et al. 2017). The structure of this algorithm is based on weak learners that are combined to form a strong learner (Ju et al. 2019). Some changes have been made in the LightGBM model, including to the histogram, leafwise tree growth function, gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB). The time of tree construction is proportional to the split operation. Finding the best splitting point (optimal split) is an important step in most GBDT models and is based on a presorted algorithm. All sample points are scanned for each feature to estimate the information gain at all possible split points; then, the optimal segmentation point is identified. However, this process is time and memory consuming. In LightGBM, another method called the histogram-based algorithm is adopted. The objective is to group data into a histogram (bins) and choose the splitting point based on these bins (Figure 4a). This approach considerably reduces the temporal complexity of the algorithm since the operation is based on bins instead of data. Another difference between this model and other GBDTs is the growth of the decision tree. In LightGBM, the leafwise tree growth strategy replaces the levelwise tree growth approach. The latter can find the best possible node to split, and splitting occurs at various levels, resulting in symmetric trees (Figure 4b). However, in LightGBM, the new strategy involves finding the leaf that will reduce the maximum error and split only that leaf without splitting others (Figure 4b). To avoid overfitting caused by the deep growth of decision trees, a maximum leafwise depth must be set (Ge et al. 2019).
The subsampling operation is usually performed with a random sample of learning data, and a tree is constructed based on that sample. In the LightGBM model, the subsampled data are weighted. The GOSS method is based on the calculation of gradients that represent the information gain of samples. Note that data instances with small gradients indicate well-trained trees that have little impact in the training process, and data instances with large gradients are those that will contribute more to the tree construction process. The GOSS algorithm will keep all instances with large gradients and perform random sampling for instances with small gradients. In addition to these algorithms, LightGBM uses the EFB method to reduce the dimensionality of data. EFB is a technique that bundles features that are never nonzero together into a single feature. These algorithms make LightGBM faster than other GBDT models and maintain high performance since the information is preserved to the greatest extent possible.

Categorical boosting (CatBoost)
The CatBoost learning algorithm is another improved GBDT approach that was proposed by Dorogush, Ershov, and Gulin (2018); it uses gradient boosting for regression trees (Friedman 2002) and builds a model in a stagewise fashion through increasingly refined approximations. In addition, several improvements to the model have been made to overcome the overfitting problem (Dorogush et al. 2018). Gradient boosting is an efficient machine learning approach that has been applied in many other fields, such as web searching, environmental variable prediction, the spatial analysis of ecological factor distributions (such as the distribution of the contaminant concentration), weather forecasting, and many others (SAFAROV et al. 2020), with acceptable results. Additionally, this approach has achieved high performance in weather forecasting (Kusiak et al. 2009), the prediction of Kickstarter campaigns (Jhaveri et al. 2019), driving style recognition ) and diabetes prediction (Miao et al. 2019). CatBoost performs well with categorical features, but the efficiency of the model increases with the absence of categorical features. The approach used is mainly based on gradient boosting and binary tree classification. The differences between CatBoost and other boosting techniques can be summarized as follows (Hancock and Khoshgoftaar 2020).
a. An advanced algorithm is automatically integrated into the model to process categorical features as numerical data. According to Prokhorenkova et al. (2019), target statistics can be used for categorical features with minimum information loss. b. Categorical features are fused to encompass the full advantages of features. c. The symmetrical tree technique is applied to avoid overfitting and improve the classification accuracy.
We consider the following data sample where, X j ¼ ðx 1 j , x 2 j , . . . , x n j Þ is the feature vector and Y j 2 R represents the labeled vector set (binary class). Input-output data are independently and identically distributed based on an undefined function q(.). The main objective of the learning scheme is to train a function H : R n ! R that can reduce the expected loss Eq: LðHÞ :¼ ELðy, HðXÞÞ: Lð:, :Þ, the smoothing loss, is a function, and ðX, YÞ represents some testing data from ensemble D. The gradient boosting algorithm iteratively builds a sequence of approximations H t : R m ! R (t ¼ 0, 1, 2, . . .) in a greedy manner. The final function H t is obtained from a previous approximation using an additive process: H t ¼ H ðtÀ1Þ þ ag t : Generally, the minimization problem is solved with a greedy approach such as the Newton method using a second-order approximation of LðH tÀ1 þ gÞ at H tÀ1 or by applying (negative) gradient steps.

Model performance validation
The receiver operating curve (ROC) is a widely used and accepted technique in geospatial analysis for determining the validity of models (Tehrany et al. 2013;Chen et al. 2020). The ROC is the most commonly applied approach for the verification of flood susceptibility and landslide models. The prediction accuracy of the examined models has been evaluated using the AUC in many previous studies (Youssef and Hegab 2019). Models that exhibit satisfactory performance generally correspond to AUC-ROC values ranging from 0.5 to 1, and a model increase can increase the AUC-ROC value. The accuracy and reliability of a model are maximized when the AUC-ROC value is equal to or close to 1.0, which reflects the capability of the model to predict disaster occurrence without bias (Bui et al. 2012).
Other statistical criteria (accuracy, precision, recall, and the F1-Score) were used to evaluate the performance of the models and to compare their robustness levels to those of other models applied in the literature. Accuracy is the ratio of accurately classified observations to the total number of observations (Eq. 8), precision is the ratio of accurately classified positive observations to the total number of classified positive observations (Eq. 9), recall (also known as sensitivity) is the ratio of accurately classified positive observations to the total number of observations (Eq. 10), and the F1-Score considers both precision and recall with a weighted average (Eq. 11).
where TP (true positive) denotes the number of cases correctly classified as flooded pixels, TN (true negative) denotes the number of cases correctly classified as nonflooded pixels, FP (false positive) denotes the number of cases incorrectly classified as flooded pixels, and FN (false negative) denotes the number of cases incorrectly classified as nonflooded pixels.

Multicollinearity assessment and feature selection
The analysis of the Spearman correlation coefficients between factors (Table A.1, Appendix A) was based on a threshold greater than 0.7 according to the approach of Chen et al. (2018). Eight conditioning factors were identified as correlated: vertical distance to rivers, slope, DEM and SPI with horizontal distance to rivers, TRI, rainfall and STI, respectively. A multicollinearity analysis of the controlling factors was conducted using the VIF approach. The VIF range varied from 1.2 to 5.3, where the highest and lowest values of VIF were observed for the TRI and plan curvature, respectively ( Figure  A.2a, Appendix A).
The computed IGR scores for all flood conditioning factors are illustrated in Figure A. 2b, Appendix A. All features have an IGR greater than zero, which indicates their relative importance in flood generation. Some factors, such as rainfall, TWI, DEM, and slope, have a high IGR score (greater than 0.2), and others, such as aspect, hillshade, and SPI, have low IGR scores (less than 0.05). The contribution and importance of factors based on IGR rank from high to the low as follows: vertical distance to rivers, DEM, TRI, TWI, slope, rainfall, lithology, land cover, flow accumulation, plan curvature, horizontal distance to rivers, STI, NDVI, SPI hillshade and aspect. Based on the VIF, the factors rank as follows: TRI, SPI, vertical distance to rivers, DEM, STI, TWI, rainfall, flow accumulation, slope, horizontal distance to rivers, aspect, hillshade, NDVI, lithology, landcover, and plan curvature.
Based on the VIF values of the features, only TRI multicollinearity must be addressed because the VIF value is greater than 5; therefore, this factor was removed from the training dataset. Additionally, due to (1) the correlation between the STI and SPI and (2) the fact that the SPI value has a high VIF value (4.9), the SPI factor was removed. The other factors (vertical and horizontal distances to rivers, rainfall and the STI) were preserved despite the corresponding Spearman correlation analysis results because of their importance as reflected by their IGR scores ( Figure A.2b, Appendix A). Finally, the selected features for flood susceptibility included 14 factors: aspect, vertical and horizontal distances to rivers, hillshade, flow accumulation, slope, the DEM value, the curvature plan, the STI, the TWI, land cover, lithology, rainfall and the NDVI.

Comparison and evaluation of the models
In this section, we provide a detailed comparison of all studied models in terms of different classification metrics. The comparison was performed based on K-fold cross-validation. The studied data were partitioned into a learning set (60%) and a test set to evaluate model performance. The learning set was divided into two sets: one for training (80%) and one to update the model weights and reduce the classification errors. The validation data were used for hyperparameter selection. The optimum hyperparameters for each classification model were selected from the learning data using a grid search method. A large range of hyperparameter values was investigated. For each classifier, the best architecture is given in Table 1. Figure 5 displays the average accuracy of the RF, CatBoost and LightGBM models. It is clear that all models provide approximately the same classification results in terms of statistical indicators, with a slight improvement for the LightGBM model in terms of speed convergence and classification metrics. Figure 6 shows the ROCs of all developed models using the test set. The simulation results demonstrate that the three selected boosting methods exhibit similar properties and provide equivalent accuracy levels. The highest AUC was obtained by the RF model (98.08%), followed by that of the LightGBM with AUC ¼ 98.05%, and the lowest AUC was obtained by CatBoost, with AUC ¼ 97.57%. Moreover, LightGBM displayed the most precise classification performance, with an accuracy ¼94.92% and precision¼ 95.68%, followed by the RF model, with an accuracy ¼94.35% and 95.14%, and finally the CatBoost model, with the same classification rate as the RF model and a 0.94% decrease in precision. In comparison with the RF models in previous studies (e.g., W. Chen et al. Science of the Total Environment 701 (2020) with   AUC ¼ 0.925, Tang et al. (2020) with AUC ¼ 0.886, Lee et al. (2017) with AUC ¼ 0.7878, and Achour and Pourghasemi (2020) with AUC ¼ 0.972), the RF model in this study is superior. In this paper, two new boosting classification techniques were investigated for FFSM in the Hurghada area. To the best of the authors' knowledge, this is the first work to investigate the use of CatBoost and LightGBM in flash flood classification problems compared to commonly used RF models. The obtained results suggest that LightGBM outperforms its counterpart models in terms of classification metrics and processing time. From the simulation results, we can conclude that LightGBM is efficient in flash flood prediction. The predictive accuracy of LightGBM (AUC ¼ 98.5%) provides reasonable performance in flash flood prediction compared to other models. Fan et al. (2019) found that LightGBM performed better than the RF, M5Tree and other empirical models in calculating daily evapotranspiration in a humid subtropical region in China.
The accuracy of CatBoost (AUC ¼ 97.57%) was also high compared with that reported in other studies in other fields. Notably, CatBoost, an RF and an SVM were utilized for modeling evapotranspiration in a humid region in China (Huang et al. 2019). CatBoost yielded better accuracy and a lower computational cost than the other methods (RF and SVM).

Flash flood susceptibility modeling
The evaluation metrics of the new boosting methods (CatBoost and LightGBM) and the RF model verify the abilities and high performance of these models in the prediction of future flash floods in arid regions. Accordingly, the methods were employed to estimate the flood susceptibility in the entire study area. Three FFSMs developed based on the three methods (RF, LightGBM, and CatBoost) are shown in Figure 7a, b, and c, respectively. The flood susceptibility values were categorized into five classes: no flooding, low, moderate, high and very high. The areas covered by the different susceptibility categories vary depending on the model. The FFSM results of the three models indicate that the areas of high and very high susceptibility to flooding cover 41% (RF), 42% (LightGBM), and 44% (CatBoost) of the study area. The areas associated with moderate and low susceptibility to flooding (Figure 8) are estimated at 38% (RF), 34% (LightGBM), and 29% (CatBoost) of the research area. The area that is not susceptible to flash floods varies from 21% to 27% of the total study area ( Figure 8). However, the employed models perform differently in classifying the areas of various FFS levels. All methods exhibit general agreement in terms of the spatial pattern of the susceptibility classes for flooding, and coastal areas are the most prone to flooding; these areas are where most residential and agricultural regions are located in the context of the study area.

Discussion
Currently, increased attention is being given to data-driven methods as alternatives to traditional hydrological and hydraulic models. Therefore, the scientific community is attempting to develop new logic-based mathematical approaches to predict flood-susceptible areas at different spatial scales (Arora et al. 2021). In arid and semiarid environments, few studies have applied machine learning approaches for FFSM (El-Haddad et al. 2020). Therefore, testing different methods is highly recommended, especially in hyperarid regions where data are limited and hydrological models are challenging to establish. This study provides an assessment of three applied methods: RF, LightGBM and CatBoost. The latter two methods are newly tested for FFSM for the first time. The obtained FFS maps verify that both methods can predict flood-prone areas with acceptable accuracy in comparison with the RF method, which has been widely applied in related studies and achieved different levels of accuracy (e.g., AUC ¼ 78% for Band et al. (2020), AUC ¼ 99.3% for Li et al. (2019), AUC ¼ 94.5% for Talukdar et al. (2020), AUC ¼ 93.8% for Park and Lee (2020), and AUC ¼ 89.4% for Nguyen et al. (2018)). In this study, the AUC ¼ 98.2% for the RF model was higher than that in most previous studies. Additionally, the newly applied LightGBM model outperformed the RF and most previously used approaches. Additionally, the three methods showed better performance on average than previously applied methods for flood susceptibility mapping according to the approximately 140 previous applications discussed in more than 30 publications that we reviewed. The performance of the previous methods applied for FSM based on AUC varies from 64% (Shafizadeh-Moghadam et al. 2018) to 99.3% ; LightGBM yields improved classification metrics and a faster processing time. From the present results, LightGBM and CatBoost were verified as efficient in flash flood prediction in arid regions, and they can be effectively applied in other climatic regions, especially for assessments of water-related disasters such as flash floods and landslides.

Conclusions
Flash flood disasters are a tangible threat to society and the environment and hinder long-term sustainability, especially in arid regions where high-quality observations and flood risk management are lacking. Therefore, the present study focuses on accurately predicting FFS in a hyperarid area of the city of Hurghada along the Red Sea. Three machine learning methods were tested to predict the FFS zones in the study area. The first method is the RF approach, which is well known and widely applied in FFM; the other two methods (LightGBM and CatBoost) are assessed for the first time for FFSM. The methods were trained and validated based on flood inventory maps and 14 floodinfluencing factors considering the topographical, hydrological, geological and landform characteristics of the study area. The conclusions of this research can be summarized as follows: 1. The results of FFSM show that the applied methods can accurately predict flood zones with acceptable accuracy, with an area under the ROC above 97%. 2. The FFSM results indicate that the highly populated coastal area is more prone to flooding than other areas and is classified as highly and very highly susceptible. 3. The study verified that the newly applied ML algorithms (LightGBM and CatBoost) can potentially be used for FFSM. 4. The outcomes of this study can be used as a reference to guide flash flood risk management and mitigation in arid regions and consequently assist planners and managers in mitigating flash floods in highly flood-susceptible regions.