Mapping active paddy rice area over monsoon asia using time-series Sentinel – 2 images in Google earth engine; a case study over lower gangetic plain

Abstract We proposed a modification of the existing approach for mapping active paddy rice fields in monsoon-dominated areas. In the existing PPPM approach, LSWI higher than EVI at the transplantation stage enables the identification of rice fields. However, it fails to recognize the fields submerged later due to monsoon floods. In the proposed approach (IPPPM), the submerged fields, at the maximum greenness time, were excluded for better estimation. Sentinel–2A/2B time-series images were used for the year 2018 to map paddy rice over the Lower Gangetic Plain (LGP) using Google earth engine (GEE). The overall accuracy (OA) obtained from IPPPM was 85%. Further comparison with the statistical data reveals the IPPPM underestimates (slope (β1) = 0.77) the total reported paddy rice area, though R2 remains close to 0.9. The findings provide a basis for near real-time mapping of active paddy rice areas for addressing the issues of production and food security.


Introduction
Among the most populous countries of the world, India stands second, residing 1.6 billion people that contribute 17 per cent of the world population with the massive demand for food crops (Lehane 2014). Among the major cereal crops, paddy rice is among the vital staple food grains for over half of the world population and occupies around 12% of the global cropland area (Elert 2014;Dong et al. 2016). The global rice production, as per 2018s FAO estimates, is about 759 million tons (FAO 2018), and the forecast pattern shows it will continue to increase with demand keeping in pace with the increasing global population. The combined share of China, India, Bangladesh and Indonesia to the total global production of rice is 66%. It is expected to increase in the next decade due to the growing population. However, there is an interannual shortfall in the production that is happening due to weather severity, especially during the monsoon season. Periodic occurrence of droughts and floods in major rice-growing areas in the south and south-east Asia due to changing climate poses an impending threat to agriculture (Mendelsohn 2014;Miyan 2015). The IPCC report on climate change in South Asia shows a likely increase of 2 À 4 C in temperature in the coming decades, and that results in an approximate decrease of 30% in crop production (Hijioka et al. 2014). Being an important production area of paddy rice in south Asia, the Gangetic plain -a part of the northern Indian plains may also face a decrease in paddy rice production. As India shares a significant portion of the global rice production and its international marketing, the uncertainty of the paddy rice production in changing climate may lead to future food security issues. In such context, it is imperative to have adequate information on the total cropped area and its dynamics over time in south and south-east Asia with special reference to paddy rice.
In the last two decades, a series of methods have been proposed to mapping cropped areas, especially the irrigated rice area mapping from remotely sensed images. Several remote sensing-based agricultural layers, including Moderate Resolution Imaging Spectroradiometer (MODIS), Medium Resolution Imaging Spectrometer (MERIS), and Finer Resolution Observation and Monitoring of Global Land Cover (FROM-GLC), are available for mapping of paddy cultivation area from local to global scales. A range of different studies was done in the past to map the cropped area under paddy cultivation through phenology-based information, derived from time-series satellite images (Dong et al. 2016;G. Zhang et al. 2017;. Additionally, paddy-rice mapping was done by using supervised Maximum Likelihood (Oguro et al. 2001), Support Vector Machine (Xu et al. 2018), Random Forest (Qadir and Mondal 2020) and unsupervised classifications such as Iso-cluster or K-means (Pan et al. 2010;Nguyen et al. 2012). The use of the time-series spectral vegetation indices for crop mapping, e.g., Enhanced vegetation index (EVI), Normalized difference vegetation index (NDVI), Land surface water index (LSWI), have an added advantage over the multispectral images because timedriven vegetation indices represent phenological events of a crop cycle which could be used effectively in the threshold-based algorithm for mapping paddy-rice (Shew and Ghosh 2019;. The time series of flooding signal (FS), derived from LSWI, was used in several studies for identifying the rice pixels during the transplantation phase. In contrast, time-series of NDVI and EVI were used in the early growing to harvesting stage for mapping paddy-rice Xiao et al. 2006;Teluguntla et al. 2015;Dong et al. 2016;Zhang et al. 2018). All of these studies evidenced effective pixel-based mapping of rice areas using phenology-driven time-series spectral indices. A summary of phenological pixel-based paddy-rice mapping (PPPM) using moderate to fine resolution images, such as MODIS, Landsat 7 and 8, and Sentinel 1 and 2, is given in Table 1. Many of these reported studies, which used coarse spatial (MODIS, 500 m) and low temporal resolution (Landsat,, have faced challenges in classifying and delineating paddy rice pixels in the heterogeneous landscape (Gallego 2004;Zhen et al. 2013;Zhou et al. 2013). As the south and south-east Asian regions are mostly characterized by small and fragmented paddy rice fields, the use of MODIS often results in low classification accuracy emanating from mixed pixels containing many other types of crops (Teluguntla et al. 2015). However, the accuracy of the classification could be improved by using Sentinel-2 dense time-series data that provides better spectral and radiometric resolutions along with the high frequency of revisit cycle and thereby captures spatial diversity in the crop management practices . The PPPM was applied for paddy rice mapping in China (Xiao et al. 2005b;Wang et al. 2015;, South and South-east Asia (Xiao et al. 2006;Dong et al. 2016;Zhang et al. 2020), and in India (Teluguntla et al. 2015). These studies were conducted using varied datasets, starting from high temporal but coarse spatial resolution (MODIS 500 m) to finer spatial with lower temporal (revisiting interval) resolution (Landsat-8 30 m imagery) data. The magnitude of the overall accuracies that these studies were reported was high enough (89 À 98%) to comprehend the model's efficacy in differentiating the paddy-rice pixels from other crop pixels. However, it is unclear whether the PPPM model performs at the same efficacy in the Lower Gangetic Plain (LGP), which is characterized by small landholding sizesmaller than a Landsat-8 pixels size, and frequent submergence of the field due to flood and heavy rain due to monsoon during sowing through physiological maturity stage. For the PPPM approach, the difference of LSWI and EVI during sowing through transplantation stages was used to select either static or dynamic thresholds for mapping paddy rice (Xiao et al. 2005b;Peng et al. 2011). Such condition is useful in mapping paddy rice fields as paddy rice needs stagged water for couple of weeks for growth after transplantation. However, the mapping also includes any surface flooded with the water. This enhances the chances of wrongly classifying the paddy rice fields. As LGP is characterized by the frequent flooding events throughout the monsoon season due to periodic heavy rain, using a threshold approach only during sowing through transplantation stage might yield an inaccurate estimation of active paddy rice fields. Therefore, in this study we propose a modification to the existing PPPM method through incorporation of the difference of LSWI and EVI later in the middle of the crop cycle. We believe that such inclusion improves the estimation of crop area under the active paddy cultivation. The objectives of the present study are (a) to map the paddy rice area during the monsoon using suggested modification of PPPM method, (b) to estimate the efficiency of the proposed method with reference to the existing PPPM and standard machine learning based random forest classification output, and (c) to compare the estimated paddy rice statistics with the national agricultural statistics.

Study area
The LGP shares the international territory between India and Bangladesh. It is one of the world's most extensive flood plain regions (Singh et al. 1998). In this study, we took the Indian part of the LGP that primarily consists of the eastern Indian state, West Bengal ( Figure 1). The state of West Bengal comprises 22 districts, as mentioned in Figure 1. Topographically, the western part of the study area is connected with the plateau fringe with undulating topography (elevation of 400 À 590 m above mean sea level), while the northern part covers Shiwalik Himalaya with elevation ranges up to 3,616 m ( Figure 1a). Three major river systems, i.e., Teesta in the north, Bhagirathi-Hooghly of Ganga River in the south, and Ajay, Damodar, and Dwarkeshwar river system in the west, feed this alluvial floodplain leading to an intense cropping practice. The climate of the area is governed by the monsoon system (Attri and Tyagi 2010;Jin and Wang 2017). The study area is characterized by frequent floods throughout the monsoon season due to the combined effect of heavy rainfall and the release of the water from reservoirs (Jha and Bairagya 2013), leading to crop yield loss because of submergence of the crop fields.
The total geographical extent of the study area is around 8.68 million ha, of which 5.25 million ha is the net sown area, comprising 68% of the geographical area. The study area is known for its intense cropping practices with an average landholding size of 0.76 ha (http://matirkatha.gov.in/agricultural-scenario-in-west-bengal/). Most of the croplands are under a triple cropping pattern, with rice (Oryza sativa) as the predominant crop. The crop calendar starts with monsoon crops (Kharif crops: from July to November) (Prashnani et al. 2014) followed by winter crops (Rabi crops: from December to April) and ends with summer or pre-monsoon crops (Zaid crops: from May to July). In West Bengal, the monsoon crop is dominated by paddy rice followed by redgram, grown as a secondary crop. The winter crops are also dominated by rice, followed by gram, lentil and wheat, and the summer crop is dominated by gram, blackgram and pulses, including the pre-monsoon paddy rice.

Sentinel-2 multispectral data and pre-processing
In this study, Sentinel-2A/B Multi-Spectral Instrument (MSI) Top-of-atmosphere (TOA) reflectance (Level-1C) data was used from July 2018 to December 2018. The Sentinel-2 MSI images with 10 m spatial resolution and a 5-day repetitive coverage over an area around the globe are provided by the European Space Agency (ESA). The TOA reflectance data was used due to the limited access of the Sentinel À 2 surface reflectance product before 2019. Additionally, the choice of the TOA product was guided by the selected timeline for this study which is also steered by the availability of the statistical data -discussed in section 2.4as one of the reference data for this study.
The wavelengths of the Sentinel-2 bands range from 442.3 nm to 2,185.7 nm with 10 m (B2, B3, B4, B8), 20 m (B5, B6, B7, B8A, B11, B12), and 60 m (B1, B9, B10) spatial resolutions. The time series TOA reflectance products were called in Google earth engine (GEE) cloud computing facility -made for planetary-scale analysis (Gorelick et al. 2017), and the corrections were made for reducing the atmospheric effect and cloud cover. The TOA reflection was corrected by applying sensor invariant atmospheric correction (SIAC) module, developed by , in the GEE. The SIAC exploits operational global datasets on bi-directional reflection distribution function (BRDF) and coarse resolution atmospheric data to estimate surface reflection from TOA product. The method uses a Bayesian framework to model surface reflection using standard BRDF product from Moderate Resolution Imaging Spectroradiometer (MODIS) and atmospheric products from Copernicus Atmospheric Monitoring Services. It gives information of aerosol optical depth and total columnar water vapor thickness with the surface reflection from TOA reflection product in due course of solving the inverse radiative transfer problem. After the atmospheric correction, the cloud pixels were removed from the scenes. Since the monsoon cloud varies in brightness over the study area, the cloud pixels were removed based on the cloud-score algorithm (Qadir and Mondal 2020;Nanshan et al. 2021). The total number of image before and after the cloud removal was counted within the selected time interval of July 2018 to December 2018 and represented in Supplement Figure 1. The removal of cloud from the images produces gap. We further processed these atmospherically corrected Sentinel-2 time series images following the gap-filling process by considering the mean of total six consecutive pixels (three pixels below and three above a gap pixel) in the time series which spans over 30 days.
The spectral indices of EVI and LSWI were computed from these corrected time series images using Equation 1 and 2. Prior to computing the spectral indices, the SWIR-1 image layers were resampled from 20 m to 10 m with reference to other spectral channels involved in the analysis. Moreover, a linear fit smoothing function was applied on time series EVI and LSWI to reduce the residual effects of atmosphere emanating from cloud, aerosols and columnar water vapour. Linear fit uses a simple moving window algorithm where a subset of data from the whole data set is used to estimate the coefficients of linear fit (Khanal et al. 2020). The window size of the linear fit smoothing was decided based on the R 2 value. We used a window size of 20-time steps that has an R 2 of 0.93 ( Figure  2).

Lulc data
We used the cropland land layer from a standard global Land Use Land Cover (LULC) classification ( Figure 1(b)), available at the 10 m resolution based on Sentinel-2 data (Gong et al. 2019). This global LULC layer was prepared using the Random Forest (RF) classifier, where the training set contains approximately 340,000 samples across the world, with an overall accuracy of the classification of 72.76%. The cropland pixels from this LULC layer were separated to build a thematic layer of cropland that was further used to mask the time series LSWI and EVI pixels. Prior to masking, the accuracy of the cropland mask was evaluated using the ground truth points (GTP)discussed further in the section 2.5. The accuracy metrices of the cropland layer from the LULC product of Sentinel À 2 is given in Supplement Table 1. The global permanent surface water layerprepared from Landsat 5, 7 and 8 at 30 m resolution -was also used in this study (Pekel et al. 2016). The purpose is to increase the classification accuracy by removing the spectral response from the aquatic weeds which replicates the spectral response of vegetation and cropland in the NIR and Red bands (Silva et al. 2008;Schmidt and Witte 2010;Singh et al. 2020;Akbari et al. 2021).

Regional cropland statistics
We used the district-level net sown area (NSA) and the paddy rice area statistics during monsoon season. The NSA, including the area statistics for different crop types for each cropping season, is assessed by the Directorate of Economics and Statistics (DES) under the Ministry of Agriculture (MoA), Govt. of India (http://eands.dacnet.nic.in/). The MoA conducts yearly field surveys on the nine-fold classification of land use and land cover. It includes the irrigated area (source wise and crop-wise) and total area under the crops at the district level for the whole country. The DES data contain seasonal crop statistics, i.e., NSA of different crops, crop yield and production per year, etc. The agricultural statistics of monsoon paddy rice was taken from DES for the year 2018.

In-situ data
A field survey was conducted in 2018 for collecting adequate training samples for conducting the rice detection from Sentinel À 2. A sum of 1263 ground truth points (GTP), which include rice and non-rice (e.g., other crops except paddy rice, vegetation, fallow land, build-up) fields, was collected by Garmin GPS (Garmin eTrex 10; accuracy $ 3.65 m). All GTPs were collected by a random sampling procedure. Out of 1263 GTPs, a total of 665 (52.9%) GTPs were collected over paddy rice fields during the monsoon season, and 598 (47.1%) GTPs form non-rice fields (Supplement Figure 2). For the purpose of reducing the uncertainty of positional vector, we selected the sample paddy fields which were within the continuous clusters of agricultural land having a size greater than 2500 m 2 . All of these GTPs were further visually checked using high-resolution Google Earth and Sentinel-2 RGB composition to ensure the reliability of the data.

Precipitation data
We used daily precipitation data from Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) in support of upholding the hypothesis we made at the beginning. Since heavy precipitation causes waterlogging situations at least for the low lying or poorly drained areas, precipitation anomaly can reveal the spatial distribution of potential waterlogged areas, and that helped in the detection of active paddy rice fields from Sentinel-2. The CHIRPS-based daily precipitation product is modelled from groundbased observations and satellite data (Funk et al. 2015). The precipitation data towards the end of monsoon season, 20 th September to 10 th October, was called in GEE-interface (Gorelick et al. 2017). We took the mean precipitation for the said time-window for the year 2018. The mean precipitation for 2018 was subtracted from the long-term average from 1988 to 2017 compute the precipitation anomaly.

PPPM Algorithm and its improvement
The PPPM is a well-recognized algorithm to detect paddy rice fields from optical remote sensing data (Xiao et al. 2005b). The paddy rice has three main phenological stages covering a period of approximately 100-120 days. The typical three stages of paddy rice over Asia during the monsoon are a) the flooding/transplanting stage (day of the year (DOY): 213 À 262 [1 st August -19 th September]), b) the growth stage, which is featured by vegetative growth, flowering, and reproductive activity (DOY: 263 À 283 [20 th September -10 th October]), and c) the harvest phase (DOY: 284 À 334 [11 th October -10 th December]) with mixture of bare soil and rice plant residues. In the transplanting phase, land surface pixels are often mixed with water and sporadic green plants (Teluguntla et al. 2015). After 50-60 days from the transplanting date, the standing water in the field, which is approximately 2 À 10 cm deep, is mostly encroached by the rapidly growing canopy . The growth curve, at its peak, is called maximum greenness time (MGT), which is characterized by closure of background soil through expansion of foliar surface. Following this physical maturity stage, the leaf chlorophyll and moisture gradually decrease and become minimum at the harvesting stage.
The spectral reflectance of the water suggests the presence of a strong water absorbance band at SWIR (1630 À 2130 nm), leading to low surface reflectance of water body in SWIR, while it is relatively higher in the NIR channel ($ 856 nm) (Jensen 2014). The spectral response of water also shows a higher reflection in Red and a lower reflection in NIR. Thus, a water surface yields a positive value in the spectral index of LSWI while it gives a negative value in the spectral index of EVI. During the transplantation stage, a high positive value of LSWI (Equation (2)) and a relatively lower EVI (Equation 1) value suggests the presence of stagnant water in the rice field (Gao 1996). After 3 À 4 weeks of transplanting, the paddy rice fields are converted into a mixture of surface water and paddy rice plant, as the growing canopy of the paddy rice reduce the moisture laden background soil surface area and enhance the reflectance in the NIR channel, leading to increase in EVI signal in comparison to LSWI. The PPPM, as given by Xiao et al. (2005b), hypothesized that a temporary inversion of vegetation index leads to mapping of the flooding and transplanting pixels, and thus, it uses a conditional approach combining LSWI, NDVI and EVI (Equation (3)) for identifying the flooded paddy rice pixel during the transplantation stage. For this study, we used a combination of LSWI and EVI at the transplantation stage. A constant of 0.05 was used in Equation (3), as proposed by Xiao et al. (2005b), for considering the fact that the difference of EVI and LSWI must be less than 0.05 or 5% for a range of different statuses of soil-water-vegetation complex at the time of transplantation of paddy rice (EVI -LSWI 0.05). The EVI was chosen over NDVI due to its better sensitivity to the rich biomass content that is generally achieved within 4 À 5 weeks after transplantation (Huete et al. 2002). We further used a modified form of Equation (3) at the MGT stage, which mathematically remains similar, but the condition that is used here (Equation (4)) is different from Equation (3). The paddy rice pixels can be identified during the transplantation stage by selecting a value greater than 0given in Equation (3) -from the time series of the difference of LSWI and EVI. However, as the beginning of the paddy rice during monsoon is associated with the frequent flood, the chance of satisfying the criteria, expressed in Equation (3), by other standing crops is unavoidable. Despite of having differences in the mean vegetation vigore manifested by average EVI for the whole crop cyclethe crops, other than paddy rice in the monsoon season, which do not need waterlogged situation at the beginning of their phenological cycle, may classified under paddy rice due statisfcation of critetion stated in Equation (3). To deal with this issue and correctly classify the rice pixels, we used the following adjustment in the existing PPPM: a condition of [(LSW þ0.05) -EVI] < 0 for mapping the paddy rice pixels was added towards the MGT stage (Equation (4)). Such condition helps to separate any crop fields submerged due to the presence of residual floodwater by the time of MGT from active paddy rice fields. Hereafter, the suggested improvement over the existing PPPM is called improved PPPM (IPPPM). Given the threshold conditions and prior assumption, i.e., LSWI could not be greater than the EVI during the MGT period, the non-rice, including some of the paddy rice crop pixels, which are flooded, could be discriminated from the active paddy fields when [(LSWI þ0.05) -EVI] > 0. A schematic representation of the methodology of this study is given in Figure 3.
MGT : A plot of samples of EVI, LSWI and [(LSWI þ 0.05) -EVI], meeting these criteria, is given in Figure 4. Time series layers were prepared for [(LSWI þ 0.05) -EVI] from EVI and LSWI, and above-stated conditions were run for all layers to identify the potential paddy rice pixels during the transplantation stage that we considered from 4 th July to 10 th September, 2018. The process produced a binary layer of paddy rice and non-paddy rice. This binary layer was later used to mask pixels from time series [(LSWI þ 0.05) -EVI] layer during the MGT (September, 20 th -October, 10 th ), and the second conditiongiven in Equation (4) was used to filter out the candidate pixels which were still in submerged condition, leading to correctly identify the paddy rice pixels. These two sets of date ranges, i.e., timelines for transplantation and MGT, were used as the average timeline based on the information gathered during the field survey, including a detailed examination of smoothed time series curves of EVI and LSWI. To compare the results of the proposed IPPPM, we estimated the paddy rice area during monsoon using PPPM, as  given in Equation 3 using EVI only. The area coverages of the paddy rice -at the district level -estimated from both these methods were then compared with the MoA reported area statistics under paddy rice.

Random Forest classification
We further compared the results of the proposed IPPPM with the machine learningbased Random Forest (RF) classification. The RF is a nonparametric and distributionindependent supervised classifier (Bargiel 2017). Modified from a single decision tree, the RF is featured by an ensemble of decision trees that used bootstrapping and bagging techniques for training the model (Breiman 2001). Because of its superior performances over other standard classification approaches, it was extensively used in studying the LULC and crop type mapping ( Hao et  The time series images of spectral indices such as LSWI and EVIduring transplantation through harvesting stage (July to December) -were used as input variables in the RF model. A sum of 11 image tiles of Sentinel À 2 were used in training and evaluating the RF classification model. Of these 11 image tiles, 3 tilesfrom the central plain of the study area where most of the variations of crop and non-crop categories were observedwere used for training the RF model (Supplement Figure 2). The remaining 8 image tiles were used for testing the classification model. A sum of 72 time series (36 images of LSWI and 36 images of EVI) images per Sentinel-2 image tile were used, and it turned out to be 216 (3 Â 72) time series images for training area and 576 (8 Â 72) images for test area. A sum of 1,257 GTPs used for the RF classification. 70% (880 GTPs in which 478 and 402 GTPs were labelled as paddy and non-paddy rice targets, respectively) of these GTP were used to extract the samples of spectral indices (i.e., LSWI and EVI) from the training image tiles for building the model. Whereas, the sample observations of rest of the 30% GTPs (187 and 190 GTPs were labelled as paddy rice and non-paddy rice, respectively) were used from testing image tiles for model validation. The RF classification was performed on the Jupyter notebook using Python packages such as geemap (v0.8.16) (Wu 2020) and Sci-kit learn (v0.24.4).

Feature selection
A total of 72 temporal chronological feature layers of EVI and LSWI were used as input to build the RF classification model. The model was optimized based on the measure of mean decrease in accuracy (MDA) for selecting ideal number of feature layers. We used hyperparameter tuning with 10-fold cross-validation to get the optimum number of features from a set of 72 chronological features. The best-resulted parameters arenumber of tree (ntree): 150, maximum depth (max_depth): 5, maximum features (max_features): 7, maximum leaf nodes (max_leaf_nodes): 10, minimum samples in the leaf (min_sample-s_leaf): 3, and minimum samples for splitting each decision node (min_samples_split): 3. We have performed a feature selection step to filter the most contributed features based on MDA. The process turned out with a total of 24 important features, which contributed significantly to the classification.

Validation and comparison
The accuracy of the results from the IPPPM was evaluated in three different ways: (1) an error matrix was calculated between GTP and predicted paddy rice pixels, (2) the area statistics derived from both IPPPM and PPPM models were compared with the reported area under paddy rice from DES, Govt. of India, and (3) the performance of IPPPM algorithm-based classification were compared with RF-derived paddy rice map. It is to be noted that, for assessing the accuracy of the outcome from the IPPPM approach, the GTPs were used only from crop fields. Thus, the total number of GTP used for assessing the accuracy is 953.
Two measures, such as Area Difference (AD) and Percentage of Area Difference (PAD), were used to show the magnitude of error between the Sentinel-2-derived paddy rice area using the IPPPM algorithm and DES-reported paddy rice area (Equation 5-6).
where A Sentinel and A DES represent the area statistics from Sentinel and DES. Moreover, to compare the model performances with reference to DES statistics, the measures of root mean square error (RMSE) and Chi-square statistics were used.

Spatial distribution of paddy rice area in the lower gangetic plain
The paddy rice area during the monsoon period through the IPPPM approach is measured at around 2.98 million ha (mha), which is about 56.72% of the net sown area. The spatial distribution of mapped paddy rice area, shown in Figure 5, manifests that the territories in the central plain and south-east part of the West Bengal, i.e., Medinipur (East), Medinipur (West), Hooghly, Bardhaman (East), Bardhaman (West), Bankura, Birbhum and South 24-Parganas, respectively, share a significant amount (> 60%) to the NSA (Table 2). In comparison, the territories in the northern part of West Bengal share less than 50% to the NSA.

Accuracy assessment
The overall accuracy estimated from IPPPM is about 85%, including producer's (PA) and user's accuracy (UA) of 97% and 82%, respectively for paddy rice, and PA and UA of 65% and 93% for non-paddy rice ( Table 3). The RF model, with 24 best-featuring variables and a mean out-of-bag error of less than 0.05 (Figure 6(a) and (b)), shows an overall accuracy of 88% for separating paddy rice and non-paddy rice fields, including PA and UA of 90.3% and 85.5% for paddy rice, and 86.5% and 91%, respectively for non-paddy rice (Table 3). For all optimized featuring variables in the RF classifier, the contribution of LSWI during September to December overwhelmed the others. The ROCs, as derived from the RF model for both paddy rice and non-paddy rice, show high TPR, including AUC more than 0.9 (Figure 6(c)), suggesting a good level of agreement with the in-situ observations.

Comparison among PPPM, IPPPM, and RF models with DES statistics
The paddy rice area, estimated through the PPPM, is about 3.09 mha, while after applying the correction through the IPPPM method, about 2.98 mha paddy rice area is estimated. The difference is around 0.11 mha between the PPPM and IPPPM approaches. The difference is generated from the elimination of paddy fields following Equation (4) due to stagnation of rain water at the MGT stage. At the district level, the estimated paddy rice areas from IPPPM, including the amount of area from the PPPM approach, shows the territories in the south of West Bengal (South 24-Parganas, Medinipur (East), North 24-Parganas, Murshidabad), with average elevation less than 10 m, suffers mostly from waterlogging condition due to monsoon rain leading to loss of active paddy rice area ( Figure  7). The results also demonstrate that, besides the territories mentioned above, a small amount of paddy fields is affected too either by flood or by water logging due to heavy rain in the rest of the districts. Supplement Figure 3 demonstrates evidence of this water logging due to heavy rain during the MGT stage. The estimated paddy rice area from the RF model is 4.93 mha, notably higher than the area estimated from both PPPM and IPPPM methods. The degree of correlation between the modelled paddy rice area, from PPPM, IPPPM and RF, and reported area from DES during the monsoon season demonstrated (Figure 8) that RF gives the best estimate with R 2 of 0.98. While the overall performances of PPPM and IPPPM are not significantly different (R 2 ¼ 0.90 for PPPM and 0.91 for IPPPM) as the model comparison through Chisquare (v 2 ) test reveals a v 2 value of 1.67, which is less than the critical value of 11.59 at the p-value of 0.05 with 21 degrees of freedom. Yet, there is a credible difference in the magnitude of RMSE for PPPM (54885.1 ha) and IPPPM (58854.8 ha) results. The PAD of the estimated paddy rice area from IPPPM and DES shows a difference of about 1.02 mha ( Table 4). The PAD at the district level shows many of the northern territories, including some in the south and south-west of the study area, have more than 30% difference. However, the PAD for the territories -belong to the central-plainexhibit fewer than 25% difference.

Discussion
The PPPM-based algorithm detects the paddy rice pixels work reasonably well at the global scale Xiao et al. 2006). However, it faces some critical issues in identifying the paddy rice areas in monsoon Asia. The major issues related to this are, 1) the time series curve of paddy rice, mapped through LSWI and EVI, during the transplanting stage is similar to wetland or any waterlogged surfaces after precipitation, 2) often, after a short period of the transplantation stage, due to the occurrence of flood, the paddy fields are left untreated due to continued waterlogging leading to loss of production. In dealing with such issues, especially in monsoon dominated areas, the mapping of paddy rice through the PPPM-based approach might fail to estimate actual paddy rice area under active cultivation. In this study, we have modified the existing PPPM approach by allowing an additional criterion during the MGT stage (Equation (4)). The inclusion of the difference of LSWI and EVI at the MGT stage and its conditional criterion has efficiently mapped the active paddy rice area by excluding the submerged pixels. This results in a lower estimation of the total paddy rice area, which remains notably higher in the conventional PPPM approach. To show evidence of such overestimation, four spatial windows were selected, as given in Figure 9. The identified rice pixels through PPPM in those spatial windows are considered paddy rice as they were submerged during the transplantation stage. A notable amount of these pixels is still appeared with a high LSWI value in the MGT, suggesting that they are either destroyed (due to submergence) paddy rice fields or non-paddy rice areas with stagged water (Figure 9(b)-(i)). Further analysis of the precipitation anomaly through CHIRPS data at the MGT during monsoon 2018 suggests that about 74% of the area have faced negative rainfall anomaly, while some of the areas in the northern and southern part of West Bengalamounting to about 26% of the total area -have faced positive rainfall anomaly (supplement material, Figure 4). Many of these areas, as evidenced in Figure 9, show signs of waterlogging during the MGT stage. Therefore, the occurrence of heavy rain, irrespective of rainfall anomaly, during the MGT stage is associated with the waterlogging condition due to the flat alluvial terrain (supplement material, Figure 3). The evaluation of the performance of IPPPM with reference to PPPM and standard RF-based classification model, while comparing the results with DES statistics, suggests the RMSE is higher for IPPPM (58854.8 ha) as compared to PPPM (54885.1 ha) and RF (51064.1 ha) since it excludes the submerged pixels due to their potential loss. A high degree of association of estimated paddy rice area (slope (b 1 ) ¼ 1.2 with R 2 ¼ 0.98) from  the RF-based model, with reference to DES area statistics, suggests RF overestimates the paddy rice area. Such a higher degree of association is also attributed to the large training set with sufficient spatial variation in the rice varieties, including the hyper-tuning parameters optimized during the training stage. The estimated area from PPPM and IPPPM, on the other hand, remains largely underestimated with reference to DES statistics as the estimated slope value for both of them remains below 0.8 (Figure 8a, 8b). One of the reasons for such underestimation is linked to the condition in Equation (4) used to identify the paddy rice pixels. The diurnal cycle of standing water in the paddy rice fields is what makes the fields identifiable through the mathematical condition of PPPM. However, when the satellite passes over an area, the absence of standing water resulting in a negative difference of LSWI and EVI (Equation (4)) shortly after the transplantation stage. Such physical condition resulting in the noted underestimation in the paddy rice area. In conjunction with that, the plausible source of underestimation could be related to the cropland layer used in this study from Sentinel-derived LULC product. Moreover, the inconsistency in season-wise official reporting of the area under paddy rice by DES is contributed to the noted difference. The inclusion of the additional criterion during the MGT stage in the IPPPM approach, which led to the exclusion of the submerged pixels from the target cluster pixels of paddy rice, caused further loss of paddy rice pixels. Nevertheless, it holds a map of the paddy rice area under active cultivation. For PPPM and RF-based output, the traces of such active paddy rice pixels, given the occurrence of flood conditions, are less likely. Figure 10 shows an apparent difference of the mapped paddy rice between IPPPM and RF-model with reference to DES statistics. The comparison of the model outputs shows RF-based classification model performs better than IPPPM model though it needs a large volume of training data with adequate quality. The implication of IPPPM can overcome these constraints through a simple approach of conditional thresholding. A plethora of studies existed on the rice area mapping at different spatial scales in the last decade; nearly all of them used automated to semi-automated methods of machine learning (Nanshan et al. 2021;Chen et al. 2020;Gumma et al. 2020;Htitiou et al. 2019;Zhang et al. 2018), deep learning (Shao et al. 2001; Chen and Mcnairn 2006), spectral matching (Gumma et al. 2014), pixel and object-based method (Belgiu and Csillik 2018) and unsupervised clustering algorithms (Panigrahy et al. 2010). The majority of these worksfor rice area detection from remote sensing imagesare band-dependent with a single or a cluster of images. The phenological advancement with changing moisture conditions was deeply undermined in these works. However, in 2005, Xiao et al. (2005b, Xiao et al. 2006) provided a strong foundation for the PPPM approach for paddy rice area detection. This simple mathematical algorithm worked fairly well for monsoon Asia as well as for the global paddy rice area mapping. Yet, it is not capable enough to map active paddy rice areas during the monsoon season in the regions where floods are an integral part of the agricultural system. The submergence of the cropped area due to monsoon rain and the periodic release of stored water in reservoirs causes a range of flood events that vary in severity. The loss of the paddy rice fields due to such events during the post-transplantation through the MGT stages was not considered in the PPPM approach so far. The inclusion of flood effect in the form of prescribed modification during the MGT stage within the PPPM is what makes IPPPM approach effective in mapping the active paddy rice area during the monsoon season; the problem of underestimation with reference to DES statistics remains unavoidable, though. Dong et al. (2016) showed exclusion of flooded pixels during the monsoon season improved the model's accuracy and produced more reliable results. Teluguntla et al. (2015) also applied the PPPM approach on 8-days MODIS-EVI and LSWI over the Krishna River basin (KRB) in south India to map the paddy rice area. Since there is a difference in monsoon rainfall characteristics between KRB and LGP, the prolific flood pixels are absent in the KRB, whereas they are abundant in the LGP. Therefore, the suggested modification is pertinent for near real-time mapping of the active paddy rice area in this part of the LGP. With the availability of the GEE cloud-based computing facility that allows planetary-scale analysis through parallel processing in multiple GPUs, mapping of the near real-time rice field, Figure 10. A comparative spatial plot of the outcomes from Random Forest model and IPPPM model. especially during the monsoon season, through the IPPPM approach provides up-to-date areal coverage of paddy rice.

Conclusion
The study fosters the strength of the high-resolution Sentinel À 2 time-series images with five days of repetitive coverage to mapping a high-resolution paddy rice area during the monsoon season at a 10 m scale. Since monsoon in the LGP is featured with frequent flooding, mapping paddy rice area during this season from space-based optical images needs special effort to enhance the accuracy for area estimation. The present study addresses this specific issue by bringing modifications to the existing PPPM approach.
The study also deals with the problem of cloud cover that is persistent over the study area during monsoon. After atmospheric correction and screening off the cloud from the scenes, the residual effect was minimized through linear smoothing by selecting the best fit time window. Such smoothing helped to restore the temporal signal of vegetation indices (i.e., EVI and LSWI) which, otherwise, would have lost their identity as crop pixels. The results suggest a wide variation of spatial-scale accuracy from north to the south of the study area with reference to the statistical data. Yet, the validation of the model's outcomein terms of the point-based accuracy -remains fairly similar to the RF-based result. Based on the outcome, we conclude that there is topographic control that led to the higher amount of offsets with respect to statistical data for territories belong to mountainous and residual-hill regions. The findings also led to the conclusion that the IPPPM approach estimates a lesser amount of paddy rice area in comparison to the conventional PPPM approach due to the exclusion of flooded fields during the MGT stage, though the principle of identification of paddy rice remains the same for both the approaches. With prolific flood events in the monsoon period, the proposed IPPPM approach -as manifested in the results -is an appropriate method in mapping active fields under paddy rice cultivation in the LGP. For periodic active rain-spells of the south-west monsoon wind, the areas, elsewhere in the world, featuring with poor drainage condition can face similar waterlogging condition for the paddy fields. Therefore, the present experimental setup could be transferred to other areas, preferably to the rice growing areas in the south and south-east Asia where monsoon rainfall plays a vital role for paddy rice cultivation. The findings of this study also suggest, despite having a quality of underestimation in terms of areal coverage, the outcome of the IPPPM method is not significantly differed from other standard approaches, and such an evidence indicates towards the efficacy of this modified approach in mapping active paddy rice at different spatial scale. The GEEbased cloud computing facility, in this connection, provides an opportunity for near realtime mapping of the active paddy rice area. Therefore, the periodic monitoring of the active paddy rice area through the IPPPM approach has a direct linkage to the total production and yield variation over the years while addressing the issues of food security and availability and price of the rice in the market at the same time.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was work supported by Microsoft Research and University Grants Commission.

Data availability statement
The authors confirm that the data used for preparing this manuscript will be made available upon request.

Code availability
The codes used for the analysis in this paper can be found in the following link: https://github.com/ara-binda90/S2-Rice-area-map