Geographical origin traceability of Sengcu rice using elemental markers and multivariate analysis

ABSTRACT Multi-element analysis combined with chemometric method has been used to investigate the distinguish between Sengcu rice and other types of rice origins in Vietnam. In Sengcu rice, As, Ba Sr, Pb, Ca, Se were confirmed as the key elements for geographical traceability among three fields of Lao Cai, whereas Al, Ca, Fe, Mg, Ag, As were major factors to distinguish between Sengcu and other types of rice. Based on linear discriminant analysis and partial least squares-discriminant analysis model, overall correct identification rates distinguishing between Sengcu and other types of rice were approximately 100% in both training and validation test. Moreover, to distinguish geographical origin of Sengcu rice samples, these rates vary from 80% to 99%. These results suggest the presence of food adulteration illustrated in the latter.


Introduction
Rice is one of the most essential food crops providing high energy (carbohydrates), vitamins and minerals for human daily on a global scale Wang et al. 2020b). In recent years, with development of international markets and demands for high-quality agricultural products, safety and authenticity of foodstuffs have become major concerns for consumers. Although there have been numerous solutions to cut down adulterating agricultural products, frauds in rice trade are increasingly recorded in a number of countries around the world. Therefore, rice with a geographical origin labels help decrease adulterating of rice products, and thus protect consumers' rights and improve credibility of producers and traders in global markets.
Sengcu rice was known as a specialty of Lao Cai province, Vietnam. It is grown on terraces that have water sources from nearby streams and normal daily temperature in the range of 18-25 degree Celsius. Sengcu rice contains higher levels of nutritional contents, including vitamin E, vitamin B and fibre compared to other types of rice (Bui et al. 2016). Besides, for monorities in Lao Cai, Sengcu rice is one of the famous local specialities and profitable crops. In recent years, the price of Sengcu rice has increased leading to improvement in living standards of ethnic minorities (Bui et al. 2018). Because of the high price, many traders often created adulterated rice by mixing original Sengcu rice with other types and adding flavour (Kusano 2019). Therefore, it is highly essential to protect rice brands and determinethe geographical origin of Sengcu rice.
In recent years, there are numerous advanced techniques used for food authenticity and traceability, including spectroscopic techniques, isotope ratios, liquid and gas chromatography and elemental analysis (Qian et al. 2020;Wadood et al. 2020). In particular, spectroscopic techniques have become common techniques to distinguish the authenticity of foods because they include vibrational (Nardecchia et al. 2020), hyperspectral (Hong et al. 2020), and nuclear magnetic resonance signals , Le Gresley et al. 2019). These techniques are rapid, cost-effective and involve few or no sample preparation (Zhu et al. 2018;Sha et al. 2019;Tian et al. 2020a;Xu et al. 2020). For example, Raman spectroscopy combined with a support vector machine has been used to identify rice-producing areas in China (Tian et al. 2020a), giving a correct rate up to 90% and near-infrared spectroscopic technology combined with multivariate analysis was reported as an approach for classifying rice (Sampaio et al. 2019). However, the main drawback is low accuracy due to less sensitivity and high noise.
On the other hand, mass spectrometry techniques such as isotope ratio mass spectrometry (IR-MS), liquid and gas chromatography-tandem high-resolution mass spectrometry (LC-and GC-HRMS) and elemental analysis were extremely sensitive and possess high accuracy (Mahne Opatić et al. 2018;Ni et al. 2018;Park et al. 2019;Qian et al. 2020;Wang et al. 2020b). In particular, multielemental analysis and isotope ratio methods are easy to implement and more stable than others (Coelho et al. 2017;Katerinopoulou et al. 2020;Qian et al. 2020;Wadood et al. 2020). Samples containing chemically identical compounds can be distinguished by the IR-MS technique based on their isotope content. The most commonly used isotope ratios of elements for the determination of geographical origin and authenticity of food products include 13 C/ 12 C, 2 H/ 1 H, 15 (Srinuttrakul et al. 2019) and to determine the geographic origin of Chinese mitten crab (Luo et al. 2019). In terms of the multi-elemental method, inductively coupled plasma mass spectrometry (ICP-MS) is a very powerful analytical technique, which is useful for geographical origin information by analysing several inorganic elements, which served as the fingerprint for geographical traceability. For example, geographical identification of Chianti red wine (Bronzi et al. 2020), Wuchang rice (Qian et al. 2019b), Asian rice (Chung et al. 2018) were carried out and achieved high correct rates, all above 95%, by utilising multi-element analysis combined with multivariate analysis. Hence, it was proved that the accuracy of the IR-MS and multi-elemental methods is very high. Nevertheless, compared to the multi-element method, IR-MS not only costs a high amount of money but also requires modern equipment Choi et al. 2020;Wadood et al. 2020).
For food authenticity research, chemometric analysis plays an important role in classifying products based on their origin, variety, or other properties (Maione and Barbosa 2019). The most common multivariate analysis method for verifying purpose includes two main gender algorithms: exploration (unsupervised) and regression (supervised) (Gautam et al. 2015). In relation to the exploration method, principal component analysis (PCA) is known as a useful tool for reducing and screening preliminary data (Maione and Barbosa 2019;Wadood et al. 2020). In previous studies, PCA analysis results are often used as input data for other regression models; however, these reduce the accuracy of the algorithm. Concerning the regression method, linear discriminant analysis (LDA) and partial least square discriminant analysis (PLS-DA) are major algorithms for regression (Wadood et al. 2020). These methods have been used in previous publications with high sensitivity and accuracy (Park et al. 2019;Choi et al. 2020;Wang et al. 2020a;Tian et al. 2020b).
Currently, many studies only carried out traceability among locations with large distances and most of them were evaluated once, so the variation is not confirmed over time. Hence, this study aimed to: (i) measure the multi-elements in Sengcu and other kinds of rice and evaluate levels of contamination of heavy metals in rice from Vietnam; (ii) determine stability of 44 elements in Sengcu rice in three crop periods starting from 2019, which is a premise for geographical traceability and prevents mislabel of Sengcu rice; (iii) construct a model using multivariate analysis to distinguish between Sengcu and other rice types for the purpose of preventing Sengcu adulterated rice authenticating the geographical origin of Sengcu rice among three fields cultivated this variety of Lao Cai province.

Sample collection
259 Sengcu rice samples were collected from three fields (MuongVi (MV), BanQua (BQ) and BanXen (BX)) of Lao Cai province during three crop periods between 2019 and 2020. In addition, 13 soil samples and 22 water samples were also collected in these fields and stored in plastic bags and PE bottles, respectively. 86 other rice samples from the North (N), Central (C) and South (S) of Vietnam were also collected in the same period. Rice samples were gathered from 3 to 4 random plots diagonally across the field, each plot was 50 m apart. Each soil sample was taken from a 20-cm-thick layer of soil from the cultivation substrate, at the same plot when collecting rice samples. Water samples are acidified with concentrated HNO 3 acid. The sample collection diagram and weather overview are summarised in Figure 1 and Tables S1 and S2 (supplementary information, which can be obtained from the corresponding author).

Rice samples
All rice samples were dried at 40°C to constant mass. After being dried, the grains were pounded, peeled and the husk was removed. The samples, after being cleaned, were finely ground in a Universal Cutting Mill (PULVERISETTE 19, Idar-Oberstein, Germany), sieved through a 0.1 mm sieve and stored in plastic bags at −80°C. Rice samples (0.25 g) were weighed into a polytetrafluoroethylene (PTFE) vessel. A volume of 4 mL H 2 O, 2 mL of concentrated HNO 3 and 2 mL of 30% H 2 O 2 were added. After 30 min, the samples were digested using a Mars X-press plus microwave digestion system (CEM, Matthews, NC, USA). The digestion programme is as follows: the temperature was increased linearly to 120°C for 15 min, kept for 10 min and increased from 120°C to 160°C for 10 min, kept for 10 min and finally increased to 180°C within 10 min and kept for 30 min. The digestates were cooled to room temperature and accurately diluted to 25 mL with ultrapure water.

Water samples
Water samples were pretreated according to EPA 3015A Method (US EPA 2007a). First of all, the water samples were filtered through a 0.45 µm cellulose acetate membrane, which was pre-rinsed with diluted 5% acid nitric. 20 mL of each water sample and 2 mL of concentrated HNO 3 were transferred into the vessel and then treated with the digestion microwave system. The digestion programme was as follows: the samples were heated to 170°C for 10 min then kept for 30 min. After being cooled, the samples were transferred to 25-mL volumetric flasks and diluted to the labelled mark using ultrapure water.

Soil samples
The process for treating soil samples was carried out according to the EPA 3051A Method (US EPA 2007b). Firstly, the soil samples were dried in an oven at 40°C to constant mass. Subsequently, they were ground and filtered through a 1 mm sieve to remove unnecessary contaminants (twigs, rock, etc.) and a 0.1 mm sieve to homogenise the samples. Then, they were dried at 60°C to constant mass and stored in a desiccator. The digesting procedure was: 0.1 g of each homogenised sample was treated with 4 mL of concentrated HNO 3 in a teflon vessel for 2 h and then 2 mL of 30% H 2 O 2 was added into each vessel. The digestion process involved several single stages: the temperature was increased from ambient temperature to 180°C for 5 min and kept for 20 min. Finally, the samples were transferred to volumetric flasks after cooling and finally diluted to 25 mL by ultrapure water.

ICP-MS analyses
The concentrations of 44 elements (Table S3) was determined by a quadrupole inductively coupled plasma mass spectrometer (ICP-MS) (iCAP TM TQ ICP-MS, Thermo Scientific, USA). The optimal operating condition of the ICP-MS was as follows: RF power 1 550 kW; plasma argon flow rate 14 L min −1 ; auxiliary Ar flow rate 0.8 L min −1 ; carrier Ar flow rate, 1.0 L min −1 ; sample depth, 9.0 mm; spray interface temp, 2 ºC; sample flow rate, 400 µL min −1 . The sampler and skimmer cones were nickel. Additionally, to obtain the best sensitivity ( 115 In 25.000 cps/ppb), minimal formation of oxides ( 156 CeO + / 140 Ce + <1%) and doubly charged ions ( 138 Ba 2+ / 138 Ba + <3%), the source parameters (gas flows and ion lens voltages) were optimised as recommended by Thermo Scientific. An external calibration curve method was used for the quantitation of all elements in the samples. The instrument was run under a linear multipoint calibration 0.05-50 µg L −1 and analyses were carried out in duplicate to remove the raw error.

Method validation
Limits of detection were calculated by using the intensity signal of the standard and the blank (using ultrapure water containing 2% nitric acid) sample by the IDL = 3SD blank x C x /(S x -S blank ), where SD blank was the standard deviation of the intensity of blank samples, C x was the average signal for the standard samples and S x , S blank were the signals of C x and blank. Limits of detection (LODs) were calculated by LOD = IDL x constant volume/sample weight. The NMIJ CRM7502-a sample was digested according to the optimised method to verify the sample preparation procedure. The recovery, limit of detection (LOD), limit of quantitation (LOQ), precision (RSD%), accuracy (%recovery) of 13 elements in NMIJ CRM7502-a sample were listed in Table 1.
Similarly, the CRM-soil was also digested according to the EPA 3051A Method to assess the accuracy of the sample digestion procedure. For water samples, repeatability and recovery were measured by adding standard solutions at three levels of concentrations to ultra-pure water samples before pretreating. Finally, all experiments were carried out in triplicate to determine the standard deviation.

Data analysis
Initially, analysis of variance (ANOVA) was used to evaluate the difference among elements in rice from different regions. Furthermore, the multivariate statistical analysis of data obtained from ICP-MS was performed by using Origin 2018 pro software (OriginLab Corporation, Northampton, MA, USA). It should be noted that all explanatory variables were scaled and centred before PCA, LDA and PLS-DA. All explanatory variables were centred and scaled by standard normalisation (mean = 0, standard deviation = 1).
The principal component analysis (PCA) algorithm aims to reduce and later-on determine the dimensionality of the common factor space. For partial least squares differential analysis (PLS-DA) and linear differential analysis (LDA), the data of rice samples were divided into training and validation sets: the number of Sengcu rice samples was 224 (including 108 samples in MV; 44 samples in BQ field; 72 samples in BX field) and the number of other rice samples was 71 (including 23 from the Northern, 25 from the Central and 23 from the Southern regions of Vietnam). The number of validation sets was 35 and 15 for Sengcu and other kinds of rice, respectively. PLS-DA and LDA were carried out to evaluate whether rice samples from different regions could be mathematically distinguished based on elements, which had significant differences among the regions.

Characteristic multi-element profiles
Mineral compositions in Sengcu and other rice samples were measured by ICP-MS. Average concentrations and standard deviations of 44 elements in rice samples (2019-2020) are given in Table 2. In the present study, 44 elements were classified into three groups including macro-elements (Mg, Ca), microelements (Al, Cu, Fe, Zn, Mn) and others, depending on the level in rice. In terms of macro-elements, Mg was the most abundant element, ranging from 531 to 2,140 μg g −1 , which accounted for approximately 86% of the total content of 44 elements in rice samples. Noticeably, concentrations of Mg and Ca in Sengcu rice were higher than others. The average concentration of Mg in Sengcu rice, which ranges from 1,636 to 2,138 μg g −1 , was nearly three times higher than that in rice in the Northern, Central and Southern regions of Vietnam. Similarly, the mean quantity of Ca in Sengcu rice was almost 150 μg g −1 , whereas it fluctuated from 49.4 to 82.7 μg g −1 in other rice. The concentrations of Ca and Mg in Sengcu rice were higher than in previous studies related to white rice from other Asian countries such as China (80-100 μg g −1 ; 289-318 μg g −1 for Wuchang rice; Qian et al. 2019b), Thailand (7-150; 500-800 μg g −1 for Thai Jasmine rice; Cheajesadagul et al. 2013, Kukusamude andKongsri 2018), Korea (16.1-118; 1008 μg g −1 ); (Chung et al. 2018), Philippines (21.9-96.4; 1158 μg g −1 ; Chung et al. 2015, Chung et al. 2018. Meanwhile, a similarity in the figures was witnessed among other kinds of rice from Vietnam and Asian countries (Chung et al. 2018;Liu et al. 2019;Qian et al. 2019b). These results could be explained by the soil in which the rice was cultivated in Lao Cai province, which was known as an apatite ore mine (fluorapatite Ca 5 (PO 4 ) 6 F 2 and carbonate-fluorapatite Ca 5 ([PO 4 ]/[CO 3 ]) 3 F genre) in Vietnam. Furthermore, the major topographic of the northwest in Vietnam is limestone mountain, which main compositions are CaCO 3 and MgCO 3 . Therefore, it was possibly the main reason resulting in the highest content Mg and Ca in Sengcu rice, when compared to other rice cultivated regions.
Fluctuations associated with concentrations of micro-elements in Sengcu and other rice samples from different regions in Vietnam are illustrated in Table 2. Generally, the total concentration of microelements accounted for around 6.3% of the total element concentration and Cu was detected in all rice samples with mean values ranging from 1.8 to 4.2 μg g −1 . Besides, the average concentration of micro-elements in Sengcu rice was higher than in other rice. Regarding rubidium, the mean values in Sengcu rice were from 10.0 to 14.2 μg g −1 , which were two times higher than in other rice kinds.
Average concentrations of Al, Mn, Fe, Zn in Sengcu rice fluctuated 6.7-14.0; 16.0-20.9; 21.3-28.3; 23.5-35.7 μgg −1 compared to 3.69-5.58; 13.1-16.0; 12.2-16.7; 13.2-28.3 μg g −1 inother types of rice. The significant difference in the level of macroelements possibly resulted from soil properties in Lao Cai area, which contains high content of chalcopyrite ore (CuFeS 2 ). Concentrations of macroelements in BX were the lowest, compared to those in other fields, except for Zn. These are similar in soil and water samples (Tables S4 and S5) indicating that these are the two decisive factors for the mineral element content in rice grains. Concentrations of Mn, Fe, Rb in Sengcu rice were higher than that in rice from China, Japan and 1.9 ± 0.9 1.0 ± 0.5 2.5 ± 1.4 2.3 ± 0.9 5.0 ± 4.5 5.0 ± 1.1 Th 6.9 ± 2.5 6.2 ± 1.8 6.0 ± 2.1 5.1 ± 1.5 5.9 ± 2.1 4.3 ± 0.4 U 2.1 ± 1.0 2.0 ± 0.8 1.6 ± 1.0 2.2 ± 2.1 4.5 ± 7.0 0.9 ± 0.3 a: μg g-1 Thailand (Chung et al. 2018;Kukusamude and Kongsri 2018;Qian et al. 2019b). This could be the basis to distinguish Sengcu rice from other premium rice varieties in other countries. In terms of trace elements, concentrations of Cr, Ni, As, Sr, Ba, Pb are summarised in Table 2. In Sengcu rice, the average concentration of Cr in MV was the highest (nearly 520 ng g −1 ), while the opposite was achieved for samples in BX (approximately 122 ng g −1 ). The amount of Ni in rice samples varied from 122 to 710 ng g −1 and there was a significant difference in Ni content between Sengcu and other rice. The average concentration of As in BX samples was the highest, while it ranged from 95.9 to 256 ng g −1 in other rice. Similarly, Sr and Ba were quantified in all rice samples fluctuating from 407 to 522 ng g −1 and from 505 ng g −1 to 1.2 μg g −1 , respectively, except BX samples, which contained only 113 ng g −1 and 256 ng g −1 , respectively. Moreover, the content of Pb in Sengcu rice was in the range of 110-252 ng g −1 which was higher than in other rice kinds ranging from 34 to 90 ng g −1 . The concentrations of elements Pb and As in Sengcu and other rice from Vietnam were lower than the maximum limits in rice as prescribed by the European Commision (2006), China (2012) andCodex Alimentarius Commission (1995).
In recent years, numerous studies have reported rare earth elements as the main key to authenticate geographical origin in agriculture and food. In this study, all these elements were also measured in all rice samples and the results are summarised in Table 2. As can be observed to rare earth elements, Ce was found with the highest content in all rice samples, with mean concentrations ranging from 5.6 to 20.6 ng g −1 . Meanwhile, concentrations of La and other elements varied 3.3-8.2 ng g −1 and 0.1-2.0 ng g −1 , respectively. Noteworthy, the ANOVA-test showed no significant difference in the content of rice samples among rare earth elements, except for Ce and La.

Stability of elements in Sengcu rice
The fluctuation of mineral element content in Sengcu rice from MV, BQ and BX are as was evaluated from 2019 to 2020 and results were shown in Table S6. Overall, there were only slightly different concentrations among of macro, micro and trace elements. The results of the ANOVA-test were summarised in Table S7 indicating that there was not a significant statistical difference among concentrations of macro, micro and other elements (except rare earth elements) in rice grains collected from three consecutive crops (p-value >0.05, at a confidence level of 95%).
On the one hand, the average concentrations of macro-elements (Mg and Ca) fluctuated around 100 and 10 μg g −1 , respectively, which accounted for just below 6.0% of the total elemental content in rice. Proportions of concentrations related to other elements such as Li, Al, Sc, V, Mn, Ni, Fe, Cu, Zn, Rb, Sr, Ba were less than 10%. On the other hand, there was a fluctuation in the percentage of the content of Cr, Co, As, Se, Ag, Cd, Pb and rare earth elements, from 14 to 23%. The variation in the content of the mineral elements in rice can be explained by environmental conditions, cultivation routines, irrigation practices, agricultural practices (i.e. fertilisation conditions) and sample collection feasibly change multielemental pattern in rice products Qian et al. 2019aQian et al. , 2019b. However, it is worth noting that Sengcu rice was major cultivated by ethnic minorities such as Giay, Giao, Nung and Tay. Thus, the use of plant protection chemicals and chemical fertilisers is minimal reducing effects of agricultural practices on discriminating the geographical origin of Sengcu rice in Lao Cai (Qian et al. 2019a). Furthermore, these changes or variations were smaller than the differences caused by production locations of the rice products when appropriate elements were selected for monitoring (Chung et al. 2015). To find good elemental markers for determining the geographical origin of rice, further evaluation of the results from this study was accessed by using various statistical analysis such as unsupervised method (PCA) and supervised method (LDA and PLS-DA).

Principal component analysis
PCA is helpful to explain the total variance of the observed variables and to reduce and later-on determine the dimensionality of the common factor space. First of all, marker elements, which may differ among the geographical origins of the Sengcu rice samples, were derived from the 44 elements by the ANOVA-test. These were Mg, Al, Ca, Sc, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Rb, Sr, Cd, Cs, Ba and Pb, so 20 out of 44 elements had significant differences among the three areas in Lao Cai, with p-value <0.05 at a confidence level of 95% (Table S8). The results of PCA (Table S9) indicated that only four components with an eigenvalue larger than 1 could explain approximately 57.5% variance of the database. The first two principal components (PCs) explained most of the variance in the model, in which PC1 explained 31.2% and PC2 explained 11.9% of the geochemical variances. From Table S10, it is evident that PC1 was significantly governed by As, Ba, Sr, V, Ni, Cr and Al with absolute loading values of 0.346; 0.344; 0.316; 0.310; 0.294; 0.279 and 0.270, respectively. Moreover, Pb, Cd, Ca, Se, Zn and Co mainly influenced PC2. PC3 had a high loading for Mg, Fe, Rb and Mn. PC4 was mainly determined by Sc and Cd, while PC5 showed the highest loading for Cs.
A scatter plot based on the first two PCs is given in Figure 2 classifying the samples from three fields cultivated Sengcu rice in Lao Cai province, in which only Sengcu rice from BX could be separated. In contrast, rice samples collected from MV and BQ overlapped gradually and were difficult to separate, because the distance between MV and BQ is only 10 km. PC1 divided rice samples into two groups: BX and MV+BQ, while PC2 is a tool to separate rice origin between MV and BQ. Noticeably, rice from BX has higher concentrations of elements than that from MV and BQ because the topographic area of BX is very flat, whereas MV and BQ are main valleys and terraces.
Secondly, to distinguish between Sengcu and other types of rice from different provinces in Vietnam, a significant difference was experienced in the content of 23 elements (Mg, Al, Ca, Sc, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Rb, Sr, Cd, Cs, Ba, Pb, Ag, La, Ce) using the results of ANOVA-test for each element (p-value <0.05), with the level of confidence at 95% (Table S11) and these elements were used as input data for PCA model. It was indicated in the PCA results (Table S12 and S13) that the first five components with an eigenvalue more than 1, explain the cumulative variance of nearly 64% of the database. Proportions of variance of PC1 and PC2 were nearly 20% and approximately three times higher than the rate of the next three components (PC3-PC5). From Table S13, Al, Ca, Fe, Mg, Cr, Ag, Ce were determined as the main factor in PC1 with the absolute loading value standing at around 0.3. In contrast, PC2 was governed significantly by Sr, Ni, As, Ba, while PC3 had a high loading value for Co, Cd, Pb and the highest figures in PC4 and PC5 were for Cr and Cs, respectively. Figure 3 illustrates that rice samples collected from three regions in Vietnam could be differentiated with Sengcu rice by PCA model, in which PC1 explained 24.4% and PC2 explained 19% of the variance. Nevertheless, the samples of three regions were mixed and overlapped gradually. Since the samples spanned over a wide range of sources related to properties of areas, elevations, longitudes and latitudes, it is difficult to identify their characteristics from other regions in Vietnam by PC2.
According to the above results, regression algorithms were necessary to construct discriminant models to identify the geographical origin of rice accurately. Hence, LDA and support vector machines (SVM) were carried out in the following studies to obtain accurate classification rates of rice samples. Moreover, because of the cumulative variance of the first five components, both databases together possessed approximately 60% (<80%) and therefore the results of PCA were not used as input data for LDA and PLS-DA.

Authentication by LDA modelling
classification technique that aims to maximise the ratio of between-class variance to the within-class variance in order to achieve maximum separability. In this study, the predicted model was constructed based on 20 elements for separating three areas cultivating Sengcu rice. The results of LDA were shown in Table S14 and indicated that discriminant function 1 (F1) accounted for 83.4% of the total variance, which was almost four times as much as the discriminant function 2 (F2). The eigenvalue of F1 was 16.8 in comparison with 3.3 of F2. Wilks' Lambda factor was used as a criterion for the evaluated statistical significance of each discriminant function (Maione and Barbosa 2019;Wadood et al. 2020), where λ-values range from 1 (no discrimination) to 0 (total discrimination). Parameters were 0.0129 and 0.230 for F1 and F2, respectively, and hence were considerably small. Combined with a p-value of 0, such parameters indicated good separation among the Sengcu rice cultivation regions in Lao Cai (Table S15). The separation of Sengcu rice from MV, BQ and BX fields depending on the first two discriminant functions was exhibited in Figure 4a, which clearly shows that function 1 provided the main key to separate Sengcu rice from MV and that of BX and function 2 provided the discrimination between rice from MV and BQ. Furthermore, 107 out of 108 SCsamples were discriminated with an accuracy of 99.1% according to the original test, whereas 100% accuracy was observed in both BQ-and BX-samples. Remarkably, the correct rate of 100% was reached for model validation samples.
Similarly, the LDA model was also constructed based on 23 elements to evaluate the ability to distinguish among Sengcu and other rice varieties in Vietnam. The proportion cumulative variance of the first two functions achieved approximately 85% (table S16), where discriminant function 1 (F1') explained nearly 49% of the variance compared to 36% of discriminant function 2 (F2'). Simultaneously, the Wilks' lambda test λ values of the first two functions were 0 and 0.011, respectively. The p-value was less than 0.01 (Table S17), and as a result confirmed good distinction between Sengcu rice and rice from other regions of Vietnam.
The scatter plot of rice from different regions based on the two discriminant functions was demonstrated in Figures 4b. Generally, F1' plays a major key in distinguishing Sengcu rice from other types of rice. In contrast, the discriminant of rice cultivation regions was distributed by F2'. The high accuracy of the model for Sengcu, Northern and Southern areas (discriminant accuracy more than 90%) was determined by the original test in Table 3. However, the figures for the Central region were 80% indicating the ability to separate rice from this region was less accurate than from other regions. A possible explanation might be the large and complex topography of the central region in Vietnam.

Authentication by PLS-DA modelling
In recent years, PLS-DA was known as the high-power algorithm to discriminate the geographical origin. Instead of finding hyperplanes of maximum variance between the response and independent variables, a linear regression model is constructed by projecting the predicted variables and the observable variables to a new space. Hence, with the difference in nature, it is expected to have better classification capacity than the Verify -distinguish between Sengcu and other genre rice; Geographical -geographical traceability Sengcu rice and other rice. LDA method. In the same way as LDA, two models were constructed, based on 20 and 23 elements for geographical traceability of Sengcu rice and distinguish Sengcu rice from other rice kinds. The result of the first model was shown in Figure 5a and Table S18, S19. Ten elements (Mg, Al, V, Cr, Mn, Ni, As, Sr, Cs, Ba), of which the variable importance in the projection (VIP) values were higher than 0.8, were found to be significant to create a discrimination model for determining the geographical origin of Sengcu rice grains. Noticeably, with nearly 1.8 the VIP value of As was the highest among 20 elements, so As is indicated as the most relevant one to discriminate the geographical origin of Sengcu rice.
The cross-validation discriminant accuracy of the training set was up to 100% for BQ and BX and the possible misclassification of MV as BQ was 1.9% and 0% for MVand BX. In respect of the validation set, the correct rate of MV was 86.7%, while the accuracy of BQ and BX rice samples were 100%. These results manifested that MV could be accurately differentiated from BQ and BX rice using a PLS-DA model (Figure 5b).
The PLS-DA method was used in a variety of recent studies to verify the geographical origin of rice with much larger geographical distances than in this study. However, the accuracy is lower, about 90%, due to using PCA results as input data with a cumulative variance of PCs <80%. In this study, the use of all data has shown advantages in geographical traceability with an accuracy of up to nearly 100% in a small range. Besides, influences of elemental fluctuations by year are also restrained, which avoids false results.
The second model was used to distinguish Sengcu from other kinds of rice in Vietnam. Fourteen elements (Mg, Al, Ca, Sc, V, Cr, Mn, Fe, As, Rb, Ag, Cs, Ba, Ce) were indicated with a VIP value above 0.8 (Figure 6a and Tables S20 and S21), the main elements such as Ag, Ca, Al, As, which were responsible for the discrimination among rice samples, exhibited a VIP score of nearly 1.5. Moreover, the total cumulative variance of the first two factors was 86.1% (>80%) indicating that the obtained model can be used to separate rice among different regions with high accuracy. In the two-dimensional PLS-DA scores and loadings plot presented in Figure 6b, clear clustering was observed between sample groups. It was also observed that a total of training and validation set samples were used to validate the accuracy of this model and the report was shown in Table 4. To distinguish Sengcu rice from other types of rice, the accuracy was 99.1% in the training data set and 100% in the validation data set. It can be seen that the accuracy of the LDA and PLS-DA methods to distinguish Sengcu from other types of rice is similar. Nevertheless, concerning geographical traceability, the correct rates when measuring Sengcu sample achieved approximately from 72.7 to 93.1%, which are higher than these (from 52.0 to 69.6%) when measuring other rice sample from Northern, Central and Southern Vietnam. Moreover, there was no misclassification among Sengcu and other kinds of rice. Therefore, this model is particularly suitable to distinguish Sengcu rice from other rice kinds, as well as to conduct geographical traceability of Sengcu rice, but with lower accuracy than the first model. Compared with the LDA method, PLS-DA has lower accuracy for traceability of Sengcu rice as it does not distinguish one rice from the others. Hence, the LDA model has been proved to possess advantages over the PLS-DA algorithm.

Conclusions
The combination of ICP-MS and multivariate analysis has been demonstrated to be useful for the geographical traceability of Sengcu rice and distinguish Sengcu from other rice in Vietnam. According to the applied PCA method, As, Ba, Sr, Pb, Se, Ca were good indicators for origin identification of Sengcu rice from Muong-Vi, Ban-Qua and Ban-Xen field in Lao Cai. Meanwhile, Al, Ca, Fe, Mg, Ag, As were major marker elements for discriminating among Sengcu and other types of rice from the northern, central and southern regions from Vietnam. Both LDA and PLS-DA models give approximately 100% accuracy in separating Sengcu from other types of rice. However, for the geographical traceability of Sengcu rice, LDA has shown more favorable performance than PLS-DA due to its advantage in maximising the ratio of betweenclass and within-class distance. To sum up, all results confirmed that multi-element analysis is a promising method and suitable for identification of Sengcu out of other types of rice as well as the geographical traceability of Sengcu rice. These findings will improve product values, protect brands, enhance interests of consumers and prevent commercial frauds. This study also shows the potential application of these techniques for other agricultural products of high economic value. Verify -distinguish between Sengcu and other genre rice; Geographical -geographical traceability Sengcu rice and other rice kinds.