Chipless RFID Tag Implementation and Machine-Learning Workflow for Robust Identification

In this work, we describe a complete step-by-step workflow to apply machine-learning (ML) classification for chipless radio-frequency identification (RFID) tag identification, covering: 1) the tag implementation criteria for circular ring resonator (CRR) and square ring resonator (SRR) arrays for ML interoperability; 2) the data collection procedure to get a sufficiently representative dataset of real measurements; 3) the ML techniques to visualize the data and reduce its dimensionality; 4) the evaluation of the ML classifier to ensure high-accuracy predictions on new measurements; and 5) a thresholding scheme to increase the certainty of the predictions. The differences in the tags’ frequency responses are maximized by optimizing the Hamming distance between the tag identifiers (IDs) and by controlling each resonator array’s radar cross section (RCS) level. We show that the proposed workflow achieves perfect accuracy for the identification of four tags at a fixed distance of 160 cm. We also evaluate the performance of the proposed workflow to identify up to 16 tags within a flexible range (up to 140 cm), showcasing the tradeoff between the number of tags that can be correctly classified based on the reading range.


I. INTRODUCTION
C HIPLESS radio-frequency identification (RFID) is a fully passive technology based on the backscattered modulation of electromagnetic (EM) signals [1].Its integration with the Internet of Things (IoT) has attracted attention in many domains, such as agriculture or retail, due to its low cost, flexibility, versatility, and sensing capabilities; and it is expected to eventually replace the barcode technology [2].Diagram of a chipless RFID system.The antenna, connected to a reader device, the vector network analyzer (VNA), transmits (Tx) an interrogation signal that is backscattered by the chipless tag and received (Rx) at the antenna.
As observed in Fig. 1, a reader device transmits an interrogation signal that is backscattered by a chipless RFID tag or transponder placed on an object.The tag embeds information about the identity/state of the object in the backscattered signal without the need for a microchip [3].Due to the lack of electronics of chipless tags, they can operate in harsh environments (e.g., high temperature, humidity, etc.) [4] at a very low price (<$0.01)[5].However, for the same reason, they can only encode a limited number of bits and transmit very low backscattered power with a short reading range [1].In addition, signals are strongly affected by multipath, the distance from the tag to the reader antenna, and the tag's orientation with respect to the incident wave among others.That is why a high-performance reader is essential to interrogate chipless tags, sample and process the backscattered signal, extract the tag's identifier (ID) and sensing information, and transfer it to the final application for further processing [6].
Current research efforts on chipless RFID systems have been mainly focused on improving the performance of the tags [7], [8] with specific and complex advanced signal-processing algorithms [9], [10].One drawback of this approach is that the algorithms have to be adjusted to the specific characteristics of the considered chipless tag and the used encoding method [11], making their usage cumbersome.In this regard, machine learning (ML), which generally follows a data-driven approach, has proven to offer superior classification capabilities to methods that rely on specialized domain 0018-9480 © 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
knowledge for a myriad of applications [12].Nevertheless, the use of ML for chipless RFID identification and authentication has barely been studied nor reported in detail (see Section II).Moreover, the few works that consider ML to analyze chipless RFID signals are very selective and focus on a certain application and type of tag [13], [14].
In this work, a complete procedure to use ML as a generalized and effective chipless RFID identification technique is described, which would allow for more robust chipless RFID system implementation in real-world applications.Given a set of tags, each with a different response, the goal of the ML model is to learn to discriminate measurements coming from each tag.For a newly received signal, the model can correctly identify which of the tags (among the possible ones) was used to generate it, that is, it can make a correct prediction.This reduces to a multiclass classification problem, in which a measurement is classified as coming from one of the possible tags (classes in the ML literature).Hence, in the following, the terms "identification" and "classification" are used interchangeably.The proposed workflow focuses not only on the ML learning part, but also on the ID and array size selection criteria for the chipless RFID tags.
In the present study, we follow the proposed chipless tag implementation criteria to design a set of four circular ring resonator (CRR) arrays that are used to evaluate the performance of the proposed workflow at a fixed distance of 160 cm, showing perfect accuracy.We then enlarge the number of implemented tags to 16, combining both CRR and square ring resonator (SRR) arrays.These tags are used to study the tradeoff between the distance to the reader and the number of tags that can be correctly classified and to show that the proposed ML workflow can correctly classify tags with different topologies.For the analysis, we consider flexible distance ranges in the 50-140-cm range.
The rest of the article is structured as follows.Section II includes a detailed review of the state-of-the-art ML implementations for chipless applications and compares this work with previous research.Section III describes the proposed chipless tag implementation criteria followed to implement a set of 4 and 16 chipless tags in consistency with the proposed ML strategy, as well as the whole workflow to preprocess the measurement data and train an ML model.Section IV assesses the performance of the proposed workflow with two different experiments: 1) identification of four tags at a fixed range of 160 cm and 2) identification of up to 16 tags in variable intervals within a flexible range of 50-140 cm.Finally, Section V summarizes the main contributions of this work.

II. STATE-OF-THE-ART: CHIPLESS RFID AND ML
Recent works have shown that ML can be an effective solution for chipless RFID tag identification and authentication.In fact, ML comes as an alternative to other complex signal-processing algorithms such as time gating, continuous wavelet transform (CWT) [17], and matched filtering [18].Furthermore, it can eventually overcome the two-measurement background subtraction calibration method which is currently required for backscattered signal measurements [16].
Nevertheless, the available literature on the use of ML for chipless detection is currently limited.A selection of the existing contributions is presented in Table I and commented below.
Nastasiu et al. [13] propose a neural network (NN) with two fully connected layers, one dropout layer, and a softmax output layer, for the authentication of 18 tags with the same design.The NN is designed to identify the randomness introduced by the fabrication process of these tags that operate in the V -band (65-72 GHz).The initial dataset consists of ten measurements of each tag (180 measurements in total) located at a fixed 15 cm from the reader antenna.Then, the dataset is extended to 3600 instances using a data augmentation process [19] consisting of adding white noise to the original measurements.Using two-thirds of this extended dataset for training and the remaining one-third for testing, the model achieves a recognition rate of 100%.
De Amorim et al. [15] measure 19 different tags (ten measurements per tag) with the same design in the V -band (57-64 GHz) for authentication purposes and preprocess the data by computing the intratag (measurements of the same tag) and intertag (one tag versus the others) cosine similarities [20].These two groups of cosine similarities are used to define a threshold that minimizes the number of false positives and false negatives.This is done by computing the cumulative mass functions of the probabilities of the false positives and the false negatives.The model achieves an accuracy of 83% with a bi-static setup, time gating, and the tags placed 17 cm far from the antennas.
Jeong et al. [16] use a support vector machine (SVM) model for the identification of four different tags.The dataset is composed of 816 measurements of the magnitude and phase response of the four depolarizing tags (204 measurements per tag) between 1 and 10 GHz at a bi-static calibrated set up (the distance between the tags and the antenna is not specified).The dataset is split in an 80/20 ratio for training and testing, respectively.SVM with a linear kernel achieves the best result, 99.3% accuracy.
The reported literature proves that ML is a suitable but still emerging tool for chipless tag identification (in the 1-10-GHz band) and authentication (in the 60-GHz band).At such an early stage of ML application in chipless technology, most of the approaches are limited and specific for a tag or use case.In addition, there is high variability regarding the data collection process and set up.Specifically, the datasets used for the analyses are generally small (less than 1000 measurements) or artificially augmented, probably due to the laborious and time-consuming task of manual data acquisition.Data is key for ML, as the algorithms learn from the available training data [21].If not enough training data is available, or if the data does not represent well the possible variations on the measurements (in this case, the backscattered response of the tags), the models will not correctly generalize to unseen data.The amount of data will also constrain the choice of the ML model, as more complex models require more data to train.Hence, it comes as no surprise that most works use an SVM model for classification, as it performs well on small datasets [15], [16].Only the works that rely on artificially generated data use more complex models, such as NNs.Further on the existing studies, the data are generally collected in a bi-static setup and in a controlled environment that does not resemble real-world conditions.There is also a lack of evaluation of the tradeoff between the distance to the reader and the number of tags that can be correctly classified.Moreover, no studies about the dimensionality of the collected data and how this parameter affects the classification accuracy on new instances have been reported.Finally, even though the tag design has a direct effect on the backscattered signals, no implementation criteria have been established.
In order to overcome the limitations of previous studies and set a precedent for the convergence between chipless RFID technology and ML, this work aims as follows.
1) Define tag ID and size selection criteria to improve the performance of classification algorithms at varying distances from the antenna.2) Propose and describe a detailed step-by-step procedure to build a complete ML model for chipless tag identification, including: 1) how to automate the acquisition system to collect sufficiently representative data and 2) techniques to ensure that the model generalizes (performs well on unseen data).3) Prove that ML algorithms perform satisfactorily in extreme real-world conditions: long range (distance between the tag and the antenna), multipath, and so on.4) Study the robustness of the system to the number of tags, the use of different tag topologies, and the distance to the antenna.

A. Chipless Tag Topology
In this study, we first explain in detail the implementation of four chipless tags with symmetrical CRR arrays.We then enlarge the population of tags to 16, including both CRR and SRR arrays, to prove that the implementation methodology can be successfully extended to a higher number of tags and different topologies.CRRs and SRRs were chosen to showcase how ML can boost tag identification, even in extreme conditions and regardless of the tag topology.
CRRs consist of concentric circular rings made from good conductors on a dielectric substrate, whereas SRRs are composed of concentric square rings.Each pair of adjacent rings introduces a stopband resonance in the radar cross-section (RCS) spectrum, whose frequency and amplitude vary with the dimensions and spacing between the rings.According to [15], the RCS of chipless tags must be greater than −40 dB and the resonance peaks (notches) used to code the ID of the tags should have a depth greater than 10 dB from the maximum level of the RCS [22], [23] in order to be correctly identified with methods not involving ML.An enhancement of the RCS can be obtained by periodically arranging the unit cells (each of the CRRs/SRRs) in the substrate, creating, therefore, a CRR or an SRR array.The responses of CRR and SRR unit cells are relatively similar, but the curvature of CRRs tends to increase mutual coupling when they are placed in an array, thus decreasing the resultant resonant frequency with respect to that of SRR arrays [24].
All the CRRs and SRRs in this work are milled out of FR-4 laminate.The four CRR tags implemented for the first experiment are designed such that they resonate within the 2-3.5-GHz band, subdivided into the four narrower bands shown in Fig. 2 (2-2.3,2.3-2.55,2.55-2.8,and 2.8-3.5 GHz).A notch in a band represents a logic "1" of the ID and the absence of a notch is a logic "0."Therefore, these will be 4-bit ID tags.The CRR in Fig. 2 has five rings, and, therefore, the RCS spectrum shows four resonant frequencies (bits), one per subband.

B. ID and CRR/SRR Array Size Criteria
Due to space constraints, in the following, we focus our discussion on the implementation of the 4-bit CRR arrays used in the first experiment and extrapolate the results to the implementation of the 16 5-bit CRR/SRR arrays used in the second experiment.The chipless tags' IDs and array sizes should be selected such that the differences in their backscattered signal, measured under the same conditions, are maximized.As explained below, in order to achieve this objective, we consider IDs that maximize the Hamming distance between them and we also vary the size of the resonator arrays for different tags.
1) Hamming Distance (d H ): Given two bitstrings of the same dimension, d H is the minimum number of symbol changes needed to change one bitmap into the other [25].For example, the bit sequences "110 000" and "101 000" have a Hamming distance of 2. In this work, the IDs of the tags were chosen so that the Hamming distance between them is as large as possible.Table II shows the d H between each of the designed tags.All of them exhibit a d H between 2 and 4.
2) CRR/SRR Array Size: To obtain an enhancement of the RCS, the higher the number of resonators in a CRR or SRR array, the higher the RCS of the signal [26].Two-dimensional CRR and SRR arrays of different sizes were designed following periodic resonant structure design guidelines [27] and verifying with simulations that they change the RCS level of each tag to use their magnitude as a discrimination parameter.The distance between the unit cells of the arrays is such that the coupling does not affect the resonant frequencies of the tags.The proposed tags have been simulated and excited as infinite structures using Floquet ports in CST Design Studio software.A new distance d rcs , defined as the distance between array sizes, is introduced to quantify the RCS level difference between tags.Table III specifies the CRR array sizes used for the four tags, as well as the distance between them.
Combinations between d H and d rcs allow chipless tags to be more differentiable among them and thus for the ML algorithm.A priori, the best solution would be to choose array   configurations that lead to a large d rcs when d H is small.Due to the fact that higher frequencies are more sensitive to multipath and noise in the measurement setup [28], tags with the highest resonance frequency (least significant bit: LSB) and without the lowest resonance frequency (most significant bit: MSB), exhibit instability and poor sharpness.This is compensated by increasing the number of unit cells per array, that is, increasing their RCS level.The overall distance between tags (denoted by d H &rcs ) is defined as follows: The IDs and sizes of the final 4-bit ID tags for the first experiment are shown in Table IV.
The response of each array is simulated in CST Design Studio by adjusting the key parameters summarized in Table V, where h is the thickness of the FR-4 substrate.The remaining parameters are illustrated in the inset of Fig. 2.
Following the ID and array size criteria described for the four 4-bit tags, Table VI lists the 5-bit IDs, array sizes, and topologies of the 16 tags implemented for the second experiment.The CRR and SRR topologies were assigned such that there are eight tags with each topology, and each array size has at least one tag of each topology.From now on, these tags are referenced with numbers from 1 to 16 throughout the article.

C. ML Workflow
1) Data Collection: It is fundamental to constitute a sufficiently representative dataset with RCS measurements of the chipless tags.RCS values are measured in a monostatic configuration (a single Tx/Rx antenna).The employed setup is shown in Fig. 3.A QRH11 horn antenna with a typical gain of 9.6 dBi in the considered bandwidth pointing toward the floor is connected to an E5071A VNA with an output Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.power of 0 dBm (EIRP = 0.0091 W) and a resolution of 1601 samples (in the bandwidth in which the tags are designed) per measurement (hereafter referred to as features).
The VNA is controlled via SCPI commands with MATLAB, which allows us to take continuous measurements and hence efficiently build a large dataset.Each instance of the dataset is a vector whose features are the magnitude values (in dB) of the S 11 of the Tx/Rx antenna when irradiating the chipless tags.For each measurement, the ID of the corresponding tag is also stored.A measurement of the environment (i.e., without a tag) is also recorded, to allow for background subtraction.An example of the measured S 11 responses for the 4-bit CRR arrays is shown in Fig. 4. The final designs of the 4-bit CRR arrays and their corresponding simulated RCS responses are also shown.Additionally, in order to build a sufficiently representative dataset, a broad range of variables that affect the measurements should be considered, such as the distance between the tag and the antenna, the tag alignment with the antenna (axial alignment and inclination), and the presence of clutter-generating elements.In our study, we manually vary these parameters in the setup during the data acquisition process.Regarding the distance between the tag and the antenna, we consider a fixed distance of 160 cm for the first experiment and a 50-140-cm range for the second experiment.We also ensure a minimum distance of 10 cm between the floor and the tag, as even the presence of the floor introduces noise.
2) Preprocessing: Each S 11 measurement is composed of 1601 features, a large dimensionality (given the generally limited number of samples) that can lead to the well-known curse of dimensionality: the more the number of features in the training dataset, the greater the risk of overfitting [29].Moreover, the small variations between consecutive measured points results in multicollinearity, that is, contiguous features strongly correlate linearly [30].
To overcome these issues, we perform a data preprocessing step to reduce the dimensionality of the data before being input into the ML model.Specifically, we employ principal component analysis (PCA) [31], a dimensionality-reduction method that transforms a large set of variables (features) into a smaller one that contains most of the information in the large set.Dimensionality reduction techniques such as PCA, variational autoencoders (VAE), or feature extraction, among others, are widely used in the ML field, as they can make the models less complex and more robust [32].For example, PCA has been used as a feature extraction tool for metal defect characterization using chipless RFID sensor tags [33].
Principal components act as new variables constructed as linear combinations of the initial variables, done in such a way that they are mutually uncorrelated.Thus, projecting the data into the new space generates uncorrelated features, hence resolving the problem of multicollinearity.Moreover, principal components are ordered so that the first one accounts for the largest possible variance in the dataset, and hence keeping just a few of the first components reduces the dimensionality (and the risk of overfitting) while keeping most of the information represented in the original space.
Hence, after performing PCA, the input to the ML model is no longer the original measurement of dimension 1601, but a smaller set (<1601) of features (see below for the strategy on how to select the new dimension size).Since the use of ML for tag identification is not based on the conventional match of notches in the subbands on the RCS curve, the model can still learn to discriminate signals from each tag in this reduced representation.In fact, PCA can also be used to visualize the data in two dimensions, by projecting the data in the first two principal components.
3) ML Model: The goal of the ML model is to correctly identify the tag (among the considered ones) used to generate a given measurement.This reduces to a multiclass classification problem, in which a given measurement is classified as having been generated by one of the considered tags.In the ML literature, the possible outputs (tag IDs in this case) are referred to as classes.When a trained ML model is used to make predictions on new instances, misclassifications can occur due to: 1) the inability of the model to generalize to unseen data and 2) the poor quality of the measurement itself.Regarding 1), in this work, the generalization ability of the model is checked during training by setting apart a test set that is not used for training, in which the model can be validated.In regards to 2), wrong predictions on S 11 measurements are avoided by using a soft ML classifier with a thresholding (filtering) scheme.Soft ML classifiers explicitly estimate the class (tag ID) conditional probabilities, which can then be used to perform classification.In particular, the predicted class of a given instance (data point) is set to the class with the highest probability.Hence, the higher this probability, the more confident the model is about the instance belonging to that particular class.Under this setting, it is, therefore, possible to set a confidence threshold such that instances with a maximum probability lower than this threshold are not classified.For those instances, the model will assign an "unassigned" label rather than a class (tag ID).
The most popular ML algorithms for chipless signal classification are SVM and NNs [13], [16].Knowing that the higher the complexity of the model, the higher the risk of overfitting, especially for small datasets, SVM is usually preferred over NNs.However, SVM is a hard classifier whose decision function directly targets the classification decision boundary, and as such it does not produce probability estimates.Although it is possible to make a probability estimation using Platt scaling [34], [35], the estimated probability may be inconsistent.For example, in binary classification, an instance could be classified into a class, being the estimated probability of belonging to that class lower than 0.5.Another way to implement soft classification using SVMs could be to evaluate per-class scores used by the decision function.However, these scores are similar for misclassified and well-classified instances, unlike the probabilities of a soft classifier, which are generally lower for misclassified instances.
Knowing that a dataset made of an S-parameter of chipless tags can be linearly separated (SVM with the linear kernel) [16], logistic regression (LR) is a natural choice for the problem at hand: it is a simple model, reducing the risk of overfitting, and it is soft classifier (unlike SVM), allowing for a thresholding (filtering) scheme.
An LR model for binary classification works as follows.Consider an input x = [x 1 , x 2 , . . ., x n ] ∈ R n that belongs to class y ∈ {0, 1} (without loss of generality, we denote class 1 as positive and class 0 as negative).Given input data x, the goal of LR is to make a prediction ŷ ∈ {0, 1} such that ŷ = y.The LR model first computes a weighted sum of the n input features plus a constant term (bias), similar to what is done in linear regression, that is, where θ 0 denotes the bias term.For ease of notation (and computation), an intercept term is normally included in the input x, such that x = [1, x 1 , x 2 , . . ., , and l(x) = θ T x.Then, unlike in linear regression, which directly outputs l(x), LR applies the logistic (sigmoid) function (denoted by σ (•)) to l(x), with The computed value σ (l(x)), denoted as p, takes values between 0 and 1 and it is an estimate of the conditional probability of x belonging to the positive class (the probability of belonging to the negative class is 1 − p).The value of p is then used to perform binary classification.In general, if p > 0.5, then ŷ = 1, and if p < 0.5, then ŷ = 0.In addition, we can select a threshold t ≥ 0.5, such that if (1−t) < p < t, no class is assigned to input x.This would indicate that the model is not confident enough about the correct class.
The parameter θ of the model is sought via an optimization procedure whose goal is to minimize a cost function J (θ ) on the training set.Consider a training set with m training examples, that is, {(x (1) , y (1) ), (x (2) , y (2) ), . . ., (x (m) , y (m) )}, with x (i) ∈ R n+1 and y (i) ∈ {0, 1}.The cost function in LR is the binary cross-entropy, given by where p (i) is the predicted probability of the positive class of the LR model for input x (i) .Intuitively, the higher (lower) the probability p (i) of a positive (negative) instance, the lower the cost function will be.The optimal values of J (θ ) are found via gradient descent [36], an iterative algorithm.The generalized form of LR for multiple-class classification is called softmax regression.In this case, an instance x = [x 1 , . . ., x n ] ∈ R n can belong to K classes, and hence y ∈ {1, 2, . . ., K }.A softmax regression model computes K probabilities (one per class), denoted as p 1 , p 2 , . . ., p K , that satisfy Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.K i=1 p i = 1.p i indicates the conditional probability of input x belonging to class i, and as in the binary case, an instance is classified into the class with the highest probability (denoted as the decisive probability), that is, ŷ = arg max i p i .
Next, we describe how these probabilities are computed.In softmax regression, there are K vectors of θ parameters (one per class), denoted as θ 1 , θ 2 , . . ., θ K , with θ i ∈ R n+1 .Hereafter, we assume x includes the intercept term, that is, x ∈ R n+1 .First, a linear term is computed per class, given by The probabilities are then estimated using the softmax function as follows: where (1/( K j=1 e (s j (x)) )) is a normalizing constant.Given a training set {(x (i) , y (i) )} m i=1 , the parameters θ 1 , θ 2 , . . ., θ K , represented by , are sought by minimizing the following cost function: where p (i) k is the estimated probability of x (i) belonging to class k, and 1 {A} is the indicator function that takes value 1 if condition A is true and 0 otherwise.This cost function is sometimes referred to as the cross-entropy loss.
LR models, even though are one of the simplest ML models, can still overfit the training data and not generalize well to unseen data.To help avoid overfitting, in this work, we consider dimensionality reduction techniques, as already described, and model regularization.Regularization is achieved by adding to the cost function a constraint on the parameters (weights) of the model.Specifically, the cost function becomes where J ( ) is as defined in (4) (for LR) and ( 7) (for softmax regression), α is the regularization parameter (the greater the value of α, the stronger the regularization), and φ is an increasing function of .In this work, φ is given by the L2 distance, that is, The parameters are sought via gradient descent by minimizing the cost function J ( ) reg .The regularization term leads to a simpler model and therefore to a better generalization capacity (better performance in the prediction of new instances).
To summarize, the input to the LR model will consist of S 11 measurements that are transformed from 1601 features to a smaller subset (given by PCA), and the output will be a vector (of dimension the number of considered tags) where the jth entry represents the probability of the measurement having being generated by the jth tag.The model will then classify the input measurement into the tag with the highest probability (this is also referred to as making a prediction).

4) Hyperparameters Tuning:
We use k-fold cross-validation (CV) on the training data to select the value of the hyperparameters: 1) the number of principal components (from PCA) and 2) the value of the regularization coefficient α.The hyperparameters that obtain the highest accuracy during the CV process are selected for the final model.The accuracy is defined as the number of correct predictions divided by the total number of predictions, where a correct prediction implies ŷ = y, that is, the predicted tag is indeed the one used to generate the measurement.Specifically, for a given regularization coefficient and the number of principal components, the accuracy value is computed as the mean of all the accuracies obtained during the CV process in the leftout sets.It should be noted that if the dataset is not balanced, that is, the number of instances per class varies significantly, the average accuracy per class is sometimes used instead.However, in this work, all the datasets are balanced.
5) Thresholding Scheme: Taking advantage of the probabilities calculated during the hyperparameter tuning (for the selected number of components and regularization parameter), a threshold is defined in order to abstain from predicting a new measurement when the certainty of the model is not sufficiently high.A threshold is defined for each class (tag ID), so that an instance is classified (assigned) as belonging to the class with the highest probability only when this probability is higher than the defined threshold and discarded (unassigned) if it is lower.
For a given class, the threshold is defined as the 10th percentile of the decisive probabilities of the well-classified instances (for the given tag) in the left-out sets while performing CV.We denote this value as thr 1 .It is worth noting that for higher threshold values, the accuracy of the predictions will be higher, but the probability to reject measurements will increase as well.This is reflected in the conserved instance ratio, defined as the ratio between the number of actually classified instances in a given test set and the total number of instances.The thresholds can, therefore, be selected depending on the requirements and criticality of the specific application of the chipless RFID system, making a tradeoff between the accuracy and the number of unassigned measurements.In cases in which the model fits very well to the training set, the value of thr 1 will be close to 1.This may lead to a low conserved instance ratio on the test set.To avoid this, which may not be desirable in most applications, a value thr set can be set so that the final threshold thr for a given class is the minimum value between thr 1 and thr set , that is, thr = min(thr 1 , thr set ). ( A unique threshold for all classes could be employed.However, this would not take into account a potentially varying level of difficulty for classifying instances of different classes. In the conducted experiments, we use (10) with thr set = 0.95 to set the threshold for each class (tag).For comparison, we also show results when a fixed threshold of 0.99 is used for all classes.
6) Model Evaluation: Once the hyperparameters are set and the thresholds computed via CV, the resulting final model is trained on the entire training set, so as to make the best use of the available data.The model is then evaluated on the test set.The considered metrics include the accuracies before and after filtering, and the percentage of unclassified instances (the conserved instance ratio).As mentioned, accuracy is defined as the percentage of measurements for which a correct tag prediction is made.Additionally, a confusion matrix can be used to get more insights about the misclassifications and the tags that are the most difficult (or easiest) to classify.Once the final model is evaluated, it can be deployed into a real setting to classify newly collected measurements.To imitate this scenario, more data can be collected at different time points, different rooms, and so on and then classified with the designed model.
The ML workflow followed for chipless RFID tag identification is summarized in Fig. 5.

IV. MEASUREMENTS AND RESULTS
To assess the performance of the proposed procedure, two experiments were carried out.The first experiment with the four 4-bit is designed to evaluate the capability of the LR model to correctly classify tags when measurements are taken in extreme conditions: long range (distance between the tags and the antenna) with and without background subtraction.The second experiment is designed to determine the capability of the LR model to make predictions at a variable range and with different tag topologies.It will study the number of tags that can be correctly classified with increasing distance.We use the set of 16 tags described above for this experiment.
A. Experiment 1: Four Tags and Fixed 160-cm Range Two datasets were created with measurements from the four 4-bit CRR arrays: one with initial background subtraction and another one without background subtraction.In the latter case, there is no visual difference among the S 11 measurements of each tag due to the low signal-to-noise (S/N) ratio [see Fig. 6(a)].With an initial background subtraction, even though the resonance peaks of the tags are visible in the S 11 spectrum, it is impossible to recognize the tag IDs [see Fig. 6(b)].This is due to the unpredictable high impact of the inclinations (from −30 • to 30 • with respect to the line of sight) and the long range (160 cm) on the response of the tags.
Each dataset contains 2400 measurements of the four 4-bit tags (600 measurements per tag) at 160 cm.A percentage of 90% of the instances of dataset were used for training and 10% (240 instances) were left aside for testing (they were not used at any stage of the preparation of the models).When splitting the data, we made sure both the training and test sets were balanced (i.e., they contain a similar number of measurements per tag).Data collection took approximately 1 h.
A PCA object is fit to each training set, and data are projected on the new space.A representation of the projection on the two first principal components makes it possible to gain insights into the structure of the data initially represented on 1601 dimensions.A 2-D plot of the datasets is depicted in Fig. 7, with each color representing a tag.Without background subtraction, although there are some overlapping areas, there is a significant aggregation of points corresponding to the same class (points of the same color are grouped).With background subtraction, the four classes are more separated.It is worth noting also that measurements from tag1110_1 × 1 are the closest to those of tag1011_2 × 2 and the furthest from tag0001_4 × 4.This is in concordance with the increasing distance d H &rcs as shown in Table IV.
Although the first two principal components explain a large percentage of the variability in the data (82.57%without background subtraction and 41.68% with background subtraction), there is no major disadvantage in including more components as long as it helps to improve the model performance.Fig. 8 shows the accuracy of the LR model as a function of the number of principal components that are used, without applying regularization, where the accuracy is computed as  the average of the accuracies obtained during a tenfold CV process.As expected, a gradual improvement is observed as the number of principal components increases.The model achieves 100% accuracy above eight principal components on the data made with initial background subtraction.Using the data without background subtraction, the model reaches 99.81% with 18 components.
For each case (with and without background subtraction), the final LR model was trained using the first 30 principal components.For testing, the testing set was first projected on the corresponding base (PCA), and then the first 30 principal components were used to make predictions.In both cases, the model achieved 100% accuracy on the testing set.Since the model generalizes well (i.e., no overfitting is observed) and a 100% accuracy is obtained in both cases, we did not consider regularization or filtering in this experiment.Hence, the effect of regularization and thresholding were evaluated in the second experiment.
This preliminary analysis shows that with dimensionality reduction, an LR model deals well with chipless RFID measurements even under extreme conditions: long range, inclination and reflections due to surrounding objects, and a low S/N ratio (due to no background subtraction).However, several trials, not shown for brevity, indicate that a model trained without background subtraction is only valid in the scenario where the training measurements were made.This constraint is not applicable to measurements with background subtraction, which can be classified correctly even if they were made in different environments.

B. Experiment 2: 16 Tags and Flexible 50-140-cm Range
In this experiment, the number of tags to be classified is extended to 16 CRR/SRR arrays of 5-bit IDs.The data are collected with an initial background subtraction while changing their inclination angle with respect to the antenna, as in the previous experiment.Measurements for training were collected in the following four subintervals: 50-80, 80-110, 110-130, and 130-140 cm.To evaluate the tradeoff between distance and number of tags that can be successfully classified, the collected measurements were further grouped as follows: 50-80 cm (set 1); 50-110 cm (set 2); 50-130 cm (set 3); and 50-140 cm (set 4).Data collection for the training step took around 4 h and consists of 1600 instances (100 per tag) per subinterval.We collected additional measurements on different days to generate the test set, at the four described subintervals.These measurements were then grouped similarly to the training data, generating four test sets of increasing intervals, all starting at 50 cm.Table VII summarizes the generated test sets.
For the tradeoff analysis, we start by assessing the model performance with the 16 tags while increasing the range distance (starting with the shortest range), until the system fails to correctly identify all tags.Larger intervals are then considered while decreasing the number of included tags, depending on the achieved results.In all cases, the models are trained on the training set and evaluated on the corresponding test set.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 1) Model Evaluation With 16 Tags: We first perform a tenfold grid-search CV on the training dataset (for each considered subinterval) to find the best set of hyperparameters (number of principal components and regularization parameter α).Fig. 9 shows the accuracy as a function of the number of principal components without regularization (α = 0) and with regularization, with α values: 1, 10 and 100.As expected, regularizing the model yields generally higher accuracies.We also notice that after a certain number of principal components as follows.
1) Increasing the number of principal components does not significantly enhance the results.
2) The difference between the accuracies achieved using different values of the regularization parameter becomes less pronounced (except for the 50-140-cm range).Based on the presented results, we set the number of principal components to 15, 40, 60, and 100 for cases 1, 2, 3, and 4, respectively, and present results for α values 1, 10, and 100.
During the CV process, the decisive per-class probabilities calculated at each split are saved and later used to set the threshold for each tag [see Section III-C; (10)].As an illustrative example, Fig. 10 shows the decisive probabilities obtained for the fourth case (50-140-cm range), with α = 100 and 100 principal components.We chose this interval as it is the most difficult one and hence it exhibits more variability in the computed probabilities.For tags for which most well-predicted samples have high decisive probabilities, the threshold is set to 0.95, whereas for tags for which the model estimates lower probabilities for the well-predicted samples, the threshold is set to a smaller value (in all cases above 0.9 in the shown example).
Once the hyperparameters and the thresholds are set, the model is evaluated in the test sets.Table VIII shows the accuracy obtained with 16 tags on the first three test sets described in Table VII, for different values of the regularization parameter α.We omit results for case 4 as the model fails to correctly classify all tags.As expected, the accuracies decrease with increasing distance, both before and after filtering.Before filtering, accuracies are above 79.8% in all cases, reaching 95.9% in the best case (test set 1 and α = 100).With the proposed filtering scheme, accuracies increase in all cases, with a conserved instance ratio within the interval 68.7%-89.6%.With the fixed threshold of 0.99, accuracies are higher, but at the cost of a decrease in the conserved instance ratio.The stronger the regularization, the higher the accuracies for both before and after filtering.It is also worth noting that the inverse effect is observed for the conserved instance ratio.
To get more insights into the obtained results, we computed the confusion matrices, before and after filtering (with perclass thresholds), for all the considered test sets (see Fig. 11).For conciseness, only the confusion matrices corresponding to α = 10 are represented.For test set 1 (50-80 cm), most measurements are well-classified, and the filtering only missed one of the misclassified measurements.For test sets 2 (50-110 cm) and 3 (50-130 cm), we noticed that the large majority of the misclassified measurements belong to the tags with size 1 × 1 (tags 7, 12, 15, and 16).A deeper analysis showed that the measurements of those tags at a large range (more than 80 cm) are very unstable due to their very low RCS, regardless of the d H &rcs criterion, meaning that measurements made on different days do not have the necessary repetitiveness to make accurate predictions.The consequence from the ML point of view is a lower conserved instance ratio (almost all the corresponding measurements are dropped after filtering) for those specific tags.The reason is that the probabilities are low for both well-classified and misclassified measurements, and misclassified instances are not filtered out.This is more pronounced for test case 3.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.2) Model Evaluation With 12 Tags: Based on the presented results, Table IX shows the outcomes excluding the 1 × 1 tags (i.e., with 12 tags) for all considered intervals (up to 140 cm).As expected, the obtained accuracies before and after filtering are better in comparison with the values in Table VIII, reaching 100% in almost all instances of test sets 1-3.The conserved instance ratio values are also higher.Although the accuracies in test set 4 are quite high, they are significantly lower than the ones achieved in the other test sets.Further analysis reveals that the accuracies before and after filtering when considering only the measurements in the range 130-140 are even lower (0.760 and 0.943, respectively).Moreover, the conserved instance ratio, in this case, is very low: 0.554.3) Model Evaluation With Eight Tags: Based on the metric defined in (1), (8) tags were selected among the remaining 12 tags.The left-out tags are: tag_00001_4 × 4 (tag 1), tag_00010_2 × 2 (tag 2), tag_01000_2 × 2 (tag 5), and tag_10000_4 × 4 (tag 9).Table X shows the results when considering only the eight selected tags.Only test set 4 is presented, as test sets 1-3 obtain 100% accuracy with better values of conserved instance ratio.For test set 4, accuracies are above 0.97 both before and after filtering, with a conserved instance ratio above 80% in all cases.
The presented results showcase the benefits of using a thresholding scheme to filter out low-confidence predictions, as they generally correspond to the incorrect ones, yielding improved accuracy.However, it is worth noting that: 1) in most cases, some of the correctly classified instances are also discarded (although they generally represent a very small fraction of the total) and 2) there is no guarantee that all misclassified instances will be discarded.As a result, the conserved instance ratio may be smaller than that achieved one if only the incorrect predictions were filtered out, and the achieved accuracy may be below 100%.
There is also a clear tradeoff between the achieved accuracy and the obtained conserved instance ratio.Increasing the thresholds will enhance the prediction performance to the detriment of the conserved instance ratio.In the same vein, decreasing the thresholds will increase the conserved instance ratio at the cost of lower accuracy.The value of the thresholds should, therefore, be chosen based on the application at hand.For instance, if a threshold equal to 0.99 is used for all classes in the considered test sets, the prediction accuracy reaches 100% in the following cases: 16 tags in the range 50-80 cm, 12 in the range 50-130 cm, and eight tags in the range 50-140 cm.However, as expected, the downside is the proportion of conserved instances.
In both experiments, the classification latency is less than 25 ms per measurement, hence making the proposed framework suitable for real-time applications.All experiments were conducted on an Intel1 Core2 i3-5010U CPU 2.10 GHz processor.

V. CONCLUSION
A step-by-step procedure to use ML for robust and reliable chipless RFID tag classification combined with a tag implementation strategy has been proposed.To the best of our knowledge, this is the first approach that: 1) optimizes the difference between the tag codes; 2) proposes a complete ML workflow including a dimensionality-reduction step and a filtering strategy to work efficiently in real-world conditions; and 3) showcases the tradeoff between the number of tags that can be correctly classified and the reading range.
The proposed workflow includes S 11 measurements in realworld conditions, data preprocessing, dimensionality reduction, regularization, and soft classification with a filtering scheme.The procedure was tested with an LR model with L2 regularization and dimensionality reduction via PCA.
In the first experiment, the complete workflow achieved perfect accuracy (100%) for the identification of four 4-bit CRR array tags under extreme conditions: long range (160 cm from the tag to the antenna), inclination and reflections due to surrounding objects, and a low S/N ratio (in the case in which no background subtraction is performed).However, under the proposed setting, a model trained without background subtraction is only valid in the scenario where the training measurements were made, whereas a model trained with background subtraction correctly classifies measurements made in different environments.
In a second experiment, a model was trained with several sets of measurements from 16 5-bit CRR and SRR array tags, taken on different days, at a variable range (50-140 cm), and with initial background subtraction.This experiment shows the tradeoff between the range of measurements and the number of tags that can be correctly classified.In particular, the presented results show that it is possible to perfectly classify 16 tags in the 50-80-cm range, 12 tags in the 50-130-cm range, and eight tags in the 50-140-cm range.
In both experiments, the accuracy values obtained with the proposed ML workflow after filtering are comparable to the ones reported in the literature (see Table I).Moreover, it is worth mentioning that the considered test sets in this work are composed of actual measurements of RFID tags in a real-world environment, as opposed to the approach with the highest accuracy, where data instances were artificially generated.
Future work includes the design of ML models that avoid the need of performing background subtraction, the use of additional or alternative input features (e.g., the phase of the signal), and evaluating the performance of the workflow when using measurements from a low-cost chipless RFID reader.

Fig. 1 .
Fig. 1.Diagram of a chipless RFID system.The antenna, connected to a reader device, the vector network analyzer (VNA), transmits (Tx) an interrogation signal that is backscattered by the chipless tag and received (Rx) at the antenna.

Fig. 2 .
Fig.2.Four subbands represented with different colors.The simulated 4-bit RCS response of a CRR with five rings and parameters w 1,2 = 0.8 mm, w 3,4,5 = 0.5 mm, g 1 = 1 mm, g 1,2,3 = 0.5 mm, and r 1 = 17 mm, where w i represents the thickness of the ith ring, g i represents the distance between the ith and (i + 1)th rings, and r i is the radius to the ith ring (subscript 1 corresponds to the outermost ring).

Fig. 6 .
Fig. 6.Representative S 11 responses at 160 cm for the four considered tags (a) without background subtraction and (b) with background subtraction.

Fig. 7 .
Fig. 7. Projection of the data on the first two principal components (a) without background subtraction and (b) with background subtraction.

Fig. 8 .
Fig. 8. Accuracy of the LR model (computed via tenfold CV on the training data) as a function of the number of principal components with and without background subtraction.

Fig. 9 .Fig. 10 .
Fig. 9. Accuracy of the LR model on the training dataset as a function of the number of principal components for different regularization parameters (α).To ease visualization, the y-axis represents an interval [0.85-1].

Fig. 11 .
Fig. 11.Confusion matrices obtained for test sets 1-3 before and after filtering.(a) Test set 1, (b) Test set 2, and (c) Test set 3. The number in parentheses indicates the instances that are unassigned after filtering, that is, X (Y ) indicates that Y out of X measurements are labeled as unassigned.

TABLE I SUMMARY
OF THE STATE-OF-THE-ART ON ML IMPLEMENTATIONS FOR CHIPLESS RFID TAG IDENTIFICATION AND AUTHENTICATION

TABLE II HAMMING
DISTANCE (d H ) BETWEEN THE IDS OF THE DESIGNED 4-BIT CRRS

TABLE III d
rcs BETWEEN THE CONSIDERED 4-BIT CRR ARRAY SIZES

TABLE IV d
H &rcs OF THE 4-BIT CRR ARRAYS

TABLE V DESIGN
PARAMETERS OF THE 4-BIT CRR ARRAYS (mm)

TABLE VI 5
-BIT IDS, ARRAY SIZES, AND TOPOLOGIES OF THE IMPLEMENTED 16 TAGS Fig. 3. Set up for the S 11 measurements.

TABLE VII DESCRIPTION
OF THE EMPLOYED TEST SETS

TABLE VIII ACCURACY
AND CONSERVED INSTANCE RATIO WITH 16 TAGS