Over-the-air Aggregation-based Federated Learning for Technology Recognition in Multi-RAT Networks

With the continuous evolution of wireless communication and the explosive growth in data traffic, decentralized spectrum sensing has become essential for the optimal utilization of wireless resources. In this direction, we propose an over-the-air aggregation-based Federated Learning (FL) for a technology recognition model that can identify signals from multiple Radio Access Technologies (RATs), including Wi-Fi, Long Term Evolution (LTE), 5G New Radio (NR), Cellular Vehicle-to-Everything PC5 (C-V2X PC5), and Intelligent Transport Systems G5 (ITSG5). In the proposed FL-based technology recognition framework, we consider edge network elements as clients to train local models and a central server to create the global model. In each client, a Convolutional Neural Network (CNN)-based model is trained from Inphase and Quadrature (IQ) samples collected from a certain combination of RATs. The possible combination of RATs considered in the clients is selected based on the capabilities of the real-world network elements that can be used as a client. The FL framework involves a process where multiple clients periodically send updates derived from local data to a central server, which then integrates these contributions to enhance a shared global model. This method ensures that the system stays current with the evolving real-world environment while also minimizing bandwidth required for training data transfer and allowing for the maintenance of personalized local models on each client’s end.


I. INTRODUCTION
As wireless communication continues to evolve, the demand for wireless connectivity and data traffic has been growing dramatically [1].This surge in traffic, coupled with the finite nature of the available spectrum, poses significant challenges for meeting the increasing demand.To address this issue, spectrum sharing has emerged as a viable solution to optimize spectrum utilization and accommodate the ever-growing traffic.Spectrum sharing enables multiple Radio Access Technologies (RATs) to share the same spectral bands, leading to more efficient use of available resources.
Effective spectrum-sharing decisions can be made based on accurate spectrum-sensing mechanisms.Technology recognition is a spectrum-sensing mechanism used to identify signal types from different RATs.Technology recognition systems can be classified into conventional and advanced machine learning-based solutions [2].Conventional technology recognition solutions include less complex systems that use mecha-nisms such as energy detection, cyclostationary feature detection, and matched filter detection to detect unique features of a signal.Despite their simplicity, these conventional mechanisms lead to poor classification accuracy when they are used to identify multiple wireless signal types that do not have easily distinguishable features.For this reason, there has been a recent surge to strengthen the performance of wireless signal recognition using advanced machine-learning-based systems that employ the powerful capabilities of Deep Neural Network (DNN) models [3].
DNN-based wireless technology recognition models follow a series of steps to identify wireless signals [4].First, a dataset is collected by capturing wireless signals from the devices connected to the considered RATs.Next, the data is pre-processed, and a model is trained to extract key features from these signals, enabling it to recognize patterns that distinguish different RATs.These features are derived from the unique physical layer characteristics of each RAT, including modulation schemes, channel bandwidth, subcarrier spacing, and transmission power.After training and validation, the model is used to analyze incoming signals in real-time, making predictions about the technology in use [5].
Recently, many DNN-based technology recognition solutions have been proposed to recognize signal types from different wireless technologies.For example, a technology recognition model can be used to identify LTE and Wi-Fi signals by continuously capturing Inphase and Quadrature (IQ) samples [6] and using it as an input to a DNN-based classifier.
State-of-the-art DNN-based technology recognition approaches have certain limitations that affect their effectiveness in real-world scenarios.The main limitation of state-of-the-art solutions is their reliance on centralized training approaches [7].In these approaches, the training process is typically conducted on a central server.However, training technology recognition models necessitates data collection from devices in diverse locations and channel conditions.Hence, centralized training methods involve transferring substantial amounts of data from various sources situated at different locations to a central server.This may result in adding a significant load to the network and possibly impacting its performance.For this reason, most state-of-the-art technology recognition models rely on static datasets that are collected once from simulations or controlled experimental setups.However, using a static dataset often fails to capture the diversity of real-world network environments, leading to suboptimal performance in practical scenarios.Furthermore, using centralized training places a heavy burden on the server's processing power and storage capacity, potentially leading to scalability issues and increased infrastructure costs.
Federated Learning (FL) has emerged as a promising solution to address the limitations associated with centralized learning [7].Instead of sending raw data to a central server, FL allows each device to train a local model on its data and only share model updates (gradients) with the central server.These updates are aggregated to improve the global model, which is then redistributed to the devices.Its ability to facilitate distributed learning makes it a suitable choice for numerous wireless communication problems.FL allows the training of technology recognition models on distributed data sources, enhancing robustness and generalization capabilities by incorporating diverse scenarios, channel conditions, and user behaviors.Moreover, the distributed nature of FL makes it more scalable, as the training process occurs locally on client devices, utilizing their computational resources.
To advance in this direction, we propose employing FL for technology recognition in multi-RAT networks.A novel FLbased technology recognition model is developed to recognize signal types from multiple RATs, including Wi-Fi, Long Term Evolution (LTE), 5G New Radio (NR), Cellular Vehicleto-everything PC5 (C-V2X PC5), and Intelligent Transport Systems G5 (ITS-G5) technologies.For the local models, a Convolutional Neural Network (CNN)-based feature extraction is used to effectively classify the wireless signals.To represent the frequency domain features of each wireless technology, the Fast Fourier Transform (FFT) of the IQ samples captured from each wireless technology is used as an input for the model.Employing a typical Independent and Identically Distributed (IID) FL framework for technology recognition in multi-RAT networks needs a complex receiver at each client to capture, identify, and label samples from each technology.However, our main goal is to create a technology recognition model that does not require such a complex receiver in the clients.Instead, we aim to use a machine learning model-based spectrum sensing technique that can identify the signal type without the need to decode the signal of different wireless technologies from every co-located transmitter.
For this reason, we propose a non-Independent and Identically Distributed (non-IID) FL approach for technology recognition.In the proposed model, each client collects IQ samples from specific combinations of classes.These combinations of classes are determined based on real-world devices used by the considered RATs.For instance, the latest smartphones can be used to capture and label Wi-Fi, LTE, and NR signals.Similarly, an Intelligent Transportation System (ITS) application device can be used to capture and label datasets for C-V2X PC5 and ITS-G5.This approach simplifies the data collection process and improves the feasibility of FL-based technology recognition.
The contributions of this work can be outlined as follows: • Introducing an approach for distributing a non-IID dataset among clients, taking into account the attributes of actual edge network devices used as clients, and proposing an over-the-air aggregation-based FL framework for a technology recognition model in multi-RAT networks.

II. RELATED WORKS
FL has emerged as a promising approach for various aspects of wireless communication, including wireless technology recognition and cognitive spectrum sensing [8].Several studies have explored the potential of FL in various wireless communication scenarios.For instance, in [9], a distributed learning approach is proposed to detect the presence of signal from the Primary User (PU) and estimate the available spectrum in a sensor network, improving overall accuracy while minimizing data exchange between sensors.The results of the study demonstrate that the proposed FL-based approach outperforms separate and autonomous models, leading to a higher accuracy score in terms of spectrum occupancy detection, and data exchange is showcased through the validation dataset.
Similarly, the authors in [7] explore FL in spectrum sensing for cognitive radio environments, highlighting its reliability and privacy preservation.An FL approach is proposed to detect the presence of a 5G downlink signal from a PU and estimate the available spectrum.A similar FL-based PU signal detection scheme is also proposed in [10], [11].These works introduce FL for cooperative spectrum sensing, reducing the transmission burden while ensuring accurate PU signal detection.
Additionally, in [12], a federated adaptive modulation classification method that preserves privacy with minimal performance loss is proposed, offering a secure alternative to conventional centralized approaches.The signal recognition model is trained to identify four modulation types under class imbalance and varying noise conditions.The results in this paper demonstrate that the solution achieves an average performance loss of less than 2% compared to the conventional centralized modulation classification, making it a promising solution for privacy-preserving adaptive modulation classification.
The study in [13] thoroughly investigates FL's applicability in signal modulation recognition.The study's distributed learning approach effectively enhances the accuracy of signal modulation recognition by leveraging an FL-based algorithm.The signal recognition model is trained to identify eleven modulation schemes.Each sample within the dataset consisted of two channels (IQ) of raw data, each with a size of 2 × 128.
The study conducted simulations to analyze signal modulation identification across various scenarios.
Table I shows the state-of-the-art FL-based signal-type recognition solutions that focus on the identification of the presence of PU signal and the identification of modulation schemes.Despite the use of wireless edge devices as clients, these solutions do not provide over-the-air aggregation mechanisms for the proposed FL systems.In contrast, studies in [14], [15] present FL solutions with over-the-air aggregation.However, these solutions focus on image classification applications for handwritten character recognition and cannot be directly applied to wireless technology recognition.
In this work, we propose a novel over-the-air aggregationbased FL scheme for a technology recognition model developed to recognize signal types from multiple RATs, including Wi-Fi, LTE, NR, C-V2X PC5, and ITS-G5 technologies.We use a non-IID dataset with a class imbalance among the edge clients and varying noise levels are considered to show the performance of the proposed scheme.

III. PROBLEM STATEMENT AND SYSTEM MODEL
Problem Statement: Centralized training in technology recognition leads to challenges like extensive data transfers, compromised generalization, and increased complexity in continuous adaptation to emerging technologies.As a solution to this, the main goal of this paper is to develop an FL-based technology recognition system that can identify wireless signals from multiple RATs using over-the-air gradient aggregation.
In addition, we investigate the impact of wireless channel conditions, the number of clients in the aggregation group, and the number of classes per client on classification accuracy and convergence.
System Model: Fig. 1 shows the framework of the proposed FL-based technology recognition.The figure illustrates that network elements at the edge are represented as clients and are employed to train local models.These clients are commercial user devices that can be connected to Wi-Fi, LTE, NR, ITS-G5, and C-V2X PC5 RATs.Additionally, a central server communicates with all clients and is used for model aggregation.Each client k has a local private dataset Dk with D k data points, which are obtained by capturing IQ samples.In each round, each client preprocesses the private dataset, uses it to train a local model, and sends the model parameters to the central server.The central server aggregates the model parameters from all clients.The details of the data acquisition, data pre-processing, local model structure, and the proposed federated learning process are presented in Section IV.   g which can be given as: where ε .The server receives a noisy version of this signal, which can be given as: where h is the channel coefficient for the uplink channel used to send local parameters from client k to the server, and η(t+1) is the noise introduced in the uplink channel at communication round t + 1.
The server uses this received signal to estimate the model parameters sent from each client k, represented as ŵ(t+1) k .The estimated model parameters include errors introduced in the uplink communications.At the server end, the estimated local model parameters received from client k can be given as: where ε(t+1) k is the effective error vector introduced during the uplink communication of the local model parameters transmitted from client k (at communication round t + 1).

IV. PROPOSED FEDERATED LEARNING PROCESS
Data Acquisition: We consider real-world network edge devices used as clients.Hence, each client captures IQ samples from signals transmitted from different RATs, identifies the signal type, and labels it.For practical reasons, we consider non-IID FL, where the dataset of each client belongs to a certain combination of classes based on the RATs that can be decoded/identified by different real-world edge devices.As an example, a smartphone can capture, identify, and label Wi-Fi, LTE, and NR signals.The considered possible combination of classes is presented in Section V.
If we assume K clients are involved in the FL process, each client k captures IQ samples stored as local private raw dataset Dk with D k signal windows.Throughout the article, note that a signal window refers to a set of M IQ samples captured in a given Time Resolution Window (TRW) and the corresponding label.Hence, a signal window can be denoted as Dk = {(û is the corresponding label.The number of classes in a local private raw dataset Dk of each client is defined based on the RATs that can be identified by the client.As each signal window û(k) i within Dk stores the M IQ samples captured in a duration of the TRW used, the value of M is determined based on the sampling rate and TRW used.
Data Pre-processing: During the pre-processing phase, an M point FFT of the M IQ samples representing each signal window is computed.Hence, the final pre-processed private dataset of each client k used for training can be represented with D k which is composed of D k pre-processed signal windows.The i-th pre-processed signal window, represented as u (k) i , is obtained using M point FFT of the M IQ samples stored in corresponding signal window before pre-processing window, i.e. , û(k) i .Therefore, the l-th value of the u (k) i is obtained using: where I[n] and Q[n] represent the in-phase and quadrature components of the M IQ samples stored û(k) i in the data acquisition stage for the i th signal window of client k, respectively, and the term W M is a complex exponential which can be given as W M = e (2πj)/M .
Local ML Model Architecture: A CNN structure designed to efficiently extract features from the FFT of IQ samples representing each signal window is used as a local model.The CNN model comprises three Convolutional layers.In the first layer, there are 64 nested filters, each with dimensions of 1 x 3. The second CNN layer consists of 32 nested filters, each with dimensions of 2 x 3. The third CNN layer contains 16 nested filters, also with dimensions of 2 x 3.These layers are designed to refine the feature extraction process.Between each CNN layer, the Max Pooling layers are used to downsample feature maps, retaining essential information while reducing complexity.The classification phase follows with two Fully Connected (FC) layers.The first FC layer uses ReLU activation, batch normalization, dropout, and regularizer while the second FC layer uses a softmax classifier for probability estimation.
Client receives global parameters ŵ(t) gk 9 Compute local gradient: Update local model: w where w ∈ R d is the parameter vector to be optimized.The objective of the FL system is to obtain the optimal global model w * at the server by minimizing the global loss function.
The minimization of F (w) is carried out iteratively.More specifically, in the t-th communication round, the server broadcasts the global parameter vector w g over a wireless channel, and each client K receives a distorted ŵ(t) gk .User k then computes its local gradient g (t) Therefore, for all k ∈ {1, 2, . . ., K} and all t ∈ {1, 2, . . ., T }, the local gradient g k is given by: Subsequently, the local parameter at client k, w k , where η t is the learning rate of the distributed learning algorithm at iteration t.This updated local model parameter is then sent over the air to the server.Hence, the server receives a distorted model weights ŵ(t+1) For each training round, the training dataset is partitioned among multiple clients.To mimic real-world scenarios, we considered that each client has a unique cluster of IQ samples from two or three classes.The dataset cluster can belong to a certain combination of classes, as shown in Table II.To represent distinct locations and channel conditions, each client introduces noise (considering SNR between -15 and 30 dB) to the data.After that, the FFT of the IQ samples from each client is used for the local training process.The local model is trained using a learning rate of 0.001, a batch size of 256, and undergoes 5 epochs.
Subsequently, an aggregate model is computed by averaging the local parameters from each client.This process is repeated for each round and enables the proposed FLbased technology recognition solution to deliver robust performance across varying environments.The over-the-air model aggregation process uses wireless model parameters exchange which involves modulation from the sender, introduction of noise at the channel, and demodulation at the receiver side.Considering a wide spectrum of channel conditions, the model parameter communication process involves the introduction of Gaussian noise.This noise is added to the transmission of model parameters, occurring bidirectionally, i.e., based on the channel from the client to the server and reciprocally from the server to the client [16].This approach is adopted to simulate and account for the unpredictable and often challenging realworld scenarios in which the data transfer takes place.
The training process utilizes a patience of 10 consecutive communication rounds to assess convergence.This is done by monitoring the accuracy of the aggregated model.If the accuracy remains unchanged or fails to improve by at least 0.1%, the process is deemed to have converged.
Impact of Number of Clients: Fig. 3 illustrates the loss and accuracy curves for the proposed FL-based technology recognition model for 3, 5, and 10 clients in a group used for computing the aggregated model (for QPSK-based overthe-air aggregation at 10 dB SNR).In this analysis, all the clients are set to have two classes based on the combinations in Table II.The results reveal that increasing the number of clients in the group leads to improved classification accuracy and reduces the number of communication rounds required for convergence.Specifically, the classification accuracy of the optimal aggregated model for 3, 5, and 10 client-based models is 90.27%, 91.61%, and 92.44% respectively.On the other hand, the communication rounds required for convergence are 61, 56, and 43 for 3, 5, and 10 client-based models respectively.This beneficial effect stems from the increased dataset diversity, as each client has its own classes and operates under different channel conditions, thereby contributing valuable information to the model training process.
Impact of the Number of Classes per Client: With a larger number of classes, the client models have access to more diverse data samples, allowing them to learn more generalized representations.The higher number of classes per client compels the client models to learn more complex and abstract features to distinguish between a larger set of classes effectively.As a result, the global model's performance is boosted as it leverages these informative representations from various clients.the accuracy percentages at various SNR levels, ranging from -5 dB to 20 dB.For instance, at 10 dB SNR, QPSK exhibits an accuracy of 91.61%, while 16-QAM achieves 89.44%.This table serves as a valuable reference for evaluating the tradeoffs between these two modulation schemes in the context of over-the-air aggregation.
Comparison between Centralized and Proposed FL: We also provide a comparison between the centralized technology recognition model in [4]) and FL-based models, considering scenarios with 3, 5, and 10 clients, each trained with data from 2 distinct classes.For the centralized model, in order to achieve a dataset as extensive and diverse as the one utilized in the case of FL models, we took into account 50 different data collection locations.These locations encompassed a range of SNR, spanning from -15 to 30 dB.Within each of these locations, we utilized 7500 signal windows for every technology (from the dataset used to train the technology recognition model in [4]).Each signal window comprises 1760 values, corresponding to a 20 Msps sampling rate with 44 µs TRW.
In centralized training, the complete dataset is transmitted to the central server, forming the basis for model training.In the context of the FL-based model, the values are obtained by accumulating the data transfer associated with the parameters transmitted from each individual client to the server, as well as the data exchanged from the server to the clients throughout all the rounds needed to achieve convergence.For this reason, Table IV shows that training the centralized technology recognition model necessitates 26.42 GB of data transfer, while the FL-based model significantly reduces this burden, with data transfers of only 0.54 GB, 0.75 GB, and 1.05 GB for 3, 5, and 10 clients, respectively.In terms of accuracy, the table demonstrates that the centralized model achieves an accuracy of 94.84%, while the FL-Model (QPSK-based over-the-air aggregation at 10 dB SNR) maintains competitive results with accuracies of 90.27%, 91.61%, and 92.44% for 3, 5, and 10 clients, respectively.Despite utilizing a similar CNN structure in both centralized and FL-based models, the presence of a non-IID dataset distribution among clients results in only a marginal reduction in the classification accuracy of the FLbased model.Generally, the results underscore the potential of FL in minimizing data transfer while preserving high accuracy rates, making it an appealing solution for collaborative machine-learning tasks with distributed data sources.In this work, the performance of an FL-based technology recognition model with over-the-air aggregation is evaluated under different channel conditions.In the near future, this can be extended by proposing adaptive transmission power and modulation schemes for over-the-air aggregation.Furthermore, the bandwidth requirement and classification accuracy analysis of model quantization can be considered as potential future work.

Fig. 1 :
Fig. 1: Framework of the proposed FL scheme: The parameter server and clients exchange the model updates over the wireless channel.Downlink Transmission: As shown in Fig. 1, in each communication round t, the central server broadcasts the aggregated global parameter vector w (t) g to all clients.We consider a wireless FL system where the model parameter exchange between the server and clients is done over a wireless channel.The server modulates the global model parameters w (t) g and transmits signal χ (t) g .Hence, each client k receives a noisy version of χ (t) is the effective error vector introduced in the downlink communication on the model parameters received by client k and communication round t.Uplink Transmission: In the next communication round, each client k updates its local model parameters w (t+1) k and sends them to the server.For transmission, the client modulates the local model parameters and generates χ (t+1) k Fig. 2 shows how the CNN-based local technology recognition model in each client is used to classify the signal type.FL Scheme: In each communication round, the FL scheme starts by initializing and broadcasting the global parameters.

Fig. 2 :Algorithm 1 : 4 for k = 1 to K do 5 Capture
Fig. 2: Technology recognition execution process using the local model in the clients: Each client captures IQ samples in a time resolution window, and the IQ samples are pre-processed using FFT.The CNN-based technology recognition model is used to extract features and classify the signal type.

t) k 11 Transmit
After that, each client captures IQ samples for D k TRWs.Then, the captured IQ samples are pre-processed using FFT to obtain the local pre-processed dataset D k .Subsequently, the local training is done based on the local datasets and the local CNN-based technology recognition model, and the local loss function f k (w) at client k is given by, Symposium on Dynamic Spectrum Access Networks (DySPAN) aggregates the local model updates from the clients to obtain a new global model w is repeated until the model converges, leading to the optimal global model w * .Algorithm 1 shows the overall process of the model aggregation in the proposed FL-based technology recognition.V. NUMERICAL EXPERIMENTS This section presents the performance evaluation of the proposed technology recognition solution.We used an IQ sample dataset of several RATs, including LTE, NR, WiFi, C-V2X PC5, and ITS-G5 [4].The dataset was collected using a sampling rate of 20 Msps and 44 µs TRW.To ensure a comprehensive assessment, 80% of the IQ samples are used for training a local model by each client, and the remaining 20% of the IQ samples are used for evaluating the performance of the aggregated model on the server.TABLE II: Combination of classes considered for local model training in each client for different numbers of clients PC5 and ITS-G5 LTE and C-V2X PC5 Wi-Fi and ITS-G5 LTE and NR (×2) Wi-Fi and LTE (×2) Wi-Fi and NR (×2) 10 C-V2X PC5 and ITS-G5 LTE and C-V2X PC5 Wi-Fi and ITS-G5 NR and C-V2X PC5 LTE, NR, and Wi-Fi 3 ITS-G5, Wi-Fi, and LTE LTE, NR, and C-V2X PC5 LTE, NR, and Wi-Fi ITS-G5, Wi-Fi, and LTE 3 classes per client 5 LTE, NR, and C-V2X PC5 Wi-Fi, LTE, and C-V2X PC5 Wi-Fi, NR, and C-V2X LTE, NR, and Wi-Fi (×2) ITS-G5, Wi-Fi, and LTE (×2) 10 LTE, NR, and C-V2X PC5 (×2) Wi-Fi, LTE, and C-V2X PC5 (×2) Wi-Fi, NR, and C-V2X (×2)

Fig. 3 :
Fig. 3: Impact of the number of clients in a group for overthe-air aggregation-based FL using QPSK modulation at 10 dB SNR: a) testing accuracy b) training loss

Fig. 4 :
Fig. 4: Impact of the number of classes per client in FLbased technology recognition for over-the-air aggregation using QPSK modulation at 10 dB SNR channel: a) testing accuracy b) training loss We have proposed a solution to tackle the challenges of centralized training for technology recognition model used to identify wireless signals in multi-RAT wireless networks.In particular, we have considered a non-IID dataset distribution approach on the clients and an over-the-air aggregation-based FL framework, which offers distributed training on client devices to enhance scalability while minimizing data transfer and maintaining classification accuracy.A local CNN-based technology recognition model was trained on the clients based on a dataset from a combination of Wi-Fi, LTE, 5G NR, C-V2X PC5, and ITS-G5 RATs.The possible combination of RATs considered in the data distribution in each client was selected based on the RATs that are used by real-world edge network devices.The evaluation results demonstrate that the proposed FL-based technology recognition greatly reduces the data transfer required for training.The results also show that classification accuracy increases as the number of clients and classes per client increases.

TABLE I :
Related work on FL for wireless signal detection showing a) target detection, b) over-the-air aggregation, c) class imbalance considered, and d) applicability on multi-RAT )

TABLE III :
Comparison of QPSK and 16-QAM based overthe-air aggregation for five clients per group and two classes per client Impact of Signal Modulation: Table III provides a comparison between the performance of QPSK and 16-QAM modulations used for the over-the-air aggregation.The comparison is based on data obtained using aggregation with five clients, and with each client accommodating two classes.The table presents

TABLE IV :
Comparison of centralized and FL-based models (QPSK-based over-the-air aggregation for 3, 5, and 10 clients with 2 classes per client)