Transfer Learning for the Behavior Prediction of Microwave Structures

Microwave structure behavior prediction is an important research topic in radio frequency (RF) design. In recent years, deep-learning-based techniques have been widely implemented to study microwaves, and they are envisaged to revolutionize this arduous and time-consuming work. However, empirical data collection and neural network training are two significant challenges of applying deep learning techniques to practical RF modeling and design problems. To this end, this letter investigates a transfer-learning-based approach to improve the accuracy and efficiency of predicting microwave structure behaviors. Through experimental comparisons, we validate that the proposed approach can reduce the amount of data required for training while shortening the neural network training time for the behavior prediction of microwave structures.


I. INTRODUCTION
D EEP learning has been widely acknowledged as an effective paradigm for the behavior prediction and design of microwave structures due to the ability to discover complicated and nonlinear relationships [1], [2]. To model electromagnetic (EM) behaviors of microwave structures, radio frequency (RF) engineers often analyze the wave propagation characteristics based on Maxwell's equations through electronic design automation (EDA) software. Deep learning has been well explored in microwave structure modeling to get RF designers away from complicated and tedious simulation and optimization process [3], [4].
However, training a qualified deep-learning model requires a massive amount of empirical data. In addition, both modeling structures through the EDA software and conducting EM simulation are time-consuming. Also, the behavior modeling of microwave structures often depends on massive variables, including the operation frequency, material, and component Manuscript  geometries. A single neural network can hardly cover all these variables. As a result, the training data for the behavior modeling of microwave structures require to be recollected, and the model needs to be completely retrained from scratch once any of these parameters change [5]. A training from the scratch process is training a model from the first layer with initial parameters. Space mapping is a well-known technique to deal with these variations by exploiting the knowledge of the coarse model [6], [7]. However, the performance of space mapping depends on the empirical knowledge of the course model, which increases the complexity of the behavior prediction model. To address these challenges, we introduce transfer learning to improve the efficiency of microwave structure behavior prediction. Motivated by insufficient training data, transfer learning has been applied to applications of bioinformatics, robotics, and communications [8], [9]. Recently, transfer learning has also been successfully used to model the nonlinear features of power amplifiers [10], [11]. We carry out extensive experiments to make comprehensive comparisons in operation frequency and structure size and examine its scalability. Systematic simulation results presented in this letter show that transfer learning is able to significantly improve neural network accuracy in predicting microwave structures' behaviors. It is also revealed that the source task is beneficial for a target task since fewer data and shorter time are required for training. We subsequently validate that the performance of transfer learning depends on the relevancy of the source and target tasks by comparing two different transfer learning tasks.

A. Deep Learning Model and Data Collection
Convolutional neural network (CNN) models have been validated to be influential in the field of image processing since image pixels are highly correlated with the neighboring [12], [13], [14]. Sharing a similar mathematical nature to image processing, CNN models can be used for precisely predicting the response for microwave structure geometries [15].
The diagram of the CNN model in this letter is illustrated in Fig. 1. The first convolutional block contains five convolutional layers to capture the spatial information of input structures. Each layer uses a 3 × 3 kernel and applies the ReLU activation function. The dense block with two fully connected layers rearranges the output from the convolutional block. A dropout layer with a ratio of 20% is introduced to the proposed model to prevent the model from overfitting [14]. The aforementioned neural network topology and its hyperparameters are all fine-tuned results using domain knowledge. By the CNN model, empirically, any number of convolutional layers and fully connected layers should work. The deeper network, in general, performs better [16]. In this letter, a fine-tuned five convolutional layers CNN is chosen for having a decent microwave structure behavior prediction, while the network is not too large to run out of the GPU memory.
Inspired by the universal mesh solution suited for any given structure in EDA software, we characterize the microwave structures by dividing them into multiple square cells and describing them with binary matrices. Consequently, the topological resolution of input depends on the number of cells covering the structure area. As shown in Fig. 1, the design space is split into 35 squaring cells, which can be quantified by 7 × 5 binary matrices. The microwave structures are randomly generated within the design space. Totally, there are 2 35 = 3.4 × 10 10 possible structures. The outputs are the estimated complex scattering parameters (S-parameters) vectors. A CNN model is developed to predict any structure's S-parameter behavior quickly and precisely. The structure is connected with three fixed 50 connectors and labeled by the Keysight ADS software. This characterizing method can be easily tailored and extended to application scenarios requiring higher design resolutions and a different number of ports.

B. Transfer Learning for Microwave Behavior Prediction
Transfer learning can initiate the learning process for a new target task by transferring knowledge from a previously well-learned source task. The more related the source task is to the target task, the better performance the transferred model can produce [17]. For most microwave behavior modeling tasks, existing learning-based approaches generally follow specific templates corresponding to different frequencies or sizes [4], [18]. These parameter-varying tasks share an identical nature and similar input-output patterns. This sets the foundation for applying transfer learning for microwave behavior prediction.
The dataset with U groups of the input features and output labels is denoted as X = [x 1 , x 2 , . . . , x U ] and Y = [y 1 , y 2 , . . . , y U ], respectively. The training process is defined to solve the following optimization problem: where L(·) is the loss function, and f ω,z (·) is the predicting model. The structures behavior prediction can be treated as a regression problem, where we choose the mean absolute error (MAE) as the loss function, which is explicitly given by The output a i+1 from fully connected layer i + 1 is equal to where A(·) is the activation function; ω i and z i are the weight and bias vectors in layer i . In the convolutional layers, ω represents the kernel, and the dot product operation in (3) is replaced by the cross correlation operation. To apply transfer learning, the first step is to develop a well-trained model for the source task, by which we can fix the parameters [ω i , z i ], 1 ≤ i ≤ m, are fixed. This model is then retrained with the target task dataset. The first m layers are called fixed layers, while the rest layers are called adaptation layers. In this way, the knowledge extracted from the source task is transferred to the model designed for the target task through the fixed layers and the initial network parameters [ω i , z i ], m < i ≤ n, in the adaptation layers.
III. PERFORMANCE EVALUATION AND DISCUSSION To comprehensively evaluate the performance of transfer learning for the behavior prediction of microwave structures, we designed and carried out three transfer tasks. As shown in Fig. 2, task 1 is designed to transfer knowledge between two adjacent frequencies; task 2 leverages the knowledge to train a microwave simulation model with a different structure size; in task 3, both frequencies and sizes are set to be different from the source task. To label these randomly generated structures, we simulate the S-parameter response of the 40 000 structures using EM simulation tools with the size of 49 × 35 mm, and 14 000 structures of 42 × 30 mm. Among them, 70% of the data was used for training, while the rest for testing. The computing platform comprises an Intel i9-9900X CPU @ 3.50 GHz and an Nvidia RTX2080 GPU with 12 GB memory.

A. Transfer Knowledge to Different Frequencies
As shown in Fig. 2, the source task refers to the prediction of the microwave structures' S-parameter response at the operation frequency of 1.7 GHz. The target task is to predict the response at the operation frequency of 1.5 GHz. Fig. 3 shows the experimental results corresponding to different numbers of fixed layers. Transfer learning performs accurately and efficiently when the first two convolutional  layers are fixed for the target training process. Thus, we choose to fix the first two layers in the following experiments. Fig. 4 compares the MAE rates and training time for cases with different amounts of training data. The results signify that the model entailing transferred knowledge is less impacted when the number of datasets is reduced. In contrast, the network without prior knowledge (learning from scratch) significantly suffers from data reduction. Specifically, using only 30% of the training data to train the model, the MAE testing result decreases by 20%, which is still better than the model trained by full training data. In comparison, the model without source knowledge drops by 63%. Even performing fine-tuning for the learning from the scratch model with a triple of training time, the MAE rate is still higher than the transfer learning model.

B. Transfer Knowledge to Different Template Sizes
The source knowledge extracted from one model with a specific size is also supposed to be constructive to train other models of different sizes. Following this rationale, the knowledge from task 1 can be leveraged for training networks and simulating microwave structures with different sizes in task 2. As shown in Fig. 2, we transfer the template from 49 × 35 to 42 × 30 mm. Task 3 is implemented to make comparisons between two different transfer tasks. Based on the results from task 1, 14 000 (30%) datasets are labeled by the S-parameter response of the 42 × 30 mm structures.
The results for the 42 × 30 mm structure are compared in Fig. 5. Tasks 2 and 3 achieve lower MAE rates than the network trained from scratch. Similar to the results in task 1, training a model with source knowledge significantly improves the performance of the target task. Compared with a fine-tuning model with a much longer training time, the MAE rate is still 50% poorer than the model derived from task 2.
The comparison between the two transfer tasks shows that task 2 achieves an 18% lower MAE rate than task 3  within the same training time. This comparison substantiates that in the context of the behavior prediction of microwave structures, the more related the target task is to the source task, the more advantageous performance will be yielded by the proposed transfer-learning-based approach.
In order to examine the feasibility of transfer learning in practice, two microstrip line structure prototypes were fabricated on an Isola substrate of 0.762 mm thickness with a dielectric constant of 2.8 to evaluate the prediction results from the transferring models. Fig. 6 shows the fabricated prototypes and the measurement results from the vector network analyzer. Our model achieves a relatively small MAE of 3.7% between the prediction yielded by the transfer learning model and fabrication validation, which proves the effectiveness and accuracy of our proposed transfer-learning-based approach. From experimental results, it is clear that source knowledge can be leveraged for facilitating model training for the target task, as the source task and target task share similarities. Consequently, only part of the neural network needs to be trained to develop high-performance models for similar tasks of microwave structure behavior prediction.

IV. CONCLUSION
In summary, we proposed a transfer-learning-based CNN model to predict microwave structures' behaviors. This model is generic and thus can be tailored or extended to multiple application scenarios. The proposed model significantly reduces the training time and the amount of training data compared to learning from scratch model. Through extensive experimental results, we also validated that the performance yielded by transfer learning depends on the similarity between the source task and the target task. The research outcomes in this letter well answered several key questions pertaining to automatic RF behavior prediction and opened up a new possibility for reaching a compromise between accuracy and efficiency.