Generative Adversarial Networks as an Accommodative Memory for Cognitive Waveform Synthesis

This paper introduces a new concept of Generative Accommodative Memory (GAM) by showcasing a practical example of using Generative Adversarial Networks (GANs) as Accommodative Memory Basic Units (AMBUs). The GAM can memorize and learn the results of any algorithm and adapt its response to new unseen scenarios by exploring the latent space. This memory is a generalization of look-up tables (LUT), where writing and reading operations correspond to the training and inference of an AMBU or traversing its latent space. To demonstrate the practical application of GAM, we use it in cognitive radar waveform synthesis. Here, a Wasserstein GAN is trained as an AMBU for a specific ambiguity function shaping scenario. The memory can retrieve information for frequent basic scenarios (called input basis scenarios) through the inference of the generator, i.e., generative read. For more complex inputs, the memory accommodates the input by optimizing output over the latent space, i.e., accommodation read. In this light, the GAM can accommodate new scenarios much faster than traditional methods, but at the cost of more memory hardware. As an additional result, we show that traditional algorithms can be outperformed in terms of suppression level by penalizing the loss function according to the desired ambiguity function.


I. INTRODUCTION
G ENERATIVE ADVERSARIAL NETWORK (GAN) has emerged as the frontrunner in recent machine learning research. Yann LeCun famously noted that they are "the most interesting idea in the last ten years in machine learning" [1]. The explosive interest in this area is partly because of their ability to regenerate complicated data distributions. In this light, many GAN offshoots are being increasingly applied to various computer vision and other cognitive-based applications. Meanwhile, cognition-inspired systems are widespread. Cognitive radar is a superb example of such a system. Principally, they enhance traditional radar systems via the cognition Manuscript  gathered from the environment. This additional knowledge is commonly fed back from the transmitter to the receiver and incorporated into the waveform design. In this light, many researchers have investigated waveform design, having some prior information about the target environment. Traditionally, there are two major research trends in radar waveform design. The first focuses on autocorrelation, while the other focuses on the ambiguity function (AF). In a seminal paper, Stoica et al. [2] first showed that it is possible to suppress a desired part of autocorrelation to "almost zero". They employed the cyclic algorithm (CA) method to minimize weighted integrated sidelobe level (WISL) and were followed by others by employing various optimization approaches, including majorization minimization (MM) [3], [4]. Their approach is also generalized to multipleinput multiple-output (MIMO) radar, both using CA [5], [6] and MM [7], [8]. Inspiringly, the CA is also applied to fast frequency sidelobe suppression for cognitive radars [9]. Minimizing peak sidelobe level (PSL) is investigated using the CA in [10] and MM [11]. Interestingly, all the above algorithms fall under the category of cognitive radar, even though they were not specifically designed for it. This is because all need some cognition about the target's range and/or Doppler and design the radar's waveform based on that cognition.
Synthesizing ambiguity function for radar waveform design can be dated back to the nineteen-seventies [12]. Since then, there have been a few investigations. The emergence of cognitive radars revitalized the need for AF shaping [13]. The first research employed time-consuming, complex optimization methods [14]. The AF shaping for cognitive radars is mainly investigated by applying MM [15] quadratic gradient descent [16] and adaptive sequential refinements [17]. These researchers tried to present better algorithms in terms of suppression level and time. However, there is a significant obstacle to practically applying one of the above methods to an actual radar. The feedback from the receiver to the transmitter in a cognitive radar demands real-time waveform design, as noted in [9] and [18]. All the above and similar algorithms involve many sequential iterations of time-consuming operations involving vectors and matrices. A highly dynamic environment, such as high sea clutter states or high-speed aerial targets, can compound this challenge by requiring a rapid change in the desired time-frequency landscape. The traditional answer to this challenge is memory [18]. In fact, no waveform is designed real-time in traditional radar systems. Instead, a pre-designed waveform is often stored and used.
Meanwhile, the use of neural networks in radar applications has been investigated by several authors. For fully adaptive radars (FAR), references [19] and [20] investigate using neural networks to replace a constraint optimization for parameter adaptation and decision making. References [21] and [22] use recurrent neural networks and variational auto-encoders (VAE) to replicate the chirp function, respectively. The latter also implied using a partially observable Markov decision process as an optimizer over the latent space, although they did not implement it. A simple GAN structure is applied in [23] to regenerate chirp-like waveforms while its discriminator operates in real and generated data's ambiguity image layer. It is then followed by [24] for variable waveform lengths. Reference [25] proposed reinforcement learning for frequency notching in radar waveforms. However, none of these studies propose a method to suppress a desired time-frequency region. Most of them also use simplistic chirp-like waveforms for their data, while not able to outperform their dataset waveforms.
Given the dynamic nature of the target environment, there is a need to reconsider the conventional approach to waveform memory. In this paper, we propose using an array of pre-trained generative networks to replicate the AF shaping algorithms in real-time. The main contributions of this paper are: • We introduce a new concept called generative accommodative memory (GAM) as an extension to the concept of a look-up table (LUT). Here, each memory unit comprises a GAN. Writing into the memory block is achieved through training. At the same time, reading corresponds to a simple inference for frequent basis scenarios in the generative mode. For non-frequent general inputs, we search the latent space for the best possible output, using a process called accommodation. In this light, the GAM can model or memorize a time-consuming algorithm that has real-time applications. Although we developed the problem in the context of cognitive radar, this type of memory can apply to other cognitive systems where the system should rapidly make a relevant decision based on its cognition of the environment.
• We focus on AF shaping for cognitive radars to showcase a practical application of this novel concept. In this regard, we apply the formulation for AF shaping to form the tensors by juxtaposing the signal and AF in a single two-layer tensor. As a result, the trained model can generate signal/AF pairs, where particular desired regions are suppressed. Here, the AF is required to facilitate searching the latent space for the best given unseen input at a low computational cost. This process is named the accommodation process because the memory can accommodate unseen inputs or scenarios. The AF part is used to traverse the distribution to find generated AF-signal pairs with desired specifications in certain delay-Doppler regions.
• Integrated sidelobe level (ISL) and peak sidelobe level (PSL) are the two primary metrics for waveform design in radar systems. The ISL corresponds to the Euclidean norm on sidelobes and is mathematically more tractable to minimize. On the other hand, the peaks of the sidelobes can correspond to false alarms, giving it special practical meaning. Here, we penalize the training of the WGAN to minimize the PSL in regions of interest (ROI). Besides its practical meaning, the PSL can also be computed with fewer floating-point operations (FLOPs) than the ISL for a given ROI. Our empirical results confirm that the WGAN can result in increased suppression compared to the algorithm that it learns. Hereafter, we use regular lowercase for a scalar or a generic variable, bold lowercase for vectors, bold uppercase for matrices, and bold uppercase with superscripts for higher dimensional tensors. The vector and matrix indices start from one, while the third or higher dimensions in tensors can adopt any integer interval. In this regard, x, x, respectively represent a scalar, a vector, while X denotes a matrix or a tensor. In particular, X r is a three-dimensional tensor, where r is the third index and can also be zero or negative. Furthermore, (.) T , (.) H , x i, j , and ⊙ respectively denote transpose and conjugate transpose (Hermitian), (i, j)−th element of x (x being a vector, matrix, or tensor), and element-wise or Hadamard product for vectors, matrices, and tensors. The symbols j and (.) * respectively represents the imaginary unit ( j := √ −1) and conjugate of a variable, while the notation := means "is defined by" and x ∼ p(x) means x is distributed according to p(x). The ||.|| and ||.|| ∞ calculates the Euclidean and Chebyshev (max) norms of a non-scalar, respectively. Also, |.| represents a variable's absolute value while operating elementwise on non-scalars. The fields of real and complex numbers are denoted by R and C, respectively, whereas E is reserved for the expected value. In contrast, a complex variable's real and imaginary parts are respectively represented by ℜ{.} and ℑ{.}. For a complex variable z = x + j y = r e jθ , the d B(.) represent the decibel value of that variable defined by d B(z) = 10 * log 10 (z) = 10 * log 10 (r ) + j 10 ln(10) * θ = d B(r ) + 10 ln(10) jθ , where ln is natural logarithm. The d B(.) and other non-mentioned functions would operate elementwise on non-scalars, and x d B := d B(x). Finally, we reservex andx to denote the "generated" and "desired" versions of x, respectively. This paper is organized as follows. Section II is devoted to introducing GANs and related literature. In particular, the WGAN is represented as a promising high-performance successor of GAN. The structure of GAM and what the write and read operations mean are given in Section III. Consequently, we address the AF shaping challenge in cognitive radar as an example of GAM application in Section IV. In particular, we reveal how to form a suitable tensor from the signal and its AF to train the network to learn their relationship. Also, we present penalizing the loss function to improve suppression and the accommodation operation as a search in the latent space. Section V is devoted to empirical results, where both achievable suppression and time gains are discussed. Finally, Section VI gives the conclusion. II. GENERATED ADVERSARIAL NETWORKS Figure 1 illustrates the structure of the GAN, constituted from two different networks: a generator or G and a discriminator or D. These two networks are trained together. The purpose of G is to learn the distribution, p data (.), of the real data, x, while it competes with the discriminator in a zerosum game. The discriminator tries to distinguish between fake (G-generated) and real data. In the original GAN [26] by Ian Goodfellow, the cross-entropy function is adopted as the loss function: where the random vector z is the generator's input and is commonly drawn from a uniform or normal distribution. Alternatively, the min-max problem in (1) can be separated into two optimizations, one for the generator and one for the discriminator: Here, the G is like a forger trying to forge quality counterfeit samples, while D is the authenticity expert trying to catch the forged items. Astonishingly, the forger (G) does not require seeing any real data sample in the training phase. Instead, it learns how to generate data like the original samples only by using feedback from the D. The primary advantage of GANs is that they can model complicated (non-smooth) data distributions [26]. Besides, they can parallelize the generation, something impossible for alternative generative algorithms like PixelCNN [27] or fully visible belief networks (FVBN) [28].
On the other hand, the original GAN was born with challenges. Training a GAN can be difficult for two reasons: 1) vanishing gradients and 2) mode collapse. The first occurs when D is trained much better, i.e., it is a very professional authenticity expert. In that case, the gradients generated by D approach zero, and it can not provide any guidance for further generator training. The second problem occurs when the distribution learned by G has only a few modes. In that case, the generator cannot generate diverse samples and essentially locks to a few or, in some cases, only one sample. Many loss functions have been proposed to remedy these shortcomings, which present smoother, more stabilized training. These approaches are reviewed in several review papers [1], [29], [30], [31], [32]. Among the proposed approaches, a few have been more successful. In particular, Wasserstein GAN (WGAN) [33] is deemed as the most promising [1]. The key to the success of WGAN stems from its use of the Earth Movers (EM) distance as the basis of its loss function. The EM or Wasserstein-1 between p data and p g is defined as: Here, ( p data , p g ) is all joint distributions between p data and p g , while inf represents the infimum. Directly working with this distance is not straightforward. In this regard, reference [33] approximated this measure and consequently proposed the following min-max problem as the training objective function: Generally, the WGAN has smoother training and can provide more meaningful learning curves. Separating the two-target function as in (2) results in: Compared to other advanced GANs, WGAN only changes the loss function in a GAN. Also, it introduces less computational complexity while maintaining superior performance. The primary reason is that the EM distance can model the distance between two distributions, i.e., real and fake datasets, very well and is implemented in WGAN efficiently.

III. CONCEPTUALIZING ACCOMMODATING MEMORY STRUCTURE
We present the concept of GAM as a general concept applicable to a broad range of algorithms. Simultaneously, we present AF shaping as a prime example of the application of such memory. The GAM is particularly beneficial when there are time-consuming algorithms for a real-time application, and researchers compete to lower the computational complexity of such algorithms. An intuitive solution for such a challenge is to save the result of the algorithms simply in a memory. Theoretically, a LUT is constituted by saving every output for any input. However, saving the result for any possible input might not be feasible, which is the case for many practical problems. Here, we propose to save the algorithm's outcome for input basis scenarios (IBSs) in a GAM. The IBSs are the input scenarios with two significant properties: 1) All scenarios can be constructed using IBSs through linear combination. 2) They are frequently observed inputs in practical situations. For a simple example, consider an algorithm that takes a two-dimensional vector and gives a linear transformation of the inputs. In that case, the IBS can be the scaled version of any basis vectors of the input plane, which is practically more observed.
Meanwhile, the GAM should also be able to accommodate other inputs. We proposed to use a GAN as the GAM basic unit (AMBU). The GAM would comprise an array of GANs, each trained for an IBS. The write operation for each memory unit would be equivalent to the training of the corresponding GAN. In contrast, the read operation for IBSs is simply a generated sample of the generator. The accommodation for  non-IBS scenarios would be achieved by traversing the latent space. In this regard, the write operation would be highly time-consuming, while read and accommodate operations can be efficiently implemented. We define two modes for this concept: writing and deployment. In the write, all AMBUs are trained, while the deployment involves sample generation of a generator for IBS inputs or traversing the latent space for non-IBS inputs. Here, traversing is achieved by searching the latent space for the best possible output for the given non-IBS input. We call this process accommodation as the memory can accommodate unseen inputs. Accommodation is detailed in Subsection IV-B. In summary, building the model for an arbitrary algorithm would be the following four steps: To make the concept simpler, let's assume that the algorithm inputs are mapped into two vectors: a and b. The first one is used to select the IBS, and the second one is for accommodation. By definition, accommodation is the process of searching latent space to generate the best output, i.e., accommodating new unseen inputs. It's important to note that neither a nor b are trainable and should be determined before establishing the model's structure. The structure of the GAM in deployment mode is shown in Fig. 2.
Here, N T is the total number of IBSs. Additionally, a selects the appropriate branch/generator, and b gives information that defines how the output should be adjusted. This is done by updating z via feedback as we traverse the latent space. The data flow is as follows. First, the a determines which generator(s) is used to produce the output. If the input is an IBS, the output would be a simple inference of the selected generator. This would be a generative read. A non-IBS input can be represented by a linear combination of several IBSssee IBS definition. In this light, a selects the corresponding generators to those IBSs. Then, we run the accommodation process to find the best output for our non-IBS input. In this case, the memory read would be accommodative. Here, b is an input to the accommodation process and defines the searching parameters in the latent spaces. The data path for accommodation is colored with red and it changes the latent variable (z) to find the best output. It is observable from Fig. 2 that the memory complexity of this approach is scaled by N T linearly. However, the presented structure is certainly scalable because all AMBU have the same structure, although complex. This enables copying and pasting in hardware or software. We are currently working on structures independent of the number of IBSs and intend to present our findings in forthcoming papers.
For the case of AF shaping, the IBSs can be selected as a set of disjoint regions of interest (ROIs) that can cover the whole time-frequency landscape. Other ROIs should be included in non-IBS inputs and demand accommodation. The IBS should be selected as basic as possible. In particular, the Dopplerlag plane can be covered by equal-sized rectangular ROIs: 0 ≤ r min ≤ r ≤ r max , ν min ≤ ν ≤ ν max as the IBS. Note that only non-negative lags should be considered because of the intrinsic symmetry in AF. In this light, a can be expressed in the form: where K is the number of disjoint ROIs or IBS cardinality.
The b can include coordinating information for any additional ROI. In brief, the a and b are determined based on the ROI boundaries (i.e., r i,min/max and ν i,min/max ) in the AF shaping problem. The main ROIs calculate the first, while the second is deduced from the axillary ROI/ROIs. Remark 1: A few words on memory usage: We propose GAM as a faster alternative to the original algorithm but with increased memory usage compared with the original algorithm. However, it uses less memory than saving all algorithm's outputs in a LUT, which might not be even possible because of the output space's high cardinality. Our solution can be applied even when the output space of the algorithm is unlimited, as demonstrated by the example of the ambiguity function shaping algorithm.

IV. AMBIGUITY FUNCTION SYNTHESIS
For a complex-valued signal s(t), the narrow band ambiguity function is defined as This function was introduced by Woodward [34], [35] in a slightly different formulation and is widely applied throughout signal processing [36], radar theory [37], [38], [39], and other fields. Considering that the signal s(t) represents a pulse-coded baseband signal where s = [s 1 , . . . , s N ] T ∈ C N code vector and ρ n (t) is the rectangular waveform, the power of the ambiguity function for s can be given by where ν ∈ [−1/2, 1/2] is the normalized frequency, r ∈ 1 − N , . . . , 0, . . . , N − 1 is the time lag. Without loss of generality, we assume that N is even. Furthermore, p and J are respectively defined by and The discrete-time AF given in (11) can also be discretized in frequency. Let N v = 2N be the number of frequency bins. Then, by letting ν k = 1 2 − k N ν , k = 0, . . . , N ν − 1, we have g s (r, ν k ) = 1 ∥s∥ 2 |s H J r (s⊙p(ν k ))| 2 . In this light, the AF shaping is commonly approached by minimizing a weighted sum of g s (r, ν k ) (for instance, see [13], [14], [15], [40]): where w r,k is the real-valued weights predetermined by the cognition from the scene. Here, we instead seek to have a deep network to learn the composition of the AF and its relation to the signal itself. In this regard, we define the To more clarify how the matrix A is constituted, consider the discretized AF: g ′ (r, k) := 1 ∥s∥ 2 s H J r (s ⊙ p(ν k )), k = 1 − N , . . . , N − 1. Then, insert values of g ′ (r, k) into a matrix, transpose it, and then downsample it by a factor of two in both dimensions. This gives us a N × N − 1 complex-valued AF matrix, where changing the first and second indices traverses frequency and time, respectively. It is worth mentioning that the AF graph can be obtained by depicting ℜ{A d B } = d B(|A| 2 ), where the |.| 2 is an elementwise (Hadamard) operation. Furthermore, note that 0 < A i, j ≤ 1 and, therefore, all dB values for A are negative. For the sake of constructing related tensors, we restrict the range of values in the matrix to some predefined value d min ≤ ℜ{A d B i, j } ≤ 0 . We then map each element of A d B into the [−1, 1] interval by following affine map: Meanwhile, lets be a vector, whose members are normalized members of s, i.e.,s i = Note that theX encompasses both s and its ambiguity function information. However, each element ofX is a complex number.
To apply a deep network with real-valued parameters, realvalued tensors are needed. In this regard, we define: The X structure is summarized in Fig. 3. A dataset made up of X tensors can train a deep generative model.

A. Penalizing the Objective Function
Aside from improving the training curve, the objective function can be penalized to have more suppression in regions of interest (ROI). Since the loss function of the generator is blind to the dataset, this penalty should be reflected in the discriminator's objective: where γ 1 and f (.) are the penalty weight and penalty function, respectively. In AF shaping, the main part of the data is the signal itself, embedded as theŝ in X. The other part is necessary for the accommodation process. In this regard, we denote the signal extracted from the G(z) asŝ. Therefore, s would be the last column of G(z) as depicted in Fig. 3. Finally, we consider the following penalty function: where w r,k ∈ {0, 1}, r = 0, . . . , N − 1, k = 0, . . . , 2N − 1 are coefficients to determine ROIs. In this light, we can penalize those AF sidelobes that are in specific ROIs by setting their corresponding coefficient to 1. Introducing the penalty function is solely to prove that the proposed method can exceed the performance of the task it is imitating. In this regard, the simulations in Section V include both the use and non-use of the penalty function, as specified therein.

B. Accommodation
We define accommodation as searching for a latent variable that generates an output suitable for a newly imposed scenario. In this regard, the accommodation would be a read operation for non-IBSs. In AF shaping, the accommodation can be achieved by solving the following optimization problem: where W is a zero-one windowing matrix, determining the ROIs; and,X contains the desired AF shape. Furthermore, ||.|| ∞ represents the Chebyshev norm defined by ||Y|| ∞ := max i, j |Y i, j |. The Chebyshev norm is preferred over the Euclidean norm because it is computationally inexpensive and also represents the PSL, which has a practical meaning for radar engineers. To clarify, the peak of sidelobes corresponds to falsely detected targets. Hence, minimizing the PSL often corresponds to minimizing the false alarm rate. As depicted in Fig. 3, the tensors in AF shaping comprise two separate pieces of information. The first part conveys the information of AF, embedded in the matrixÂ. This part enables accommodation, while the signal informationŝ would be used in the actual application. In this regard, the W would have zeros in elements corresponding toŝ. Although (23) can seek any desired AF shape, the practical applications often involve suppression in certain ROIs, i.e.,X = 0. In this case, equation (23) seeks to minimize PSL over ROIs determined by W. The optimization in (23) can be solved by one of the gradient descent (GD) algorithms. Here, we applied ADAM [41] with its default hyperparameters to find its solution.

V. EMPIRICAL RESULTS
In this Section, we present the experiments driven to showcase the performance metrics for the proposed approaches. The model construction is programmed in Python using TensorFlow≃2.3. We extensively used and modified the implementation in [42]. The dataset generation and depictions are performed using Matlab 2019. The training is performed using the ARC computing cluster of the University of Calgary. Combined with Slurm [43] and comet model monitoring service, 1 it significantly facilitated the hyper-parameter tuning. We used one Tesla V100-PCIE GPU per run, while Slurm is limited to 128 GBytes of RAM. We used a Core I7 Desktop with 16 Gbytes of RAM for evaluations, comparisons, and depictions. No GPU is applied for this case not to use GPU speed up in comparisons. Finally, we have used TFTB python library 2 for calculating the complex-valued ambiguity matrix. 1 Available at https://www.comet.ml/ 2 Available at https://pypi.org/project/tftb/

A. Dataset, Network Structure, and Hyperparameters
For the dataset, we use the output of several algorithms from [15]. This reference presents several base algorithms and several accelerated versions. The first base algorithm uses the pure MM method and is called Majorized Iteration for Ambiguity Function Iterative Shaping (MIAFIS), while the second is Coordinate Iteration for Ambiguity Function Iterative Shaping (CIAFIS) which combines the coordinate descent method with MM. Since speed is a main concern in this article, we compare our work with accelerated versions mainly. In this light, the dataset is constituted by juxtaposing the AF and signal samples generated by CIAFIS, Accelerated CIAFIS, and MIAFIS via Local Majorization (MIAFIS LM), all from [15]. The samples for all algorithms are gathered after their convergence has been completed. This is achieved after at least 100, 200, and 10000 iterations for MIAFIS LM, Accelerated CIAFIS, and CIAFIS, respectively. All parameters are set according to the suggestions of [15], while the primary ROIs are depicted in Fig. 4. To form the dataset, 5000 unimodular signal samples and their sub-sampled AF are stacked together. This dataset is then used to train deep convolutional WGAN with a structure similar to [44], except for the loss function. The network structure is detailed in Table II. Here, the generator includes four layers, with a latent variable of length 128, whereas the size of the generated tensor (i.e.,X) is 32 × 32 × 2. The discriminator also has four layers. The generator and discriminator respectively use the ReLU and Leaky ReLU (α = .2). The Adam optimizer [41] is used for both training and accommodation purposes with β 1 = 0.5, β 2 = 0.999 and a learning rate of 0.0002 unless specified otherwise. The training is performed for 400 epochs while the batch size is set to 64. All network parameters are from float32 type, while training is performed by benefiting from the TensorFlow graph mode [45] and GPU speed-ups [46]. Commonly, each one results in one to two orders of magnitude faster training. Note that no graph mode or GPU acceleration is used for accommodation, as emphasized later. In this implementation, one generator training cycle is performed for each discriminator training cycle. Finally, the tensor's lower bound is set to d min = 150 dB. Meanwhile, the PSL penalty weight γ 1 is set to 1000 unless specified otherwise. The hyperparameters given here are obtained through modest babysitting by training the model several times to find the best hyperparameters.

B. Hyperparameter Tuning
We briefly discuss the hyperparameter tuning procedure for the proposed model. First, we consider the effects of the learning rate while all other hyperparameters are constant. In this regard, the training process was repeated with four different learning rates L R = [2, 4, 6, 8] × 10 −4 . The discriminator loss and generator gain are depicted in Figs. 5.a and 5.b, respectively. For these experiments, the batch size, γ 1 , and the number of epochs are set to 64, 0, and 200. Therefore, these models are trained considering no penalty. It can be observed that the higher learning rate, i.e., L R = 0.0008, corresponds to faster convergence in terms of Wasserstein-1 distance. However, it may not correspond to smooth convergence in terms of  overall sidelobe levels. Meanwhile, the model is also trained for various penalty weights: γ 1 = 10 [2,3,4] . Figure 5.c reveals the penalty value for these experiments, where the more negative value corresponds to higher suppression. Interestingly, γ 1 = 100 and γ 1 = 10000 correspond to the best results, while the latter slightly outperforms all others. We expect that finer tuning might result in a very slight improvement.

C. Achievable Suppression
The main purpose of this paper is to advocate a type of memory. However, the penalization of the WGAN loss function can also result in significant suppression levels, which can be considered an important achievement. In this regard, we compare an evaluation of the generator network, trained with the hyperparameters given in the previous subsection, with MIAFIS LM and CIAFIS benchmarks. We deliberately consider a rather complicated IBS to illustrate that the presented method can handle such scenarios. Note that IBS is chosen to be as basic as possible. The IBS considers suppressing two primary ROIs depicted in Fig. 4. Here, we set N = 32, N ν = 2N = 64. The primary ROIs are and ROI 2 : That is, the normalized frequency interval for the ROI 1 is ConsiderX be the produced tensor by the generator, where hat notation (ˆ) emphasizes being produced by the generator. We extracted the signal and AF fromX by noting to (18). Then, the AF of the generated waveform gˆs(r, ν) and the generated downsampled AF tensorÂ d B are constituted. Note that the actual waveform isŝ, embedded in the last column ofX. TheÂ d B is only used for the accommodation process. However, omittingÂ d B necessitates recalculating the AF in each accommodation step, making it useless given the required time. Furthermore, Figure 6 illustrates the comparison of gˆs(r, ν),Â d B and the AF for the CIAFIS and MIAFIS LM benchmark algorithms. The parameters of all benchmark algorithms are set to the suggestions of [15]. At the same time, they are run enough to ensure their convergence. Here, we only depict AF for positive lags because the other half can be given using the AF symmetry rules [47]. Meanwhile, note thatÂ d B is the generated downsampled version of AF. This justifies the lower resolution and omitting the last column. Since the penalty in the loss function is on the AF of s itself, we observe a better suppression in gˆs(r, ν). Figure 7 depicts the r = 5 cut of the AF for CIAFIS MIAFIS LM and gˆs(r, ν).

D. Accommodation
Although this approach can search for any desired imposed output, the practical AF shaping only involves suppressed ROIs. In this regard, we consider a secondary ROI where suppressed regions are desired. This ROI is depicted in Fig. 4 and is defined by   which means the frequency interval lies between 1 2 − 13 64 = 0.2964 ≤ ν ≤ 0.3438 = 1 2 − 10 64 . For the benchmark, the CIAFIS, Accelerated CIAFIS, and MIAFIS algorithms are rerun to suppress ROI 1−3 . In addition, we use the accelerated version of AF shaping via Iterative Minimization (AFSIM) from [48], targeted to suppress the same ROIs. The AFSIM is previously developed by us to be a fast solution for AF shaping with the ability to have nonzero ROI. Figure 8 illustrates the PSL vs. time for CIAFIS, accelerated CIAFIS, MIAFIS, and accelerated AFSIM algorithms are run for the newly imposed scenario, ROI 1−3 , and compared to the accommodation with learning rates of 0.002 and 0.0002. No TensorFlow speed-up is used for this Figure. Note that one to two orders of magnitude speed-up are often observed when using each GPU or Graph implementation speed-ups. Therefore, significantly more efficient implementation is also possible. The total number of iterations for accommodation is 500, while the CIAFIS, Accelerated CIAFIS, and MIAFIS are respectively run for 10000, 500, and 10000 iterations to ensure proper convergence. Note that all benchmark algorithms start producing outputs later than GAM because their first iteration would take over 0.2 seconds. In comparison, the total 500 iterations of GAM would take approximately 7 seconds, making it almost 0.014 seconds per iteration. Interestingly, all 500 iterations are not even needed for this example to generate the close optimal solution. The figure reveals that this can be achieved by around 0.2 seconds for LR = 0.002. Also, note that this process is only needed for non-IBS inputs, which are not frequent by definition. For IBS inputs, only one inference is needed as a generative read, which is significantly shorter in time than 0.01 seconds, as each iteration includes at least one inference. We did not use the GPU and graph implementation speed-ups in TensorFlow for the sake of fairness Also, note that the benchmark algorithms are not intrinsically designed to optimize PSL but ISL. This is why the PSL might increase during their evolution. We seek to minimize PSL in the GAM accommodation mode because computing the PSL is computationally less expensive, which suits better to our purpose. Still, comparing PSL and ISL minimization algorithms together is fair and has precedence [10], [11], [49]. This is because both metrics are defined to measure the suppression level, while the eventual goal is suppressing the sidelobes. In brief, up to two orders of magnitude and less time consumption are evidently observable in accommodation than rerunning the algorithms for new scenarios, even if the TensorFlow implementation is loose. We expect at least one or two orders of magnitude faster implementation using speed-up techniques of TensorFlow (i.e., GPU and Graph implementation speed-ups). The overall performance of GAM compared with rerunning the algorithm should also consider the frequency of IBSs vs. non-IBSs. For instance, considering IBSs are ten times more frequently observed, the overall performance would be faster, another order of magnitude.

E. ISL Comparison
Although we designed the accommodation process based on the PSL, comparing the ISL also can convey additional information about the waveform. In this regard, we report on the ISL of the final waveform from benchmark algorithms and the accommodation process with exactly the same specification as the previous subsection. To ensure the reproducibility of the experiments, we repeated generating the waveforms using benchmark algorithms and the accommodation process 5 times. The mean and variance (respectively denoted by µ and σ ) of the ISL and time are reported in Table III. The best ISL is for accelerated CIAFIS, a benchmark algorithm, while the second and third places are for the accommodation process. Although not designed for optimizing ISL, the proposed accommodation process ranks second, whereas all other benchmark algorithms are designed by optimizing ISL.
Meanwhile, the main metric here is the time needed for generating the waveform. It can be observed that the accommodation process needs substantially less time compared to other methods. It is also established that the overall average time can be 3 to 4 orders of magnitude lower, considering the infrequency of the need for the accommodation process and optimized implementation options available in deep learning toolboxes.

VI. CONCLUSION
In this paper, we introduced generative accommodative memory (GAM) as a smart look-up table (LUT). Each accommodative memory basic unit (AMBU) can adjust its output for unseen inputs through a mechanism called accommodation. We suggested using a generative adversarial-based structure for each AMBU with three operations. The write operation corresponds to training. The generative read operation corresponds to generator evaluation, only for frequently seen basis scenarios. The accommodative read operation traverses the latent space to accommodate unseen input scenarios. In this light, the GAM can be applied to replicate the output of timeconsuming algorithms for real-time applications, where the memory is written rarely and read frequently. To showcase the application of this concept, we adapted the approach to the AF shaping for cognitive radars. This approach is proved to have several advantages compared to common approaches: • By penalizing the target function, it is possible to get better suppression than algorithms that the GAM tries to mimic.
• Upon a newly imposed requirement, the output can be accommodated by traversing the latent space of the generator. The requirement can be of any type, including but not restricted to new suppressed regions of interest. Accommodation is, by design, for non-frequent scenarios, while a simple evaluation of an AMBU's generator would suffice for frequent scenarios. Even a non-optimized implementation of accommodation (no graph or GPU speed-ups in TensorFlow) was proved to be about two orders of magnitude faster than rerunning the algorithm for new scenarios. Although we showcased only the AF shaping algorithm, the GAM can adapt to memorizing a wide variety of algorithms in various fields. It can be adapted whenever the researchers try to speed up an existing algorithm to meet the real-time requirements of an application. To this end, the input basis scenarios (IBSs) should be determined, a set of datasets should be constructed for all IBSs, and an array of GANs should be trained for them. The downside of GAM is a time-consuming write operation and required memory hardware. Optimizing the structure of a GAM to improve this weakness can be a good trend for future research.
Hamid Esmaeili Najafabadi (Senior Member, IEEE) received the B.S.in electrical engineering from the Isfahan University of Technology in 2007, the M.S. degree in electrical engineering from the Amir Kabir University of Technology in 2010, and the Ph.D. degree in electrical engineering from the University of Isfahan in 2017. He has collaborated with some industrial groups, including the ICTI (icti.ir), Cheetah group, and he was engaged in some national and international projects. Currently, he is a Research Associate with the University of Calgary. He published over 30 journal papers and technical reports and has been frequent reviewer for IEEE TSP, TVT, TAES. He also served as guest editor for IEEE JSTARS. His research interests include optimization theories and applications of machine learning with a focus on generative adversarial networks.
Henry Leung (Fellow, IEEE) is a Professor of the Department of Electrical and Computer Engineering of the University of Calgary. Before joining U of C, he was with the Department of National Defence (DND) of Canada as a defence scientist. His current research interests include information fusion, machine learning, the IoT, nonlinear dynamics, robotics, signal and image processing. He is an associate editor of the IEEE Circuits and Systems Magazine and the IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS. He is the topic editor on "Robotic Sensors" of the International Journal of Advanced Robotic Systems. He is the editor of the Springer book series on "Information Fusion and Data Science". He is a Fellow of SPIE.
Peter W. Moo (Senior Member, IEEE) received the B.Sc. degree in mathematics and engineering from Queen's University, Kingston, in 1993, and the M.S.E and Ph.D. degree in electrical engineering: systems from the University of Michigan in 1995 and 1998, respectively. Since 1999, He has been with Defence Research and Development Canada (DRDC), where he is currently an Expert Level Defence Scientist and Group Leader in the Radar Sensing and Exploitation Section. He is DRDC's lead scientist for naval phased array radar and high frequency surface wave radar. He is coauthor of the book Adaptive Radar Resource Management, published by Academic Press in 2015. He has coauthored more than 50 journal papers, conference papers, and scientific reports. His research interests include radar resource management, multipleinput multiple-output radar, and space-time adaptive processing. He serves on the Editorial Board of IET Radar, Sonar and Navigation and is Chair of NATO Research Task Group SET-285 on Multifunction RF Systems.