Low-Complexity VLSI Architecture for OTFS Transceiver Under Multipath Fading Channel

Orthogonal time frequency space (OTFS) modulation has established itself as a dependable protocol for high-speed vehicular communication. This pioneering technique operates within a novel 2-D delay-Doppler domain waveform. When compared with conventional modulation methods like orthogonal frequency-division multiplexing (OFDM), OTFS demonstrates superior performance enhancements in scenarios involving rapidly moving wireless channels. This article begins by initially unveiling the input–output association of the OTFS signal within the delay-time domain. A comprehensive comparison with the established OFDM waveform highlights the potential of OTFS for achieving a notably lower bit error rate (BER) under various conditions, which has been obtained by using the minimum mean square equalizer (MMSE) equalization technique. Finally, we have proposed a novel and low-complexity VLSI architecture for the OTFS transmitter and the receiver by using the lower–upper (LU) decomposition technique for the first time in the literature. We have compared the performance metrics of our proposed transmitter architecture with the existing work, where our design works 7.394% faster than others, utilizing 89.354% less in the number of lookup tables (LUTs) and 79.984% less in the number of flip-flops (FFs), which shows that our design is more optimized in latency and resource utilization. There is no architecture design of the OTFS receiver part in the existing literature to compare; we have shown the resource utilization of our proposed receiver architecture for the first time in the literature, followed by timing analysis and functionality testing of the proposed architecture.


I. INTRODUCTION
T O ensure high-quality service in fast-moving vehicular scenarios like vehicle-to-vehicle communications (V2V), unmanned aerial vehicle communications, and other fifthgeneration (5G) applications, the demand is steadily rising, as highlighted in [1].Although OFDM is a widely used transmission technology, it faces challenges in delivering reliable connections at speeds exceeding 300 kmph due to its susceptibility to inter-carrier interference (ICI) arising from Doppler spread and phase noise.In contrast, OTFS, as demonstrated in [2], has proven to be superior to OFDM in such highly mobile environment, making it an essential component of 5G's operational scenarios.
To the best of our knowledge, OTFS modulation is a recently introduced two-dimensional (2D) modulation method that leverages the delay-Doppler domain for encoding information symbols, as detailed in [2].It is noteworthy that OTFS incorporates both pre-processing and post-processing steps into conventional multi-carrier modulation schemes, resulting in enhanced bit error performance compared to traditional multi-carrier techniques.Furthermore, channel variations have been observed to be more gradual in the delay-Doppler domain as compared to the time-varying multi-path channel.This simplifies the equalizer design and allows for less frequent channel estimation in OTFS, consequently reducing the over-head associated with channel estimation in rapidly changing channels [3].
In the OTFS framework, data symbols are arranged in the delay-Doppler domain, as distributed in the time-frequency grid used in OFDM.Subsequently, a unitary transformation, known as the Inverse Symplectic Finite Fourier Transform (ISFFT), is applied to disperse the data across the timefrequency grid.Next, an OFDM modulation technique is employed [2].For delay spread, a Cyclic Prefix is added in the case of OFDM modulation, which is referred to as CP-OTFS.On the other hand, when OTFS is combined with block OFDM modulation and block CP, it is denoted as reduced CP OTFS [4].
The OTFS signal, which exhibits time-frequency spread and is susceptible to inter-symbol interference and ICI when transmitted through a linear time-varying channel, is subjected to advanced interference cancellation techniques.These receivers can be categorized into two types: (a) Linear receivers [5], and (b) Nonlinear receivers [6].Nonlinear receivers offer lower error probabilities but come with higher computational complexity compared to their linear counterparts.Our focus is exclusively on linear receivers due to their practical feasibility.
In order to deal with high ICI due to Doppler spread and phase noise, 5G new radio (NR) has adopted a variant of contemporary edition of OFDM, which is known as variable sub-carrier bandwidth OFDM (VSB-OFDM) [7] [8] [9].VSB-OFDM, which is characterized by different bandwidths for different sub-carriers, serves to reduce ICI [10].To select efficient waveform for the environment in which the Doppler spread and the phase noise are very strong, one needs to do appropriate comparison of performance of VSB-OFDM with that of OTFS.
OTFS modulation was initially documented in [11], demonstrating its remarkable error performance, particularly in scenarios involving vehicle speeds as high as 500 Km/h.Following this, numerous instances of work have emerged, each delving into various aspects of OTFS modulation [5], [12]- [17].Based on multiplexing of   information symbols across  delay bins and  Doppler bins, an analytical upper limit on the peak-to-average power ratio (PAPR) has been established.It is worth noting that this bound exhibits linear growth with the number of Doppler bins  and not with the number of delay bins  (or equivalently, the number of subcarriers in the time-frequency domain).This contrasts with conventional multi-carrier waveform, where PAPR increases linearly with .Consequently, OTFS systems with  <  (typically the case) can potentially possess a lower PAPR Fig. 1: OTFS Modulation and demodulation scheme.compared to a multi-carrier system with  sub-carriers [18].
To the best of the authors' knowledge, limited attention has been focused on VLSI architecture of the OTFS transceiver or its hardware implementation in FPGA, as observed in the current body of literature.The primary obstacles within this field pertain to the substantial scale of matrix inversion required for computing the final mathematical equations in OTFS.The proposed architecture has been developed to tackle these challenges.This work offers the advantage of efficiently utilizing FPGA board resources through the proposed architecture.This paper introduces an architectural solution designed for OTFS transceiver for the fading channel.
The structure of the remaining paper is outlined as follows: In Section II, we delve into the models pertaining to the OTFS system.Section III elaborates on the matrix representation of the OTFS model.Section IV corresponds to simulation results.Section V depicts the proposed architecture of OTFS transmitter and Section VI shows the proposed architecture of the OTFS receiver.Synthesis and implementation report have been shown in Section VII.Sections VIII and IX analyze timing and functionality tests, respectively.Finally, The paper concludes in Section X.

II. SYSTEM MODELS
In the context being discussed, a single-input single-output (SISO) OTFS system is under consideration, which involves the transmission and reception of uncoded quadrature amplitude modulation (QAM) symbols.This approach can be visualized as an augmentation of the conventional OFDM system through the incorporation of pre-and post-processing modules.The modulation and demodulation procedure for OTFS is depicted in Fig. 1.Initially, at the transmitting end, the QAM symbols are organized within a two-dimensional matrix, with  columns in the Doppler domain and  rows in the delay domain.Subsequently, the signal undergoes a transformation into the time-frequency domain through the ISFFT.Following the Heisenberg transformation, the signal is converted back to the time domain and transmitted through the doubly-dispersive channel.At the receiving end, the received signal is subjected to processing using the Wigner Transform and Symplectic Finite Fourier Transform (SFFT).
The time-frequency grid is discretized with resolutions of T and Δ  , which define the units of time and sub-carrier frequency spacing, respectively.This grid is composed of a collection of discrete time-frequency elements denoted as Φ = (, Δ  ),  = 0, . . .,  − 1,  = 0, . . .,  − 1.Here, T represents the duration of  QAM symbols, and Δ  represents the spacing between consecutive sub-carriers.Subsequent to the ISFFT processing, the signal in the time-frequency domain can be represented as, In the given context,  [, ] represents the signal existing within the delay-Doppler domain with Doppler resolution as   = 1   and   = 1  △  as delay resolution.The framework of the delay-Doppler grid is defined as a specific collection characterized by a set of elements: Following the reconfiguration of the matrix  [, ] into a sequence within the time domain, the Heisenberg transform (as detailed in references [1] [16]) is applied to generate the signal within the time domain.
where    () represents the window function, the signal () is propagated through the delay-Doppler channel.This propagation results in the received signal, which can be denoted as follows: where ℎ(, ) is the channel delay-Doppler response and () is the additive white Gaussian noise, and ℎ(, ) is defined as In this context, ℎ  ,   , and   indicate the path gain, delay, and Doppler shift pertaining to the th path, respectively.
Upon reception, the process involves applying a Wigner transform to initially convert the received signal from the time domain to the time-frequency domain, defined as follows: In this context, the parameters  and  are represented as  =  and  = △  , respectively, where    () denotes the window function.Subsequently, the decoding of the signal is accomplished through the use of the SFFT method as: Due to the influences of Doppler effects and channel delay spread, equalization is necessary to effectively retrieve data symbols from [, ].We have utilized the minimum mean squared equalization approach.

III. MATRIX REPRESENTATION
To facilitate the analysis of the OTFS system within fastfading channels, we transform the given formulas into a framework involving vectors and matrices.We employ the Kronecker product symbol ⊗ and the function (•) to represent the Kronecker product and vectorization, respectively.Specifically, let x = vec(D) symbolize the signal intended for transmission, where D ∈ C  ×  represents the 2D data matrix.During the transmission process, the ISFFT operation can be expressed as follows: Here, X ∈   ×  represents the signal intended for transmission in the time-frequency domain.The matrices F  and F   correspond to the FFT and IFFT matrices, respectively.Based on (3), the expression for the Heisenberg transform is derived as follows: The matrix S is then converted into the time domain in vector form, as shown below.Where, G   is the identity matrix(I  ) of order M.
After undergoing transmission through the delay-Doppler channel, the received signal can be formulated as follows: where H is defined as where The received baseband signal subsequently passes through pulse shaping, Wigner transforms, and SFFT processing modules.By reorganizing the signal  into an × matrix denoted as R, the received signal  can be acquired as follows: Finally, the signal is transformed back into a vector format as shown in equation (16).
Here, G   represents the shaping pulse employed at the receiver.In the context of the OTFS system, we consider using standard rectangular pulse shaping.In this scenario, both the pulse shaping matrices G   and G   are equivalent to identity matrices.As a result, the relationship between the input and output of the OTFS signal can be articulated as follows: Let,  d 2 be the transmission signal power,  n 2 be the noise variance and H    is given as The received signal undergoes equalization using conventional MMSE.The receiver possesses accurate channel state information.The output of the MMSE equalizer, which considers the noise variance  2  , can be represented as: When the signal r is subjected to processing through an LMMSE equalizer, then (19) can be simplified and written as follows: Where A is equal to F   ⊗ G   and F   is N-order normalised Inverse Discrete Fourier Transform(IDFT).Thus, the above equation is simplified as, Now we can plot the Bit Error Rate (BER) on comparing the original De-Do data (x) and the equalized De-Do data (x).

IV. SIMULATION RESULTS
In this section, we conduct simulations for two modulation techniques: traditional OFDM and OTFS.Table I shows the details of the simulation parameters.We make the assumption that the receiver possesses accurate knowledge of the channel conditions; in other words, channel coding is not implemented.Fig. 2(a) demonstrates that OTFS exhibits substantially improved BER performance compared to other modulation schemes, especially in fast-fading channels.It is notable that OTFS, specifically in LMMSE receiver, can effectively exploit diversity gain.For instance, when considering a BER of 10 −3 , the OTFS-LMMSE receiver demonstrates an improvement of 8 dB in contrast to the OFDM-MMSE receiver.In light of these observations, we can deduce that OTFS surpasses OFDM in terms of performance.
Here, Fig. 2(b) depicts that OTFS works well for different high-speed scenarios for ideal channel estimation.Here, we can see that the BER plot has close proximity for almost all the speeds, which means that OTFS supports different orders of speeds.Compared to different speeds, we can see that the 120 Kmph speed shows slightly better BER performance than all other speeds.
Referring to the equation ( 21), LMMSE equalization can be executed as a two-stape process.In the initial stage, LMMSE channel equalization is performed to derive r  = H  r, where H  represents the equalized channel matrix.The subsequent stage involves an OTFS-matched filter receiver, yielding the result d = A † r  , where A † signifies the conjugate transpose of matrix A.
In the context of this study, it is demonstrated that the execution of d = A † r  is straightforward and demands ( /2) log 2 () complex multiplications (CMs).However, the direct implementation of r  = H  r necessitates the inversion of the matrix  = HH † +  2   2  I and the multiplication of H † , involving a computational complexity of  ( 3  3 ) CMs.To address this, the authors of [19] have reduced the computational complexity associated with r  = H  r by using LU decomposition.To achieve this, they explored the structural characteristics of the matrices involved in channel equalization, as outlined below.

I
Using (19), HH † can be expressed as, Using (12),  becomes, Based on the aforementioned equations, it can be inferred that the utmost displacement of diagonal elements within matrix  is approximately ±(-1), where  is the channel delay length (Where  = ⌈  ⌉,   is maximum delay spread).Furthermore, owing to the cyclic pattern of this displacement, matrix  demonstrates a quasi-banded structure, with a bandwidth of (2-1) as illustrated in Fig. 3. Given that  is significantly smaller than MN, matrix  also possesses sparsity characteristics in the context of a typical wireless channel.Considering the necessity of computing  −1 for the practical realization of an LMMSE receiver, the authors put forth a method for achieving a computationally efficient LU decomposition of .
Using the partition as shown in the above equation, the following equalities hold • As the matrix T is a banded matrix, its LU decomposition can be calculated using the low-complexity algorithm detailed in [20].
• The computation of L −1 B can be achieved through a forward substitution algorithm tailored for lower triangular banded matrices, detailed in Algorithm 1 as stated in [19].• The calculation of equation (28) involves two sequential steps.Since U † forms a lower triangular banded matrix, the initial step entails the computation of V † = (U † ) −1 S † using Algorithm 1 as depicted in [19].Subsequently, V can be readily obtained by computing the conjugate transpose of V † .• A direct computation of equation (29) necessitates  ( 2  ) calculations.Given the lower triangular nature of matrix F and the upper triangular nature of matrix G, both F and G can be computed through the LU decomposition of equation ( 29).The pivotal Gaussian elimination algorithm [21] can be applied to attain the LU decomposition of equation ( 31) without introducing significant complexity overhead.Notably, the diagonal elements of matrix L and F are all set to unity, implying that the diagonal elements of matrix L are also unity.

C. Computation of r
Since matrix L is a quasi-banded lower triangular matrix, the computation of r (1) = L −1 r can be efficiently achieved using a forward substitution approach, as elaborated in Algorithm 2 as shown in [19].Subsequently, the calculation of r (2) = U −1 r (1) can be carried out employing Algorithm 3 following [19]. (2)  circular shift (31) For the computation of r  , the vector r (2) is initially subjected to a circular shift by a delay of −  and subsequently multiplied by h diag(Δ −  ) using element-wise multiplication for each path .The resultant vectors from this process are eventually summed to yield r  .
Instead of directly calculating x as A † r  , a preliminary step involves reshaping r  into a matrix R of size  ×, following which we perform detection of the symbol as shown below.
This operation can be implemented by using  number of -point FFT operations.
Considering low complexity LMMSE equalization technique it takes (   −2+2) 2 2 2 log 2  complex multiplications.On the other hand, direct implementation would involve   2 log 2 + 8 6 ( ) 3 + 2( ) 2 computations.Considering the number of subcarriers  which may be 2,4,8, . . ., and  = 4,  = 3, we compared the computational (27, 28) using Algorithm 1 +  Algorithm 2 and 3 complexity of complex multiplications performed with and without LU Decomposition (direct), as shown in the Fig. 4.After verifying the OTFS transmitter steps and analyzing the complexity of OTFS receiver, one can now design VLSI architecture of the OTFS Transceiver for multi-path fading channel pertaining to the SISO case.Here, we assume that the channel characteristics are known.In other words, one performs an ideal channel estimation at the receiver.
V. ARCHITECTURE DESIGN FOR OTFS TRANSMITTER Fig. 5 shows the architectural design of the OTFS transmitter design based on the block diagram of the OTFS Modulation and Demodulation scheme, which has been portrayed in Fig. 1.This architecture is designed by following the equation shown in ( 8)- (10).
Here, for the coding part, we have employed the HDL coder (Verilog code, Vivado tool box).We have considered the values of parameters like , ,   and the number of pipeline stages for the CORDIC program.Here we need to compute the values of () and () by using Area and Speed Efficient CORDIC algorithms [22].There are three types of CORDIC operations which are known as CIRCULAR, HYPERBOLIC, and LINEAR.Again in each and every operation, there are two modes of operations, namely rotation and vectoring modes, including Circular Rotation, Circular Vectoring, Hyperbolic The CORDIC algorithms for this mode are derived from the general rotation transform.
Where;  0 ,  0 are initial values and   ,   are final values. Here =0.607 and  0 =0. 0 is the user-given angle.Here,  0 = .  is the pre-defined angle and   =  −1 2 − . i gives the sign of  i .Within the CORDIC methodology, the process of performing a plane rotation by an angle   involves breaking down the desired angle into multiple elementary angles.Subsequently, rotations are executed for each of these elementary angles, resulting in the overall rotation.In the context of Fig. 6, the variable  0 corresponds to .At the final stage of the process, the quantities ( 0 ) and ( 0 ) correspond to () and (), respectively.A set of  values are stored in the block RAM memory to initiate the procedure.These theta values are 8 bits in width, with a specific resolution pattern.The allocation includes 1 bit for the sign, the following bit for the integer part, and the remaining 6 bits for the fractional portion.Sequentially, these theta values are transmitted to the CORDIC stages, aligning with each positive edge of the clock cycle within the code.Due to the utilization of a 16-stage pipelined CORDIC module, the initial output from the CORDIC block becomes available after 16 clock cycles.The outcome is saved as   , while the 2's complement of () is stored as    .Both  and  are calculated with 8 bits for each value; thus collectively, the width size for each component of   and    is 16 bits.Once these values are entirely accessible, subsequent calculations are executed.
Given that we have designated the symbols for the OTFS transmitter as normalized quadrature phase shift keying (QPSK) modulated symbols, we can conveniently preserve these symbols within a Block RAM (BRAM) memory designated as "x" for subsequent computations.This is essential as we intend to calculate "s", which can be expressed as s = vec(S) = F   ⊗ G   x.In this equation, G   is readily recognized as the  ×  identity matrix.Notably, we have chosen  =  = 4 in this specific context.On completing the VLSI architecture design of the OTFS transmitter, one can now take up the design of architecture of the OTFS receiver.

VI. ARCHITECTURE DESIGN FOR OTFS RECEIVER
Prior to the design, we have to calculate the values of the parameters   ,   and ℎ  .
If we consider the transmitter and receiver in the static scenario, then   = 0 so,   = 0; so,   is made of all zeros.
The above calculations are generalized one.Thus, one can calculate the necessary values of   ,   and ℎ  based on the value of , , Δ  and   .

B. OTFS receiver architecture
Fig. 7 shows the proposed receiver architecture.We have followed the equations given in ( 20)-(32) to derive this receiver architecture.As we have already assumed complete knowledge of the channel characteristics, so we can calculate the value of ℎ  ,   and   at first by considering the Extended Vehicular A model (EVA) from ETSI TS 136 104 V14.3.0 (2017-04) page number 186.Subsequently, we can find out the Ψ according to the equation mentioned in (25).
Finally, we can do the LU decomposition as mentioned in the above section and store the values of L and U in the BRAM by using the address of required depth.All the values stored in the BRAM are of width 24 bits, out of which 12 bits for real and 12 bits for fractional part.All the stored numbers are in fixed point notation with 12 bits (1 for sign, 4 for integer part and 7 for fractional part).
In each clock cycle, we are retrieving one value of  (1) and performing the operation  (1) =  −1  as per the algorithm 2 mentioned in [19].Then in a similar manner, we take the values of  from BRAM and perform the operation  (2) =  −1  1 following the algorithm 3 stated in [19].The multipliers shown in the architecture are Power and areaefficient approximate multipliers following the paper [24].
Once the value of  2 becomes available at the output of the second multiplier, we found out the   as given in the formula given (31).Here, we have used the Low-Latency Memory-Based Hard Core IP for FFT [25] calculation ( numbers of -point FFT) to find out x.However, one can use parallel-processing memory-based fast Fourier transform (FFT) processors for high-throughput applications [26].Next, we go for synthesis and implementation of the proposed architecture and present the utilization and timing analysis reports, and finally, we perform the functionality testing of the proposed receiver architecture.

VII. SYNTHESIS AND IMPLEMENTATION
Once the Verilog code for the OTFS transmitter and the receiver has been formulated within a fading channel scenario for the Single Input Single Output (SISO) configuration, the next step is the synthesis and implementation using the Vivado tool.The target platform for this endeavor is the ZC706 BASE Board, which features the XC7Z045-2FFG900 device from the Zynq-7000 family.Specifically, we utilized the ffg900 package and set the speed grade to -2 as per the ZC706 Evaluation Kit specifications.The synthesis and implementation phases were executed successfully.For comprehensive insights, Tables III and IV provide an in-depth breakdown of the Synthesis and Implementation report of the transmitter and receiver, respectively.
We have implemented both algorithms in Verilog and showed the implemented results/resource utilization report and the flow chart of the implementation process of the proposed architectures.

VIII. TIMING ANALYSIS
After successfully synthesizing the proposed architecture, we went for the implementation, where we took a clock period of 20 ns to implement the designed HDL code with an Input and Output Delay of 2 ns.The timing constraints are perfectly met, as shown in Fig. 8.The detailed timing summary report is shown in Table V.
Here, we checked the timing for Max Delay Paths and the Min Delay Paths of the design.Max Delay Paths represent the longest paths in our design from one flip-flop (register) to another.In other words, they are the paths that take the maximum time to propagate a signal from the source flip-flop to the destination flip-flop.Max Delay paths are critical for ensuring that the clock frequency at which our design can operate is met.Similarly, Min Delay paths, on the other hand, represent the shortest paths in our design.In the case of Max Delay Paths, the timing summary report table shows that the Slack(Required time -Arrival time) for Max Delay Paths is 9.160 ns, which is a positive number.The Data Path Delay is 3.955 ns; out of this time, the delay due to logic block is 2.654 ns (67.108%), and the delay due to routing is 1.301 ns (32.898%).For our design, the Clock Path Skew is -4.850 ns, represented by the formula DCD-SCD+CPR (vide Table V).One point is to observe that though the Clock Path Skew is negative, it will not affect our design because we have Slack value to be much more than Skew value.The Min Delay paths are also shown in Table V, where we can observe that the slack in the case of Min Delay paths is 0.159 ns (Positive number).The Data Path Delay takes around 0.246 ns, out of which the delay due to logic block is 0.171 ns (69.631%), and the delay due to routing is 0.075 ns (30.369%).For our design, the Clock Path Skew is 0 ns, represented by the formula DCD-SCD+CPR.The values of DCD, SCD, and CPR are shown in Table V.All the outputs are complex numbers of 24 bits; out of those, 12 bits from MSB are real parts and 12 bits from LSB are imaginary parts of the resultant complex number.All the numbers we have taken in signed fixed point notation with one bit from MSB are for sign, the next 4 bits for the integer, and the last 7 bits for the fractional part.Maximum Data Path Delay is 3.955 ns, so the minimum operating clock frequency of the FPGA board is 0.2528 GHz.

IX. FUNCTIONALITY TESTING
Here, we do the functionality test of the proposed architecture.We show the timing diagram of both the transmitter and receiver and compare their output.We compared the output of both the transmitter and receiver and observed the performance.To validate our output, we compared the HDLcoded output with the MATLAB platform output.Fig. 9 shows the original constellation points and the transmitter outputs.Fig. 10 shows the timing diagram of the receiver outputs.Both figures show that the original constellation and the detected symbols are almost identical, indicating that the transmitted symbols are correctly detected at the receiver.We can see that the detected symbols are not precisely the transmitted original symbols but are almost equal.This mismatch can be corrected by increasing the resolution of the width of the symbol which we considered; by increasing the width of the fractional part of the number, we can increase the accuracy of detection.Practically, the size of the delay-Doppler grid size is 512 × 128, which is of the size 2 16 .However, as this research is at its preliminary stage, we take the delay-Doppler grid size to be 4 × 4, so that one frame contains 16 symbols, which is shown in the OTFS Transmitter and Receiver output.In the proposed design, we also checked the intermediate output as well i.e., the output of every single step at the receiver (outputs of Algorithms 1, 2, 3 and outputs of equations ( 31) and ( 32)).The number of symbols per frame can be increased by adapting the optimized architecture design of the OTFS Transmitter and Receiver for the multi-path fading channel.Considering the optimized Transceiver, the resource utilization and power consumption will be reduced drastically, and we can send a frame with more and more symbols.X. CONCLUSION This present work involves deriving the input-output relationship of the OTFS signal within a delay-Doppler channel, using the delay-time response of the channel as a foundation.The expression for the received OTFS signal, represented in a matrix format, has been established.A detailed investigation into Bit Error Rate performance has been conducted, encompassing OTFS and OFDM modulation schemes.Given OTFS's inherent capability to harness full diversity, the outcomes indicate its superiority over OFDM across diverse channel scenarios.Here, we have proposed a novel and low complexity VLSI architecture for the OTFS transmitter and the receiver by using the LU decomposition technique.We have ensured the accuracy of the Verilog code output by cross-referencing it with MATLAB simulations using identical parameter values, yielding complete agreement between the two, by testing the functionality test of architecture and timing analysis.With the acceptable performance of the proposed transceiver as established above, it is deemed to be applicable to wireless mobile communication.

Fig. 4 :
Fig. 4: Computation complexity comparison of the direct and LU decomposition.

TABLE I :
Simulation Parameters of OTFS and OFDM

TABLE II :
Complexity Computation of Various Operations of LU decomposition

TABLE III :
Table of Synthesized and Implementation Report of Transmitter

TABLE IV :
Table of Synthesized and Implementation Report of Receiver

TABLE V :
Table of Timing Analysis Report