Performance Analysis of Multi-User OTFS, OTSM, and Single Carrier in Uplink

In this paper, we develop a multiple access (MA) mechanism where users transmit upsampled and circularly shifted orthogonal frequency division multiplexing (OFDM) signals in the uplink. These signals pass through MA doubly-dispersive channel and get combined at the base station (BS). We show that the composite signal received at the BS forms an equivalent single-user orthogonal time frequency space (OTFS) system. We show that such low-complexity transmission of OFDM symbols in the uplink from low-capability devices yields diversity gain as the OTFS. We extend this MA scheme for orthogonal time sequency multiplexing (OTSM) and block based single carrier (block SC) transmissions, where the composite received signal at the BS forms an equivalent single-user OTSM or block SC received signal depending on the transmission. We also develop successive interference cancellation (SIC) and turbo decoding principles-based receivers for multi-user OTFS, OTSM, and block SC transmissions, and a frequency domain SIC receiver for OFDM. We analyze their complexity with both analytical and simulation methods. Further, we compare the multi-user uncoded and coded performance of OTFS, OTSM, and block SC with multi-user OFDM. We also evaluate their performance under the effects of nonlinear high-power amplifiers in multi-user scenarios.


I. INTRODUCTION
T HE dawn of 6G excites several use cases, broadly catego- rized into scenarios of eMBB+, mMTC+, and URLLC+ [1], [2], [3].The eMBB+ use cases, such as holographic telepresence, require high data rates up to a few Tbps [4].The mMTC+ scenario describes the use cases unmanned aerial vehicles (UAVs), smart cities, healthcare, and smart industries, which require high mobility and massive connectivity with support for reduced capability IoT (Internet of Things) devices [5].The URLLC+ covers remote robotic surgeries and autonomous driving, which require high reliability and low latency in the order of a few fractions of a millisecond [6].For 6G to support these diversified scenarios, investigation of waveforms for air interface with an efficient multiple access scheme which support various mobility conditions, higher reliability and connectivity to a large number of low power and low capability devices is critical.
The waveforms from 2G to 5G have evolved from Gaussian minimum shift keying (GMSK) in 2G, to spread spectrumbased wideband code division multiple access (WCDMA) in 3G, up to the popular orthogonal frequency division multiplexing (OFDM) in 4G.In 5G, a variant of OFDM is used, which has flexible subcarrier bandwidth [7] to handle high Doppler and phase noise, while other contending waveforms such as FBMC, UFMC, GFDM [8] were explored in the past decade.In recent years, a new waveform, namely orthogonal time frequency space (OTFS) [9] has been proposed, which has attracted researchers' attention across the world.
In OTFS, information bearing quadrature amplitude modulation (QAM) symbols are placed in a two-dimensional (2D) delay-Doppler (de-Do) domain (grid) against placing these in the popular time-frequency (TF) domain as is done in OFDM.OTFS is known to outperform OFDM by several dB [10], [11], [12] due to its resilience to Doppler effects and its ability to extract the diversity gain.This promises to meet some of the requirements of 6G such as high mobility and reliability.However, to support various use cases of 6G, where user devices range from having high capability to simple lowcost devices, which are massive in number and characterized by low-rate, low-energy, and low peak-to-average power ratio (PAPR) requirements, efficient multiple access (MA) schemes for OTFS need to be developed [13], which is the goal of this work.
The OTFS scheme presented in [14] multiplexes users' data in the de-Do domain using delay division multiplexing (deDM) and Doppler division multiplexing (DoDM) with guard delay bins and guard Doppler bins.In [15], an interleaved delay-Doppler multiple access (IDDMA) scheme is proposed, where users are allocated interleaved de-Do resource blocks with no guard bins.Interleaved time-frequency multiple access (ITFMA) is proposed in [16], where users are allocated interleaved TF resource blocks.Angle-de-Do domain resource allocation is considered in [17] for massive MIMO scenarios.Recently, time-reversal precoding-based MA schemes [18], [19] have also been developed to improve spectral efficiency and support high data rates, however the schemes presented are applicable for downlink and use channel state information (CSI) at transmitter.
In OTFS, the discrete symplectic Fourier transform (DSFT) is used to transform QAM symbols from the de-Do domain to the TF domain.The unitary transform DSFT acts as orthogonal precoding (OP) for QAM symbols before being placed in the TF domain.In [20], the DSFT is replaced with the Walsh-Hadamard Transform (WHT), while in [21], sparse WHT is used in place of DSFT, both of which have similar error performance as that of OTFS.
The OFDM operation is performed on the TF symbols obtained after the DSFT to generate the time domain OTFS signal.This two-step procedure for converting de-Do symbols to the time domain is simplified by using an IDFT along the Doppler dimension of QAM symbols in the de-Do domain, followed by vectorization of the resultant 2D symbols [12], [22] to obtain the time domain OTFS signal.As the IDFT is a unitary transform, [23] explored replacing IDFT with WHT, and the corresponding scheme is named orthogonal time sequency multiplexing (OTSM), which is reported to perform as OTFS in terms of error probability.In a similar way, the identity matrix may be used in place of IDFT / WHT, and the resultant waveform becomes block-based single carrier (block SC), which is compared with OTFS in [21], [23], [24], [25], and [26].
The peak-to-average power ratio (PAPR) is an important characteristic of waveforms, as it affects the bit error rate (BER), energy efficiency, and link budget in the presence of non-linear effects of high-power amplifiers (HPAs).In the patent [27], a method to reduce the PAPR of an MA scheme with DoDM-based resource allocation is described.The works [28] and [29] analyze the PAPR for OTFS, while [30] studies the effects of non-linear HPAs on its performance.
Receiver signal processing is an integral part of waveform/air-interface design, and hence we analyse receiver performance for MA schemes for OTFS, OTSM, and block SC.The work [31] uses MMSE for OTFS uplink multi-user reception, while [16] employs MMSE as well as maximum likelihood detection (known to be highly complex) based receivers.
The works on MA and multi-user receivers for OTFS and related schemes presented above have some limitations, which are discussed below.The use of guard bins in [14] for OTFS MA results in a loss of spectral efficiency.The MA schemes in [15], [16], [17], and [27] require users to generate an entire OTFS frame for transmission, irrespective of the allocation size of resources, leading to unnecessary high complexity for IoT devices.The MA scheme in [17] requires a larger number of antennas at BS than the number of users, and the scheme is relevant only for massive MIMO scenarios.The OTFS-related waveforms, namely WHT and sparse WHT based OP schemes in [20], [21], and OTSM in [23], are compared with OTFS in only single-user scenarios, while the performance comparison of MA schemes is not available in the literature to the best of the authors' knowledge.Although [21], [23], [24], [25], [26] show the performance of OTFS and block SC, the analysis is limited to single-user scenarios.The PAPR analysis of OTFS and the performance of OTFS under HPA non-linear effects presented in [28], [29], and [30] are also limited to single-user scenarios.The multi-user receivers for OTFS in [16] and [31] were designed by considering ideal transmit and receive pulse shapes, rendering them unrealizable in practical situations.These works present only the uncoded error performance where degradation in the probability of error is reported with an increasing number of users.To the best of the authors' knowledge, the coded performance for OTFS in multi-user scenarios is not yet available in the literature.The works [20], [21], [24], [32], [33], and [34] describe low complexity OTFS receivers for practical pulse shapes; however, they are designed for processing signals from single-user.
Motivated by the potential of OTFS and similar OP based schemes to be contenders for air interface of 6G while considering the limitations of the available literature, we investigate MA schemes for OTFS and related waveforms, which are spectrally efficient and equally applicable to low-capability IoT and regular devices.In the mechanisms discussed here, one OTFS frame is split into M discrete-time OFDM symbols [35] and a group of such symbols is allocated to a user in the uplink.Each OFDM symbol carries N QAM symbols.This scheme, in contrast, does not require any guard bins as used in [14].The transmission complexity of a user is limited to the number of OFDM symbols that the user transmits, which is similar to the transmission complexity of 4G, WiFi, and existing 5G systems.We further analyze an MA scheme for OTSM and block SC.We develop successive interference cancellation (SIC) and turbo decoding principle based multi-user receiver structures for practical rectangular pulseshapes, which are used for OTFS, OTSM and block SC.We developed a mathematical model for multi-user OFDM and a frequency domain SIC receiver for uplink reception, and compared its error performance with the three waveforms.Finally, we compare OTFS, OTSM, and block SC in terms of PAPR from an uplink perspective and analyze their multi-user performance in presence of HPA non-linearities using solid-state power amplifier (SSPA) model [36].

A. Paper Plan
In Section II-A, we introduce the system model, including a brief introduction to OTFS.We relook at OTFS signal generation to derive an MA scheme in II-B.In Section III-A, we describe an MA scheme for OTFS and derive an expression for the multi-user uplink received signal in III-B.MA schemes for OTSM and block SC are developed in III-C.A description of MA for OFDM is given in III-D.In Section IV, we develop SIC and turbo iterative receivers for multi-user reception of OTFS, OTSM, and block SC, and a frequency domain SIC receiver for OFDM.We then compare their complexity using analytical expressions.In Section V, we discuss the simulation results, including a comparison of receiver performance and complexity, as well as the PAPR and HPA non-linear effects on the performance of OTFS, OTSM, and block SC.Finally, we provide our conclusions in Section VI.

B. Notations
Throughout the paper, we use the following notations.We consider x and x as vectors.X and x as matrices and scalars, respectively.N[a b] is the set of natural numbers in the range Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
between a and b.C M ×N denotes the set of all matrices of dimension M × N with complex numbers.x(i, j) represents the element in the i th row and j th column of the matrix X. x(t) (x[n]) represents a continuous (discrete) time signal as a function of t (n).I N , F N , and W N denote an Identity, normalized discrete Fourier transform (DFT), and normalized Walsh-Hadamard matrices of order N , respectively.A zero vector with length N is represented by 0 N .The variables xy denote multiplication between x and y. () T and () H represent the transpose and conjugate transpose operations, respectively.vec (X) vectorizes X by stacking all the columns serially, while vec −1 M ×N (x) produces a matrix of size M ×N from the vector x of length M N , which is the inverse of vectorization.If x ∈ C N ×1 , diag{x} is an N × N diagonal matrix.The (−) N represents the modulo N operation, which results in an integer between 0 and N − 1.

A. Orthogonal Time Frequency Space
The source bits are channel encoded using low-density parity-check (LDPC) codes.The coded bits are mapped to QAM symbols, which are placed in the de-Do domain over a grid (l∆τ, k∆ν) : l = 0, 1, . . ., M − 1, k = 0, 1, . . ., N − 1 of size M × N for transmission in each frame.The grid parameters, M and N , represent the number of delays and Doppler bins, and ∆τ and ∆ν represent the delay and Doppler resolutions, respectively.The bandwidth B and the frame duration T f are given as where ∆f = 1/T .These QAM symbols denoted as x(l, k) are arranged in a matrix form as expressed in (2), according to their placement over the de-Do grid, where row indices represents delays and column indices for Dopplers.
The generation of the time domain signal corresponding to the symbols in de-Do domain, which is regarded as OTFS modulation, given in discrete-time [37] as where s ∈ C M N ×1 is the transmission vector whose elements are the samples of the OTFS signal s[n] for n = 0, 1, . . ., M N − 1 and G is an M × M diagonal matrix diag{[g(0), g(∆τ ), . . ., g((M −1)∆τ )] T } for a transmit pulse shape g(t) of duration T = M ∆τ .For practical rectangular pulse shapes with unit amplitude, G reduces to an identity matrix I M .Thus, we can write (3) as where S is the delay-time matrix.CP included OTFS signal is transmitted as an analog waveform s(t) using a pulseshape c(t) as, where l cp is the CP length.Then the received time domain signal for a doubly-dispersive channel with an input delay spread function g(t, τ ) from [38] is where v(t) is the continuous time white Gaussian noise with power spectral density (PSD) of N 0 .For a finite P number of propagation paths and one Doppler frequency per path, the g(t, τ ) as in [39], becomes where a p , τ p , and ν p are the complex gain, delay, and Doppler frequency associated with the p th path, respectively.At the receiver, the output of the matched filter with impulse response c * (−t) is sampled at intervals of ∆τ .Thus, we obtain where g[n, l] is given in (91) and w[n] is a sample of filtered noise with variance σ 2 w = N 0 .The proof of ( 9) is given in the appendix.After discarding the CP samples, the received signal y = {y[n]} M N −1 n=0 can be expressed as where , and where H ∈ C M N ×M N is the single-user time domain channel matrix, with Π as a cyclic (forward) permutation matrix of order M N [37], and Now, we explore OTFS signal so as to identify the possible MA scheme that is inherently present in the signal.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
We observe that each l th term of ( 15) is vec e l x T l , which represents an upsampled and circularly shifted OFDM symbol x l of length M N samples.Therefore, one may consider the OTFS signal as an orthogonal aggregation of non-zero samples of M upsampled OFDM symbols, each of which is circularly shifted with orthogonal delays from l ∈ [0, M − 1].
Since in WiFi, 4G, and 5G, OFDM is already used as the signaling mechanism, we can consider an MA scheme for OTFS, where each user generates such vec e l x T l to form a component of OTFS signal, thereby the transition in technology is minimal.Unlike [15], [16], [17], and [27], a user does not need to generate the complete OTFS signal for transmission.Furthermore, since OFDM technology is now mature and is used in IoT devices, including 5G supporting NB IoT [5], we can extend a similar philosophy to OTFS.
Each part of the OTFS signal will go through different channels corresponding to the link between the users and the base station (BS), which is in contrast to (10) where the entire OTFS signal (all parts) go through the same channel.It remains to see whether received signals combined at BS to form OTFS signal, which is analyzed in Section III.

III. MULTIPLE ACCESS SCHEMES
In the following subsections, we describe an MA scheme for OTFS in III-A and derive an expression for the combined received signal at the BS in III-B.We further develop an MA scheme for OTSM and block SC in III-C, based on the scheme described for OTFS.Finally, we provide a signal model for an MA scheme with OFDM in III-D.

A. OTFS
The circular delays l = 0, 1, . . ., M − 1, which are row indices of X in (2), are partitioned into J number of sets and allocated to J users.The set of delay indices allocated for the j th user, for j = 1, 2, . . ., J, is denoted as Ω j , and its cardinality as |Ω j |.The following othogonality condition is ensured while allocating delay sets to users.
Each of j th user considers a block of QAM symbols x j ∈ C |Ωj |N ×1 for transmission in a frame.The x j is divided into |Ω j | number of N length vectors, and each of them is assigned with a row index l ∈ Ω j using an operator Ψ j as where x j,l = [x j (l, 0), x j (l, 1), . . ., x j (l, N − 1)] T are the QAM symbols of the j th user associated with the index l.Then the j th user de-Do plane symbol matrix X j is constructed as Defining, which is a de-Do plane symbol matrix, formed by aggregating all symbol matrices of J users.Since the orthogonality condition in ( 16) is ensured for allocation of delays to the users, X j can be obtained from X M U as where D j is an M × M diagonal matrix given as The j th user's delay-time matrix is obtained using (5) as, The j th user's transmission vector is given as After adding CP, the samples of s j are transmitted as an analog waveform s j (t), following (6).

B. Multi-User Reception at the BS
The signal received from the J users at BS in discrete-time, after discarding the CP, assuming ideal synchronization, is where H j is the time domain channel matrix for the j th user, as given in (11).This is explained with the help of Fig. 1 for a 4-user scenario.In the figure, user U1, U2, U3, and U4, with channel matrices H 1 , H 2 , H 3 , and H 4 , transmits s 1 , s 2 , s 3 , and s 4 , respectively.Using ( 22) and ( 23), ( 24) can be written as, Using (20), Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Using the identity, vec (BA) = (I ⊗ B) vec (A), we get, Defining, which is a combination of J number of channel matrices.H ef f can be referred to as an effective time domain channel matrix for the multiple access channel and it can also be expressed alternatively with a set of M N column vectors as with, where, h j,n is the n th column vector of H j .Further, we define, The last two terms of ( 31) are represented by ( 19) and ( 23).
The vector s M U describes an OTFS signal for the symbol matrix X M U , which is an aggregation of individual users' transmission vectors.Using ( 27), (28), and ( 31), the signal received in the uplink can be expressed as It shows that the upsampled and circularly shifted OFDM symbols from different users after going through the multiple access channel H 1 , . . ., H J when combined at the receiver forms an equivalent OTFS system, similar to (10).The eqs. ( 28) to (32) are illustrated for a 2-user scenario with M = 2, N = 2 in Fig. 2 when noise is ignored.Here, each user transmits one OFDM symbol of two samples with upsampling and circular shifts where We see that two column vectors from H 1 and two from H 2 forms the H ef f .As the H ef f contains the column vectors from multiple users' channel matrices, which are drawn from different impulse responses, we hypothesize the H ef f is more diversified than a single user channel matrix H j .This we verify in the result section by comparing single user and multi-user error performance.

C. MA Scheme for OTFS-Related Waveforms
We extend the described MA scheme for OTFS in III-A to OTSM [23] and block SC.These are closely related to OTFS, and have a unified expression for signal generation given below.
where, P is an N × N matrix given for each waveform in Table I.Accordingly, the QAM symbols in X which are assumed to be placed in the de-Do domain for OTFS, are considered to be in the delay-sequency domain for OTSM [23], and the delay-time domain for the block SC.We refer to the operation in (33) as waveform modulation and (34) below as waveform demodulation.
Therefore, we can replace the IDFT (F H N ) matrix in (22) with P to express the j th user transmission vector for all three waveforms.Similarly, P replaces the F H N in the expression for s M U in (31).The expression for the effective channel matrix H ef f in (28) remains identical for all three waveforms.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. MA Scheme for OFDM
An M subcarrier CP-OFDM scheme is considered where CP is added for each OFDM symbol.Each j th user is allocated with an orthogonal set of subcarriers as Ωj , similar to the allocation of delay sets for OTFS (OTSM, and block SC).The time-domain received signal y of dm ∈ C M ×1 for all J users at the BS, corresponding to one OFDM symbol after discarding CP, is given as where Hj ∈ C M ×M is the j th user time domain channel matrix, follows (11), for one OFDM symbol of length M samples, d j ∈ C M ×1 is the j th user symbol vector, and w of dm ∈ C M ×1 is the noise vector.After the DFT operation at the receiver y of dm = F M y of dm is expressed as where Λ j = F M Hj F H M is a frequency domain channel matrix, Dj is the diagonal matrix for Ωj as given in (21) for OTFS (OTSM or block SC), w of dm = F M w of dm , and Defining Rewriting (36) as In this work, a frequency domain SIC receiver described in IV-F is used to process y of dm , for detecting each j th user's symbol vector d j .

IV. MULTI-USER ITERATIVE RECEIVERS
We describe two receiver architectures for OTFS, OTSM, and block SC: (1) SIC-based and (2) turbo decoding-based.Before the SIC or turbo iterations begin, both receivers require a common signal processing stage.We use Fig. 3 to describe this signal processing stage.In IV-F, we describe frequency domain SIC receiver for OFDM.It is assumed here that the receiver has perfect knowledge of the channel.The H ef f can be estimated at the BS by placing user-specific pilots either in the de-Do domain or time domain, as described in [40] and [11], respectively, during transmission.

A. MMSE Equalization
The received signal y in ( 32) is first equalized using where G is the MMSE equalization matrix given as

B. Gauss-Seidel (GS) Iterative Detector
The GS iterative detector requires the following computations and where L, D, and U are strictly lower triangular, diagonal, and strictly upper triangular components of the matrix H H ef f H ef f , respectively.During each u th GS iteration, where u ∈ N[1 U ], the following steps are taken to obtain the estimate of s M U (denoted as š(u) M U ).It begins with performing waveform demodulation on the output of the previous iteration as We set š(0) M U = ȳ, which is the MMSE equalized sequence given in (40).The symbol estimates in X(u) M U are mapped to the nearest constellation points using hard decisions and the resultant QAM symbols are waveform modulated as where D is an operator for the hard decisions.An L2 norm for the difference between y and H ef f š(u) M U is calculated as If ∆e (u) < ∆e (u−1) , š(u) M U is obtained as For ∆e (u) ≥ ∆e (u−1) , the iterations are stopped and we continue with š(u−1) as the GS detector output ŝMU .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.For single-user scenarios, zero padding (ZP) in the transmission is suggested for OTFS and OTSM to achieve GS iterative detection with low complexity [23].In ZP, the last l max rows of the matrix X are assigned zero row vectors which ensure zeros in the last l max positions of every subblock of M samples in the OTFS signal of (15).This helps in avoiding inter subblock interference at the receiver.To obtain similar advantage with the MA scheme described for OTFS (OTSM or block SC), the users are forbidden from using the last l max delays of [0 M − 1] for transmission as shown in Fig. 4. We consider the ZP based transmission in this work.Thus, from (28), we get the effective channel matrix H ef f as a lower triangular with block diagonal structure as where, each H ef f,p is an M × M lower triangular matrix, for p = 0, 1, . . ., N − 1.Using (48), the received vector y in (32) can be divided into N number of M length vectors as y T = [y T sub,0 , y T sub,1 , . . ., y T sub,N −1 ] where, with y sub,p ∈ C M ×1 , s sub,p ∈ C M ×1 and w sub,p ∈ C M ×1 are the p th subblock of y, s M U , and w, respectively.After the steps ( 44), (45), and verifying the condition ∆e (q) < ∆e (q−1) , the estimate of each s sub,p in u th GS iteration is obtained as, where, L p and D p are strictly lower triangular and diagonal components of the matrix H H ef f,p H ef f,p , respectively, š(u) sub,p ∈ C M ×1 is the p th subblock of the š(u) M U of (45), and z p = H H ef f,p y sub,p .The estimate of s M U for ZP based transmissions in each u th GS iteration is given as

C. User-Wise Demodulation and Channel Decoding
From ( 20), (22), and (31), we can decouple the estimate of the each j th user transmission vector ŝj from ŝMU as ŝj = (I N ⊗ D j ) ŝMU . (52) The j th user's estimated delay-time matrix becomes The estimate for the j th user's symbol matrix is obtained with the waveform demodulation as The estimates of the QAM symbols associated with delay l are The estimates of all the QAM symbols of the j th user are gathered into xj as The soft demodulation provides the bit-level log-likelihood ratio (LLR) values for each element xj (γ) of xj as where b α j,γ is the LLR value for the α th bit of R bits b R−1 j,γ b R−2 j,γ . . .b 0 j,γ .S 1 α and S 0 α are sets of all constellation points for which the α th bit is 1 and 0, respectively.The σ 2 γ is the noise power associated with xj (γ), which can be set to 1, as all the symbol estimates have equal noise power.
The bit-level LLR values are stored in a vector b j of length of |Ω j |N R. If coded bits are interleaved before being mapped to QAM symbols in the transmitter, then they are de-interleaved at the receiver.In the case of the turbo receiver, the coded bits are interleaved.Therefore, the LLR values are de-interleaved before LDPC decoding to obtain bj .This is shown with a dotted block in Fig. 3. On the other hand, in the case of SIC receivers, the interleaving of coded bits is not performed during transmission.As a result, the de-interleaver block is bypassed for SIC, whereby bj = b j .Each j th user's LLR values bj are reshaped into a C l × L j matrix Cj = [c j,0 cj,1 . . .cj,Lj−1 ], where C l is the code block (CB) length and L j is the number of CBs transmitted by the j th user.The total number of CBs from all users is given by The LDPC decoder decodes each column of Cj , and the resulting output CBs are stored as columns in another C l × L j matrix as where c j,η is the η th CB of j th user obtained form the LDPC decoder, for η = 0, 1, . . ., L j − 1.The indices of the correctly decoded code blocks (CCBs) and the wrongly decoded code blocks (WCBs) are noted in a L j × 1 vector a j as The total number of CCBs for all users is given as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and the total number of WCBs are In subsections IV-D and IV-E, we describe the multi-user SIC and multi-user turbo iterative receivers, respectively.These receivers utilize the LDPC output C j to iteratively detect the CBs which were decoded in error for each user.

D. Multi-User SIC Iterative Receiver
In each q th SIC iteration, where q ∈ N[1 Q], the bits in the CCBs obtained from the LDPC decoder output C (q−1) j of the previous iteration are mapped to QAM symbols, while bits in the WCBs are mapped to zero symbols for each j th user as where M is an operator for QAM modulation, which converts bits in given CB c (q) j,η to a vector of QAM symbols u which are obtained from the first stage of signal processing given in ( 59) and (60), respectively.Vectorizing the matrix The x(q) j is a reconstructed vector for the block of QAM symbols x j , transmitted by the j th user.Using (17), we get For cancellation of interference pattern generated by the known symbols in time domain, the following operation is performed for OTFS and OTSM.
x(q) j,l = 0 N if P ̸ = I N and ℵ(x where ℵ is an operator to find the number of non-zero elements in the given vector.Using ( 18), (22), and ( 23), the j th user's transmission vector is reconstructed as The combined reconstructed vector for all J users is given as Let us denote A (q) as the set of indices (positions) of the non-zero elements in s(q) M U , which are expected to be the same as the corresponding elements in s M U at the identical positions, and B (q) be the set of indices of the zero elements in s(q) M U as and The size of set A (q) (B (q) ) increases (decreases) with the total number of CCBs obtained by the receiver.The interference from previous iterations is cancelled by Substituting (32) in (71), and using (69) we get Accordingly, for channel equalization of y (q) , the channel matrix is updated following [41] as with The MMSE equalization matrix with the updated channel matrix ef f is given by From (40), Using equation (52), each ŝ(q) j is decoupled from ŝ(q) M U .Userwise waveform demodulation is then performed following eqs.( 53) to (56).The LLR values for the symbol estimates are computed using equation (57), and they are reshaped into Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the matrix Cj .Since each column of Cj represents a CB, the channel decoder selectively decodes only those columns or CBs which were incorrectly decoded in the previous iteration.The columns of C (q) j , which represent the LDPC decoder output in the q th iteration, are obtained as otherwise. (77) The a (q−1) j [η] is then updated to a (q) j [η] following equation (60), which is used in the next iteration.Similarly, we update the total number of CCBs and WCBs for all users, denoted by L (q) c and L (q) w , respectively, using equations (61) and (62).As iterations progress, L (q) c increases with each iteration, and the size of the set A (q) (B (q) ) will increase (decrease).The channel matrix for equalization in (73) becomes more sparse with each SIC iteration, and the complexity of the MMSE operation, using sparse matrix methods, progressively decreases.
The SIC iterations will continue until all the CBs from all users are turned correct, or a maximum number of iterations is reached, or no new CBs are decoded correctly in the present iteration.The sequence of steps followed by the SIC receiver is given in Fig. 5.

E. Multi-User Turbo Receiver Using GS Detector
The turbo receiver for multi-user reception, based on [24], is described below and explained with the help of the schematic diagram shown in Fig. 6.In each q th turbo iteration, for q ∈ N[1 Q], it obtains the reconstructed s M U using all output CBs of the LDPC decoder from the previous iteration.The C (q−1) j of each j th user is reshaped into vector form as We set C (0) j = C j , obtained from the first stage of signal processing given in (59).As mentioned in Section IV-C, for the turbo receiver case, the channel coded bits are interleaved before mapping them to a QAM symbol in each j th user transmission.Therefore, for reconstruction, the elements in vector c (q−1) j are interleaved, and the resulting c(q−1) j is used for the QAM modulation as Using ( 65), (67), and (68), we obtain s(q−1) M U , which is applied to GS iterative detector as After the final iteration of the GS detector, which is described by equations ( 44), ( 45), (50), and (51), each ŝ(q) j is separated from the output ŝ(q) M U using (52).User-wise waveform demodulation and QAM soft demodulation are performed using eqs.( 53) to (57).The LLR values b (q) j are de-interleaved, and b(q) j is applied to LDPC decoding.Unlike the SIC receiver, the LDPC decoder decodes all the CBs in each of the turbo iterations.The output CBs C (q) j from the LDPC decoder are used for the reconstruction of s M U in the next iteration.The turbo iterations will continue until all the CBs are correctly decoded, or the maximum number of iterations is reached.

F. Frequency Domain SIC Receiver for OFDM
The frequency domain SIC receiver follows the principle of the time domain SIC receiver developed for OTFS (OTSM or block SC) in IV-D.It operates on a frame of N number of received OFDM symbols {y of dm,k } k=0,1,...,N −1 .Each k th received OFDM symbol is expressed by following (39) as where Λ ef f,k , d M U,k , and w of dm,k represent the combined channel matrix as given in (38), symbol vector as given in (37), and noise F M w of dm,k , respectively.In each q th SIC iteration, where q ∈ N[1 Q], the interference cancellation is performed in frequency domain for the received OFDM symbols as where { d(q) M U,k } are the reconstructed {d M U,k } in the q th iteration obtained using the CCBs from the previous iterations following eqs.( 63) to (68).We set where ef f,k is the updated channel matrix obtained by following the procedures of the time domain SIC receiver, given in eqs.( 73) and (74).Each j th user's estimated symbol vectors The user-wise estimates of symbol vectors, { d(q) j,k }, are then used for soft demodulation, and the resultant LLR values are fed to the LDPC decoder.Decoding is selectively performed Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II COMPARISON OF PER-ITERATION COMPLEXITY BETWEEN THE SIC AND TURBO RECEIVERS
on those CBs that were incorrectly decoded in the previous iteration, as is done in the time domain SIC receiver.Similarly, the sequence of operations for the frequency domain SIC receiver follows the time domain SIC receiver given in Fig. 5.

G. Implementation Complexity of Receivers
Table II compares the iteration-wise complexity of the SIC and turbo receivers.The GS detection in turbo receiver requires U M N (2(l max + 1) + 1 + 2 log 2 N ) complex multiplications (CMs) for U number of GS iterations [23].On the other hand, the MMSE in SIC involves sparse matrix multiplication and inversion of a sparse matrix Ψ = H (q) ef f H (q)H ef f + σ 2 w I M N , as given in eqs.(75) ans(76).These operations can be achieved with a complexity of O(M N ) and the order of the number of non-zero elements (NNZ) of Ψ using sparse matrix multiplication and direct Cholesky factorization methods, respectively [42].Since the sparsity of H (q) ef f increases with each iteration, the total complexity of the MMSE operation lies between O(M N ) and O(M N (2l max + 1)), which is close to the complexity of a GS iteration.Since in each q th iteration, the SIC receiver performs detection for only the WCBs of the previous iteration, the waveform demodulation followed by LLR computation is limited to those samples of ŝ(q) M U that correspond to the WCBs.Similarly, the LDPC decoding process involves a distinct number of code blocks: L (q−1) w for SIC, whereas the LDPC decoding process in turbo receiver operates on all code blocks (L).Consequently, the complexity per LDPC decoding iteration, based on min-sum algorithms, is O(d c L (q−1) w C l ) for SIC receiver and O(d c LC l ) for turbo receiver, where d c is the average column weight of the parity check matrix used for LDPC decoding.While the complexity of LDPC decoding per iteration for turbo remains constant, the number of decoding iterations of LDPC varies with each iteration of the turbo loop.The number of LDPC decoding iterations for each of the receivers is given in Fig. 14 in Section V-C based on simulations for block error rate (BLER) results.Similarly, the number of WCBs L (q−1) w which are input for LDPC decoding in the q th SIC iteration for OTFS is given in Fig. 15.For reconstruction of s M U , the SIC receiver can reuse the reconstructed vector from the previous iterations and enhance it with the newly detected CCBs in the current iteration, which are L (q) c − L (q−1) c in number.In contrast, the turbo receiver requires reconstruction of the entire s M U , involving procedures such as bit interleaving, QAM modulation, and waveform modulation.
Complexity of Frequency Domain SIC Receiver for OFDM: The constituent operations of frequency domain SIC are similar to those of time domain SIC listed in Table II, except for the MMSE operation.The matrix inversion in the frequency domain MMSE, as given in (83), can be implemented using the LDL H factorization. To achieve this, Λ ef f is replaced with a banded matrix formed with the main diagonal, D subdiagonals, and D superdiagonals of Λ ef f , as described in [43].The resulting complexity of frequency domain MMSE operation for a frame of N OFDM symbols is M N 8D 2 + 22D + 4 complex operations.We use D = 5 for the complexity comparisons made in V-C using simulations.This MMSE complexity is higher compared to the sparse MMSE operation and GS iterations, and it dominates the overall complexity of the frequency domain SIC receiver.

V. SIMULATION RESULTS AND PERFORMANCE COMPARISONS
We begin this section with the performance analysis of the MA schemes in terms of uncoded error probability, as discussed in Section V-A.In Section V-B, we present the LDPC forward error correction (FEC) coded performance of the receivers in terms of the block error rate (BLER), which is the ratio of the total number of CBs in error for all users to the total number of CBs transmitted by all users as Lw L .The processing complexity of the three receiver architectures is compared in Section V-C.We compare the PAPR of the OTFS, OTSM, and Block SC, and analyze their performance in the presence of a non-linear HPA effects in Section V-D.
The configuration used for Monte Carlo based simulation analysis given in Table III.The Doppler frequency for each path of the EVA channel was generated using the Jakes model,  which is defined as ν p = ν max cos θ p , where θ p is uniformly distributed between [−π π] and ν max is the maximum Doppler shift.The delays for allocation are equally shared among users and the allocated delay indices are adjacent.However, it is also possible to allocate varying numbers of non-adjacent delays to users while satisfying the condition in (16).

A. Uncoded BER Comparison
The uncoded BER for OTFS, OTSM, and block SC is computed by hard thresholding the LLR values bj before LDPC decoding in the first stage of signal processing, before SIC or turbo iterations begin.For OFDM, the uncoded BER is determined within the first iteration of the frequency domain SIC receiver, where hard thresholding is applied to the LLR values before LDPC decoding.Fig. 7 shows the uncoded BER performance of these waveforms for different numbers of users.In single-user scenarios, OTFS, OTSM, and block SC outperform OFDM.Both OTFS and OTSM exhibit identical performance, while block SC has relatively poorer performance.However, in multiuser scenarios, the uncoded BER performance for OFDM remains unaffected.This is due to the allocation of orthogonal subcarriers and the use of a cyclic prefix for each OFDM symbol, which limits interference between users in both the frequency and time domains.In the case of OTFS, OTSM, and block SC MA schemes, the information-bearing QAM symbols from users interfere in the time domain, leading to an increase in multi-user interference (MUI) with an increasing number of users.Fig. 7 shows a degradation in the uncoded BER with the number of users, similar to the results in [16] for OTFS.Furthermore, we observe that OTFS and OTSM perform similarly in multi-user scenarios as in single-user scenarios.The performance gap between block SC and these waveforms is reduced as the number of users increases.For 64 users, where MUI is more pronounced, OTFS, OTSM, and block SC have similar uncoded BER performance.
Since most broadband wireless systems use FEC codes, thus, before drawing any usable conclusions, it is essential to observe the coded performance of these waveforms, which is discussed in the following subsection.

B. Coded Performance
We compare the coded performance of MA schemes for OTFS, OTSM, and block SC using two iterative receivers developed in IV-D and IV-E, along with the MA performance of OFDM using a frequency domain SIC receiver described in IV-F.For block SC and OFDM with SIC receiver, the coded QAM symbols are randomly interleaved before being placed in their respective delay-time and time-frequency domain.Similarly, during reception, for the block SC and OFDM with the SIC receiver, the QAM symbol estimates are deinterleaved prior to the QAM soft demodulation.
Fig. 8 shows the BLER performance of MA schemes with SIC receiver.We observe that the coded single-user performance of OTFS, OTSM, and block SC are nearly identical, while these outperform OFDM by a significant SNR margin.It is interesting to observe the multi-user performance of the OTFS, OTSM, and block SC is better than their singleuser performance.This is in contrast to their uncoded BER performance shown in Fig. 7.This can be attributed to the conjecture made in Section III-B, i.e. the effective channel matrix H ef f in ( 28) -( 30) has higher diversity compared to the single-user channel matrix H of (11) and the MUI cancellation performed by the SIC receiver.On the other hand, BLER for OFDM degrades in the multi-user scenario where each user receives only a fraction of the bandwidth, limiting the available channel frequency diversity for coded performance.
The performance of the multi-user turbo receiver is shown in Fig. 9 for OTFS, OTSM, and block SC.For the single-user case, it can be observed that the three waveforms have nearly identical performance.In the multi-user scenario, the BLER is improved compared to the single-user case for SNRs above 13 dB.This improvement is due to the weighted subtraction between the received vector y in (50) and reconstructed s(q−1) M U in the turbo loop, which cancels interference similar to the SIC receiver.However, since the generation of s(q−1) M U also involves WCBs from the previous turbo iteration carrying MUI, the noise combined with the MUI reduces turbo receiver performance in noise-dominated scenarios (for low SNRs).This can be observed in the Fig. 9 for SNRs ranging from 11 dB to 13 dB.
We see that the coded performance of block SC for both receivers is comparable to OTFS and OTSM over the timevarying channels.This can be understood by comparing the uncoded and coded error performance of OTFS and block SC under different mobility conditions, as shown in Fig. 10.We observe that the uncoded BER of block SC is unaffected by mobility, while the uncoded BER of OTFS improves with mobility.This is because each information-bearing QAM symbol in block SC is spread across the entire bandwidth, hence the frequency dispersion caused by mobility does not induce any inter-Doppler interference (IDI) between the QAM symbols.However, it lacks time diversity since these symbols are limited to an interval of ∆τ = 1 B .Whereas in OTFS, the QAM symbols are spread across the entire time-frequency domain, resulting in both frequency and time diversity which helps it in performing better than block SC in terms of uncoded BER.We also observe that the coded performance of block SC and OTFS under different mobility conditions is nearly identical, which is in contrast to the uncoded BER results.This is because the FEC encoded blocks spread in the time domain help coded block SC achieve time diversity, in addition to its inherent frequency diversity.Fig. 11 compares the coded BLER of OTFS and block SC under different mobility conditions.We observe that, like the coded BER, the OTFS and block SC have similar BLER performance.
Comparing Fig. 8 and Fig. 9, we can observe that both the SIC and turbo receivers exhibit similar BLER performance in the presented multi-user scenarios.However, it is necessary to examine their performance in scenarios with a high number of users, where the number of CBs per user that the user transmits is very small.In such scenarios, even if the channel condition is good for a user, the receiver will not be able to clear much interference by utilizing the correctly received CBs from that user.This results in high residual interference that cannot be overcome by available diversity in H ef f .Fig.  shows the BLER performance of OTFS and block SC for both SIC and turbo receivers for scenarios with more than 64 users.In the case of OTFS, we see that up to 72 users, where the number of CBs per user is 4, both receivers provide improved performance.However, the improvement is marginal compared to the 64-user scenario, where the number of CBs per user is 5.A degradation in performance appears for 100 users and 160 users, where the number of CBs per user is 3 and 2, respectively, and it is significantly higher for turbo receiver compared to the SIC.In the case of block SC, we can observe a significant improvement in BLER performance when using an SIC receiver for a high number of users compared to OTFS.Additionally, block SC supports a higher number of users than OTFS.We see improved performance for block SC for up to 160 users.Even for the 246-user scenario, the degradation is minimal, and the performance is still close to that of OTFS with 160 users.However, with the turbo receiver, block SC is only marginally better than OTFS.Furthermore, we observe that across both waveforms, the degradation in performance with high number of users is significantly less for the SIC receiver than for the turbo receiver.The BLER performance of the SIC receiver remains better than that of single-user scenarios, providing BLERs ∼ 10 −5 .

C. Receivers Complexity Comparison From Simulations
We compare the complexities of the receivers for the scenario with 64 users.Fig. 13, shows the average number of iterations performed by each receiver per frame for different SNR conditions.We see that the SIC receiver requires less number of iterations compared to turbo receiver.The difference is significant in the low SNR region.For OTFS, at an SNR of 15 dB, the turbo iterations are 2.3 times higher than the SIC iterations.Fig. 14, shows the average number of LDPC decoding iterations required per CB per frame for the three receivers.We observe that, again, SIC-based receivers require fewer decoding iterations compared to turbo, and it is significant for low SNR points.At an SNR of 15 dB, the SIC requires 40% less number of LDPC decoding iterations than the turbo receiver.Fig. 15 shows the average number of WCBs L (q−1) w that are input for LDPC selective decoding in each SIC  iteration for OTFS under varying SNR conditions.We see that as the number of SIC iterations increase, the number of WCBs decreases.From the results it can be said that 6 iterations are sufficient, beyond which improvement is negligible.Now we compare the total order of complexity per transmit frame for each receiver to process the received signal.This is determined by multiplying the average number of iterations performed by the receivers (Fig. 13) and the average per-iteration complexity of the receivers.The latter term is computed from the expressions given in Table II, along with the average count of LDPC decoding iterations (Fig. 14) and the average count of WCBs L (q−1) w for each q th SIC iteration (Fig. 15).The result is shown in Fig. 16.We observe that the overall complexity order of the time domain SIC is significantly lower by ∼ 10 times than that of the turbo receiver at all SNR points.This is due to the fact that the per-iteration complexity of SIC gradually decreases as iterations progress, and this is combined with the decreasing average number of SIC iterations with increasing SNR.On the other hand, the frequency domain SIC receiver, which uses non-sparse matrix-based methods for MMSE operation, has high complexity, comparable to that of the turbo receiver.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Therefore, the time domain SIC receiver has significantly lower complexity than the turbo receiver, while providing similar BLER performance.After having discussed the performance complexity of multi-user receivers we move on to the next section, where we analyze the effects of non-linear HPA on OTFS, OTSM, and block SC.

D. Effect of HPA Non-Linearities
Fig. 17 shows the complementary cumulative distribution function (CCDF) curves for PAPR of OTFS, OTSM, and block SC in both single-user and 8-user scenarios.It is observed that OTFS and OTSM have identical PAPR distributions in both cases.In the single-user scenario, OTFS and OTSM have a PAPR of approximately 11.5 dB at 90 th percentile, while block SC has around 3 dB.For 8 users, each user has only 61 IDFT per transmission, resulting in 61N non-zero samples out of M N samples.Consequently, the 90 th percentile PAPR of the 8-user OTFS case is approximately 1 dB less than that of the single-user case.Now we look at the BLER performance of the three waveforms when SSPA [36] based HPA is used in transmission.
Fig. 18 shows the BLER performance of the MA schemes for the SIC receiver in the 8-user scenario with the non-linear Fig. 18.The BLER performance comparison using SIC receiver with and without HPA non-linear effects.Fig. 19.TD curves for a targeted BLER of 10 −3 with SSPA model [36].
HPA at output backoff (OBO) of 4 dB, as well as with an ideal HPA.It can be observed that the BLER performance of block SC with the non-linear HPA is much closer to that of the ideal HPA than the performance of OTFS and OTSM.
As the OBO increases, the gap between the error performance of MA schemes with non-ideal and ideal HPAs decreases.However, increasing OBO also decrease the transmit SNR, leading to an increase in total degradation (TD).TD is defined as the sum (in dBs) of OBO and the SNR gap between the performances for ideal and non-linear HPAs, for a given targeted level of performance [46].TD = (SNR) non ideal HPA − (SNR) ideal HPA + OBO (in dB) (85) Fig. 19 shows the TD versus OBO curves of the three MA schemes using the SIC receiver in the 8-user scenario, for a targeted BLER of 10 −3 .It can be observed that the TD curves for OTFS and OTSM are similar, as their PAPR distributions are close to each other.The TD is minimum for the block SC at an OBO of 1 dB, which is lower than the minimum TD for OTFS and OTSM, occurring at an OBO of 4 dB, by approximately 2.4 dB.This significant SNR advantage with block SC in practical scenarios makes it an alternative to OTFS and OTSM.

VI. CONCLUSION
In accordance with the goal of this work, we have developed a multiple access scheme where each user generates FECcoded QAM-modulated OFDM signal, upsamples it, circularly delays it (as per the user's allocation), and then send the signal in uplink transmission to the base station (BS).It is shown through an analytical system model that the composite received signal at the BS has the form of a received signal of an equivalent single-user OTFS system.Further, a user may flexibly use Walsh-Hadamard spreading in place of OFDM, as explained in Section III-C, to send the signal, which gets combined at the BS to produce an effective single-user OTSM received signal.Likewise, by utilizing this framework, a user may directly send the signal without using any spreading, resulting in an equivalent single-user block SC received signal at the BS.The multi-user SIC and turbo receivers developed in this work are able to extract the diversity gain available in the three waveforms and also provide a superior coded performance for multi-user scenarios than single-user scenarios by utilizing multi-user channel diversity.This novel approach of MA enables even low capability devices to enjoy the benefits of diversity gain available with OTFS, OTSM, and block SC, which was limited to the devices with high signal processing capability as per existing works.
The single-user and multi-user coded-BLER of OTFS, OTSM, and block SC are demonstrated to be considerably superior to that of OFDM, with OTFS and OTSM exhibiting identical error performance.The multi-user SIC receiver is shown to require much less computation complexity than the multi-user turbo receiver.The block SC, in conjunction with the multi-user SIC receiver exhibit support for the highest number of users while achieving the lowest coded-BLER amongst all waveforms under identical operating conditions.It has simple signal generation structure, lowest PAPR and has the highest resilience to HPA non-linearities.
Thus, based on the observations the block single carrier stands out as a potential candidate waveform for next generation air interface as it provides support for multiple access, power efficiency, low capability devices, high throughput and reliability, which are the requirements of future 6G systems.Investigations on multi-antenna signal processing is envisioned as a future extension of this work.
where, l max is the channel delay tap corresponding to the maximum excess delay, and a p e j2πνpn ′ ∆τ c o (l∆τ − τ p ). (91)

Fig. 3 .
Fig.3.The first stage of signal processing for both SIC and turbo receiver.

Fig. 4 .
Fig. 4. The allocation of row indices to users for ZP based transmission.

Fig. 8 .
Fig. 8. BLER performance of the MA schemes for the SIC receiver.

Fig. 9 .
Fig. 9. BLER performance of the MA schemes for the turbo receiver.

Fig. 10 .
Fig. 10.Uncoded BER and coded BER of block SC and OTFS under different mobility conditions for turbo receiver.

Fig. 11 .
Fig. 11.BLER performance of block SC and OTFS under different mobility conditions for turbo receiver.

Fig. 12 .
Fig. 12. BLER performance of the receivers for high number of users.

Fig. 13 .
Fig. 13.Average number of iterations for turbo and SIC receivers.

Fig. 14 .
Fig. 14.Average number of LDPC decoding iterations per CB per frame for turbo and SIC receivers.

Fig. 15 .
Fig. 15.SIC iteration-wise average number of CBs input for LDPC decoding at varying SNRs for OTFS in a 64-user scenario.

TABLE I MATRIX
P FOR OTFS-RELATED WAVEFORMS