Joint Beamforming and Compressed Sensing for Uplink Grant-Free Access

Compressed sensing (CS)-based techniques have been widely applied in the grant-free non-orthogonal multiple access (NOMA) to a single-antenna base station (BS). In this paper, we consider the multi-antenna reception at the BS for uplink grant-free access for the massive machine type communication (mMTC) with limited channel resources. To enhance the overloading performance of the BS, we develop a general framework for the synergistic amalgamation of the spatial division multiple access (SDMA) technique with the CS-based grant-free NOMA. We derive a closed-form statistical beamforming and a dynamic beamforming scheme for the inter-cluster interference suppression when applying SDMA. Based on this, we further develop a joint adaptive beamforming and subspace pursuit (J-ABF-SP) algorithm for the multiuser detection and data recovery, with a novel sparsity level decision method without the accurate knowledge of the noise level. To further improve the data recovery performance, we propose an interference cancellation-based J-ABF-SP scheme (J-ABF-SP-IC) by using the initial signal estimates generated from the J-ABF-SP algorithm. Illustrative simulations verify the superior user detection and signal recovery performance of our proposed algorithms in comparison with existing CS-based grant-free NOMA techniques.


I. INTRODUCTION
T HE massive machine type communication (mMTC), e.g., the internet of things (IoT), emerged in the 5G era, will still play a critical role in the forthcoming beyond 5G and even 6G eras.Non-orthogonal multiple access (NOMA) has been identified as an enabler to support the massive connectivity with limited channel resources [1], [2], [3], [4], [5].Guoqing Xia is with the School of Engineering, University of Leicester, LE1 7RH Leicester, U.K. (e-mail: gx21@leicester.ac.uk).
Bohan Li is with the Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China (e-mail: bohan.li@ouc.edu.cn).
Yue Zhang is with the Institute of Communication Systems and Measurement Technology, Chengdu 610095, China (e-mail: zhangyue@icsmcn.cn).
Huiyu Zhou is with the School of Computing and Mathematical Sciences, University of Leicester, LE1 7RH Leicester, U.K. (e-mail: hz143@leicester.ac.uk).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TWC.2024.3373474.
Another characteristic of mMTC is sporadic data transmission, i.e., at any time only a small fraction of potential users are active and transmit small data packets [6], [7], [8], [9].In this case, the conventional grant-based NOMA techniques will cause the large access delay and signalling overhead.Therefore, an efficient communication paradigm shift is necessary to enable the low-latency and high-reliability mMTC applications.

A. Related Work
Recently, grant-free NOMA methods have been envisioned as feasible solutions for mMTC.In the uplink grant-free access, the active users transmit data via the available channel resources that the BS broadcasts periodically, without going through the complicated channel access request and granting process [9], [10].Thus, the grant-free access is effective in reducing the access delay and signalling overhead due to the sporadic and small-scale data transmission in the mMTC scenario.However, in the grant-free access, the BS cannot identify the active users before data transmission without the granting process.Thus, for reliable uplink communications, partially sighted user activity detection is necessary via the superimposed received signal of the active users.
Current coherent grant-free access schemes can be classified into two categories according to the method of channel estimation and user activity detection [11].For the first grant-free access type, the preambles of the active users are transmitted to the BS for channel estimation (CE) and multiple user detection (MUD), and the coherent data recovery (DR) is then performed at the BS based on the previously estimated channel state information [12], [13], [14], [15].For the second grant-free access type, the channel information of all the users are estimated based on pilots in the first stage, and subsequently within the coherence time, the joint MUD and DR is performed at the BS [16], [17], [18], [19].The frame structures of these two grant-free schemes are shown in Figs. 1 and 2. In addition, some non-coherent grant-free access methods are proposed for some specific applications, e.g., unmanned aerial vehicle (UAV) assisted massive IoT [11] and massive multiple-input-multiple-output (MIMO) [12].In this paper, we focus on the joint MUD and DR for the second type of grant-free access for mMTC.
The sporadic transmission in mMTC gives rise to the sparse received signal with high probability.Compressed sensing (CS) techniques are promising in recovering the sparse signals from the far fewer samples than those required by the classic Nyquist sampling [20], [21], [22], [23], [24].Accordingly, the number of necessary resource elements for data transmission can be reduced when considering the CSbased receiver.The CS-based grant-free NOMA necessitates judicious transceiver design.At the transmitter, the active users modulate the information bits into symbols, and spread them onto specific subcarriers by using non-orthogonal signatures for transmissions.The widely used spreading schemes include low density signature (LDS) [1], sparse code multiple access (SCMA) [2], [3], [25], [26], etc..At the receiver, the received signals on different subcarriers are used for the user activity detection and signal recovery by CS techniques.Extensive CS-based sparse signal recovery methods have been proposed, including the orthogonal matching pursuit (OMP) [20], compressed sampling matching pursuit (CoSaMP) [22], subspace pursuit (SP) [23] and approximate message passing (AMP) method [24], etc..These methods require prior knowledge of the user sparsity level (the number of active users), which is often impractical in engineering applications.
Furthermore, considering the consecutive data transmission in different slots in mMTC scenarios, the temporal correlation for the user activity has been utilised to enhance the communication performance in grant-free NOMA systems [16], [17], [18], [19], [27], [28], [29], [30].The assumptions on the temporal correlation of the user activity can be classified into two categories.The first one is that the user activity stays unchanged in one frame, called frame-wise (block) sparsity.Based on this assumption, the modified AMP [16], SP [17] and block-coordinate-descent (BCD) [18] methods were developed for the frame-wise user activity detection and data recovery in grant-free NOMA.These methods do not require the prior user sparsity level but need to estimate it based on the prior noise power.To avoid using the prior information of the noise level, the authors in [17] proposed a cross-validationbased method to determine the user sparsity level.The authors in [19] considered an orthogonal approximate message passing (OAMP)-multiple measurement vector (MMV) algorithm with simplified structure learning (SSL) and accurate structure learning (ASL), termed as OAMP-MMV-SSL and OAMP-MMV-ASL, respectively.These two methods can iteratively estimate the user sparsity ratio and the noise variance using the expectation maximisation [19].
The second is the dynamic user sparsity assumption, i.e., the user activity can be different in consecutive slots.A dynamic CS method [27] and a modified SP method [28] were proposed to improve the active user estimates in consecutive slots based on the temporal correlation between one another.The weighted l 2,1 minimisation model-based method was developed for the enhanced performance in detecting the users with dynamic sparsity [29].In addition, the first bit with value 0 or 1 in the data payload was used to determine whether the active user has data to transmit in the current time slot [30].All of these methods require the noise level as the prior information.
The aforementioned methods are usually developed for the grant-free NOMA system with a single-antenna BS.Recently, [13] demonstrated that, both the missed user detection and the false alarm probabilities can always converge to zero by utilising the vector AMP algorithm [24], in the asymptotic massive MIMO regime.A joint spatial-temporal-structured adaptive SP method was proposed for grant-free NOMA to jointly estimate channels and detect users by considering the block sparsity over multiple slots and multiple antennas [31].Additionally, media-based modulation is employed in grant-free access in multi-antenna BS scenarios by using SP [32], [33] and AMP [34].However, these spatial modulation methods do not fully exploit the inherent spatial diversity and multiplexing gain of the potential user clustering and thus require a large number of antennas to achieve a satisfactory performance.

B. Motivation
Accurate sparse signal recovery necessitates a large number of spectrum resources or massive antennas for massive connectivity with current CS-based grant-free NOMA techniques, even though they can enable the system to operate in overloaded conditions to some extent [16], [17], [18], [19], [33], [34], [35].The spatial division multiple access (SDMA) technique characterised by the multi-antenna BS has been proven to be effective in supporting massive connectivity, especially when integrating with the power-domain NOMA techniques [36], [37], [38], [39], [40], [41].As shown in Fig. 3, the SDMA can cope with the simultaneous transmissions of multiple users sharing the same spectrum resources aided by an advanced interference mitigation technique, e.g., digital beamforming.It is a promising solution to integrate the SDMA with the CS-based grant-free NOMA technique in mMTC applications for improved spectral efficiency.However, to our best knowledge, there is no work in the open literature that has taken this into consideration.

C. Our Contribution
In this paper, we concentrate on developing the joint MUD and DR method for the uplink grant-free NOMA to a multiantenna BS.We consider i) the first temporal correlation assumption, i.e., the frame-wise block sparsity for each user; ii) the second coherent grant-free access type with the channel information estimated using pilots before the data transmission.Massive users are assumed to be clustered according to the channel correlation, based on which the multi-antenna reception can be combined by beamforming to suppress the inter-cluster interferences.For users within the same cluster, the CS-based grant-free NOMA method is utilised for the MUD and DR based on the combined signal by beamforming.The main contributions are summarised as follows.
1) We have developed both a closed-form statistical beamforming (SBF) scheme and a dynamic beamforming (DBF) scheme.These beamforming approaches, when combined with appropriate user clustering based on channel correlation, effectively mitigate inter-cluster interferences.Even in cases where the total number of users significantly exceeds the number of antenna elements at the base station, these schemes demonstrate effective interference suppression.
2) We have formulated a comprehensive framework for integrating SDMA with grant-free NOMA.This framework enables simultaneous differentiation and service of spatially clustered users using the spatial diversity and multiplexing gain provided by multiple beams.Within this structure, the optimisation of beamforming and signal estimation is jointly and alternately performed.This parallel optimisation process for distinct user clusters can significantly reduce the access latency.Additionally, the utilisation of the same spectrum resources by all user clusters leads to a substantial increase in spectral efficiency.
3) As a practical realisation of the developed framework, we introduce a joint adaptive beamforming and subspace pursuit (J-ABF-SP) algorithm tailored for uplink grant-free access.In each iteration of the J-ABF-SP algorithm, adaptive beamforming and subspace pursuit are performed alternately to jointly achieve user detection and signal recovery.A robust method for determining user sparsity level is introduced, obviating the need for prior knowledge of noise levels.
4) To further enhance MUD and DR performance, we propose an interference cancellation (IC) scheme denoted as J-ABF-SP-IC.Building upon the results obtained from user activity detection and initial signal estimation via the J-ABF-SP algorithm, this scheme involves the reconstruction of received signals for each cluster.By utilising these reconstructed signals, interference-cancelled received signals for each cluster are derived.Subsequently, similar procedures to those in the J-ABF-SP algorithm are used to alternate between signal estimation and beamforming optimisation.
5) Simulation results verify that the J-ABF-SP algorithm can achieve superior MUD and DR performance in comparison with the benchmark methods at the cost of moderately increased complexity.Moreover, the J-ABF-SP-IC algorithm can further enhance the performance with slightly increased complexity.In addition, compared to the existing methods, the integration of the SDMA and grant-free NOMA in this paper can markedly improve the spectral efficiency.
The remainder of the following parts of this paper is organised as follows.Section II describes the signal model and problem formulation.Section III introduces the proposed beamforming schemes.Section IV details the proposed joint optimisation algorithms for the beamforming and data recovery.Section V gives the computational complexity analysis.Section VI illustrates the simulation results.Section VII concludes this paper.
Notation: C denotes the field of complex numbers.Scalars are denoted by lower-case letters, vectors and matrices respectively by lower-and upper-case boldface letters.System architecture of the integration of SDMA and grant-free NOMA.
The conjugate, transpose, conjugate transpose and Moore-Penrose (M-P) inverse are denoted by (•) * , (•) T (•) H and (•) † , respectively.E{•} and |•| denote the mathematical expectation and modulus, respectively.vec{•} vectorises a matrix by stacking each column of it on top of one another.vec −1 (c, T ) generates a matrix with T rows by performing inversely vectorisation to the vector c. ∥ • ∥ 2 denotes the l 2 norm of a matrix.∥ • ∥ 0 denotes the l 0 norm of a vector, i.e., the number of non-zero elements of it.The notations min{•} and max{•} denote the minimum and maximum element of the enclosed set {•}, respectively.The notation ⊗ denotes the Kronecker product.

II. SIGNAL MODEL AND PROBLEM FORMULATION
We consider the spreading-based grant-free NOMA in a multi-antenna cellular system to support the mMTC with limited channel resources.The cellular BS is equipped with a uniform linear array with M antenna elements while all users are with a single antenna.We consider the second coherent grant-free access type with the channel information estimated using pilots before the data transmission, as illustrated in Fig. 2. As shown in Fig. 3, N Q users (devices) are grouped into N clusters 1 according to their channel correlation by using common clustering methods, e.g., K-means [37], [40], [42].The channel correlation coefficient is defined in Appendix A. Without loss of generality, the equal-size clusters are assumed, e.g., Q users in each cluster n = 1, 2, • • • , N .All user clusters employ the same frequency resources, i.e., K subcarriers, for simultaneous communication with the BS.To support mMTC, we consider an overloaded system with K < N Q. 2Please note that the number of user clusters is constrained by the degrees-of-freedom (DoF) of the BS, while the angular distribution range of users in each cluster is limited by the main lobe width of the beampattern.Both the DoF and the main lobe width of the beampattern are determined by the number of antenna elements in a specific array configuration.Consequently, for a given user distribution, the number of user clusters and the angular distribution range of users in each cluster should match the number of antennas.This ensures sufficient utilisation of the spatial resources and helps prevent the performance degradation.To enhance the readability of the signal model and algorithm derivations, we provide a summary of the key variables involved in Table I.This table includes their definitions and dimensions for clarity.

A. Signal Model
The qth user in cluster n is expressed by u n,q .The spreading signature for u n,q is denoted as s n,q = [s n,q,1 , s n,q,2 , • • • , s n,q,K ] T with s n,q,k representing the spreading factor on subcarrier k for user u n,q [18], [19], [29].Non-orthogonal non-sparse spreading signatures are employed in this paper, e.g., Zadoff-Chu sequences 3 [43].Assuming the line-of-sight transmission only, the angle of arrival (AoA) from user u n,q can be denoted as θ n,q and the steering vector is defined as, where e is the Euler's number, λ is the carrier wavelength and d is the distance between the adjacent antenna elements, usually set to be a half wavelength λ/2.The channel gain vector g n,q,k ∈ C M ×1 between the user u n,q and the multi-antenna BS using subcarrier k can be modelled as the product of the channel fading and the steering vector, defined as g n,q,k = f n,q,k a n,q , where the channel fading f n,q,k = ρ n,q η n,q,k consists of the large-scale fading ρ n,q , including the path loss and shadowing fading, and the small-scale random fading η n,q,k following the complex Gaussian distribution.We assume a slow-fading channel which remains unchanged within a coherence time interval (longer than the frame length of the mMTC).
The received signal at the BS for subcarrier k and slot t can be formulated as, 3 The Zadoff-Chu spreading signatures are detailed in Appendix B.
where x n,q,t4 is the transmitted signal of user u n,q at the current slot t, x n,t is the transmitted signal vector with its qth entry being x n,q,t , and v k,t is the additive Gaussian noise vector.The equivalent channel gain matrix for cluster Since the users are clustered by channel correlation, beamforming can be performed to suppress the inter-cluster interference signals at the BS.For any cluster n = 1, 2, • • • , N , the multi-antenna received signal on subcarrier k is combined by beamforming, i.e., where N is the index set of all clusters, and b n is the beamforming weight vector for cluster n.
where I K denotes a K × K identity matrix and the received signal vector y t is given by, T .We define the equivalent beamforming gain matrix, Then, y n,t can be rewritten as, The first term on the right-hand side of ( 7) is the desired signal for cluster n, the second is the superimposed intercluster interference, and the last is the noise term.

B. Problem Formulation
As stated in Section I-A, we consider the second grant-free access type, i.e., the channel gains are a priori estimated in the first stage [16], [17], [18], [19].In this context, we consider non-sparse spreading signatures, such as Zadoff-Chu sequences.With the channel information and spreading signatures, one can obtain the equivalent channel gain matrix Gl .Our objective is to develop an algorithm that optimises both the beamforming weights and the signal estimates concurrently at the BS.
Define the transmitted signal matrix for cluster n as with T denoting the number of slots in one frame.According to (7), the least-squares (LS) error function for MUD and DR is given by, where (•) t denotes the random realisation at time slot t, e.g., y n,t , y k,t and x n,t .
To optimise the signal estimation, we need to constrain the beamforming main lobe towards the desired user cluster by the constraint b H n ān = 1 where ān ≜ 1/Q Q q=1 a n,q is the average of the steering vectors of the users in cluster n.Herein we use the steering vectors rather than the original channel gain vectors to alleviate the impacts of the random channel fading.The joint optimisation problem can be formulated as, where Γn,t denotes the support set of user cluster n at time slot t and s is the maximum user sparsity level.For a slow-fading channel, ān can be obtained by ān = 1/Q Q q=1 g n,q,k /g n,q,k (1) for any k.

III. BEAMFORMING SCHEMES
The problem in (9) belongs to the multivariate high-order nonlinear constrained optimisation problem, which is generally non-polynomial hard (NP-hard) to solve.In this paper, we consider the joint alternating optimisation for the beamforming weight and the signal estimate.To this end, we first design the effective beamforming schemes for inter-cluster interference suppression.

A. Statistical Beamforming Scheme
Ideally, the LS error in (8) can be converted into the mean squared error (MSE) when three conditions satisfy, i.e., 1) the number of slots (samples) is large enough, 2) the transmitted signals follow stationary distributions and 3) the channel states stay unchanged within a frame.Based on this, we substitute y n,t in ( 7) into (8) and present the MSE cost function, With the transmission power of the individual active user in each cluster l denoted as σ 2 l , user activity probability α l and noise power σ 2 v , (10) can be simplified as, Eq. ( 12) describes a constrained quadratic convex optimisation problem, and the closed-form solution of it for each cluster n is given by, Kσ 2 v denotes the total noise power, involving the suppression of the additive noise by beamforming.It also acts as a diagonal loading factor to enable the matrix inversion in (13).α l σ 2 l involves the suppression of the interference signals.Notably, the balance between noise and interference suppression hinges on the interplay between the signal-to-noise ratio (SNR) δ l ≜ σ 2 l /σ 2 v and α l , relating to the interfering clusters l ∈ N \n.Thus, we can pragmatically select an empirical SNR (ESNR) δ l and a rough α l from the interval (0, 1] without requiring precise values.The solution ( 13) is referred to as statistical beamforming (SBF), capable of effectively curbing interference even when the number of antenna elements significantly falls short of the number of users.
In practical mMTC scenarios, the small data sample per user is insufficient to represent the statistics in (12) by using the sample variance.In addition, the inaccurate ESNRs and user activity probabilities also influence the tradeoff between the interference and noise suppression to some extent.Thus, it is better to use the LS cost function rather than the MSE.

B. Dynamic Beamforming Scheme
We now develop the beamforming scheme based on the LS criterion.In light of Eqs. ( 3)-( 6), the LS error function in (8) can be further expanded as follows, Thus, the LS-based beamforming optimisation problem can be further expressed as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where i n,k,t is the interference plus the noise component (IpNC), defined as, Similar to the SBF, the dynamic beamforming (DBF) solution to ( 15) is derived, i.e., where T t=1 i n,k,t i H n,k,t can be seen as the auto-correlation matrix 5 of the IpNC, and ϵ is a diagonal loading factor.
The measurement signal y k,t and the transmitted signal x n,t are not prerequisites for SBF.Likewise, DBF does not demand prior knowledge of equivalent channel matrices from interfering user clusters.The SBF and DBF approaches are readily applicable to prevailing receive beamforming scenarios, particularly for receivers featuring a limited number of antennas.The DBF simplifies to the conventional constrained least squares (LS) beamforming method when dealing with only one desired user and one subcarrier [44].

IV. THE INTEGRATION OF BEAMFORMING AND COMPRESSED SENSING
The DBF algorithm necessitates prior knowledge of x n,t for n = 1, 2, • • • , N and t = 1, 2, • • • , T , which paradoxically are the signals under estimation.Consequently, we turn our attention towards the joint optimisation of signal estimation and beamforming.
In light of ( 5), the received signal over a frame can be represented in matrix form by, where the tth column vector of X n is x n,t and the tth column of V is v t .Similarly, extending y n in (4) in one frame yields, To utilise the block sparsity, i.e., constant user activity in a frame, ( 19) is vectorised as, where z n is regarded as the IpNC under beamforming.Therefore, the joint optimisation problem for any cluster n is rewritten as, 2 as the residual energy of cluster n in the following sections.

A. General Framework for the Joint Optimisation
As mentioned in Section I-A, CS-based methods can be employed for MUD, such as CoSaMP [22] and SP [17], [23]. 6 Before delving into the specifics, we will provide a brief overview of the design principles behind the joint optimisation system.For any cluster n, given the known beamforming weight and user sparsity level, the sparse signal recovery problem ( 21) can be efficiently solved using CS methods.Subsequently, the signal estimate is used to update the adaptive beamforming (ABF) module, generating new measurements for the CS module.Fig. 4 illustrates a general framework that integrates SDMA and CS for uplink grant-free access for any user cluster n.In this paper, we focus on the block-sparsity based adaptive SP (ASP) method in the CS module.

B. Algorithm Design for the Joint Adaptive Beamforming and Subspace Pursuit
Based on the beamforming weight bn which is initialised by the SBF weight b SBF n before the first iteration, the measurements (combined signals) for the ASP are generated by, We also have, Step 4: To compute the initial signal estimates w[q, T ] for all the candidate users in the support set of Step 3.
Step 5: To estimate the support set Γn,ι+1 by sparsity level s by selecting the first s largest values of the l 2 norms (magnitudes) of w[q, T ] over all users in one cluster. 6Other existing multiple user detection methods can also be extended and applied to this framework.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Step 6: With the support set estimate Γn,ι+1 at the ιth iteration, the signal is estimated by, where Q is the set of user indices for any cluster.We denote the vector c n [q, T ] as the qth T × 1 vector block of c n and the matrix D n [q, T ] as the matrix block of D n constituted by consecutive columns with index from (q − 1)T + 1 to qT .Furthermore, c n [Λ, T ] and D n [Λ, T ] denote the sub-vector and sub-matrix by selecting their respective blocks according to the indices from the set Λ. Subsequently, with the output Xn = [vec −1 (ĉ n , T )] T of the ASP, the IpNC is estimated by, în,k,t = y k,t − Gn,k xn,t , with xn,t being the tth column of Xn .The beamforming weight is accordingly updated by, with the estimation of the auto-correlation matrix R n , To sum up, a joint adaptive beamforming and subspace pursuit algorithm (J-ABF-SP) is presented in Algorithm 2. Considering the potential small fluctuation of the sparsity level due to the empirical user activity rate α l , the upper bound s for sparsity level searching is selected within a range, e.g., (α l Q, 2α l Q].We now detail the main steps of Algorithm 2. 1) Parallel Computation: The iteration process (the steps between 2 and 21) can be performed in parallel for all clusters in N .This guarantees the fairness in terms of the access delay for different user clusters and thus reduces the total latency in comparison to the serial computation.(Residual and support initialisation) rn,1 = rz , Γn,1 = Γz .9: Invoking the ASP algorithm.(Beamforming weight) Xn = [vec −1 (ĉ z , T )] T , compute în,k,t by (25), and compute bn,z by (26). 12: (Measurement update) Compute ηn and Dn via ( 22) and ( 23) by using bn,z to replace bn .(Signal recovery) X n,1 = [vec −1 (c so , T )] T .22: end for 2) Parameter Passing: The outputs of ASP encompass the estimate of the support set (active user set), residual, and signal estimate (step 9), with the latter employed for beamforming updates (step 11).The updated beamforming weight contributes to generating new measurements (step 12).These, along with the support set and residual, are then fed back into ASP (steps 8 and 9).Upon fulfilling the stopping condition of adaptive beamforming (step 13), the signal estimates, residual energy, and support set estimate are preserved (step 14).Notably, only the support set estimate proceeds to the next iteration at a fresh sparsity level (step 6).These parameter passing processes ensure the continuity of the entire iteration.
3) Important Initialisation: We initialise the beamforming for each sparsity level using the SBF weight (step 3).The SBF offers effective channel utilisation for both the desired user cluster and the interfering user clusters, even without precise SNR values.However, the adaptive beamforming weight at the current sparsity level cannot be directly applied in the next sparsity level iteration.This is because the beamformer treats the signals of undetected active users (UDAUs) as interferences (steps 5 and 12) when the given sparsity is smaller than the actual sparsity level.This aspect is explained in more detail in Appendix D. Consequently, the residual at each sparsity level is initialised using the measurement vector generated through the SBF weight (step 6).
4) Stopping Condition: For the ASP (step 9), the stopping condition is that the current residual energy (norm) is larger than the previous one (step 8 in Algorithm 1), which indicates the current and subsequent iterations tend to deteriorate the user detection and signal recovery performance.For the beamforming update (step 13), we employ a threshold related to the change in residual energy as the stopping criterion.This helps prevent unnecessary beamforming updates.

C. Error Analysis
We now analyse the signal estimation error when using the J-ABF-SP algorithm.The combined signal (20) for cluster n is expressed in a sparse matrix form, i.e., where Γn is the index set of the active users in cluster n and z n is the IpNC under beamforming.With the support set estimate Γ s , the transmitted signals are estimated via (24), i.e., Considering that D n [Γ s , T ] is with the full column rank, we have Thus, we have We now simplify (29) as, where Γ n,s = Γn ∩ Γ s denotes the index set of the detected active users (DAUs).We have Γ n,s ⊆ Γ s and Γ n,s ⊆ Γn .In the following, we will analyse the signal estimation error under two cases, i.e., no falsely detected inactive users (FDIUs) exists with Γ n,s = Γ s and FDIUs exist with Γ n,s ⊂ Γ s .Firstly, for Γ n,s = Γ s , we have Γ s \ Γ n,s = ∅ and the signal estimates of DAUs in (31) can be rewritten as If Γ s ⊂ Γn , we can find that the signal estimates of DAUs are contaminated by the received signals from the UDAUs and the IpNC simultaneously.The existence of the UDAUs indicates the information loss.When Γ s = Γn , there is no UDAU and (32) can be simplified as, It can be seen that more accurate signal estimates are generated in (33) than those in (32) since they are impacted solely by the IpNC.Secondly, when FDIUs exist with Γ n,s ⊂ Γ s , (31) can be rewritten as, where Note that the relevant matrix inversion can be referred to Appendix C. Based on the property , T ] = 0, we have from (34), and On one hand, when UDAUs exist with Γ n,s ⊂ Γn , the signal estimates ĉn [Γ n,s , T ] for DAUs in (36) face contamination from both received signals emanating from UDAUs and IpNC, similar to (32) with Γ s ⊂ Γn .However, due to the unit non-zero eigenvalues of D n [Γ s \ Γ n,s , T ]W H n,s , the overall interference power, stemming from both the UDAUs and IpNC, is anticipated to be lower than that in (32).This results in more precise signal estimates.
The signal estimates for FDIUs in (37) encompass contributions from both received signals from the UDAUs and IpNC, weighted by W H n,s , different from (39) with Γ n,s = Γn .Specifically, signal estimate magnitudes for FDIUs typically fall short of those attributed to DAUs in (36).The degradation of the magnitudes is due to the channel differences between various users, as exemplified by W H n,s D n [ Γn \Γ n,s , T ] in (37), where W n,s involves the channels of DAUs and FDIUs while D n [ Γn \ Γ n,s , T ] involves the channels of UDAUs.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
On the other hand, when all active users are detected with Γ n,s = Γn , we have the signal estimates as follows, in light of ( 36) and (37), where It can be seen that the signal estimates ĉn [ Γn , T ] of the active users suffer from the additive IpNC weighted by  (33).Nonetheless, the Γ s ⊃ Γn scenario inevitably leads to false alarms.
For simplicity, we have considered the same beamforming weight for the above analysis, indicating the same IpNC under beamforming.In fact, as detailed in Appendix D, the beamforming weight varies in different sparsity levels, leading to distinct IpNCs under beamforming.

D. Sparsity Level Decision
Expectantly, the accurate support set estimate Γ s satisfies Γ s = Γn with s equal to the actual sparsity level s o .We now study the sparsity level decision method via the signal estimate ĉn [Γ s , T ] above.
Define the temporal power ratio (TPR) as, where x n,q , the transmitted signal vector of the user u n,q in one sampling duration, is the transpose of the qth row of the transmitted signal matrix X n .Similarly, TPR of xn,q with given sparsity level s is defined as, where xn,q = ĉn [q, T ] is a block vector of the above signal estimate ĉn [Γ s , T ].
2) The sparsity is given by s o = arg min s∈Sc ε s .
We analyse the feasibility of this method in the following.
The TPR within a given sampling duration T generally remains below a specific threshold.In particular, when T is suitably large, the temporal power of the transmitted signal approaches its actual transmission power.Assuming uniform transmission power among active users within the same cluster, 7 γ n tends to converge towards 1.As inferred from ( 32), ( 33), ( 36) and (38), the signal estimates of DAUs are affected by the IpNC and may even be adversely influenced by UDAUs.In contrast, the TPR is a relative metric and is less susceptible to such concerns.Considering the influence of randomness due to limited samples, it is reasonable to empirically set a threshold γn greater than 1.
As discussed in (39), if inactive users are mistakenly identified as active, their signal estimates are dominated by the IpNC, which is notably suppressed by beamforming.This results in γn,s > γn .Even when UDAUs and FDIUs coexist with Γ n,s ⊂ Γ s and Γ n,s ⊂ Γn , the signal estimate magnitudes of FDIUs in (37) are generally lower than those of DAUs in (36).Consequently, step 1) is employed to eliminate sparsity levels where FDIUs probably exist.
Step 2) aims to ascertain the user sparsity level via the fact that the residual energy decreases as the sparsity level s approaches the true value.This verification is presented in Appendix D.

E. Interference Cancellation
As analysed earlier, the transmitted signal is estimated by (24) via the measurements generated by beamforming for the received signal in (22).However, the IpNC suppression solely relying on beamforming may be limited, especially with the number of antennas comparable to the number of user clusters.We propose an interference cancellation (IC) scheme to further improve the signal estimation based on the support set and initial signal estimates from the J-ABF-SP algorithm.
With the active user set and initial signal estimates from the J-ABF-SP algorithm, we can reconstruct the received signal from each cluster n as Gn X n,ι , where X n,ι is the signal estimate after the (ι − 1)th IC.Then, we can obtain the IC-enabling received signal for cluster n, i.e., where Y i,n = N l=1,l̸ =n Gl X l,ι is the reconstructed interference signal for cluster n.Then, the new measurements are generated by, Note that bn is computed by (26) based on the signal estimate Xn , which is initialised by X n,1 before the first IC.
In addition, the parameter matrix Dn is computed by (23).
Based on the measurements (44), the transmitted signals can be estimated by using (24).The detailed steps on IC-enhanced signal recovery are summarised in Algorithm 3, which mainly consists of three loops.Loop 1 gives the number L 2 to perform the IC which is generally small since the performance enhancement by (43) typically reaches its peak quickly.The steps in loop 2 can be performed in parallel for all clusters.This parallel computation property, similar to Algorithm 2, ensures fairness among different user clusters in terms of access delay and computational for Cluster n = 1 to N do 5: (Interference reconstruction) construct the received interference signal Y i,n = N l=1,l̸ =n Gl X l,ι2 . 6: for Iteration ι 3 = 1 to L 3 do 8: (Measurement update) Compute ηn and Dn using bn via ( 44) and ( 23).(Signal update) X n,ι2+1 = Xn .

19:
end for 20: end for 21: (Signal recovery) X n = X n,L2+1 .resources.Loop 3 is used to iterate the signal estimation and beamforming based on the constructed interference-cancelled received signal, with major procedures outlined in Fig. 5. Similar to the ASP algorithm, the stopping condition for loop 3 is that the current residual energy is larger than the previous one.The residual energy, signal estimate, and beamforming weight in loop 3 will be conveyed to loop 1 as initial values.The algorithms 2 and 3 are referred to as the IC-enhanced joint adaptive beamforming and subspace pursuit algorithm (J-ABF-SP-IC).

V. COMPUTATIONAL COMPLEXITY ANALYSIS
In this section, we compare the computational complexity of the proposed algorithms with benchmark methods, including TA-BSASP [17], OAMP-MMV-SSL [19], OAMP-MMV-ASL [19], and DS-AMP [34] methods.The complexity is measured by the number of complex-valued multiplications needed for the whole algorithm implementation.The number of complex-valued multiplications for various algorithms is listed in Table II.For ease of analysis, we assume the same maximum number of iterations for all methods, i.e., L 1 .For the OAMP-MMV-SSL, OAMP-MMV-ASL and DS-AMP, the letter P denotes the dimension of the signal constellation, e.g., P = 2 for binary phase shift keying (BPSK).For the DS-AMP, M D is the number of BS antennas, and M t = 2 MRF is the number of mirror activation patterns by using media modulation with M RF denoting the number of radio frequency (RF) mirrors.
We now detail the computational complexity of our proposed algorithms for one cluster since the algorithms can be performed in parallel for all clusters.Given the number of alternating iterations as L b , the computational complexity of the J-ABF-SP algorithm is expressed as, where ) is the complexity for the ASP in Algorithm 1 and C BF = M 3 + (KT + 1)M 2 + (Q + 1)M denotes the complexity for beamforming update.Given the actual user sparsity level s o , the complexity for the IC-enhanced method in Algorithm 3 is, Consequently, the total computational complexity of the J-ABF-SP-IC is As mentioned in Section II, the number of user clusters and the angular distribution range of users within each cluster should match the number of antennas.Therefore, we assume M is in the same magnitude with N .Additionally, the signal recovery by the subspace pursuit method requires the number of measurements K no less than 2s o [23].Thus, for the J-ABF-SP algorithm, the complexity can be finally denoted by the O notation, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II THE NUMBER OF COMPLEX-VALUED MULTIPLICATIONS
where the first two terms are directly relevant with the beamforming and the last term is involved with the ASP algorithm.Similarly, we have the order of the complexity of performing IC, i.e., Consequently, the total complexity of the J-ABF-SP-IC algorithm is given by For ease of analysis, we assume M = ς + N with ς denotes a non-negative integer enabling the number of user clusters N and the angular distribution range of users within each cluster matched to the number of antennas M .We further assume with Q all = N Q denoting the total number of users.Then, (47) can be converted into, It is evident that the complexity regarding the number of antennas presents a decreasing-then-increasing trend.It can achieve a minimum value simply by letting the sum of the first two terms equal to the third term in (48).
In fact, L b denotes the number for beamforming update, which is generally small.For the proposed algorithms, the increased complexity due to beamforming is modest compared to the TA-BSASP algorithms when utilizing a small number of BS antennas.However, the complexity is comparatively high when compared to the OAMP-MMV-SSL and OAMPMMV-ASL methods because they employ complexity reduction schemes, while the proposed algorithms still leverage the block-sparsity-based ASP method (Algorithm 1) for the MUD.
Additionally, it may seem that the proposed algorithms entail higher complexity than the DS-AMP algorithm.However, the latter relies on a massive number of antennas, whereas our methods can achieve satisfactory performance even with a small number of antennas, provided that the number of user clusters and the angular distribution range of users within each cluster match the number of antennas.The complexity of integrating SDMA and grant-free access is expected to be reduced by using specially designed MUD schemes.This aspect will be investigated in our future work.

VI. SIMULATION RESULTS
We now assess the MUD and DR performance of the proposed J-ABF-SP algorithms through simulations.A BS with M antenna elements is considered, serving massive users simultaneously.The users are assumed to be grouped based on the channel correlation into N ≤ M clusters with Q users in each cluster n, n = 1, 2, • • • , N .Without loss of generality, we consider N = 3 and Q = 40.Assume the AoAs of the users in each cluster are randomly distributed over an angle range with a width of 5 degrees, 8 with the central angles being −30, −10 and 10 degrees, respectively.
All users employ the common K = 20 subcarriers, unless specified otherwise.The same spreading signatures, generated in Appendix B, are utilised in all clusters.In this case, the frequency-domain system overloading factor is N Q/K = 600%, which increases linearly with the number of user clusters.We consider the user activity rate to be α n = 10%.Without loss of generality, we consider a typical value s o = 4 or s o = 5 for the number of active users in each cluster, which is far less than the number of the total users.Each data frame consists of T = 7 continuous symbol durations, following the LTE-Advanced standard [45].
We consider the detection error rate (DER) and the symbol error rate (SER) as performance metrics.For any cluster n, the DER is defined as p d,n = (f n + m n )/Q where f n and m n denote the number of FDIUs and the number of UDAUs, respectively.The SER is defined as p s,n = p d,n + S e,n /(QT ) where S e,n denotes the number of error symbols of DAUs.Both the DER and SER are calculated over a large number of independent trials.In the following, we consider the same input SNR δ n for each user cluster n ∈ N and present the average values of the DERs or SERs of the N clusters, unless noted otherwise.
We evaluate the performance of the proposed J-ABF-SP and J-ABF-SP-IC methods for the MUD and DR, in comparison with some benchmark methods, including the Oracle-BSASP [17], OAMP-MMV-SSL [19], the OAMP-MMV-ASL [19] and the DS-AMP [34] methods.Without loss of generality, the transmitted symbols are randomly generated from 16QAM constellation for all the users.In particular, the Oracle-BSAMP method is evaluated with known user sparsity levels.The DS-AMP [34] algorithm relies on the number of BS antennas, and we consider a BS setup with M D = 150 antennas for its simulation.The number radio frequency (RF) mirrors is denoted as M RF , e.g., M RF = 0 or M RF = 2.For the single-antenna benchmark algorithms including the Oracle-BSASP [17], OAMP-MMV-SSL [19] and the OAMP-MMV-ASL [19], we consider the singleantenna (e.g., the first antenna) reception of any one user cluster, without the interference from the other two clusters.For the proposed algorithms, γn = 3 is selected as the sparsity decision threshold for each cluster n.We also consider ESNR = 13 dB for the SBF, the SNR of 2dB and the number of antennas M = 5, unless specified otherwise.For clarity, Table III details the parameter presentation for different figures.
Fig. 6 shows the DERs regarding the input SNRs for different MUD methods.The proposed J-ABF-SP algorithm performs better in user detection than the Oracle-BSASP algorithm even though the latter knows the user sparsity level a priori.This is because both the SBF and ABF used in the J-ABF-SP can suppress the IpNC contained in the received signal, leading to a higher receiver signal-to-interference-plusnoise ratio (SINR) than that of the Oracle-BSASP.With increasing input SNR for each cluster, the power of corresponding inter-cluster interferences rises uniformly, leading to a SINR (signal-to-interference-plus-noise ratio) floor that induces the DER (detection error rate) floor at a certain input SNR level, e.g., 1 dB.From another perspective, the J-ABF-SP algorithm can achieve extremely low DERs even at low SNRs, e.g., -60 dB DER under the 1 dB SNR.In this regard, it does not matter that the J-ABF-SP presents a slightly higher DER than that of the OAMP-MMV algorithms as the SNR increases to a certain value, e.g., 4 dB.Additionally, the results show that the J-ABF-SP always outperforms the DS-AMP algorithms over the given SNR range.
Figure 7 depicts the SERs across various input SNRs.Notably, the proposed J-ABF-SP algorithm showcases a   remarkable SER gain of over 8 dB when compared to the OAMPMMV algorithms and exhibits notably superior performance than other benchmark algorithms.Furthermore, the J-ABFSP-IC algorithm outperforms the J-ABF-SP algorithm.This improvement can be attributed to IC enhancing the SINR at the receiver.
Figs. 8 and 9 illustrate the DERs and the SERs with respect to the number of slots.The proposed algorithms achieve significantly low DERs and SERs compared to the benchmark algorithms, even with only one slot in a frame.Moreover, the SER performance superiority by the proposed algorithms tends to enhance with the number of slots and eventually converges.In particular, compared with the OAMP-MMV algorithms, the J-ABF-SP algorithm shows slightly inferior DER performance when the number of slots increases to 9, but demonstrates Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.remarkable superiority in SER performance.This indicates that the SER for the DAUs by the proposed algorithms is extremely lower than that of the OAMP-MMV algorithms.
We now study the impact of the number of antennas M on the performance of the proposed algorithms.Fig. 10 illustrates the DER and SER of each cluster with respect to the number of antennas, respectively.Note that c1 is the abbreviation of cluster 1, similar for c2 and c3, and ave.denotes the average value over three clusters.The DERs of all clusters gradually decrease with the number of antennas.Specifically, the DER of cluster 2 is initially higher than those of the other two clusters with a small number of antennas, but approaches a similar value with the increased number of antennas.This is because cluster 2 is located spatially between the other two clusters and thus suffers from larger interferences, but this impact is mitigated with the enhanced beamforming gain and spatial resolution provided by the increased number of antennas.Similarly, more antennas result in better SERs and smaller SER differences among different clusters.In addition, J-ABF-SP-IC outperforms J-ABF-SP in SER performance.Specifically, the SER performance is enhanced by more than 20 dB by increasing the number of antennas from 4 to 6, indicating a promising prospect for the integration of SDMA and CS for uplink grant-free communication.
We now study the importance of the dynamic update of beamforming weights for the MUD and DR performance.The zero-forcing beamforming (ZFBF) is used as a benchmark [46].We compare the ZFBF-ASP, SBF-ASP, ZFBF-ASP-IC, and SBF-ASP-IC methods, which are obtained by selecting initial beamforming (ZFBF or SBF) and ignoring the beamforming and measurement updates in each iteration in both J-ABF-SP and J-ABF-SP-IC.Specifically, for the SBF-ASP and SBF-ASP-IC, two ESNRs are considered, i.e., 13 dB or 20 dB.We also consider unbalanced SNRs in distinct clusters, e.g., SNR={2, 5, 3} in dB for the corresponding clusters n = {1, 2, 3}, but with the same ESNRs of 13 dB.
Figures 11 and 12 show the DER and SER performance of individual clusters, respectively.We can find that the SBFASP achieves a similar DER or SER with the J-ABF-SP at ESNR=13 dB, but degraded performance at ESNR=20 dB, while the J-ABF-SP is insensitive to the ESNRs.This indicates the importance of dynamic beamforming updates when the SNR is unknown a priori.In addition, when compared to the  scenario with the same SNR (5 dB) in all clusters (red line), cluster 2 has a lower DER (SER) while the other two clusters have higher DERs (SERs) in the scenario with different SNRs in different clusters (blue line).This is because the inter-cluster interferences for cluster 2 are weakened since the other two clusters have lower SNRs, while for clusters 1 and 3, their lower SNRs result in their higher DERs (SERs).We also observe from Fig. 12 that the enhanced SER performance can be obtained for all the methods when using IC.
The non-orthogonal ZC spreading sequences are considered for simulations, as in Appendix B. We now study the impact of the number of subcarriers K (length of ZC sequences) on the performance of the proposed algorithms.It is evident from Fig. 13 that the performance improves gradually with an increasing number of subcarriers, irrespective of their primality.Furthermore, one can also observe the performance enhancement with decreased user sparsity level s o .
We now assess the performance of the proposed algorithms in scenarios where clusters have varying numbers of active users.Without loss of generality, we consider two cases for unbalanced clusters.One is that s o = {5, 4, 6} active users in clusters n = {1, 2, 3}, respectively.The other is that s o = {6, 3, 6} active users in clusters n = {1, 2, 3}, respectively.For comparison, we also examine the case with s o = 5 active users in each cluster n ∈ {1, 2, 3}.Figs. 14 and 15 illustrate the DER and SER concerning the input SNR and the number of slots, respectively.We observe that the similar performance can be obtained for both unbalanced and balanced user clusters.We now explore the effects of the (channel state information) CSI errors on the MUD and DR performance.Assume there is a random error on the small-scale random fading η n,q,k , termed as, ηn,q,k ∼ U(η n,q,k − δ η n,q,k , η n,q,k + δ η n,q,k ) with the half disturbation range δ η n,q,k .Similarly, assume a random error on each element of the steering vector a n,q,m in channel measurement, termed as, ân,q,m ∼ U(a n,q,m − δ a n,q,m , a n,q,m +δ a n,q,m ) with the half disturbation range δ a n,q,m .Without loss of generality, we herein assume both the smallscale fading error and the steering vector elements satisfy the uniform distribution with U(a, b) denoting the uniform distribution on range [a, b].We consider the error disturbation magnitudes δ η n,q,k = η n,q,k p% and δ a n,q,m = a n,q,m p% with the percentage p given by 5 or 10.The simulation results are demonstrated in Figs.16 and 17.The legend 'DER, 5, 5' denotes the DER performance with 5% disturbation for the random fading and 5% disturbation for the steering vector elements, respectively.For the proposed J-ABF-SP algorithm, the negligible DER performance degradation and the comparably large SER performance loss can be observed due to the CSI error.In addition, the SER performance deterioration would be incurred by the CSI error for the interference cancellation-based scheme (J-ABF-SP-IC) because the involved interference reconstruction relies on the CSI estimation.Overall, the performance degradation lies in an acceptable level, even with relatively large CSI errors.

VII. CONCLUSION AND FUTURE WORK
In this paper, we presented a general framework for the integration of the SDMA with the CS-based grant-free NOMA for the mMTC.Two beamforming schemes were proposed for the realisation of SDMA.In particular, we developed a joint adaptive beamforming and subspace pursuit algorithm for the user detection and data recovery, with a novel user sparsity decision method without knowing the noise level.We also devised an interference cancellation scheme to further enhance the data recovery performance.
In the future, we will study the amalgamation of the SDMA and CS for the dynamic user sparsity-based grantfree NOMA.To reduce the complexity, we will also study the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
computationally efficient CS method for the user detection and data recovery.

APPENDIX A CHANNEL CORRELATION COEFFICIENT
The channel correlation between any two users is defined by the Pearson correlation coefficient, i.e., ρ q,p ≜ |(g q − ḡq ) H (g p − ḡp )| ∥g q − ḡq ∥ 2 ∥g p − ḡp ∥ 2 . ( where ḡq and ḡp are the average values of all the elements in vector g q and g p , respectively.In our work, the channel gain vector is defined as the product of the channel fading and the steering vector, i.e., g n,q,k = f n,q,k a n,q .In fact, we can approximately substitute the average values in (49) with zeros since the channel fading factor f n,q,k = ρ n,q η n,q,k follows the complex Gaussian distribution with zero mean.Therefore, the channel correlation coefficient can be given by, It can be viewed as the correlation between the corresponding steering vectors.Note that the channel fading factors in (50) have been removed because they appear in both denominator and numerator.With the steering vector defined in (1), the channel correlation coefficient (50) can be further written as, e −jπm(ϕ l,p −ϕn,q) |/M, = sin πM (ϕ l,p − ϕ n,q ) 2 M sin π(ϕ l,p − ϕ n,q ) 2 where ϕ n,q ≜ 2d sin(θ n,q ) λ .Note that (52) follows from the definition of the Fej'er kernel which converges to zero quickly when its input parameter ϕ l,p −ϕ n,q increases.This means that the correlation of two users' channel vectors can be measured by the normalised direction, such as ϕ l,p and ϕ n,q .Therefore, the user clustering can be performed based on the a priori estimated channel information by the K-means method.

APPENDIX B ZADOFF-CHU SPREADING SEQUENCES
A ZC sequence of length K, consisting of K complex numbers, can be denoted as z q = [z q,0 , z q,1 , . . ., z q,K−1 ] T .Each element of the β-root NC sequence is given by [43]and [47], s n,q,k = exp(−jπβk(k + 1 + 2q)/K), K is odd, exp(−jπβk(k + 2q)/K), K is even, where K is the length of the sequence, k = 0, 1, • • • , K − 1 is the index of the element in the sequence, root index β, coprime to K, satisfies 0 < β < K, and the shift index q can be any integer.In our work, we formulate the ZC spreading signature for user u n,q by s n,q,k = s q,k , where n = 1, 2, • • • , N is the user cluster index and q = 1, 2, • • • , Q denotes the user index.We have the spreading signature vector s n,q = [s n,q,1 , s n,q,2 , • • • , s n,q,K ] T .For simplicity, we have expressed the spreading signature vector of each user by its index q, while in fact, Q spreading vectors can be randomly assigned to the Q users according to the permutation of the user indexes.

APPENDIX C THE MOORE-PENROSE INVERSE OF A BLOCK MATRIX WITH A FULL COLUMN RANK
We now present a method for solving the M-P inverse of a block matrix with a full column rank.We first consider a complex-valued block matrix with a full column rank, i.e., C = A B where both A ∈ C M ×n and B ∈ C M ×q are with full column ranks.Define the M-P inverse of C as , where F ∈ C n×M and W ∈ C M ×q are matrices to be determined by using the known A and B. According to C † C = I, we have We define F = GW H with any matrix G ∈ C n×q .In this case, (56) leads to (54).Then, according to (55) and (57), we have G = A † B and thus F = A † BW H . Subsequently, we need to solve W from (56) and (57).From (56), we can find a matrix U = (D + B) − AA † (D + B) ∈ C M ×q satisfying U H A = 0 where D is any matrix with matching dimensions and we have used (AA † ) H = AA † and AA † A = A. We define W = U J with unknown J .According to (57), we have, We can easily find D = 0 and J = (U H U ) −1 are the solutions.Thus, we have W = U (U H U ) −1 with U = B − AA † B.

APPENDIX D THE MONOTONOUS DECREASING OF THE RESIDUAL ENERGY REGARDING THE SPARSITY LEVEL
We now verify the monotonous decreasing of the residual energy with the sparsity level increasing up to the real one.With the stopping condition for beamforming update reached, the residual energy for the sparsity s can be derived in light of ( 15), ( 25 = 0. Thus, the IpNC estimate în,k,t in (60) contains the residual signal component of the DAUs, the signal component of the UDAUs and the real IpNC.The suppression on the signal component of UDAUs in în,k,t is much smaller than that on the IpNC due to the beam constraint bH n ān = 1.Thus, the residual energy ε s in (59) with s < s o mainly consists of the signal component of UDAUs followed by the suppressed IpNC.
As s increases, the number of the UDAUs decreases.Hence, the signal component of the UDAUs in în,k,t is weakened.Meantime, the suppression for i n,k,t by beamforming can be enhanced.Therefore, the residual energy ε s will gradually decrease with the given sparsity s increasing up to s o .

Manuscript received 20
April 2023; revised 3 September 2023 and 20 January 2024; accepted 28 February 2024.Date of publication 18 March 2024; date of current version 12 September 2024.This work was supported in part by the EU Horizon 2020 Project 6G BRAINS under Grant 101017226.The associate editor coordinating the review of this article and approving it for publication was K. Cohen.(Corresponding author: Yue Zhang.)

Fig. 2 .
Fig. 2. Frame structure of the second grant-free access type.

Fig. 3 .
Fig. 3.System architecture of the integration of SDMA and grant-free NOMA.

Fig. 4 .
Fig. 4. A general framework of the integration of SDMA and CS-based grant-free NOMA.
With the measurement ηn and the parameter matrix Dn of cluster n, we can use the ASP algorithm in Algorithm 1 to estimate the user support set and the transmitted signals.The finding function F(V, ζ) in Algorithm 1 selects the indices of the first ζ largest elements of an ordered set/vector V.The main steps in Algorithm 1 are detailed as follows:Step 3: To estimate the support set Λ by adding the current selected s users with larger residual energy into the previously estimated support set Γn,ι .

Algorithm 2 1 : 4 :
Joint Adaptive Beamforming and Subspace Pursuit Algorithm: User Detection Input: The received signals Y , equivalent channel matrices Gn , number of time slots T , upper bound for user sparsity level s, SBF weight b SBF n in (13), diagonal loading factor ϵ, stopping factor ϑ 1 , average steering vector ān , and the maximum iteration L 1 for user detection.Output: Reconstructed sparse signal X n,1 , support set Γn and residual energy e n for each n ∈ N for each cluster n ∈ N do 2:(Support initialisation) Null initial support set Γ 0 = ∅.3: (Measurement initialisation) Compute η n and D n via (22) and (23) by using b SBF n to replace bn .for sparsity s = 1 to s do 5: (Measurement initialisation) The iterative index z = 1, ηn = η n and Dn = D n .6: (Residual and support initialisation) rz = ηn and Γz = Γ s−1 .

Fig. 8 .
Fig. 8.The DER regarding the number of slots.

Fig. 9 .
Fig. 9.The SER regarding the number of slots.

Fig. 10 .
Fig. 10.The DER and SER regarding the number of antennas.

Fig. 13 .
Fig. 13.The DER and SER regarding the number of subcarriers.

Fig. 14 .
Fig. 14.DER and SER regarding the input SNR with unbalanced clusters.

Fig. 15 .
Fig. 15.DER and SER regarding the frame length with unbalanced clusters.

Fig. 16 .Fig. 17
Fig. 16.The performance robustness to the CSI error under different input SNRs.
Fig. 17.The performance robustness to the CSI error under different frame length.

TABLE I A
SUMMARY OF THE KEY VARIABLES IN THIS PAPER while the signal estimates for FDIUs are constituted by the IpNC weighted by W (38)Since D n [Γ s \ Γn , T ]W H has unit non-zero eigenvalues as W H D n [Γ s \ Γn , T ] = I,(38)is subject to relatively minor interference from the IpNC and may yield more accurate signal estimates than those from (26)rithm 3 Interference Cancellation Enhanced Signal Recovery Input: The received signals Y , equivalent channel matrices Gn , number of the consecutive time slots T , diagonal loading factor ϵ, average steering vector ān , maximum number of iterations L 2 and L 3 , active user set Γn , initial error e n and initial signal estiamtion X n,1 .Weight initialisation) For each cluster n, Xn = X n,1 , în,k,t = y k,t − Gn,k xn,t , compute bn by(26).

TABLE III THE
PARAMETERS FOR DIFFERENT SIMULATIONS Fig.6.The DER with respect to SNR.