Double Deep Learning for Joint Phase-Shift and Beamforming Based on Cascaded Channels in RIS-Assisted MIMO Networks

This letter investigates machine learning approach for the joint optimal phase shift and beamforming in the reconfigurable intelligent surface (RIS) assisted multiple-input and multiple-output (MIMO) network, consisting of one source node, one RIS panel and one destination node. If individual source-to-RIS and RIS-to-destination channels are known, the joint optimization is similar to that in the traditional MIMO network, which has been well studied. However, the channel estimation for the individual channels is complicated and often inaccurate. On the other hand, while estimating the cascaded channels for the source-RIS-destination links are more accessible, the corresponding joint optimization is complicated. In this letter, we propose a novel double deep learning network model which is superior to the conventional reinforcement learning in the RIS joint optimization. Numerical simulations are given to verify the proposed algorithm.


I. PROBLEM STATEMENT
T HE RECONFIGURABLE intelligent surface (RIS) has attracted many attentions [1]. It has various applications such as the unmanned aerial vehicles [2], [3], energy efficiency [4], and secure communications [5]. A typical RISassisted MIMO networks is shown in Fig. 1. When the source node S has multiple antennas, the phase shift at the RIS and the beamforming at the source must be jointly optimized, requiring the channel state information (CSI) of all links. There are two types of CSI: the individual channels for the S → RIS and RIS → D links, and the cascaded channels for the end-to-end S → RIS → D links, respectively.
Individual S → RIS and RIS → D channel estimation is difficult to obtain (e.g., [6]). In [7], the individual S → RIS and RIS → D channels can be estimated up to some ambiguity. In [8], the Tensor modeling approach was used for individual channel estimation in the MIMO RIS network. In [9], the iterative approach was applied to estimate the individual channels. On the other hand, estimating the cascaded S → RIS → D channels is more accessible. Manuscript  In [10] and [11], the cascaded channel estimation for the SISO (i.e., single antenna applied at both the source and destination nodes) double RIS panels and MISO RIS-assisted network are proposed, respectively. In [12], the deep learning was used for the cascaded channel estimation in the MISO OFDM. Estimating cascaded channels often leads to complicated joint phase shift and beamforming because it is difficult to explicitly express the end-to-end channel capacity in terms of the cascaded channels, particularly when the multiple antennas are applied at both the source and destination nodes. On the other hand, while estimating the individual channels is difficult, the corresponding joint phase shift and beamforming is more conventional. Joint phase shift and beamforming based on both individual and cascaded channels have been proposed. For example, in [8] and [9], the joint beamforming and phase adjustment were based on individual channel estimation for the RIS-assisted MIMO network. In [13], the phase adjustment was optimized based on the cascaded channel in the SISO OFDM RIS network, in which no beamforming at the source node is necessary. Following that, in [14], the authors further considered the RIS-assisted MIMO beamforming based on the cascaded channel, where the beamforming and phase adjustment are carried out iteratively. The phases of the RIS are also optimized iteratively: at one time, only the phase of one RIS element is optimized while all other phases are fixed.
The aforementioned approaches based on mathematic optimization involve high online computation. Particularly for the MIMO RIS network, iterative optimization for the beamforming and phase shift is necessary. This not only imposes even heavier burden on online computation but also has convergence issues, making it hard to implement. Moreover, the optimizations are usually based on simplified models, for example, by assuming constant amplitude gains for different phase shifts. When this is not the case [15], [16], the optimization becomes more complicated.
Recent progress in machine learning provides attractive alternatives for the RIS phase shift (e.g., [17], [18], [19], [20], [21]). Most of these approaches apply the reinforcement learning (RL) (or related algorithms) which is however not ideal in the RIS system. This is because that the RL shall be applied for correlated data samples that the state transits from one to another depending on 'actions'. This is not the case in the RIS phase shift problem (that we will explain in Section III). Particularly for the RIS with large number of elements, it would be hard for the RL to converge. This will be verified in the simulation. In this letter, we propose a novel deep neural network model consisting of two deep neural networks. Unlike the RL, the proposed double deep neural network works well for large number of RIS elements.

II. SYSTEM MODEL
The system model of the RIS-assisted MIMO networks is shown in Fig. 1, where there are one source node S with N antennas, one destination node D with M antennas, and the RIS with K reflecting elements. We assume there is no direct link between S and D due to severe blocking or deep fading, the channel efficient between the nth antenna at S and the k-th RIS element follows the Rician fading as where K k ,n is the Rician factor for the corresponding link, h where θ is the angle of arrival at the RIS, and β 0 is the path loss at the reference distance of one meter. For the NLOS, we have h k ,n , whereh k ,n models the complex-Gaussian small-scale fading with zero mean and unit variance. The channel between the kth RIS element and the mth antenna at D is denoted as g m,k which is similarly modeled. The received signal at the destination is given by where d ∈ C M ×1 , H ∈ C K ×N which is the channel matrix between S and the RIS, G ∈ C M ×K which is the channel matrix between the RIS and D, η ∈ C M ×1 is the additive white Gaussian noise (AWGN) vector with variance σ 2 , and Θ is the phase shift matrix at the RIS which is given by where a k and θ k are the amplitude and phase shift at the kth reflecting element, respectively. While a k is assumed to be constant in many existing approaches, it can also be nonconstant [15], [16]. In this letter, we assume discrete phase shift with R-bits quantization that there are 2 R possible phase shifts for every RIS element.

A. Cascaded Channel Estimation
We assume the destination D estimates the channels and sends the estimation to the source S via backhaul links. In some applications, S directly estimates the D → S channels, and obtain the S → D coefficients through channel reciprocity. Because multiple antennas are assumed at both S and D, the channel estimation described in this section can be readily applied in both cases.
Defining the cascaded channel coefficient as f n,k ,m = h n,k g k ,m , we have the cascaded channel vector between nth antenna at S and mth antenna at D as The cascaded channel vector for the mth receiving antenna at D is then defined as Then (3) is expressed as where Regarding (7) as an NK-by-M MIMO model, and assuming there are P snaps of pilots for the channel estimation, we have The pilots for the cascaded channel estimation involve both transmit signal x n and the phase shift φ k which need to be joined designed such that P ≥ NM and X φ X H φ = P s /N · I, where P s is the power constraint at S.

B. Phase Shift and Beamforming Based on Cascaded Channels
In order to jointly obtain the beamforming at the source and phase shift at the RIS, (3) can be rewritten as , 1 n is the K-by-N matrix with the elements of the nth column being ones and all other elements being zeros.
Letting Q = min (M , N ), the beamforming at S can be obtained as where W = [w 1 , . . . , w Q ], w q is the q-th beamforming vector, s = [s 1 , . . . , s Q ] , s q is the q-th transmit data, and we assume E[s 2 q ] = 1 for all q. Substituting (11) into (10) gives Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. The joint beamforming and phase shift is to maximize the channel capacity between S and D as max W,Φv where A = FΦ1 F ∈ C M ×N The singular value decomposition (SVD) of A is given by A = U˜V H , where U ∈ C M ×M and V ∈ C N ×N which are the left and right singular vectors, respectively, and˜∈ C M ×N is the singular value matrix.
For given Φ v , the optimum beamforming vectors are where v q be the qth right singular vector in V, P q is power allocated for the qth transmit data which is obtained through water-filling to satisfy the transmit power constraint Q q=1 P q ≤ P s . Substituting (14) into (15) gives where P = diag [P 1 , . . . , P Q , 0, . . . , 0 Solving (15) is complicated because it is hard to explicitly express P and Λ in terms of the phase shifts.

III. DOUBLE DEEP LEARNING FOR THE RIS PHASE SHIFT
There are two machine learning methods widely used in communications: the supervised learning and reinforcement learning, respectively. None of them, unfortunately, is ideal in this case. If the supervised learning is applied, because the labeled samples need to be generated from the iterative optimization (e.g., [14]), the learning performance is upper bounded by the iterative optimization which is not only complicated but also not accurate (if it converges). On the other hand, for the reinforcement learning, while it does not rely on labeled data, it shall be applied for correlated data samples. In this case, however, the data samples for the reinforcement learning would be the channel coefficients which are independently fading. A new learning model, which does not relies on mathematical optimization nor is same as the reinforcement learning, is proposed below.

A. Double Deep Learning Structure
The new learning model consists of two deep neural networks, N 1 and N 2 as are shown in Fig. 2(a) and 2(b), respectively. Network N 1 is the main network. It has M × N × K inputs, corresponding to every entry of the cascaded channel matrix F defined in (7). The output of N 1 are K group of 2 R dimensional categorical vector, where each group corresponds to the R-bit discrete phase shift at one RIS element.
It is hard to apply the conventional supervised learning to train N 1 , because the required labeled data would be the optimum phase shifts which are difficult to obtain, as we have shown earlier. On the other hand, for reasonable large RIS element number K and phase quantization bit number R, it can easily be computationally prohibitive to apply the bruteforce search for the optimal phase shifts because there would be 2 KR possible phase shifts for every channel realization.
To solve this problem, the complement neural network N 2 is introduced. The inputs to N 2 are F and Φ v , and the output is the corresponding capacity C F ,Φ .

B. Train the Double Deep Neural Networks
The complement network N 2 is firstly trained as a conventional supervised deep neural network. Suppose there are L pairs of labeled data, the lth pair are generated as following: • The lth labeled inputs: Randomly generate one realization of S → RIS and RIS → D channels, from which obtain the cascaded channel F (l) . Then randomly generate the phase shift Φ -Obtain the corresponding labeled output (i.e., the capacity) as • The lth pair of labeled data is then obtained as Network N 2 is trained to minimize the loss function as After N 2 is trained, it is cascaded with N 1 as is shown in Fig. 2(c). The cascaded network is then trained to maximize the system capacity, or equivalently to minimize the negative of the cascaded network outputs as with the coefficients of N 2 being fixed. The lth labeled inputs to the cascaded networks are F (l) which are generated similarly to those for N 2 . No labeled outputs are necessary as they are not included in the loss function (19). The coefficients of N 1 are updated by applying the backpropagation gradient descent based on minimizing (19). Because the coefficients of N 2 are fixed, the coefficients of N 1 are constrained from updating to trivial solutions such as those leading to infinite large outputs. The sigmoid activation function is applied at the N 1 output layer to normalize the values to probability distributions, from which the corresponding categorical vectors are obtained as the final output of N 1 .
In summary, for the complement network N 2 , the labeled data, which are the capacities for given channels and phase shifts, can be easily obtained as in (16). On the other hand, the main network N 1 is not trained directly but through the cascaded network combining N 1 and N 2 . The training of the cascaded network is based on minimizing the negative of the network outputs, which does not involve any labeled outputs. Therefore, for both N 1 and N 2 training, neither the mathematical optimization nor brute-force search is necessary.
After the training, N 1 outputs the phase shifts that maximize the network capacity for the corresponding cascaded channel F. The optimum beamforming at the source can be obtained as in (14).

IV. NUMERICAL SIMULATIONS
In all simulations below, all channels are Rician fading with Rician factor as K k ,n = 10 dB, and three bits are used to quantize the phase shifts at every RIS element.  There are 2 × 10 5 training data generated to train the neural networks N 1 and N 2 , following the procedures in Section III. After the training, a set of 1000 independent channel realizations are used to obtain the average capacities.
For comparison, the results for both the proposed double deep learning (DDL) and the random phase shift at the RIS are shown in Fig. 3. It is clear that the proposed DDL achieves higher capacities than the random phase shift in all cases. Particularly for K = 4, we show the result for the optimal phase shift based on brute-force search. It is shown that the proposed DDL achieves performance close to that through brute-force search. Note that the brute-force search results for K = 32 and 64 are difficult (if not impossible) to obtain because there would be 2 3×32 = 2 96 or 2 3×64 = 2 192 possible phase shifts for every set of channel realizations to go through, respectively. Fig. 4 shows the capacities for the MIMO case that both source and destination nodes apply two antennas, where all other simulation parameters are similarly set. It is shown that the proposed DDL approach also achieves higher capacity than the random phase shift. Particularly we also show the results for the deep-Q-network (DQN), a widely used reinforcement learning. It is shown that for K = 4, the DQN and the proposed DDL achieve similar performance. However, for K = 32 and K = 64, the DQN performs no differently as the random phase shift approach. This matches our previous statements that the reinforcement learning is not suitable in this case. Particularly when the number of RIS elements is large, the action space would be too huge for the DQN to converge. On the other hand, the proposed DDL still performs well.

V. CONCLUSION
This letter proposed a novel double deep learning network for joint beamforming and phase shift based on cascaded channels in the RIS-assisted MIMO network. Numerical simulation has been provided to verify the proposed approach. The proposed approach achieves good performance without relying on complicated optimization, nor being similar to the reinforcement learning, making it an attractive approach not only in the RIS phase shift network but also other systems with similar scenarios.