Kalman Filter Based Channel Tracking for RIS-Assisted Multi-User Networks

In this paper, we investigate channel estimation in a reconfigurable intelligent surface (RIS) assisted multi-user network while taking the mobility of users into consideration. Based on a time-varying channel model, we utilize Kalman filter (KF) that is able to exploit temporal correlation to track cascaded channel. In order to maintain a relatively low pilot overhead, we present a multiple sub-phases based transmission protocol where the number of pilot sequences in each sub-phase is less than the number of users, i.e., pilot contamination exists. For the sake of practicality, we directly utilize discrete Fourier transform (DFT) matrix as phase shift matrix. We analyze normalized mean square error and provide some asymptotic results. A more practical scenario with hardware impairments (HWI) at the transceiver and the RIS is also considered. Since HWI is also part of the measurement matrix and is unknown to the base station, we propose a joint estimation of the channel and HWI. Under this joint estimation framework, the underlying state space model becomes nonlinear. We develop an extended KF (EKF) algorithm to tackle the nonlinearity through which the model can be linearized. Numerical results show that the proposed KF and EKF algorithms outperform benchmark schemes under various scenarios.


I. INTRODUCTION
T HE rapid development of the fifth generation (5G) wire- less network was underpinned by a variety of new techniques, including massive multiple-input multiple-output (MIMO) [1] and millimeter wave (mmWave) communication [2], with the aim of providing ultra-reliable low latency and high data rate communications to massive number of users [3].However, due to high frequency bands, the transmission range of mmWave is limited due to the short wavelength and high path loss [4].In addition, in massive MIMO systems, the pilot contamination problem arises due to the limited orthogonal pilot sequences [5].Moreover, manufacturing small-size mmWave equipments and large-scale antenna arrays in massive MIMO systems incurs high costs.
Recent works have reported that reconfigurable intelligent surface (RIS) is able to reconfigure wireless propagation environment by careful selection of the phase shifts of a large number of passive reflecting elements with low cost and power consumption [6], [7].Specifically, RIS is controlled by a smart controller which is able to tune the phase shifts of the incident waves before reflecting them out.Since RIS can strengthen the signal power of intended users and suppress the interference from unintended users, it achieves the reconfiguration of wireless channels.Moreover, different from conventional fullduplex (FD) relays, RIS can achieve passive beamforming in FD mode without processing delay, self-interference, and noise amplification [8].With all the advantages mentioned above, RIS has attracted significant interest as a means of enhancing network capacity with lower implementation.
Achieving reflecting beamforming gain of RIS highly relies on obtaining accurate channel state information (CSI) [8].Otherwise, the estimated channel would be outdated for the passive beamformer design.Generally, reflecting elements are passive and there is no active radio-frequency (RF) transceiver at the RIS, which makes channel estimation very challenging.Furthermore, with a large number of passive elements, the pilot overhead can be extremely high.Recently, numerous works have been devoted to the study of channel estimation in RIS systems [9], [10], [11], [12].Some prior works propose to equip the passive RIS with active antennas or RF chains to enable active channel estimation [9], but the potential advantages of RIS would be impaired.Since users can send pilot sequences, the CSI can also be obtained at the base station (BS) by analyzing the received pilot signal that is reflected by RIS.In this way, the cascaded BS-RIS-user channel can be obtained with the fully passive RIS, instead of estimating the BS-RIS and RIS-user channels separately.A simple ON/OFF method is developed to sequentially estimate the cascaded channel related to each element by switching on only one element of RIS and turning off all the other elements at a time [10].However, the full RIS aperture gain is not exploited to improve estimation accuracy.In order to fully exploit the RIS aperture gain, the authors of [11] and [12] designed the phase shift matrix based on the discrete Fourier transform (DFT) matrix.With the DFT-based reflection pattern, minimum estimation error can be achieved.On the other hand, there are some other works aiming to reduce pilot overhead [13], [14].
Nonetheless, none of the works above have considered the mobility of users.When the users are mobile, the channel between RIS and users is time-varying.It is more challenging to accurately track the cascaded channel.Kalman filter (KF) is a useful tool to track unknown states based on a state space model.In a wireless communication system, the prior information of channels and observed measurements can be exploited by KF to improve estimation accuracy [15].Several recent works have investigated channel estimation in RIS networks by utilizing KF [16], [17], [18], [19].However, all the works in [16], [17], [18], and [19] consider estimation in the case of a single user.When considering multiple users and pilot contamination, the estimation method derived from the single-user case cannot be directly applied.Moreover, the authors of [16] and [17] assume Rayleigh fading for the BS-RIS and RIS-user channels, however, Rician fading is more relevant due to the ability of RIS to provide line-of-sight (LoS) links.
In addition, the aforementioned works are under the assumption of perfect hardware.While in practical systems, the inherent hardware impairments, e.g., phase noise, distortion noise, and phase drift at various kinds of devices, are not negligible, and they could significantly degrade the system performance [20], [21].The authors of [22], [23], and [24] have studied the channel estimation in RIS networks, taking hardware impairments into consideration.In [22], [23], and [24], the impairments at transceiver and RIS are modeled as distortion noise and phase error, respectively.The authors of [22] adopted the linear minimum mean square error (LMMSE) method to estimate the overall channel which is the sum of the direct channel and the aggregate channel embedded with phase shift.Similarly, the authors of [23] proposed a deep-learning-based estimator to estimate the overall channel.However, the overall channel that is integrated with the fixed phase shift matrix cannot be utilized when the phase shift is changed.An ON/OFF protocol is adopted in [24] and the LMMSE estimation method is utilized to estimate the direct and the cascaded channels.
For non-RIS systems with hardware impairments (HWI), some works have investigated joint estimation of channel and HWI to improve estimation accuracy.For example, an earlier work proposed an approximate maximum-likelihood estimator for frequency offset and channel parameters [25], [26].A joint estimation of frequency offset and channel is proposed in [27], where the frequency offset and channel are stacked into one vector to be estimated.A KF-based algorithm is proposed to jointly estimate carrier frequency offset (CFO) and time-varying channel in a MIMO system without RIS [15].Since the CFO and channel are jointly estimated, the measurement equation is nonlinear, and thus the extended KF (EKF) that deals with the nonlinear case is adopted.The first-order Taylor approximation needs to be applied to linearize the nonlinear function and then the standard KF can be adopted.However, the joint estimation of channel and HWI has never been studied in RIS systems.
In order to fill the gaps mentioned above, in this paper, we investigate channel estimation in an RIS-assisted multi-user network where pilot contamination and HWI are taken into account.Specifically, considering the mobility of users, we use KF to track the time-varying cascaded channel.We consider a pilot contamination situation where the number of orthogonal pilot sequences is less than the number of users.Furthermore, we assume Rician fading for BS-RIS and RIS-user channels.We adopt the DFT matrix as the phase shift and provide some asymptotic results of normalized mean square error (NMSE).We then extend the work to the case where various kinds of HWI at transceiver and RIS exist.We adopt EKF to jointly estimate the HWI and the cascaded channel.The main contributions of this paper are summarized as follows.
• In the RIS system with mobile users, it is extremely challenging to acquire accurate channel state information of the cascaded channel.We propose a KF-based algorithm to track the time-varying cascaded channel for its simplicity and efficiency.To cater for the practical situation, we assume Rician fading for the BS-RIS and RIS-user channels in the considered RIS network.
To reduce the pilot overhead, we present a transmission protocol through which less frequent pilots are used to complete the estimation of cascaded channels.• In order to improve the estimation accuracy, the phase shift should be well designed.However, the mean square error (MSE) minimization problem is highly nonconvex due to the unit-modulus constraint.To avoid high computational complexity, we adopt the DFT matrix as the phase shift matrix.We analyze the NMSE and provide some asymptotic results under the DFT phase shift.We also prove that the optimal phase shift under Rayleigh fading and orthogonal pilot allocation can be reduced to any unitary matrix.• We extend the channel estimation to the case of HWI.
To the best of our knowledge, this is the first time that additive distortion noise and phase noise as well as multiplicative phase drift are taken into consideration simultaneously in an RIS system.The phase drift is part of the measurement matrix which is unknown to the BS, making the tracking process difficult.Fortunately, the phase drift can be modeled as a discrete-time Wiener process.Therefore, we propose to consider the phase drift as part of the state vector and jointly estimate the phase drift and cascaded channel.The measurement equation is nonlinear due to the HWI.We propose EKF to solve this problem by applying the first-order Taylor approximation to linearize it and then applying the standard KF.• Finally, we validate the effectiveness of the proposed KF and EKF algorithms.The NMSE achieved by the proposed algorithms is superior to the benchmark schemes under various scenarios.Numerical results demonstrate that when there is no HWI, the NMSE performance can be improved by creating a stronger LoS link, improving the transmit power of pilot sequences, and increasing the number of RIS elements.When the HWI at transceiver and RIS is considered, the estimation accuracy is degraded particularly as the number of BS antennas and RIS elements increases.The rest of this paper is organized as follows.In Section II, we present a system model.In Section III, a KF-based algorithm is proposed to track the cascaded channel.The DFT  matrix is adopted as the phase shift matrix during the estimation process and the NMSE analysis is presented.We extend the channel estimation to the case of various kinds of HWI and present an EKF-based algorithm in Section IV.Numerical results are shown in Section V followed by conclusions in Section VI.
Notations: C x×y denotes the space of x × y complexvalued matrices.For a complex-valued vector x, diag(x) denotes a diagonal matrix with each diagonal element being the corresponding element in x.For a matrix M, [M] i,j denotes the (i, j)th element and tr{M} is the trace operator.The superscripts (•) T and (•) H stand for transpose and conjugate transpose, respectively.The distribution of a circularly symmetric complex Gaussian (CSCG) random variable with mean x and covariance σ is denoted by CN (x, σ), and ∼ stands for distributed as.U(•) stands for uniform distribution.'s.t.' denotes 'subject to'.⊙ denotes Hadamard product.⊗ denotes the Kronecker product.ℜ{•} stands for taking the real part.E{•} and V{•} denote the expectation and variance of a matrix, respectively.

II. SYSTEM MODEL
In this paper, we investigate an RIS-assisted multi-user network, where K single-antenna users communicate with a BS with the aid of an RIS, as shown in Fig. 1.The BS is equipped with M antennas and the RIS consists of S passive reflecting elements.The RIS is connected to a smart controller that dynamically configures the phase shift of RIS by the BS.We assume that the direct link between the BS and users is blocked by obstacles, and the users move with low to medium speed.

A. Channel Model
We denote H n 1 ∈ C M ×S and h n 2,k ∈ C S×1 as the channel from the RIS to the BS as well as the channel from the kth user to the RIS at the nth time slot, respectively.We consider a time-varying channel model that remains constant within its coherence time.In Fig. 2, T H and T k h denote the coherence time of H n 1 and h n 2,k , respectively.Since the BS and RIS have fixed locations, we assume that the BS-RIS channel changes slower than the RIS-user channel, i.e., T H > T k h .Specifically, we assume T H = N T k h , where N is an integer.For simplicity, we assume We assume Rician fading for the BS-RIS and RIS-user channels.Within the coherence time T H , the BS-RIS and RIS-user channels can be modeled as where κ denotes the Rician factor.β BR and β Ru (k) are the large-scale fading coefficients of the BS-RIS and RIS-user channels.Ĥ1 and ĥn 2,k represent the normalized non-LoS (NLoS) components, with elements independently and identically distributed according to CN (0, 1).H1 and h2,k represent the normalized LoS components.Here, we drop the time index n of H n 1 since H 1 does not change within its coherence time.Moreover, we assume the LoS component hn 2,k does not change within the coherence time T H and drop the time index n, because it is more reasonable to assume a stable LoS component. 1The LoS components are given by H1 = a r ω AoA a H t ϑ AoD,a , ϑ AoD,e and h2,k = a r ϑ AoA,a k , ϑ AoA,e k , respectively.We assume a uniform linear array and a uniform planar array (UPA) for the BS and the RIS, respectively.The steering vectors a r ω AoA and a t ϑ AoD,a , ϑ AoD,e are given by (3) and ( 4), shown at the bottom of the next page, where d denotes the antenna or element separation distance and λ denotes the wavelength.For simplicity, we assume d = λ/2.S H and S V denote the number of elements in the horizontal and vertical directions, respectively.The number of RIS elements is equal to S = S H ×S V .ω AoA denotes the angle of arrival from the RIS to the BS.ϑ AoD,a and ϑ AoD,e are the azimuth and elevation angles of departure from the RIS to the BS.ϑ AoA,a k and ϑ AoA,e k are the azimuth and elevation angles of arrival from the kth user to the RIS.We can use (4) Considering possible different velocities of different users, the channel from the RIS to the kth user h n 2,k can be modeled by the first-order auto-regressive (AR) model as follows [17], [28]: where ) ∈ C S×S represents the AR parameter matrix of the kth user and u n k denotes the innovation process.Due to that h n 2,k is Rician distributed, the elements of u n k are complex Gaussian distributed with mean The element in A n k , i.e., a n k,s , denotes the time-correlation coefficient.For simplicity, we assume that different elements in the same RIS have similar time-correlation characteristics [16], [17], [29], i.e., a n k,s = a n k .Using the standard Jakes' model [17], [30], the time-correlation coefficient is determined by the symbol interval between the adjacent time slots, i.e., J 0 2πf k D |n − (n − 1)|T k h , where J 0 (•) denotes the zero-order Bessel function and f k D denotes the maximum Doppler shift in frequency of the kth user.Therefore, the time-correlation coefficient a k can be determined by

B. Signal Model
In the RIS-assisted network, we estimate the BS-RIS-user cascaded channel.The estimated CSI can be obtained by allowing users to send pilots to BS.To begin with, we should figure out how many training symbols are needed to complete the estimation of the cascaded channel in a multi-user network.Assume that there is only one element in the RIS and each user needs to send a pilot with length τ .If τ = K and all the pilots are orthogonal to each other, there is no pilot contamination.If τ < K, the pilot contamination exists.Since each RIS has S elements, in order to estimate the channel accurately, we need to either consecutively estimate the cascaded channel of each element by the ON/OFF method, or turn on all the elements but extend the training period to obtain more observations [31].In this case, the total length of the pilot should be τ × S. From [8], we know that the benefit brought by RIS could be deteriorated by using the first method while the full RIS aperture can be fully utilized by using the second method.As a result, we present a pilot transmission protocol as depicted in Fig. 2, where the pilot transmission period of each time slot is further divided into multiple sub-phases.Specifically, each time slot is separated into J = S sub-phases and each sub-phase consists of τ training symbols. 2  Considering the large number of elements in an RIS, a large amount of pilots is needed to estimate the cascaded channel.In order to reduce pilot overhead, we assume that in each sub-phase the length of pilots τ is shorter than the number of users, i.e., pilot contamination exists.In this case, it is possible that multiple users transmit the same pilot sequence to the BS.Define as the pilot sequence used by the kth user at the jth sub-phase.In each sub-phase, s n k,j is randomly selected from the set of orthogonal pilots.We define the set C k,j = {i|s n i,j = s n k,j , ∀i ̸ = k} as the set of indices of the users that have the same pilot sequence 2 According to [32], the least square (LS) estimator requires J ≥ S. We adopt J = S to maintain a relatively low training overhead.
as the kth user.Therefore, we have When the kth user sends the pilot sequence s n k,j to the BS, the received signal at the BS at the jth sub-phase of the nth time slot is given by where ) represents the phase shift matrix of the RIS at the jth sub-phase in the nth slot.ϕ n j,s denotes the phase shift of the sth element at the jth sub-phase in the nth slot.p represents the transmit power of training sequences.w n j ∈ C M ×τ denotes the Gaussian noise with zero mean and variance σ 2 .Define as the cascaded channel, ( 7) can be rewritten as In order to estimate the cascaded channel of the kth user, the LS estimation is performed and we have where the second term on the right side is the contaminated signal of the kth user.By defining v n k,j ≜ i∈C k,j G n i ϕ n j + w n j (s n k,j ) H / √ p and applying transpose to (9), we have By stacking the received signal in (10) for all sub-phases, we have where a t ϑ AoD,a , ϑ AoD,e = 1, e j 2πd λ sin ϑ AoD,a sin ϑ AoD,e , • • • , e j 2πd(S H −1) Finally, by stacking the received signal in (11) for all the users, we have where

III. KALMAN FILTER BASED CHANNEL PREDICTION A. Kalman Filter Based Tracking
Considering the mobility of users and the dynamic characteristic of channels, the required time for the channel estimation might exceed the coherence time of channels and thus the obtained CSI might be outdated.This will hinder the beamforming design that is based on the estimated CSI.One simple method to mitigate this issue is to transmit pilot sequences more frequently.However, the pilot overhead could be overwhelming since RIS has a large number of passive elements.Therefore, performing channel estimation only without consideration of its evolution over time is inefficient.Kalman filter is the Bayesian solution to the problem of sequentially estimating the states of a dynamical system.It can provide predictions of unknown states.Meanwhile, the estimation accuracy can be improved given the measurements observed over time.Therefore, it is appropriate for us to adopt the Kalman filter to track the cascaded channel.In the previous section, we have already obtained the dynamic model of the channel between RIS and users.Next, we need to obtain the dynamic model of the cascaded channel.
By applying diagonalization to both sides of (5) and multiplying it with H 1 , we have 3 16) According to the definition of the cascaded channel, ( 16) can be reexpressed as where By stacking the state equation ( 17) for all the users, we have where Based on the measurement equation ( 15) and the state equation (18), we obtain the overall state space model as follows: (19b) Based on the state space model, we show the procedures of predicting the cascaded channel in Algorithm 1, where 3 If H 1 changes when one coherence time T H ends and a new time slot in the next coherence time begins, ( 16) is not satisfied.Therefore, we assume H 1 does not change during the estimation process for simplicity. 5: and P n|n = P n|n−1 − K n Fn P n|n−1 .

6:
Set n = n + 1. 7: until n = N P n|n−1 and P n|n denote the priori and posteriori covariance matrices, respectively.The superscript notation 'i|j' is used to indicate the association with the probability density function of the state vector G computed at the ith time slot using the measurements up to the jth time slot.Q n represents the state noise covariance matrix U n .R n is the measurement noise covariance matrix V n .The expressions of Q n and R n are given in Lemma 1.
Specifically, the overall prediction process consists of four major steps.At the beginning, the estimates of the channel matrix and the MSE matrix are initialized.Then, the prediction results are derived according to step 3. Next, the Kalman gain is calculated by step 4. Kalman gain matrix is used to determine how much of a new measurement should be used for the next update of the cascaded channel.Finally, after receiving the new measurements, the estimated channel and MSE matrices can be corrected by utilizing Kalman gain.Steps 3-5 are repeated until the end of transmission.
Lemma 1: Under Rician fading and pilot contamination, the expression of Q n k is given as where where the definition of G cov i is given in Appendix A. Proof: See in Appendix A.

B. Complexity Analysis
Algorithm 1 is an iterative algorithm and the computational complexity of each iteration is determined by the prediction, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
update of Kalman gain, and the correction steps.The prediction step involves three matrix multiplications, and the number of real-valued multiplications is 8K 3 S 3 + 2K 2 S 2 M [33].The update of Kalman gain involves four matrix multiplications and one matrix inverse, and thus the number of real-valued multiplications is 17K 3 S 3 .The correction step involves three matrix multiplications, and thus the number of real-valued multiplications is 8K 3 S 3 + 4K 2 S 2 M .In step 5, the computational complexity of the LS method is O(KSM τ ).Overall, the total computational complexity of the Algorithm 1 is We also analyze the complexity of the existing methods introduced in section I. Since [17] adopts the same LS method and KF algorithm, the complexity can be obtained by setting K = 1 as they assume a single user case.In [13] and [14], the LMMSE estimator is adopted.In [13], the main complexity is the calculation of the second moment of the aggregated channel and is given by O(12KM S 3 ).The LMMSE estimator for each user and each time slot in [14] involves seven matrix multiplications and seven matrix inverse.The number of real-valued multiplications is approximately , where ∆ = S − ( S M − 1)M .Based on the above analysis, the complexity of the proposed algorithm is in the same order of S as the benchmarks in [13], [14], and [17].

C. Phase Shift Design
From Algorithm 1, we know that the MSE matrix depends on the phase shift.Therefore, we can optimize the phase shift during the prediction process to minimize MSE. 4 Let us first calculate the MSE of the kth user.According to Algorithm 1, the value of the prior covariance at the nth time slot is The posterior covariance at the nth time slot is calculated as Then the MSE of the kth user can be calculated as By summing the MSE of the kth user for all the users, the MSE minimization problem at the nth slot can be equivalently formulated as the following maximization problem.
where the constraint (25b) is derived from the unit-modulus constraint of phase shift ϕ n j and the definition of F in (14).
[F] j,s denotes the element at the jth row and the sth column of the matrix F.
The problem ( 25) is difficult to solve directly due to the nonconvex unit-modulus constraint.From the literature, 4 The BS uses a smart controller to configure the phase shift.
we know that a popular method of solving the phase shift optimization problem is to approximate it into a semidefinite programming problem and iteratively solve it [34].However, it is neither simple nor practical because of its high computational complexity.In order to reduce the computational complexity, a simple and effective method of designing phase shift is needed.Another general way to design phase shifts is to adopt the DFT matrix [35].The validity of the DFT matrix has been verified in [11], [12], [16], and [36].The objective function in (25a) has a similar form to the equation (22) in [36].The empirical study in [36] reveals that the DFT matrix is a stationary point and the MSE of local optima are close to the DFT-based solution.Although [36] considers a single user case, the DFT-based phase shift matrix is also a stationary point of our problem (25) because all the users share the same RIS.Since the DFT-based phase shift matrix is determined prior to the channel estimation process, the computational complexity is largely reduced.
In the following proposition, we show that the optimal phase shift under Rayleigh fading and orthogonal pilot sequences could be any unitary matrix that satisfies the unit-modulus constraint.
Proposition 1: When κ = 0 and τ = K, i.e., in the case of Rayleigh fading channel and orthogonal pilot allocation, the optimal phase shift that minimizes MSE is any unitary matrix that satisfies the unit-modulus constraint [17]. Proof: k and R n k are scaled identity matrices.As a result, the optimal phase shift is reduced to any unitary matrix that satisfies the unit-modulus constraint.Detailed derivations can be found in [17].

D. NMSE Analysis Under DFT Phase Shift
In this subsection, we analyze the NMSE obtained under the DFT phase shift.The NMSE of the system is defined as The denominator in ( 26) can be rewritten as Therefore, the denominator can be further rewritten as Here, we omit the details and directly present the results in the following.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
From section III-C, the MSE of all the users, i.e., the numerator in ( 26) is given by With DFT phase shift, we have FF H = SI S .MSE can be reduced to Note that in (28), the second term is still difficult to handle due to the matrices Qn k and R k .However, we can obtain some asymptotic results under a special case, i.e., an orthogonal pilot allocation case.From Proposition 1, R n k is reduced to M σ 2 /pI S under orthogonal pilot allocation.The second term in (28) can be rewritten as Remark: When considering orthogonal pilot allocation, we have NMSE → 0 in high pilot transmit SNR region and large S region.This is because ( 29) is reduced to Then the MSE derived in ( 28) is equal to zero.Combining ( 26)-( 27), we have zero NMSE.

IV. EXTENDED KALMAN FILTER UNDER HARDWARE IMPAIRMENTS A. Hardware Impairments
In practical communication scenarios, the hardware is generally non-ideal, and therefore the received signal is affected by the HWI.In this section, we consider three different types of hardware impairments in the following, including HWI at transceiver and RIS. 5 The distortion noise at each user and the BS: They can be assumed as Gaussian distributed variables with average power being proportional to the power of transmit and received signals.
• The multiplicative phase drifts at each user and the BS: They are caused by local oscillators and can be modeled by the Wiener process [21].• The phase error at RIS: This type of hardware impairment originates from finite precision of configuration of RIS elements and can be modeled as phase noise.
We assume that the phase noise is uniformly distributed in where k r denotes the severity of the residual impairments6 at the RIS [22].Since the phase shift could be interfered by the phase error, the optimization of phase shift is ineffective.Therefore, we do not consider the optimization of phase shifts in this section.During the prediction process, we adopt random phase shift at RIS. Considering the additive distortion noise and multiplicative phase drift at the transceiver as well as the phase noise at the RIS [21], [37], [38], the received pilot signal at the BS can be written as In (30), δ n k,j ∈ C 1×τ denotes the distortion noise at each user with each element distributed according to CN (0, κ U E p). κ U E denotes the proportionality coefficient that characterizes the level of impairment at each user.The distortion noise at the BS is denoted as [21], [22], [37].κ BS denotes the proportionality coefficient that characterizes the level of impairment at the BS.The phase shift matrix with phase noise at the RIS is given as ) , e j(θ n j,2 +∆θ n j,2 ) , • • • , e j(θ n j,S +∆θ n j,S ) ]), where θ n j,s ∈ [0, 2π) denotes the phase shift of the sth element at the jth sub-phase of the nth time slot.The phase noise of the sth element at RIS is denoted as ∆θ n j,s .The multiplicative phase drift is defined as Ψ n j = diag([e jψ n j,1 , e jψ n j,2 , • • • , e jψ n j,M ]), where ψ n j,m = ϵ n j + ν n j,m .ϵ n j denotes the phase drift of each user at the jth sub-phase of the nth time slot.ν n j,m denotes the phase drift of the mth BS antenna at the jth sub-phase of the nth time slot.From [21], [37], [38], ϵ n j and ν n j,m can be modeled as a discrete-time independent Wiener process as follows: where ∆ε n j ∼ N (0, σ 2 ϵ ) and ∆ν n j,m ∼ N (0, σ 2 ν ) denote the random phase increment caused by the imperfect local oscillator at each user and the mth BS antenna, respectively.According to [21], [37], and [38], the phase drifts at the transceiver update every sub-phase.

B. Joint Estimation Based on Extended Kalman Filter
By employing the LS estimation method, we have Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where φn j ≜[e j(θ n j,1 +∆θ n j,1 ) , e j(θ n j,2 +∆θ n j,2 ) , • • • , e j(θ n j,S +∆θ n j,S ) ] T and d n k,j ≜ . In order to facilitate the derivations later, we define a new variable as g n k ≜ vec(G n k ).vec(•) is the vectorization operation.By utilizing g n k , we can rewrite G n k φn j as (( φn j ) T ⊗ I M )g n k .Therefore, (32) can be rewritten as where F n j ≜ ( φn j ) T ⊗ I M .Following the same procedure in the previous section, we stack the vectorized signal for all the sub-phases as follows: where By stacking (34) for all users into a large vector, we derive the measurement equation as follows: where Ŷn = [( Ỹn , and (39), the phase drift is also integrated into the measurement matrix.Normally, the phase drift is unknown to the BS, which makes the estimation challenging. 7Fortunately, we observe that the phase drift in (31) has a similar update rule as the cascaded channel.This motivates us to jointly estimate the cascaded channel and the phase drift.From [39], [40], the phase drift updates every sub-phase, i.e., every pilot sequence length.While in state equation (18), the cascaded channel updates every time slot, which means the phase drift at the transceiver updates more frequently than the channel.In order to describe the overall state equation in a unified manner, we first present the following update equations of the phase drifts by recursively updating the equations in (31).8 where ∆ϵ n j ∼ N (0, Jσ 2 ϵ ) and ∆ν n j,m ∼ N (0, Jσ 2 ν ) [39], [40].
We obtain the update equation of the phase drift in a vector form as follows: where ∆ϵ Next, we present the new state equation of the vectorized channel vector g n k as follows: where Ãk ≜ a k I SM and z n k ≜ rvec(U n k ).rvec(•) denotes the row vectorization of a matrix.
Combining ( 41)-( 42), we rearrange the overall state equation as Υ n = ÃΥ n−1 + Z n , where the new state vector Υ n is defined by stacking the phase drift and channel variables into one vector as follows: T , (∆ϵ) T , (∆ν) T T .(45) By utilizing the new state vector Υ n and defining φ (Υ n ) = Ψn Fn g n , the measurement equation ( 39) can be rewritten as Ŷn = φ (Υ n ) + D n .Since the phase drifts need to be jointly estimated with the cascaded channel, φ (Υ n ) is a nonlinear function of the state vector Υ n .The nonlinearity of the measurement equation is caused by the multiplicative phase shift matrix Ψn .Specifically, the overall state vector Υ n in (43) consists of the channel vector g n and the phase drift vector [(ϵ n ) T , (ν n ) T ] T .While the function φ (Υ n ) = Ψn Fn g n is the product of the channel vector g n and the phase drift matrix Ψn .Since the measurement equation is nonlinear, the Kalman filter cannot be directly applied.One approach to solve the nonlinear problem is the extended Kalman filter.Through this approach, we first need to linearize the nonlinear equation by applying the first-Taylor approximation.Then we can apply KF to the linearized model.
With the derived Jacobian matrix, we can apply extended Kalman filter to jointly estimate the phase drift and the cascaded channel, as shown in Algorithm 2 below, where Qn and Rn denote the covariance matrices of the state noise Z n and the measurement noise D n , respectively.The expressions of Qn and Rn can be derived following a similar method as in Appendix A.

V. SIMULATION RESULTS
In this section, we evaluate the NMSE performance of the proposed algorithms through simulations.Numerical results are obtained using MATLAB, and the results are averaged over 100 independent channel realizations.The total number of time slots is N = 200.We consider a two-dimensional (2D) plane network where the BS and RIS are located at (−10, 0) and (0, 10), respectively.In addition, K single-antenna users are uniformly distributed in a 20m × 10m rectangle area.The path loss models from the BS to RIS and from RIS to users are 30 + 22 log 10 (d(m)), where d denotes the distance.The AoAs/AoDs are uniformly distributed within [0, 1  2 π] for the elevation angle and within [0, 2π] for the azimuth angle.The Rician factor is 10 dB.The coherence time of the BS-RIS channel is 10 times that of the RIS-user channel, i.e., T H = 10T h .The Doppler shift for different users is uniformly distributed as 0.01 + 0.001 × U(0, 1).The noise power is set as σ 2 = −120 dBm.The number of users K, the number of BS antennas M , the number of RIS elements S, and the severity of different types of HWI will be specified in each experiment.The NMSE of KF and EKF algorithms can be computed as , where ∥ • ∥ F and ∥ • ∥ represent the Frobenius norm and the l 2 norm, respectively.
We denote the proposed Algorithm 1 and Algorithm 2 as "Proposed KF algorithm" and "Proposed EKF algorithm", respectively.The "Proposed KF algorithm" adopts the DFT matrix as the phase shift.We evaluate the performance of the proposed algorithms by comparing them with the following benchmark schemes: • Unitary phase [17]: In this scheme, the authors consider a single-user case and do not consider pilot contamination.
Kalman filter is used to estimate the cascaded channel.Therefore, we iteratively estimate the cascaded channel for each user.Also, Rayleigh fading is considered in this paper.The optimal phase shift in this case is any unitary matrix that satisfies the power constraint.
• LMMSE [13]: In [13], the authors consider multiple users, pilot contamination, and Rayleigh fading.Since the optimal phase shift design in the scheme is equal phase shifts, we replace the equal phase shifts with varying random phase shifts for a fair comparison, and thus the BS can obtain sufficient observations.Particularly, instead of estimating the cascaded channel of the BS-RIS-user link, the authors estimate the aggregate channel that includes phase shift.The authors adopt the LMMSE estimator to estimate the aggregate channel.• Random phase: This scheme adopts the same Kalman filter as the "Proposed KF algorithm", but the RIS phase shifts are randomly generated from [0, 2π] during the whole estimation process.• LMMSE, three phase [14]: The authors of [14] proposed a three-phase channel estimation framework, in which the direct link and the cascaded channel of a typical user are estimated in phases I and II, respectively.The cascaded channels of the rest of users are sequentially estimated in phase III by utilizing the common BS-RIS channel.
When each user transmits a pilot symbol during phase III, only selected RIS elements are switched on to reflect the pilot symbol and thus multiple time slots are needed to complete the estimation.When comparing with this framework, the estimation of the direct link is neglected to ensure consistency.• LMMSE, ON/OFF [24]: When considering hardware impairments, we compare our proposed EKF algorithm with this scheme to show the advantages.In this scheme, the distortion noise and phase noise are considered while the phase drift is not considered.We will show that this benchmark scheme is inferior to the proposed scheme even though the phase drift is not taken into consideration.Particularly, the authors use the ON/OFF switch method and the LMMSE method to consecutively estimate the cascaded channel of each RIS element.
A. Without HWI Fig. 3 shows the NMSE performance versus the Rician factor.The RIS adopts a 5 × 4 UPA.The NMSE of the proposed KF algorithm and the random phase scheme improve when the Rician factor increases.This is because when κ becomes larger, the LoS components of BS-RIS and RIS-user channels become more dominant, thus enhancing the received signal power at the BS.The NMSE of the Unitary phase scheme slightly decreases when the Rician factor increases.This is because [17] assumes Rayleigh fading, and thus the Rician factor is not involved in the KF algorithm and its phase shift design.On the contrary, the NMSE of the LMMSE scheme and the "LMMSE, three phase" scheme increases with Rician factor.This can be explained as follows.Both two schemes adopt the LMMSE estimator in which the covariance matrices of the BS-RIS and RIS-user channels are utilized to compute the second moment of the cascaded channel.To adapt to our scenario, the Rician factor is involved in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the covariance matrices.However, the two schemes consider Rayleigh fading assumption.The LMMSE estimator obtained under Rayleigh fading does not work well in the case of Rician fading.Furthermore, the LMMSE-based schemes are inferior to all the Kalman-based schemes.This is reasonable because the LMMSE-based schemes do not have the information on time correlation that can be exploited to improve the channel estimation.
Fig. 4 demonstrates the NMSE performance versus the transmit power of pilot sequences.It can be observed that the NMSE performance of all the schemes improves with the increase in p.The proposed KF algorithm achieves a much better NMSE performance over other benchmark schemes even when suffering from pilot contamination.This is because the time-varying characteristic is exploited by the proposed algorithm.The performance gaps between the proposed algorithm, the benchmark [17], and the random phase scheme show the superiority of the DFT matrix over unitary phase shift and random phase shift.The "LMMSE, three phase" scheme achieves the worst performance.The reason is that only selected RIS elements are switched on during estimation and the full RIS aperture gain is not utilized.
In Fig. 5, we compare the NMSE performance in terms of the number of elements.The RIS is an S H × 10 UPA, where S H increases from 2 to 6.The proposed KF algorithm outperforms all the other benchmarks, which indicates the benefit of the DFT matrix and the utilization of time-varying  characteristics.The NMSE performance of the proposed KF algorithm, the "Unitary phase", and the LMMSE scheme improve with the number of elements, especially for the proposed KF algorithm.The NMSE performance of the "Random phase" scheme degrades when the number of elements increases.The reason is that the observations obtained with random phase shift are not sufficient for the estimation, and the contaminated signal significantly increases when S increases.It is seen that the NMSE of the "LMMSE, three phase" scheme first slightly increases, and then varies a little after N > 40.This is reasonable because when S increases, each user needs more time slots to complete the estimation of the cascaded channels of all the RIS elements.As the estimation at each time slot is independent, the total estimation error of each user accumulates when the number of time slots increases.
Fig. 6 illustrates the NMSE versus the normalized Doppler shift.A higher Doppler shift means a higher moving speed of users.When the speed of users increases, as expected, the channel tracking becomes difficult and the estimation accuracy decreases.From Fig. 6, the NMSE of all the schemes except the "LMMSE" scheme increases with the increase in the normalized Doppler shift.Although both the "LMMSE" scheme and the "LMMSE, three phase" scheme assume Rayleigh fading and use the same LMMSE estimator, the two schemes show different trends.The reason is that the "LMMSE" scheme only utilizes the variance matrices of the BS-RIS and RIS-user channels.As a result, the NMSE of the "LMMSE" Both algorithms can achieve a better NMSE performance when the transmit power increases.Both algorithms reach error floors around p = 10 dB due to the HWI-induced interference.The NMSE performance degrades when the severity of the distortion noise increases.The error floors go higher when increasing the severity of the distortion noise.It is observed that there is a large performance gap between the two algorithms.The proposed algorithm achieves a much better performance under different values of κ BS κ U E .This is reasonable because the "LMMSE, ON/OFF" scheme adopts the ON/OFF scheme without fully utilizing the RIS aperture.The figures above indicate that the RIS-assisted system has poor NMSE performance under lower hardware quality.

VI. CONCLUSION
In this paper, we have investigated channel estimation in an RIS-assisted multi-user network.Considering users are mobile, we utilized KF to track the cascaded channel on time correlation.We derived the expressions of state and measurement covariance matrices in case of Rician fading and pilot contamination.We adopted the DFT matrix as phase shift matrix for the sake of practicality and presented NMSE analysis and some asymptotic results.We then extended the channel estimation problem to a more practical scenario where three different kinds of hardware impairments are considered.Due to the non-linearity of HWI, we proposed to jointly estimate the cascaded channel and HWI by EKF.Extensive numerical results demonstrated the superiority of the proposed algorithms over benchmark schemes and provided valuable insights into the effect of various HWI on estimation accuracy.Without HWI, it can be observed that the NMSE can be improved by increasing the transmit power of pilot sequences and the number of elements.When the LoS of RIS-BS and RIS-user links are strengthened, the estimation accuracy can be enhanced.When the moving speed of users increases, the ability to track the channel deteriorates.The proposed EKF algorithm is able to track the cascaded channel well even with severe HWI.When the number of users, RIS elements, and BS antennas increase, the HWI at the transceiver and RIS becomes more severe resulting in the degradation of estimation accuracy.

APPENDIX A PROOF OF LEMMA 1
Based on the definition of Q n , it can be calculated as Q n = Cov(U n , U n ) and Cov(•) takes the covariance.For simplicity, we first calculate the covariance matrix for the kth user, i.e., Q n k .Q n can be obtained by stacking the covariance matrices of all the users.For the kth user, Q n k can be calculated as where The element at the ith row and the jth column of the matrix X cov k is given by R n can be calculated as R n = Cov(V n , V n ).Similarly, we calculate the covariance matrix for the kth user as n k = Cov(V n k , V n k ).Apparently, R n k ∈ C J×J is a diagonal matrix.Considering the same structure of v n k,j , ∀j, we start with the element at the jth row and the jth column of R n k as follows: The first term in (51) can be calculated as  (53) As w n has a zero mean, we only need to calculate T as follows: We define , and it can be calculated following the similar way as the first term in (47).Therefore, R n k is a diagonal with its diagonal elements being given by combining (51)-(54).

Algorithm 1
Channel Prediction Based on Kalman Filter 1: Initialization: G 0 = 0 KS×M , P 0 = M I KS , n = 1, and the total number of time slots N .2: repeat 3: