Deep Learning based Channel Estimation Algorithm over Time Selective Fading Channels

The research about deep learning application for physical layer has been received much attention in recent years. In this paper, we propose a Deep Learning (DL) based channel estimator under time varying Rayleigh fading channel. We build up, train and test the channel estimator using Neural Network (NN). The proposed DL-based estimator can dynamically track the channel status without any prior knowledge about the channel model and statistic characteristics. The simulation results show the proposed NN estimator has better Mean Square Error (MSE) performance compared with the traditional algorithms and some other DL-based architectures. Furthermore, the proposed DL-based estimator also shows its robustness with the different pilot densities.


I. INTRODUCTION
A S the machine learning technology and the perfor- mance of hardware develop rapidly in recent years, Deep Learning(DL) has been successfully applied to many fields, especially in Computer Version and Nature Language Processing(NLP).Such technology has been applied to the physical layer processing of communication systems in [1].Since then, more research has been focusing on applying learning algorithms to different communication user scenarios.
In traditional communication system, it always consists of different modules such as source coding, channel coding, modulation, demodulation, estimation, equalization, etc.And an end-to-end communication system under AWGN channel is designed in [1].Using fully connected NN, whose behavior is similar to an autoencoder, it achieves the similar performance to the tradition system with (7,4) Hamming code and BPSK modulation.Such autoencoder learns how to get an expression in a low dimension and the way to restore it.And Convolution Neural Network(CNN) based model [2] has been developed to solve the dimensional explosion problem in autoencoder and achieves better performance than traditional methods(64QAM+MMSE) under both AWGN and static fading channel.Besides, a communication system with Software Defined Radio(SDR) only including NN are used to prove that transmission over the air with deep learning technology is Qinbo Bai, Jintao Wang, and Jian Song are with the Electronic Engineering Department, Tsinghua University, and Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China (email:wangjintao@tsinghua.edu.cn).
Yue Zhang is with the Department of Engineering, University of Leicester, Leicester, LE1 7RH, United Kingdom (e-mail: yue.zhang@leicester.ac.uk).
possible [3].In Orthogonal Frequency Division Multiplexing [4] system, the deep learning algorithm for joint channel estimation and signal detection has been researched in [5].To overcome the back-propagation problem in NN transmitter when the channel is unknown, different methods are proposed.Policy gradient algorithm in the reinforcement learning is used in [6].A new deep learning technology, Conditional Generative Adversarial Nets [7], is introduced in [8] to emulate the unknown channel.Simultaneous Perturbation Stochastic Approximation [9] algorithm is utilized in [10] to give a direct estimation of the channel gradient.
However, in order to make DL-based communication system meaningful in the practical system, complex channels need to be considered.One kind of complex channel, which is difficult to handle with a traditional algorithm, is the time selective channel.Due to the movement of receiver, the channel status will change in time domain.Research on such channel using deep learning is somehow only a little.Sliding Bidirectional Recurrent Neural Network(SBRNN) has been put forward in [11] and works as a detector to learn rapid varying optical and molecular channel.A simple application of neural network to Rayleigh fading channel is given in [12].Multiple Layers Perceptron(MLP) is used to undertake channel estimation for the time selective channel [13] and doubly selective channel [14], respectively.
However, MLP is a memoryless structure.Thus, it can't learn the relation of data in time domain well.Besides, linear layers in MLP will result in the increasing of neurons size as input length increases.Despite that data can be divided into blocks to avoid this problem, divided data may lead to the discontinuity of the channel estimation.Considering the similarity of this problem in NLP field, it is better to use Recurrent Neural Network(RNN) to get the estimation of channel.In this article, time varying Rayleigh fading channel is explored using the deep learning technology and our contributions are summarized as below.
• Based on deep learning algorithm, the SBGRU channel estimator is proposed to learn time varying Rayleigh fading channel.Using RNN structure and sliding idea, SBGRU can handle the transmitted symbol with arbitrary length and immediately provide the result as soon as the symbol arrived.The rest parts of this article is arranged as follow: Section II describes the basic channel model, data structure and the signal flow model.Section III gives the deep learning based algorithm in details for NN channel estimator.Section IV uses quantities of simulation results to demonstrate the performance of NN estimator.Finally, Section V concludes the paper and gives some orientations for future work.
Notation: Bold lower-case letters and upper-case letters denote vectors and matrices, respectively.The subscript on a lower-case letter x i represent i th element of vector x.

A. Time Varying Rayleigh Fading Channel Model
Typically, wireless communication environment is generally modeled as Rayleigh fading channel.Multi-path will cause frequency selective fading and Doppler shifting will result in time selective fading.However, in this paper, only time selective fading is considered in order to give the first exploration of rapidly varying channel.The influence of multi-path will be researched in the future work.
Clarke's model [15] is used in this paper to describe time varying channel.In order to describe the time varying characteristic, Jakes Doppler Spectrum [16] is adopted here: where f d is the maximum Doppler shift.Given a speed v(m/s) and carrier frequency f c (Hz), is the speed of light in free space).The autocorrelation of Jakes Doppler Spectrum is: where J 0 (•) is the first kind of Bessel function of 0 order and the discrete form of autocorrelation is:

C. Signal Flow Model
The signal flow model is shown in Fig. 2. At the transmitter side, no deep learning technology is introduced.Information bits and pilot bits are combined to generated original signal.After modulating, transmitted signal x is sent to the channel and modulated pilots p are sent to NN estimator.At the receiver side, NN channel estimator uses p and channel distorted signal plus the noise y to give the estimation of channel h.
Two things need to be notified.Firstly, due to no NN introduced at transmitter, it is easy to add any traditional channel coding such as Low Density Parity Check(LDPC) [17], to improve the performance against noise.Secondly, NN channel estimator doesn't need any information about the channel.It means that the communication system is model free.

D. Traditional Algorithms For Channel Estimation
In channel estimation, the most common estimators are Least Square(LS) [18] estimator and Minimal Mean Square Error(MMSE) [19] estimator.According to (1), LS estimator under the time varying channel is: For those positions where pilots are inserted, above equation can be directly used to get the estimation.For other positions, linear interpolation is necessary.Denote j,k( j < k) to be positions of pilot nearest to the position i.Thus, the interpolated channel is: Due to the existence of noise, omitting the influence of interpolation, the expected Mean Square Error(MSE) of LS estimator is: Another traditional estimator would be MMSE estimator: where I represents unit matrix and R hh = E(hh H ) represents correlation matrix: where R[•] can be calculated according to (4) It should be noticed that the form of autocorrelation function of channel and Doppler speed need to be given in advance in order to undertake the MMSE estimation.However, real channel model and accurate statistic characteristic(Doppler speed here) are hard to know under practical application.Thus, two methods for MMSE estimation are used in simulation.
Firstly, assuming above information already known, ĥM M SE can be directly calculated according to (3) and (8).Thus, we call this method "MMSE theory".Secondly, after getting LS estimation, ĥLS can be used to calculate auto- and then use (8).We call this method "MMSE sim" because the computation is completed by simulation results.

III. DL-BASED NN CHANNEL ESTIMATOR
To track a time varying channel, it is necessary to give neural network the ability of studying the behavior of correlation in time domain.Thus, a good choice to handle sequence data is using RNN.

A. RNN structure
A simple example of 1 layer RNN is given in Fig. 3a.In this structure, the output of last time becomes one part of input of this time.By this way, RNN can capture past information.
The basic RNN cell will give the computation result as the following function.
where T anh is hyperbolic tangent function and h t , h t−1 are the hidden states at time t and t − 1, respectively.x t is the input at time t.W ih , W hh and b ih , b hh are weights and biases, which need to be learned.However, the time varying channel h(t) has relation with both past and future channel states.Basic RNN cell is fed forward only.Thus, bidirectional structure, as shown in Fig. 3b, would have better performance.Blue blocks are forward cells and red blocks are backward cells.The data will not only be fed in forward direction, but fed backward again.The hidden states h t and h t are combined together to become the input of a linear layer to give final results.
Another problem is that Basic RNN cell with (9) can't capture long time information.To solve this problem, Long Short Time Memory(LSTM) [20] cell has been put forward.In this paper, Gated Recurrent Unit(GRU) [21] is used, one variation of LSTM, to replace basic RNN cell.The GRU will give the result as the following function( [21],( 5),( 6),( 7),( 8)) where σ(•) refers to Sigmoid function f s (x) = 1 1+e −x , W z , W r , W are weights and h t , h t−1 , x t have the same meaning as (9).Compared with basic RNN cell, GRU introduces 2 gates, update gate z t and reset gate r t , to control the information flow.GRU has been proved to have similar performance to LSTM on many tasks [22] and have higher speed due to less gate number.
Based on above discussion, BGRU cell will be used in NN channel estimator.However, the result of simple BGRU is not good enough.The idea of Sliding BRNN(SBRNN) [11] is considered to improve the performance further, and the compare between BGRU and SBGRU will be given in section IV.C.

B. SBGRU structure
SBRNN is put forward in [11] to work as a detector under optical and molecule channel.Here, this structure is used in estimation task under the time varying Rayleigh fading channel.A simple example of the sliding structure is given in Fig. 4.Each BGRU block in the figure has a fixed window length W L .It should be stated that the selection of window length has relationship with channel character.Due to the any two moments of channel h is correlated, it is reasonable that the longer the window is, the better the performance will be.The simulation about window length will be given in section IV.D.
SBGRU will be given W L symbols to undertake once computation, and will slide 1 symbol after each computation.Due to the sliding operation, most symbols in the sequence will be estimated for several times.We take the average of all estimation to give final results.Denote h t = f BGRU (x t , h t−1 , h t+1 ) as the function of operation defined in (10) for bidirectional version in BGRU layer.Denote S = { j | j ∈ Z, max(0, t − W L + 1) ≤ j ≤ min(t, L−1)} as the set including all starting positions of BGRU for symbol x t , and final output of SBGRU for x t is: where h j t−1 and h j t+1 are the hidden states for BGRU starting from j th symbol in time t − 1 for forward and t + 1 for backward.

C. Train and test NN estimator
A final implement of SBGRU neural network is given in Fig. 5.The input data to SBGRU consists of channel distorted signal y and the original pilot information p = [ p 1 , p 2 , ..., p K ], including K same pilot blocks p i with length N, and It means that the pilot sequence will have the same symbols as x in pilot positions and have 0 symbols in information positions.Due to current deep learning platform only receiving real numbers, real part and image part of complex signal need to be separated firstly.Thus, the input data of the SBGRU will be given as: Considering the balance between accuracy and training time, here 2 layers BGRU are adopted to construct SBGRU layer.Denote function f SBGRU as the operation in SBGRU layer defined by (11) and function f Linear as the operation in Linear layer defined as: where W, b are weight and bias in linear layer, respectively.The final estimation of channel, denoted as ĥ, can be expressed as: where θ S are the parameters of SBGRU and θ L are the parameters of Linear layer.Denote θ = {θ S , θ L } to make notation clearly.To train the NN estimator, a loss function, which can represent the system performance, needs to be constructed.And parameters θ need to be optimized in order to minimize the loss function.Due to MSE always regarded as criterion in estimation problem, MSE loss function is adopted, which can be expressed as Minimizing loss function can be completed by updating θ iteratively.The most classical algorithm is Stochastic Gradient Descant(SGD).Adam [23] optimization algorithm, which has better performance in multiple tasks, is adopted here.
Testing data has the same structure and statistic characteristics with training data.Trained parameters θ are loaded to finish the computation of testing data and get the estimated channel.

IV. SIMULATION RESULTS
In this section, we demonstrate the performance of NN channel estimator under the time varying Rayleigh fading channel and provide the explanation to the performance improvement through the simulation results.And the simulation setting for the NN estimator is firstly described.Then, four group simulation results of NN estimator have been presented and analyzed.

A. Simulation Setting
In the following simulations, i.i.d.bit sequences are randomly generated, and QPSK modulation is used to map bits to symbols.According to the channel model given in section II.A and channel parameters given in Table I, 1200 channels are generated, 800 for training, 200 for validation and 200 for testing.The selection of channel parameters and pilot density is the same as [13] in order to undertake comparison simulation in Section IV.C.Also, based on the data structure in Fig. 1 and data parameters in Table I, 120000 sequences are generated, 100000 for training, 10000 for validation and 10000 for testing.When calculating the channel distorted signal, each symbol sequence randomly choose one channel to send.
The default data and NN parameters of estimator, detector and system are shown in Table II.
The proposed DL-based algorithm is implemented on a computer with an Intel (R) Corel (TM) i7-6700K CPU @ 4.0GHz CPU, a NVIDIA GeForce GTX 1080 GPU and 16GB memory.Pytorch 1.0.0 and python 3.6 are used for the estimation.

B. Performance Comparison with the traditional algorithm
Here the proposed NN channel estimator is compared with traditional algorithm, LS estimator and MMSE estimator.The performance comparison is shown in Fig. 6.It is obvious that "MMSE theory" achieves the best performance within the testing SNR range.And the LS estimation is the worst due to not considering the influence of the noise.And the simulation result does match the expected performance stated in (7)."MMSE sim" estimation, stated in section II.E, has small performance improvement compared with LS estimator and the improvement decreases when SNR reaches high value.SBGRU estimator reaches the similar performance to "MMSE theory" estimator and doesn't need any channel knowledge.Besides, SBGRU estimator also greatly outperforms both LS and "MMSE sim".Such results prove that SBGRU estimator is a best solution under the time varying channel.
To visualize how the SBGRU estimator work, the performance of the channel tracking of the SBGRU and traditional estimator is given in Fig. 7.In order to make the channel varying significant in time domain, channel length is extended to 4000 symbols and SNR is set to 20dB.It's easy to find that SBGRU estimator can track the channel very well in most linear parts and has slight oscillation in non-linear parts.However, in Fig. 7b, where white line represents real channel, both LS estimator and "MMSE sim" estimator vibrate heavily.

C. Performance Comparison with different structures of NN
When deep learning algorithms are used to undertake the channel estimation, different structures of neural network will achieve different performances.Firstly, The enhancement of the sliding operation for SBGRU is demonstrated in Fig. 8.All settings are the same except that BGRU computes block by block.The performance of BGRU decreases rapidly as SNR increases because the introduction of sliding operation can utilizes the average channel information within a certain time window.Besides, the channel estimation problem under similar time varying channel has been researched in [13] by using MLP neural network.Its basic idea is to include not only the channel distorted data and pilot data but the estimated channel from last block to get the better channel estimation performance.In its simulations, it sets the estimation block length the same as the data structure.However, this estimation block length can be different.In order to compare the performance fairly, the NN architecture in [13] is reconstructed, trained and tested using the same settings and simulation parameters as the SBGRU simulation.Besides, three different parameters 16, 32 and 40 are used to fully explore the influence of the estimation block length, .
The performance comparison between MLP and SBGRU is given in Fig. 9. MLP with estimation block length 16(same design as [13]) doesn't work very well.It is possible that parameters in NN model is not enough so that the ability to learn the nonlinear channel isn't strong.When estimation block length increases to 32, the performance increase a bit.However, a estimation block length of 40 will result in performance decreased.It is because MLP with estimation block length 40, which is not the integral multiple of original data block length 16, can't fully explore the pilot information repeated in time domain.However, SBGRU estimator outperforms all above MLP estimator when SNR is above 5dB.Besides, thanks to the recurrent structure of RNN, previous channel estimation doesn't need to be inputed into neural network.It can be captured by SBGRU automatically.

D. Performance vs window length
Here the influence of sliding window length is explored.The performance among different window lengths is in Fig. 10.The performance monotonically increases as the window length getting longer.Except for window length of 16 symbols, all 3 other window lengths have nearly the same performance.It shows that the window length can't be too short in or-Fig.8. Performance compare between Sliding BGRU and Non-sliding BGRU Fig. 9. Performance compare between SBGRU estimator and MLP estimator der to have enough information to undertake the estimation.However, the too long window length can't bring much more improvement.Thus, selecting a suitable window length can achieve the balance between the accuracy and the speed of training and testing.Overall, the setting of window length have the relation with channel characteristics.

E. Performance vs pilot density
Finally, the influence of pilot density is described to show the robustness of SBGRU estimator.The performance is shown in Fig. 11.As the pilot density decreases, the MSE performance indeed decreases a little but not seriously.The result is still much better than LS estimation and "MMSE sim" estimation.Thus, SBGRU estimator shows the performance robustness with the different pilot densities.

V. CONCLUSION
In this paper, a DL-based channel estimator is designed under the time varying Rayleigh fading channel.The proposed DL-based channel estimator can achieve better performance than traditional algorithms and some NN estimators with different structures.Besides, the proposed NN channel estimator shows its ability to dynamically track the channel and its robustness with pilot density.
In the traditional communication, there are much more complex traditional algorithms to complete channel estimation.However, there are some unique advantages compared with the traditional algorithms when deep learning algorithms are used.
• Despite many estimation methods having been developed in traditional communication system, most of them always assume the channel to be invariant in coherence time.However, using deep learning algorithm, the prior knowledge about channel model and the channel invariant in coherence time assumption aren't needed during the training and testing, which shows the potential performance of DL-based algorithm under the time varying channel.
• The channel estimator designed in this paper can be easily optimized by combining traditional algorithms.For exam-ple, it's convenient to insert the high performance channel coding before the modulation to protect the performance against Gaussian noise.Thus, the MSE performance can be further improved.In addition, there is still a lot work to do in applying deep learning or machine learning technology to the physical layer under time varying channel and here are some following aspects.
• Except for the channel estimation, it is also feasible to construct a detector to undertake the equalization and demodulation together using deep learning algorithm.Thus, by connecting NN estimator and NN detector, a wireless communication system can be constructed.
It is worth to explore whether such DL-based system can achieve better bit error rate(BER) performance than traditional system under the time varying channel and is still robust with different pilot densities.
This work was supported in part by the National Key R&D Program of China under Grant 2017YFE0112300 and Beijing National Research Center for Information Science and Technology under Grant BNR2019RC01014 and BNR2019TD01001.(Correspondingauthor: Jintao Wang.) E(•) refers to the expectation.(•) T and (•) H refer to the transpose and Hermite transpose of the vector.| • | represents for the absolute value or amplitude for real number and complex number, respectively.For two vectors or matrices a and b,[a, b] is the matrix combing a and b.For two real numbers a ≤ b, [a, b] is the set for all real numbers in range from a to b. real(•) and image(•) are the functions giving the real and imaginary part of complex vector for each element.II.SYSTEM MODEL In this section, signal architecture and time varying Rayleigh fading channel model are firstly presented.Then, a signal flow model will be introduced.Denote the transmitted signal and received signal as x,y, respectively.Denote the Rayleigh time varying channel as h.Considering a Linear Time Variant(LTV) model, the relation between input and output of channel is:

Fig. 3 .Fig. 4 .
Fig. 3.The structure of RNN (a) The structure of forward only RNN (b) The structure of bidirectional RNN

Fig. 7 .
Fig. 7. Simulation results for Channel Tracking.(a) Tracking performance of SBGRU estimator (b) Tracking performance of LS and MMSE estimator

Fig. 10 .
Fig. 10.The influence of sliding window length to SBGRU estimator