Asynchronous baseband processor design for cooperative MIMO satellite communication

The challenges in satellite communication (SatCom) include but not limited to the customary complications of telecommunication such as channel condition, signal to noise ratio (SNR), etc. SatCom system is also prone to transient and permanent radiations hazards. Hence, in spite of the harsh environmental factors (weather phenomena, solar events, etc), a SatCom system must maintain reliable and predictable communication functions with limited source of power. This paper presents a SatCom system design for achieving both low-power and high fidelity communication. The design uses cooperative multiple input multiple output (MIMO) for spectral efficiency and diversity, low-density parity-check (LDPC) decoding for near Shannon-limit gain, and dynamic voltage and frequency scaling (DVFS)-assisted asynchronous circuit designs to achieve low-power and fault tolerance. The MIMO system permits uninterrupted service in the event of temporary/permanent link or unit failures. The results show that the resilience against injected radiation levels of upto about 25 fempto-Coulombs on critical path is achieved. This is more than 600 times the minimum charge required to logically flip a gate output in ordinary static CMOS gate.


I. INTRODUCTION
MIMO has been widely acclaimed due to its high throughput, spatial gain, diversity, and interference reduction with no additional cost from the perspectives of transmission power and bandwidth [1], [2].It has already been adopted by many terrestrial standards, such as the IEEE 802.11n, 802.16e,Long Term Evaluation (LTE), etc.In effort to keep the pace with terrestrial systems, research in SatCom systems also tends to incorporate MIMO for higher data rate and bandwidth efficiency.
The general setup of MIMO Satellite uplinks and downlinks is proposed in [3], whereas the optimization for maximum achievable channel capacity is obtained from Line-of-Sight (LOS) signal component, which is the backbone of SatCom.A number of MIMO SatCom application examples, including the common case of satellites with transparent communication payloads are discussed in [4].It shows that the construction of MIMO SatCom with optimal capacity is practically possible under the assumption of undistributed LOS propagation.However, with severe weather condition (such as rain and wet snow) the MIMO Satellite channel capacity degrades significantly [5].To resolve this issue, several techniques such as dual polarization, power allocation with linear precoding have been proposed in [6].Cooperative satellite communication is possibly a better solution to this problem presented in [7], offering extended satellite coverage in uncovered areas.Generally, cooperative systems have a source node multicasting a message to a number of cooperative nodes, which in turn retransmit a processed version to the intended destination node.Along with this classical concept of repetitive forwarding, several techniques have been discussed in [8], [9] for cooperative MIMO SatCom.However, most of these cooperative proposals require symbol-level synchronization between cooperative nodes.The lack of synchronization may result in inter-symbol interference and dispersive channels.To address this problem, asynchronous cooperative MIMO communication techniques is proposed [8].
The use of MIMO can save a significant amount of transmission power, and can increase the bandwidth even considering the local energy cost for trafficking joint information within the cooperating nodes [3]- [8].In [9] the power efficiency of the cooperative MIMO in term of transmission robustness has been studied.The result shows that if the target distance is more than a threshold or if channel condition is poor, use of MIMO is inevitable.Although a MIMO system with increased circuit complexity can achieve better performance in term of power attributable to its diversity and power gain, the energy consumption of the decoding circuit itself can lead to another venue of exploration.However, most of the MIMO decoding circuits are sensitive to clock, i.e., synchronous.As a result, they are highly sensitive to antenna placement and radiations effects, leading to low reliability and high bit error rate (BER).
With the emphasis on low power, we developed a faulttolerant MIMO receiver for satellite transmission using the approach of asynchronous circuit design.The key attribute of an asynchronous circuit is remarkably high reliability even at very low power consumption.These circuits perform ondemand computation until correct completion.Moreover, in terms of stability, asynchronous circuits are inherently more robust [10].Research on asynchronous circuits show that they can mask 100% of the non-permanent faults for Wireless communication applications.Several ultra-low power design techniques have been reported in [11], [12].
The overview of the cooperative communication scenario is illustrated in Fig 1 which includes, e.g., the three cooperative satellites along with the three antennas on earth.The cloud represents severe weather condition that degrades the MIMO Satellite channel capacity.The proposed idea of combining cooperative MIMO SatCom with asynchronous decoding circuit is an amicable solution to such situation.Cooperative MIMO SatCom can ensure coverage even at the failure of one satellite.Moreover, use of asynchronous circuit [12] can achieve nearcomplete fault coverage with low power consumption and higher reliability even at very low voltage levels.We have synthesized the MIMO receiver and conducted SPICE simulation on critical path to show that the error-free performance is achievable even in the worst case scenarios.This is so even with fault injection charge level that is 625 times greater than minimum charge required for logically flipping a gate output in CMOS circuits.The effect of reducing the supply voltage in both present and absence of radiation is also studied.The results show that, with appropriate power management method, the effect of radiation can be eliminated while still keeping the power consumption to minimum.The cooperative asynchronous MIMO design can save upto 90% power for baseband processing while yielding similar bandwidth as that of a synchronous design.
The rest of this paper organized as follow: The overview of the system is given in Section II.In the following, the modeling and setup of simulation are presented in Section III.After analyzing the results from simulations in Section III, this paper is concluded in Section IV.

II. OVERVIEW OF SYSTEM
Here, we present an asynchronous MIMO processor for cooperative satellite comunication using LORD MIMO detecion algorithm [13].We have used a fully parallel LDPC decoder in this study while authors believe that a relatively similar result can be achieved using more advanced architecture [14].Both of these architectures are individually optimized for maximum achievable performance.The receiver not only uses the inherent advantages of MIMO that increase spectral power efficiency, but also uses the properties of asynchronous circuit to overcome the faults introduced to the circuit by radiation.DVFS method is used to increase the system flexibility in different environment while keeping the power consumption at the minimum level.In order to meet the real time requirement of communication system, a deadline is applied to the processing time of each frame.The impact of faults and their consequent effects on the increased delay is negligible due to the error tolerance of the receiver and the iterative properties of decoder.The target throughput of the system is 200 Mbps.
Fig. 2 shows the timing and block diagram of both LDPC and LORD MIMO detector.The detector starts right after a Hard deadline and as soon as the detector calculates the first frame, the decoder starts decoding.Hard deadline is determined by synchronous interfacing necessary for SatComm system application.The process ends when LDPC reaches the maximum allowed number of iterations, or all the check nodes are satisfied.The other factor that can terminate the LDPC routine, is reaching the Hard deadline.If a Hard deadline (upward arrows in the timing diagram of Fig. 2) is reached, the LDPC stops and delivers the last updated values as output.
As depicted in Fig. 2, LORD detector is the first step in the base-band processing and is subsequently connected to LDPC decoder.The buffers that are the interfaces of the detector and decoder, are not shown in the figure.Decoder uses iterative decoding method and after each iteration, it determines if all the check nodes are satisfied.Just before the decoding process ends, a hard decision from the LLR values will be made.The DVFS control unit uses several state information of the LDPC decoder to determine if the voltage should be increased or decreased.In Section II-E, we will describe the function of DVFS control unit in more details.
The ratios in the timing diagram, shown in the figure, are for illustration purposes.There is a possibility of having the third detection completing before the first LDPC process due to unpredictable timing outcome of asynchronous runs.One possible resolution for this problem is to include additional buffers.However that would incurr more power and space overheads.Another solution is simply to prevent the third MIMO process from commencing until the first decoding process is completed.
Another important feature of the proposed architecture is that the detector needs to generate multiple symbols that constitute a frame before LDPC can start decoding that frame.Synchronous to asynchronous (STA) and asynchronous to synchronous (ATS) buffers are used before the reordering unit and after hard derision unit respectively.The clock of ATS buffer is determined by external hard deadline associated with SatCom system throughput.

A. MIMO System
The transmission model of a MIMO system with M transmit antenna and N receiving antenna can be represented as y = Hs + n, where s is the symbol of N × 1 dimensional transmitted signal, H is M ×N complex channel matrix, y is a symbol of M dimensional received vector, and n is an M × 1 vector of additive complex symmetric Gaussian Noise.The entries of s are chosen from a set of complex constellation (Ω).This paper works with 4×4 MIMO arrangement with receivers having the knowledge of channel matrix and variance of noise.For MIMO detection, one approach is to search exhaustively among all the constellation points, known as maximum likelihood (ML) by calculating ŝ = min ∥ y − Hs ∥ 2 , where ∥ • ∥ denotes the 2-norm.However, sphere decoding can successfully reduce the search space by evaluating only those points, which fit inside a sphere around the received signal and it can be formulated as ŝ = argmin ∥ y − Hs ∥ 2 : d(s) < r 2 .This problem of decoding can also be formulated as tree search problem, where each branch is one of the possible transmitted symbols.

B. LORD Detector and LLR Update Unit
The LORD algorithm is one of the best available MIMO detection methods in term of performance to consumed power ratio [13].This detector provides soft output while at the same time keeps the calculation complexity of the minimum.The throughput and completion delay of this decoder unlike many other detectors (depth first search algorithm) is constant.The detector in this study is designed for 4 × 4 MIMO system with 16 QAM modulation.The detector searches all branch of the tree at first layer and takes the best child of each branch for rest of the layers.To provide a reasonable list of candidate that is able to estimate the log likelihood ratio (LLR) values for soft decoding, LORD has to reorder the channel matrix for all levels of the tree and finds the best candidate at each time.The detector processes at 230 Mbps while using 6.8 mW at 0.9 V supply voltage.This is sufficient to accommodate 15 percent delay increase resulting from a radiation event.The output of LORD tree search unit will be delivered to LLR calculation unit which calculates the soft values for the LDPC decoder.

C. LDPC
An LDPC code is defined with matrix called H.Each row of the matrix is a parity check equation and columns are associated with received bits.Using Tanner graph the parity check equation can be called check nodes and the coded bits can be presented by variable nodes.A variable node is connected to a check node if the associated bit in H matrix is one.The process of decoding can be done by passing information iteratively through the edges of the graph.The LDPC used for this experiment is a fully parallel soft LDPC decoder.This LDPC uses H matrix of 2304 with 1/2 rate coding scheme.Although no more than 11 decoding iterations are necessary to achieve maximum gain, with increase of supply voltage it can accommodate 32 iterations in radiation free environment and uses 38.3 mW with 200 Mbps throughput.This LDPC has two timers which are the only synchronous circuits of the design.The first timer is used to keep track of the time from the start of LDPC iterations until the end.The other keeps the duration from the end of iterations until next Hard deadline.This is to measure the time slack for each detection/decoding frame to adjust the DVFS controller.

D. Asynchronous Design
The receiver with pre-charged static logic (PCSL) is presented in [15].Here, a transistor is added to each static gate to enable the pre-charging sequence and the gates of those transistors are connected to request signal (Req.).In the evaluation period each gate works as a static logic gate, which has two inverters that specify if the result of the gate is correct or not.At each stage of the design, Req.signal is received from previous stage, and the acknowledgement signal (Ack) is sent back after the processing ends.In this design, instead of using the concept sub-threshold voltage, supply voltage is varied from 0.7 V to 1.1 V and 0.83 V is the maximum source voltage for system in absence of radiation.The maximum charge, expected to be 25 fQ, is applied to the specific nodes of critical path is 625 times higher than the minimum charge (40 aQ) required to flip a bit in 45 nm technology.

E. DVFS Control Unit
The DVFS control unit uses a simple algorithm to determine if the voltage should be increased or decreased.The system, as shown in Fig. 2, uses continuous on-chip voltage regulators based on the command received from the control unit and their voltage levels change from 0.4 V to 1.1 V.This amount of change in supply voltage can theoretically provide up to about four times the speed scaling for the circuit.The control unit uses four signals to determine whether to increase or decrease the voltage.These four signals are, CH T = 0, Iteration#, LDPC timer, and Done timer.CH T = 0 is the signal that specifies if all the check nodes are satisfied in the last iteration of decoding.C is the matrix of codes and H is a parity check matrix.If CH T = 0, it means that all the outputs are valid codes (does not ensure the correctness for transmuted data).Iteration# is an eight bit bus which specifies the number of LDPC iterations.LDPC timer calculates the time from the start of LDPC decoding until the end of it.Done timer indicates the time from termination of LDPC iterations until next Hard deadline.Control unit starts with checking if CH T = 0 is satisfied; if so, it means that all the necessary calculations have been done in time.In this case, control unit only has to check if there is a room for reducing supply voltage.If CH T ̸ = 0, the control unit checks Iteration# and a request for increase in voltage is made unless the Iteration# is 11 (maximum allowed iteration).If the number of iterations is already 11, the control unit checks for the possibility of voltage decrease.This basically means if the available time is bigger than one LDPC iteration compared to the last received data, operating in the same scenario.

III. SETUP AND SIMULATIONS
The radiation that satellites are subjected to can impact the circuit of the receiver and introduces charge.If the charge is high, it will cause permanent damages.This will cause an satellite node outtage but the cooperative MIMO system can continue to function with remaining satellite units.The charges not forcing permanent damages, may cause fault in the synchronous circuit.Certain asynchronous circuits are designed to eliminate the faults by 100% at the cost of increasing delay.The design of receiver is able to tolerate the charges that cause around 60% delay, while the supply voltage is 1.1 V.The study of the effect of charges applied to the critical path is presented in section III-A.
For a specific throughput, the time to process one MIMO symbol is pre-determined and depends on delay caused by radiations and supply voltage.As a result, the number of LDPC iterations will be affected by the increase of computation delay for both MIMO detector and LDPC decoder.The decoding fidelity over SNR with different radiation and supply voltage is simulated using Matlab, as presented in section III-B.

A. Study of Critical Path
The design can tolerate any change in delay as long as it is less than 50% without degradation in performance, while the power supply voltage is 1.1 V. To set up the exploration on relationship between error injection and delay, the RTL coding for the receiver is synthesized using Synopsys Design Compiler.The critical path is extracted in SPICE netlist by Synopsys Primetime.The error model presented in [16] including dual exponent current injection model, is used in our experiments, where the radiation is modeled as a current source connected to nets.For a certain level of radiation, a quantity of electric charge at random rate will be injected to a random net during the transition time.Two constraints are applied to the delay simulator.The first is to pick up the fault injection time points, so that the resulting delay can be propagated to the output of the circuit.Secondly, the whole process needs to be done in such a way that delay never decreases, thereby simulating the worst case scenario.In order to obtain the effect of radiation on delay, SPICE simulations are run on extracted critical path for both detector and decoder to examine the delays at different levels of electric charges.Fig. 3 shows the average delay caused by different charges in described simulation settings.The results are presented in percentage.The effect of the charge on each circuit is different because of their differences in architecture and sizes of the gates.The maximum charge applied to the circuit causes around 12% delay and it is more than 600 times the minimum required to flip a gate value in 45 nm technologies.The system will lose performance only if the supply voltage is less than necessary value.Next we will study the results of power and performance of the system for different voltages and radiation effects.

B. Results
This system has the ability to cope with very harsh environments as well as can reduce the power consumed effectively for the noise and radiation free environment.In very high SNRs and low radiation environment, the power usage of the system can be reduced to 3.2 mW with 0.7 V as power supply and very slow system.In this case, LDPC uses only one iteration to decode the received data and most of the available calculation time is dedicated to detector.To achieve the exact power numbers, the designs are synthesized using TSMC 45 nm CMOS Technology and Synopsys Design Compiler.Moreover, Matlab simulation is performed to calculate the BER of the system.These simulations are for 100000 frames (around 230 Mb) or 100 errors whichever comes first.The results of these simulations are presented in Fig 4 .The left axis is presenting the BER while the right one shows the power in mW.The power curves are doted lines for different voltages of power supply and different charges (presenting different radiation situations).The BER curves that are presented with solid lines have the same marker as their paired power curves.Presented curves are chosen to demonstrate the ability of system to reduce power consumption and keeping the performance perfect at the same time.The figure shows that even with power supply set to 0.9 V, the system can tolerate maximum delay caused by radiations.Essentially this can be inferred comparing the 0.9 V and 25f Q performance curve with that of 1.1 V and 0f Q as both have the same shape.The power consumption for 1.1 V curve is higher, but the time took by LDPC is less since in both situations, LDPC would have enough time to accommodate the maximum of 11 iterations.
The Fig. 4 also shows the effect of different charges on the performance of the system, when the supply voltage is 0.8 V.While the performance of the system in 0.8 V almost (0.1 dB difference) matches the maximum expected performance, it can cause more than 2 and 3 dB loss in performance for 10 and 25 fQ charge respectively.This shows that the mismanagement of supply voltage can result in serious performance loss.
IV. CONCLUSION The employment of SatCom system includes overcoming the communication challenges such as bad channel condition, SNR etc. as well as maintaining reliable and predictable communication despite of bad weather condition even with limited source of power.This paper refers to a co-operative MIMO with low power, fault tolerant asynchronous circuit design ensuring uninterrupted services even at the failure of one unit.The proposed idea of MIMO can exploit both spectral efficiency and diversity for achieving near Shannon limit gain, and use of DVFS-assisted asynchronous circuit includes the guarantee of low power consumption and 100% fault tolerability.The simulation results show the perfect tolerability as opposed to the radiation which applies up to 25f Q on critical path.This offers more than 600 times (40a F) the minimum charge necessary for flipping an output.The results also shows that the mismanagement of power supply can cause more than 3 dB performance loss.

Fig. 2 .
Fig. 2. Timing and block diagram of asynchronous MIMO receiver

Fig. 4 .
Fig. 4. Power consumption and BER performance in different radiation and voltage scaling for SNRs range of 1 to 13 dB.