An Enhancement of Decimation Process using Fast Cascaded Integrator Comb (CIC) Filter

The oversampling technique has been shown to increase the SNR and is used in many high-performance systems such as in the ADC for audio and DAT systems. This paper presents the design of the decimation and it's VLSI implementation which is the sub-component in the oversampling technique. The design of three main units in the decimation stage that is the Cascaded Integrator Comb (CIC) filter, the associated half-band filters and the droop correction are also described. The Verilog HDL code in Xilinx ISE environment has been derived to describe the CIC filter properties and downloaded into Virtex II FPGA board. In the design of these units, we focus on the trade-off between the speed improvement and the power consumption as well as the silicon area for the chip implementation.


I. INTRODUCTION
The most popular A/D converters for audio applications are realized based on the use of over sampling and sigma-delta (∑Δ) modulation techniques followed by decimation process [1].Oversampled Sigma delta (∑Δ) modulator provides high resolution sample output in contrast to the standard Nyquist sampling technique.However at the output, the sampling process is needed in order to bring down the high sampling frequency and obtain high resolution.The CIC filter is a preferred technique for this purpose.In 1981, Eugene Hogenauer [2] invented a new class of economical digital filter for decimation called a Cascaded Integrator Comb filter (CIC) or recursive comb filter.This filter worked with sampling frequency of 5 MHz.Additionally the CIC filter does not require storage for filter coefficients and multipliers as all coefficients are unity [3].Furthermore its onchip implementation is efficient because of its regular structure consisting of three basic building blocks, minimum external control and less complicated local timing is required and its change factors is reconfigurable with the addition of a scaling circuit and minimal changes to the filter timing.It is also used to perform filtering of the out of band quantization noise and prevent excess aliasing introduced during sampling rate decreasing.Hence enhanced high speed will be key issue in chip implementation of CIC decimators.In 1998, Garcia [4] designed Residue Number System (RNS) for pipelined Hogenauer CIC.Compared to the two's complement design, the RNS based Hogenaur filter enjoys an improved speed advantage by approximately 54%.Similar structure by Meyer-Baese [5] has been implemented to reduce the cost in the Hogenauer CIC filter which shows that the filter can operate up to maximum clock frequency of 164.1 MHz on Altera FPLD and 82.64 MHz on Synopsys cell-based IC design.
This paper shows the implementation of the high speed CIC filters which are consist of three parts, integrator, comb and down sampler.The CIC filter is considered as recursive filter because of the feedback loop in integrator circuit and it can work with maximum throughput of 190 MHz.
The next section describes the mathematical formulation and block diagram of CIC filters in detail.Enhanced high speed architecture is explained in section III.Section IV shows implementation and design result in brief.Finally conclusion is expressed in section V.

II. DEVELOPMENT OF A DECIMATION FILTER
The purpose of the CIC filter is twofold; firstly to remove filtering noise which could be aliased back to the base band signals and secondly to convert high sample rate m-bit data stream at the output of the Sigma-delta modulator to n-bit data stream with lower sample rate.This process is also known as decimation which is essentially performing the averaging and a rate reduction functions simultaneously.
Figure 1 shows the decimation process using CIC filter.The two half band filters [6] are used to reduce remain sampling rate reduction to the Nyquist output rate.First half band filter and second half band filter make the frequency response more flat and sharp similar to ideal filter frequency response.Droop correction filter is allocated to compensate pass band attenuation which is created by the CIC filter.The frequency response of overall system will be shown in section V.
Table 1 shows filter specification in decimation process.The CIC filter consist of N stages of integrator and comb filter which are connected by a down sampler stage as shown in figure 1 in z domain.The CIC filter has the following transfer function: where N is the number of stage, M is the differential delay and R is the decimation factor.In this paper, N, M and R have been chosen to be 5, 1 and 16 respectively to avoid overflow in each stages.From the equation, the maximum register growth/width, max G can be expressed as: In other word, max G is the maximum register growth and a function of the maximum output magnitude due to the worst possible input conditions [2].
If the input data word length is in B , most significant bit (MSB) at the filter output, max B is given by: In order to reduce the data loss, normally the first stage of the CIC filter has maximum number of bit compared to the other stages.Since the integrator stage works at the highest oversampling rate with a large internal word length, decimation ratio and filter order increase which result in more power consumption and speed limitation.

A.Truncation for low power & high speed
Truncation means estimating and removing Least Significant Bit (LSB) to reduce the area requirements on chip and power consumption and also increase speed of calculation.Although this estimation and removing introduces additional error, the error can be made small enough to be acceptable for DSP applications.Figure 3 illustrates five stages of the CIC filter when max B is 25 bit so truncation is applied to reduce register width.Matlab software helps to find word length in integrator and comb section.

B. Pipeline structure
One way to have high speed CIC filter is by implementing the pipeline filter structure.Figure 4 shows pipeline CIC filter structure when truncation is also applied.In the pipelined structure, no additional pipeline registers are used in integrator part.So that hardware requirement is the same as in the non-pipeline [7].The CIC decimation filter clock rate is determined by the first integrator stage that causes more propagation delay than any other stage due to maximum number of bit.So it is possible to use a higher clock rate for a CIC decimation filter if a pipeline structure is used in the integrator stages, as compared to nonpipelined integrator stages.The clock rate in integrator section is R times higher than in the comb section.Previously, the pipeline structure for CIC filter was applied just for integrator part since the maximum clock rate is determined by the integrator.The above architecture showed that the maximum throughput was increased by 20 MHz when the pipeline structure is used for all the CIC parts consisting of integrator, comb and down sampler.

C. Modified Carry look-ahead Adder (MCLA)
The other technique to increase speed is using Modified Carry Look-ahead Adder.The Carry Look-ahead adder (CLA) is the fastest adder which can be used for speeding up purpose but the disadvantage of the CLA adder is that the carry logic is getting quite complicated for more than 4 bits so Modified Carry Look-ahead Adder (MCLA) is introduced to replace as adder.This improve in speed is due to the carry calculation in MCLA.In the ripple carry adder, most significant bit addition has to wait for the carry to ripple through from the least significant bit addition.Therefore the carry of MCLA adder has become a focus of study in speeding up the adder circuits [8].The 8 bit MCLA structure is shown in   The remaining 3 stages involve the reduction of the sampling frequency by the decimation factor of 2 only which are carried out by the first half band, droop correction and the second half band respectively.Figure 6 illustrate the frequency response of the overall decimation filter when the sampling frequency is 6.144 MHz.The CIC filter Verilog code was written and simulated by Matlab software.The signal to noise ratio is 141.56 dB in sigma delta modulator output and it is increased to 145.35 dB in the decimation stages.To improve the signal to noise ratio, word length of recursive CIC filter should be increased but the speed of filter calculation is also decreased.
The chip layout on Virtex II FPGA board has been shown in Figure 9.
Fig. 9 The core layout on FPGA board IV.CONCLUSION Recursive CIC filters have been designed and investigated.Enhanced high Speed CIC filters was obtained by three ways.The pipeline structure, using the modified carry look-ahead adder (MCLA) and truncation lead us to have high speed CIC filter with the maximum throughput of 190 MHz.The evaluation indicates that the pipelined CIC filter with MCLA adder is attractive due to high speed when both the decimation ratio and filter order are not high as stated in the Hogenauer Comb filter.Since the first stage of the CIC filter require maximum word length and also because of the recursive loop in its structure, the reduction in power consumption is limited by the throughput.Thus the truncation will reduce the power consumption and the number of calculation.The power consumption computed using CAD tools (Cadence and Synopsys) and 0.18 μm Silterra technology library gives 3.5 mW power consumption at maximum clock frequency.

Fig. 2
Fig. 2 One-stage of CIC filter block diagram

Fig. 4
Fig. 4 Five-stage of truncated pipeline CIC filter

Figure 5 .
Its block diagram consists of 2, 4-bit module which is connected and each previous 4 bit calculates carry out for the next carry.The CIC filter in this paper has five MCLA in integrator parts.The maximum number of bit is 25 and it is decreased in next stages.So it truncated respectively to 25, 22, 20, 18 and 16 bit in each adder, left to right Notice that each 4-bit adder provides a group propagate and generate Signal, which is used by the MCLA Logic block.The group Propagate P G and Generate G G of a 4bit adder will have the following expressions: most important equations to obtain carry of each stage have been defined as below: Calculation of MCLA is based on above equations.8-Bit MCLA Adder could be constructed continuing along in the same logic pattern, with the MSB carry-out resulting from OR & AND gates.The Verilog code has been written to implement addition.The MCLA Verilog code was downloaded to the Xilinx chip.From Xilinx ISE synthesize report, it was found minimum clock period is 3.701ns (Maximum Frequency is 270 MHz).