On Error Probability of CRC/Polar Codes with List Decoding

—Error performance of polar codes on the AWGN channel is discussed in conjunction with list decoding with or without an outer CRC code. The two mechanisms for block error are formulated, called type I (list misses) and type II (list hits but wrong vector selected). Two slightly-different ways of utilizing the CRC code are described, having identical type I error probability, but differing type II error probability dependent primarily on CRC code length. Results are presented for rate 1 / 2 polar codes and antipodal modulation with blocklength 256 and 2048. The cases presented illuminate design tradeoffs on the selection of list size, CRC length, and especially the issue of ﬁnal list sorting.


I. INTRODUCTION
Polar codes, a class of linear block codes invented by Arikan [1], are the first provably capacity-achieving codes on a binaryinput memoryless channel with an explicit construction and with a low decoding complexity. Though capacity-achieving as blocklength grows, even with simple successive cancellation decoding, finite blocklength performance is inferior to turbo codes and LDPC (low density parity-check) codes with the same blocklength.
It was shown in [2] that it's possible to improve the performance of polar codes by using a successive cancellation list (SCL) decoder. The SCL decoder with list size L does a tracking of a list of L candidate codewords, and then picks the best and most probable path as its decision during the final stage of the decoding. By adding an outer CRC code, and using this to select from the inner decoder's list, performance can be improved to be competitive with 'best' comparable codes, at modest complexity. The outer CRC code can also have the benefit of increasing the minimum distance of the concatenated code, [3,4].
In this work we study performance on the binary antipodal AWGN channel for rate 1/2 coding approaches, with short and medium blocklengths. The emphasis is on the particular contributions of various events leading to block error and their dependence on signal-to-noise ratio, decoder list size, CRC length, if any, and the role of list sorting. While we focus on classic polar codes, the presentation generalizes to variations, including Reed-Muller/polar codes [5] and others that follow the polarization and successive decoding paradigm.
The paper is organized as follows. In Section II, polar codes codes are reviewed. In section III we study polar codes without CRC, formulating the error types, and in section IV the results are extended to use of CRC outer codes.. Conclusions are provided in Section V.

A. Polar Codes
We assume familiarity with the literature on polar codes, first proposed by Arikan [1], including code construction and successive cancellation decoding. For notation we let N = 2 n denote the blocklength of the code, R in = K in /N the code rate of the polar code, and L the decoder's list size. In polar coding an underlying message vector u contains K in unfrozen bits, and N − K in frozen bits, commonly set to 0, although message-dependent choice of frozen bit state can offer some gains, [6]. Unfrozen (message-carrying) bit locations are chosen by some sort of selection process in the design of the code, typically based on mutual information of the N bit channels, conditioned on previous correct bit positions.
The transmitted codeword x is generated by where F = [1 0; 1 1] is the kernel of the polarizing transformation, and F ⊗n denotes the n-fold Kronecker product of F.
In contrast with the original bit-by-bit successive cancellation decoding of Arikan, list decoding [2] is normally supported by an error-detecting outer code, typically a lowoverhead CRC code, that can usually locate the correct message vector, if it is on the list produced by the inner polar list decoder. While the simplest decoder in such situations follows the conventional checking list candidates for zero CRC remainder, superior performance at a given CRC overhead is obtained by adding list sorting to CRC checking, as seen below. Such a decoder is capable of approaching the performance of maximum likelihood decoding for the concatenated code, as list size becomes sufficiently large and the CRC code has sufficiently small undetected error probability.

B. Related work on list decoding
Though polar coding was shown to be capacity-achieving for binary symmetric memoryless channels, the approach to capacity with blocklength is relatively slow, and bit-by-bit decoding is inferior to good LDPC codes, say, with equal rate and blocklength. First, [2] showed that list decoding could improve the situation so that performance is competitive with other well-known error control techniques, while providing an attractive complexity/performance tradeoff. Murata et al [3] built upon this by studying the effect of CRC code selection on block error probability, for a fixed list size of 32. There the tradeoff on CRC length becomes evidenttoo short means the undetected error probability is large and false positives degrade the performance, but too long a CRC code implies an energy penalty, shifting the performance curve to the right in SNR. Murata et al showed there is a weaker dependence on the particular code polynomial of a given length as well. Reference [4] has also examined the impact of CRC selection.
A study in [7] is also relevant to this work, focusing on the coding gain versus length of CRC code and the tradeoffs involved in this paper, without specific separation of decoding error mechanisms.
More broadly, Seshadri and Sundberg [8] have studied the benefits of list Viterbi decoding of convolutional codes, when supplemented with a reliable error-detecting outer code. Their analysis shows that in the low P e region, the asymptotic gain in SNR relative to the case of L = 1 on the same code is which approaches 3 dB for large L. Even small values of L provide remarkable gain, e.g. L = 4 projects about 2 dB asymptotic gain.

III. LIST DECODING WITHOUT CRC CODE
To establish the framework for the rest of the paper, we first consider list decoding, but without a CRC code.
Decoding of polar codes is based on efficient sequential computation of the posterior probability of unfrozen bits in the underlying message vector, derived from all the measurements of the noisy bits under the assumption that previously-decided bits are correct. This is referred to as successive cancellation (SC) decoding. The computation operates in a highly-regular graph of 2 x 2 butterflies having breadth N and depth n = log 2 N , and can be efficiently implemented in a depth-first traversal of a binary tree of depth n, providing O(N log 2 N ) complexity.
In contrast to the original bit-by-bit successive cancellation decoder of Arikan, the list decoder introduced by the authors in [2] sequentially builds a size-L list of candidate message vectors having largest cumulative metrics to the current bit position in the estimate of u, with L being a key design variable. Decoding begins with deciding the first unfrozen bit among the K in = N R in unfrozen bits contained in u. As the decoding proceeds, skipping over known frozen bit positions, the decoder forms two extensions of each previous list member, forms the cumulative metric to date for these 2L candidates, and by a sort on metrics preserves the best L vectors for the next cycle. This continues until the message vector estimates have length N. Reference [9] provides a comprehensive description of efficient list decoding in the LLR domain.
The performance advantage of list decoding rests on the possibility to avoid early dismissal of the eventually-correct codeword that a bit-by-bit decoder may produce. Since polar codes are linear codes, on the binary Gaussian channel we can assume the message vector u is the all-0's vector, an Ntuple including frozen bit positions set to 0. The list decoder will fail to include the all-0 vector on its final list if at some decision stage k the prefix vector (all 0's as well) does not survive this extend-and-sort cycle, for once the all-0 prefix is flushed from the candidate list, it can never reappear later.
Once the list is built at the end of decoding, we have L Ntuples as candidate codewords. All are valid codewords from among the 2 N Rin codewords of the polar code. Final choice of the decoded vector (again, no-CRC help as yet) is based on a final sort of the K in likelihoods, i.e. we output the best in the list as the decided codeword.
We denote by P e the block error probability for the decoder, implicitly a function of N , L and SNR, as well as the finer details of frozen bit selection. There are two ways a block error can occur at the end of decoding, which we denote type I and type II. A type I error occurs when the correct vector (all-0's) is not on the list, and a type II error event occurs when the correct vector is on the list, but one of the other L − 1 vectors has a better overall likelihood, or in other words the correct vector is not at the top of the list after final metric sort. When we refer to SNR we mean E b /N 0 as usual.
Letting miss denote the event that the correct word is not on the list, we can write P e = P (miss)P (error|miss) + (1 − P (miss))P (error|no miss) = P (miss) + (1 − P (miss))P (error|no miss) Another expression for block error probability is obtained by alternate expansion of the above: P e = P (miss)(1 − P (error|no miss)) + P (error|no miss) (4) These two expressions give an immediate lower bound to error probability: An upper-bound also readily follows: The two key probabilities influencing the error probability are thus P (miss) and P (error|no miss). Both of these are functions of N, L and SNR, though they appear difficult to obtain analytically, given the sequential nature of decoding and the need for order statistics. We resort to simulation to obtain these probabilities in what follows. P I = P (miss) is clearly a monotonically-decreasing function of SNR, for any given L, and likewise in L for any fixed SNR. In the other direction P (miss) → 1 as SNR or L decreases. It is worth note that P (miss) → 0 for sufficiently large L at any non-zero SNR, though very large L is required as SNR drops near or below the Shannon limit.
The second term of interest, P (error|no miss), is the conditional probability that the transmitted code vector resides on the final list, but that some other list member has better likelihood. We compute this empirically as well by Monte Carlo methods, checking for all-0 vector being on the list, but not the best of the list, and normalizing by number of list hits.
While P I steadily decreases for any SNR as L grows, P II saturates in L and eventually dominates the overall block error probability; choice of L involves subtle balancing of these under the constraint of computational complexity.
Though unrealizable for typical blocklength, an ML decoder's block error probability is also of interest, as a standard of comparison. ML decoding can be viewed as letting list size become arbitrarily large, so that all codewords have likelihoods evaluated. Then P (miss) vanishes. For any finite L we have a lower bound P e,M L ≥ P (error|no miss) and a good approximation to the ML decoder performance is simply P (error|no miss) for L = 64 say.

A. Numerical results: no CRC case
Simulation results for N = 256, R = R in = 1/2 are now shown, versus SNR in the range of -1 dB (slightly beyond the Shannon limit for R = 1/2) to 5 dB, and for L ∈ {1, 4, 16, 64}. E b /N 0 is the energy-per-information bit to noise power density ratio, as usual.
First we show P (miss) = P (type I error) (Fig 1), then P (error|no miss) (Fig 2). As discussed above P (miss) steadily diminishes with L, though slowly after L = 16, and all curves are approaching 1 as SNR drops. P (error|no miss) on the other hand is basically insensitive to L. The simulated probability of block error is shown in Figure 3, and is a weighted sum of the first two by (3) above. Note that with no CRC assistance, P e saturates versus L quickly around L = 4, as P II dominates thereafter. This behavior is tied to the relatively poor minimum distance of the polar code, 8 for this blocklength. (Codes with larger blocklength will not saturate until a larger L.) Moreover for this case, list decoding provides little or no benefit over SC decoding, except at low SNR (and commensurate high error probability).
List decoding (with sufficient list size) is a means of achieving ML decoding for the overall code as SNR increases, and this remains true whether an outer CRC code is present or not. To illustrate we calculate the first term of a union bound for the case of N = 256 and no CRC. Since the minimum Hamming distance for this rate 1/2 code is 8, and there are 96 nearest-neighbor codewords (found by use of a list decoder at which agrees well with the simulated performance in Figure  3.

IV. LIST DECODING WITH CRC CODE
Now we add an outer code based on a simple CRC code imposing C check bits. The purpose is not to do concatenated, or joint, decoding, but to aid in selecting the correct code vector from the list. A side benefit of such a concatenation is that the overall minimum distance of the code may be increased, see [3,4]; however the inner decoder does not utilize the constraints imposed by the outer code.
A perfect outer code would point to the correct code vector, assuming it is on the list, and only to this vector. In this case, P e = P (miss), and as seen earlier, increasing list size can make this probability small, at the expense of decoding complexity. However, a finite-length CRC code can produce false positives, i.e. can give a CRC remainder of all-0's for where C is the number of check bits appended by the CRC code, [10]. There emerges a design tradeoff here-larger C gives a more reliable CRC check, but at the cost of an energy penalty and rate loss.
In the following, we assume that the overall code rate R is fixed here, and so the inner polar code rate R in becomes If K >> C then the rate change is negligible, but for finite parameters this energy and bandwidth penalty needs to be acknowledged. In this regard, when we plot versus E b /N 0 , the small energy penalty on polar code bits is properly incorporated. Two ways of applying the CRC check are the following: Obviously case 2 imposes additional final sorting computation, increasing with L, and the procedure of [2] seems better in this regard.
As before, the block error probability in either case is P e = P (miss)P (error|miss) + (1 − P (miss))P (error|no miss) = P (miss) + (1 − P (miss))P (error|no miss) which again establishes P (miss) as a lower bound on error probability. If the CRC code is a perfect pointer to the correct vector when it's on the list, then we'd have P (error|no miss) = 0, and P e = P (miss). But as earlier argued P (miss) goes to zero for sufficiently large L, for any SNR, including those SNR's beyond the Shannon limit. The necessary reconciliation is to recognize that as L becomes large, the undetected error probability increases, and type-II events dominate. A general design strategy is to make L reasonably large to nearly ensure the correct vector is on the list, and use a sufficiently strong CRC to reach or approach the ML bound.
Before showing results, we approximate P (error|no miss) for the two cases.
Case 1: Assume the final list of candidate codewords contains the correct codeword. (Note the complete list is normally not sorted according to likelihood metric.) Let the correct vector be in location i, i = 1, 2, ...L in the decoder's list. These candidates are subjected to the CRC check in natural order, with the first sequence passing the check selected as the transmitted codeword. Again, given a random binary vector, the probability of false positive is 2 −C , [10], and given that the correct vector is in location i, any of i − 1 false positives can occur. We can upper-bound this conditional false positive probability using a union-bound as Thus the unconditional probability of false positive is (i − 1)P (correct vector at i) (13) This can be further simplified to We have studied via simulation the conditional p.m.f. for correct vector location, and found that it's a uniform distribution on 1, 2, ....L, as expected for random message selection. 1 Thus, the conditional expected value for i is approximately (L + 1)/2 and the approximate value for P (error|nomiss) is independent of SNR.
Simulation results show this overbounds somewhat the correct probability, and the conditional probability does decrease slowly with SNR. This is due to the fact that vectors on the list tend to be codewords close in Hamming distance to the correct vector, and the random selection model above is less valid. Similar effects are seen in undetected error probability for CRC codes on binary channels as the channel error probability decreases, [10].
A reasonable design approach is to make type II error probability well below the targeted block error probability. If this target is P e , then the CRC length C should obey for whatever L is selected in the design to meet P e . Thus a P e = 10 −4 target dictates C ≥ 18 with L = 64, for example. By the argument above, this is a conservative estimate. Figure 4 illustrates for N =256 and C = 8 the steady drop in P I with list size and SNR, (identical for cases 1 and 2), while Figure 5 shows P (error|no miss) (case 1) for L=4, 16, and 64. Note the increase with L in Figure 5, and it is this is the source of the high error 'floors' in P e for this case ( Figure 6). Clearly C = 8 is a poor choice for this decoding option.
where d min is the minimum Hamming distance attached to the inner code. The extra Q-function leverage compared to case 1 reduces the type II error probabilities typically by several orders of magnitude relative to that for case 1, and explains the improved performance in P e seen below. We also note that (17) collapses to (15) as SNR drops.
A. Numerical results: N = 256, (Figures 4, 7) To illustrate we begin with N = 256, R = 1/2, and C = 8, so the rate loss is small. The false-positive probability for CRC checks is now proportional to 2 −8 under the random-selection model, so for L = 64 the approximation for case 1 decoding would be P (error|no miss) = .00048. Figure 7 (case 2) on the other hand has P e performance that mimics P I , tantamount to saying that P II events have negligible probability, even with C = 8.
The net gain offered by the outer code with N = 256 can be found by comparing Figures 3 and 7. At P e = 10 −3 for example, L = 64 with CRC-8 provides a net 1.4 dB gain over L = 1 without CRC. (With no CRC, a smaller L is sufficient as shown earlier.) This gain fairly incorporates the small overhead attached to the CRC code. The energy gain grows as one moves to even smaller P e . We assert that for this concatenated CRC/polar code performance with L = 64 and case II processing is close to ML performance.  First, the curves are steeper and shifted to the left as expected. Notice case 1, (Figure 9) exhibits flaring of the performance curve at lower P e whereas this disappears in case 2, (Figure 10) due to much smaller type II error probability.
Case 1 performance can be easily made to match that of case 2 by increasing the CRC length to C = 24, which for N = 2048 represents a negligible overhead, and simplifies decoder processing. Finally, we note case 2 results are consistent with those of [2] for this blocklength; there the CRC length was actually 8.

V. CONCLUSION
We have made explicit the two types of error leading to decoding error with list decoding of polar codes, for both simple CRC checking and list sorting followed by CRC checking. The latter can work well for shorter CRC code, but the former works well for only minor increase in CRC length, which often is a negligible overhead penalty. Type II error probability falls as 2 −C in both cases, so a small increase in C can repair any 'floors' in performance due to type II error dominance. In fact the floor with L = 64 is roughly equivalent for both blocklengths, as suggested by the approximate analysis earlier in the paper. Again, a longer CRC code can repair this degradation, and for N = 2048 say, the overhead penalty is negligible.