An error-locating signature analyzer to identify faulty units in digital systems

In conventional signature analysis, faulty units are identified by mismatch between the actual and reference signatures. The amount of reference signatures can be quite large for a complex system that requires high testing resolution. This complicates the diagnosis procedure. We show how under certain restrictions this amount can be reduced, preserving the diagnosis resolution and the aliasing rate. We construct a signature analyzer that is capable of locating faulty units. If the analyzer is implemented in the external automated test equipment, its throughput requirements can be diminished.


INTRODUCTION
The density and complexity of integrated circuits have grown as modern digital technology has developed.These circuits are used in advanced applications whose down time due to failure can be critical.Therefore, it is very important to detect improper functions of these applications as efficiently as possible.Here efficiency means that (i) only a small number of faults can escape detection and (ii) the test hardware overhead is low.
One of the popular methods satisfying the above conditions has been signature analysis [1].A sufficiently large number of test stimuli are applied to the inputs of a unit under test (UUT) in order to detect as many faults as possible.The output response is compacted into a short signature.If the actual signature does not match the expected error-free signature, the UUT is considered to be faulty.
A UUT is often partitioned into a certain number of replaceable units (RUs), each of which can be characterized by a unique signature.Hardware complexity of a diagnostic system based on signature analysis and being able to locate faulty RUs is proportional to the amount of RUs.This is particularly sensible for self-test systems built-in to complex UUTs.In such systems, test response compaction is conducted on the same chip as the UUT [2].Any reduction of the number of reference signatures required to diagnose the given amount of RUs, without deterioration of the probability of error escape would increase efficiency in testing.In this paper we demonstrate how to solve this problem, provided that the maximum number of RUs, which may fail simultaneously, is known a priori.
The problem of extraction of diagnostic information from signatures has attracted attention of test researchers for a long time.In [3] authors use modular and burst errorcorrecting codes to identify erroneous bits within the output response of the UUT.In [4] and [5] a technique is developed to locate single errors in the output sequence, using a linear feedback shift register (LFSR) and multiple input signature registers, (MISRs) respectively.In [6] authors propose to use a multiple error-correcting BCH code for diagnostic purposes.The techniques developed in [3] - [6] are oriented to a relatively small number of errors, since the test hardware complexity rapidly increases with the growth of potential errors.Reference [7] considers an X-tolerant convolutional compactor with low aliasing rate, which is used to diagnose faulty cells of scan chains.This approach also deals with a small number of errors and the size of signatures is greater compared to the traditional (deconvolutional) approach.The diagnosis procedure is complex for built-in implementation; it is based on heuristics and a significant amount of time is spent on its completion.
The methods developed in [8] and [9] are aimed to identify any arbitrary number of errors in the output responses.They are based on a two-level procedure that uses two different classes of error-control codes.This procedure resembles a technique used in an error-control theory to construct an error-locating code (ELC) [10].As we will see below, ELCs require less redundancy than considered in [8] and [9].The encoding/decoding procedures for ELCs are quite simple as well.But their application to testing is not straightforward.In this paper we will consider how ELCs can be employed for the diagnosis of digital circuits.The approach can be used at the chip, board or system level.

PROBLRM STATEMENT
A general form of a UUT broken into n '' ×q replaceable units RU 00 , ... , RU n''q is represented in Fig. 1.To prevent propagation of errors through the feedback paths, the UUT comprises tri-state buffers T. While testing, all the feedback paths are broken.Thus, Test stimuli are applied to the UUT and the output responses coming from the RUs are shifted into an LFSR or MISR.It is important to probe RUs in the level order.A level of an RU is the maximum distance of its inputs from the inputs to the UUT.Testing starts from the lowest level RUs.If signatures of RUs in this level are erroneous, then those RUs will be repaired.Afterwards, testing resumes to the next level of RUs.This procedure ensures that errors in the RUs are independent.We will refer to the RUs of the level i simply as a slice i.If the diagnosis procedure is automated, then the size of the memory to store error-free signatures for the entire UUT is n '' ×q×σ bits, where n '' is the number of RUs in a slice, q is the number of slices, and σ is the bit-length of the signature analyzer (coinciding with the number of outputs of an RU in case of using a MISR).These requirements are normally satisfied in scan chained UUTs, in this case the scan outputs are directly connected to the signature analyzer.Distinct portions of the data sequences obtained through those chains characterize distinct RUs.
Our objective is to locate erroneous RUs within a slice as efficiently as possible (i.e. with maximum fault coverage and minimum hardware overhead) provided that any number of errors can occur in faulty RUs and the number of RUs that can fail simultaneously, does not exceed t.
There are a few available classes of codes that can be exercised here, but none of them can be directly applied to attain the objective.Indeed, error-detecting codes (EDCs) only detect errors.Error-correcting codes (ECCs) can in addition, locate and correct errors in data sequences.However the complexity of the decoder readily becomes impractical for long word lengths and/or for large number of errors, that may occur in the UUT.In fact, error correction is not always necessary.It is often quite enough to identify faulty RUs which can be repaired afterwards.In communication, such situations are remedied by ELCs lying in-between EDCs and ECCs.ELCs can identify erroneous sub-words within a word.Their great advantage is that they possess less redundancy versus ECCs.Though, in the presence of multiple errors ELC's redundancy may become quite large.We will show how to reduce the redundancy of ELCs to practical levels at the expense of some deterioration of the code capabilities.As we see below, this deterioration is fully acceptable for testing.Therefore, the most favourable choice toward achieving the main objective will be ELCs.The codec for an ELC will be able to locate faulty RUs and will be referred to as an error-locating signature analyzer.

DIAGNOSTIC METHOD
ELCs were introduced by Wolf and Elspas in [10].Goethals has constructed a sub-class of ELCs that is equivalent to cyclic codes under certain permutation of coordinates [11].Since cyclic codes tolerate simple practical realization, we will concentrate on encoding/decoding procedures for that particular class.
Existence of cyclic ELCs is stated in the following theorem [11].
Let C ' of block length n ' be a cyclic d-error-detecting code, generated by the product g(x) = g 1 (x)…g r (x) of r distinct irreducible factors over GF(q), and let GF(q m ) be the smallest extension field of GF(q) containing all roots of g(x).
Let C '' , of block length n '' relatively prime to n ' , be a cyclic t-error-correcting code generated by the product G(x) = G 1 (x)…G s (x) of s distinct irreducible factors over GF(q m ), and let GF(q mp ) be the smallest extension field of GF(q m ) containing the n ' n '' -th roots of unity.
Then there exists an ELC C of length n = n ' n '' and having not more than rspm check digits over GF(q), which locates up to t erroneous sub-words, each sub-word having not more than d errors.Furthermore, C is equivalent to a cyclic code.
Redundancy of the code C is 6 x 4 = 24 bits.C is a (63,39) -code over GF (2), which locates 2 out of 9 subblocks of length 7 distorted by not more than 6 errors each.A conventional analyzer would require 9×6=54 bits.
Encoding procedure for this code uses the following generator polynomial: (1+x+x 2 +x 4 +x 6 ) (1+x 2 +x 4 +x 5 +x 6 ) (1+ x+x 3 +x 4 +x 6 ) (1+x+x 6 ).The procedure is simple; however the bits coming from distinct sub-words must be permuted.This does not cause a problem for communication applications, but is inconvenient for diagnostic purposes: there must be a multiplexer in the test system, which works in compliance with the permutation table.
Another problem that appears when the ELCs are applied to testing is that convenient ELCs deal with relatively small numbers of errors in sub-words.
The decoding procedure coincides with the known decoding procedure for the t-error-correcting code C ' ' .
In order to alleviate the second problem of encoding mentioned above, we will extend the length of code C ' keeping its redundancy at the same level.If the parameters of the original and extended codes are (n ' , k ' ) and (N ' , K ' ) respectively, where n ' -k ' = N ' -K ' =r ' , the extended paritycheck matrix will be [HH…H], where the standard matrix H are repeated N ' / n ' times.Such a manipulation makes the resulting ELC non-optimal.And some errors may now escape detection.If any combination of independent errors is likely to occur and N ' is quite large, the probability of undetected error P nd (or the aliasing probability P al ) can be estimated as P nd = P al ≤ 1/2 r' .For example, r' = 16, then P nd = 0.000015.A minor increase of P nd versus 0 is fully justified by reduction of hardware for the decoder.
To solve the second problem of encoding we will slightly modify the encoding procedure.Let 2) be a code vector of a code C, where:

. , (a A (l) n (l) l
Here c i (l) and a i (l) are respectively the i -th check and data characters of the l -th sub-block.
For a code vector S(A) = AH T = 0, or from ( 1) and ( 2 Let us equate the coefficients for the same powers of γ j : From ( 7) and ( 5) we get The check characters can be found by solving (8).
According to the above, the standard encoding procedure, which requires check bits to be a part of subwords, can be modified.First, we compute modulo g(x) remainders S i , i = 0,…,n " -1, of all output responses of the fault-free RUs.Next, we compute modulo G(x) remainder S ref of the polynomial with the factors S 0 ,…,S n"-1 .We will refer to S ref as generalized signature (of the slice).The procedure is irrespective of the type of the signature analyzer employed (whether it is LFSR or MISR).
When decoding, we perform the similar procedure with actual output responses.If for a given slice, the actual generalized signature S a equals to S ref , we assume the RUs in the slice are fault-free.Otherwise, we use the code C '' decoding procedure to locate faulty RUs.
Example 2. Let us apply 5100 test stimuli to a UUT with 257 RUs in a slice.Only a single RU may fail at a time, although the number of faults that may occur in the RU is not limited.We construct the ELC on the basis of the two codes: an EDC C ' generated by the g(x) = x 8 +x 5 +x 3 +x+1, and a (257,255) -Hamming code C '' correcting single 2 8 -ary errors and generated by the G(x) = x 2 +βx+1 over GF (2 8 ), where β is a primitive element in GF (2 8 ).The actual errorlocating signature analyzer will consist of the two LFSRs described by g(x) and G(x) respectively.The latter one is the decoder of (257,255)-Hamming code over GF (2 8 ).It is represented in Fig. 2. The size of the memory of error-free signatures will reduce by (257×8) / (2×8) = 128.5 per slice.If there are 100 slices, the memory size will change from 257×8×100 =205600 bits to 2×8×100=1600 bits.The aliasing rate will then be 1/2 8 =0.004.Hardware complexity of the ELC decoder only slightly exceeds the one for the conventional LFSR.
After shifting all digits and obtaining the sum S a ⊕ S ref = (β 8 ,β 9 ) = (11010100, 01101010), the decoder will shift forward until its left stage, R 0 holds all zeroes.This occurs at the 255-th shift.Therefore, the number of the faulty RU is 257-255= 2.

CONCLUSION
We have adapted error-locating codes for the off-line diagnosis of digital systems that have been partitioned into replaceable units.If the number of units that may fail simultaneously is small, the hardware of a tester based on error-locating codes (compared with conventional compression schemes) is reduced, while preserving the diagnostic resolution and aliasing rate.The gain is proportional to the number of replaceable units residing in the system.
We developed encoding/decoding procedures for errorlocating codes, which have simple practical implementation.We introduced an error-locating signature analyzer and showed how to use it to identify faulty units in digital systems.
The method considered can use any conventional compacting scheme.The type of the scheme determines the overall aliasing rate.