Power Optimization In Digital Circuits Using Scan-Based BIST

Ramakrishna Porandla1, Gella Ravikanth2, Podili Ramu3
1,2 Asst.Professor, 3 Associate Professor
Laqshya Institute Of Technology & Sciences, Khammam.
Electronics And Communication Engineering
Ramakrishnaporandla@gmail.Com 1, Gellaravikanth@gmail.Com 2, Podiliram68@gmail.Com3

ABSTRACT
Technology provides smaller, faster and lower energy devices which allow more powerful and compact circuitry. Thermal and shot-noise estimations alone suggest that the fault rate of an individual Nano scale device may be orders of magnitude higher than today’s devices. For that purpose, going for Built in self-test (BIST). BIST-test patterns are generated and applied to the circuit-under-test (CUT) by on-chip hardware and minimizing hardware overhead is a major concern of BIST implementation. This Paper presents a low hardware overhead test pattern generator (TPG) for scan-based built-in self-test (BIST). The proposed BS-LFSR for test-per-scan BISTs is based upon some new observations concerning the number of transitions produced at the output of an LFSR. The average and peak power is reduced while capturing the vectors by using scan chain reordering. BS-LFSR is combined with a scan-chain-ordering algorithm that orders the cells in a way that reduces the average and peak power in the test cycle or while scanning out a response to a signature analyzer. The problem of the capture power will be solved by using a novel algorithm that will reorder some cells in the scan chain in such a way that minimizes the Hamming distance between the applied test vector and the captured response in the test cycle. This technique of reducing power consumption significantly increases the test application time.

(Keywords— BIST- Built in self test, TPG- test pattern generator, CUT- circuit under-test, Hamming distance, scan-chain-ordering)

I.INTRODUCTION

In recent years, the design for low power has become one of the greatest challenges in high-performance very large scale integration (VLSI) design. As a consequence, many techniques have been introduced to minimize the power consumption of new VLSI systems. However, most of these methods focus on the power consumption during normal mode operation, while test mode operation has not normally been a predominant concern. However, it has been found that the power consumed during test mode operation is often much higher than during normal mode operation [1]. This is because most of the consumed power results from the switching activity in the nodes of the circuit under test (CUT), which is much higher during test mode than during normal mode operation [1]–[3].

Several techniques that have been developed to reduce the peak and average power dissipated during scan-based tests can be found in [4] and [5]. A direct technique to reduce power consumption is by running the test at a slower frequency than that in normal mode. This technique of reducing power consumption, while easy to implement, significantly increases the test application time [6]. Furthermore, it fails in reducing peak-power consumption since it is independent of clock frequency.

Another category of techniques used to reduce the power consumption in scan-based built-in self-tests (BISTs) is by using scan-chain-ordering techniques [7]–[13]. These techniques aim to reduce the average-power consumption when scanning in test vectors and scanning out captured responses. Although these algorithms aim to reduce average-power consumption, they can reduce the peak power that may occur in the CUT during the scanning cycles, but not the capture power that may result during the test cycle (i.e., between launch and capture).

The design of low-transition test-pattern generators (TPGs) is one of the most common and
efficient techniques for low-power tests [14]–[20]. These algorithms modify the test vectors generated by the LFSR to get test vectors with a low number of transitions. The main drawback of these algorithms is that they aim only to reduce the average-power consumption while loading a new test vector, and they ignore the power consumption that results while scanning out the captured response or during the test cycle. Furthermore, some of these techniques may result in lower fault coverage and higher test-application time. Other techniques to reduce average-power consumption during scan-based tests include scan segmentation into multiple scan chains [6], [21], test-scheduling techniques [22], [23], static-compaction techniques [24], and multiple scan chains with many scan enable inputs to activate one scan chain at a time [25]. The latter technique also reduces the peak power in the CUT.

On the other hand, in addition to the techniques mentioned earlier, there are some new approaches that aim to reduce peak-power consumption during tests, particularly the capture power in the test cycle. One of the common techniques for this purpose is to modify patterns using an X-filling technique to assign values to the don’t care bits of a deterministic set of test vectors in such a way as to reduce the peak power in the test vectors that have a peak-power violation.

This paper presents a new TPG, called the bit-swapping linear feedback shift register (BS-LFSR), that is based on a simple bit-swapping technique applied to the output sequence of a conventional LFSR and designed using a conventional LFSR and a 2 × 1 multiplexer. The proposed BS-LFSR reduces the average and instantaneous weighted switching activity (WSA) during test operation by reducing the number of transitions in the scan input of the CUT. The BS-LFSR is combined with a scan-chain-ordering algorithm that reduces the switching activity in both the test cycle (capture power) and the scanning cycles (scanning power).

II. PROPOSED APPROACH TO DESIGN THE BS-LFSR

The proposed BS-LFSR for test-per-scan BISTs is based upon some new observations concerning the number of transitions produced at the output of an LFSR.

Definition: Two cells in an n-bit LFSR are considered to be adjacent if the output of one cell feeds the input of the second directly (i.e., without an intervening XOR gate). Lemma 1: Each cell in a maximal-length n-stage LFSR (internal or external) will produce a number of transitions equal to 2n−1 after going through a sequence of 2n clock cycles.

Proof: The sequence of 1s and 0s that is followed by one bit position of a maximal-length LFSR is commonly referred to as an m-sequence. Each bit within the LFSR will follow the same m-sequence with a one-time-step delay. The m-sequence generated by an LFSR of length n has a periodicity of 2n − 1. It is a well-known standard property of an m-sequence of length n that the total number of runs of consecutive occurrences of the same binary digit is 2n−1 [3]. The beginning of each run is marked by a transition.

2n−1. This lemma can be proved by using the toggle property of the XOR gates used in the feedback of the LFSR.

Lemma 2: Consider a maximal-length n-stage internal or external LFSR (n > 2). We choose one of the cells and swap its value with its adjacent cell if the current value of a third cell in the LFSR is 0 (or 1) and leave the cells un-swapped if the third cell has a value of 1 (or 0). Fig. 1 shows this arrangement for an external LFSR (the same is valid for an internal LFSR). In this arrangement, the output of the two cells will have its transition count reduced by Tsaved = 2(n−2) transitions. Since the two cells originally produce 2 × 2n−1 transitions, then the resulting percentage saving is Tsaved% = 25%.
In Lemma 2, the total percentage of transition savings after swap-ping is 25% [31]. In the case where cell x is not directly linked to cell m or cell m + 1 through an XOR gate, each of the cells has the same share of savings (i.e., 25%).

Lemmas 3–10 show the special cases where the cell that drives the selection line is linked to one of the swapped cells through an XOR gate. In these configurations, a single cell can save 50% transitions that were originally produced by an LFSR cell. Lemma 3 and its proof are given; other lemmas can be proved in the same way.

Lemma 3: For an external n-bit maximal-length LFSR that implements the prime polynomial \( x^n + x + 1 \) as shown in Fig. 2, if the first two cells (c1 and c2) have been chosen for swapping and cell n as a selection line, then o2 (the output of MUX2) will produce a total transition savings of \( 2n - 2 \) compared to the number of transitions produced by each LFSR cell, while o1 has no savings (i.e., the savings in transitions is concentrated in one multiplexer output, which means that o2 will save 50% of the original transitions produced by each LFSR cell).

Proof: There are eight possible combinations for the initial state of the cells c1, c2, and cn. If we then consider all possible values of the following state, we have two possible combinations (not eight, because the value of c2 in the next state is determined by the value of c1 in the present state; also, the value of c1 in the next state is determined by \(-c1 \oplus cn\)). Table I shows all possible and subsequent states.

It is important to note that the overall savings of 25% is not equally distributed between the outputs of the multiplexers as in Lemma 2. This is because the value of c1 in the present state will affect the value of c2 and its own value in the next state (c2(Next) = c1 and c1(Next) = \(-c1 \oplus cn\)). To see the effect of each cell in transition savings, Table I shows that o1 will save one transition when moving from state (0,0,1) to (1,0,0), from (0,1,1) to (1,0,0), from (1,0,1) to (0,1,0), or from (1,1,1) to (0,1,0). In the same time, o1 will increase one transition when moving from (0,1,0) to (0,0,0), from (0,1,0) to (0,0,1), from (1,0,1) to (0,1,0), or from (1,0,0) to (1,1,1). Since o1 increases the transitions in four possible scenarios and saves transitions in other four scenarios, then it has a neutral overall effect because all the scenarios have the same probabilities.

For o2, one transition is saved when moving from (0,1,0) to (0,0,0), from (0,1,0) to (0,0,1), from (0,1,1) to (1,0,0), from (1,0,0) to (1,1,0), from (1,0,0) to (1,1,1), or from (1,0,1) to (0,1,0). This gives o2 an overall saving of one transition in four possible scenarios where the initial states has a probability of 1/8 and the final states of probability 1/2; hence, Psave is given by

\[
Psave = 1/8 \times 1/2 + 1/8 \times 1/2 + 1/8 \times 1/2 + 1/8 \times 1/2 = 1/4.
\]

In the special configurations shown in Table II (i.e. Lemmas 3–10), if the cell that saves 50% of the transitions is connected to feed the scan-chain input, then it saves 50% of the transitions inside the scan-chain cells, which directly reduces the average power and also the peak power that may result while scanning in a new test vector.

Table III shows that there are 104 LFSRs (internal and external) whose sizes in the range of 3-168 stages that can be configured to satisfy one or more of the special cases in Table II to concentrate the transition savings in one multiplexer output.
TABLE II
SPECIAL CASES WHERE ONE CELL SAVES 50% OF THE TRANSITIONS

<table>
<thead>
<tr>
<th>n of LFSR Stages</th>
<th>LFSR set</th>
<th>50% Save</th>
<th>LFSR set</th>
<th>50% Save</th>
</tr>
</thead>
<tbody>
<tr>
<td>2-20</td>
<td>3, 7, 5, 6, 7, 8, 11, 2, 25</td>
<td>38, 19</td>
<td>3, 7, 5, 6, 7, 8, 11, 2, 25</td>
<td>38, 19</td>
</tr>
<tr>
<td>2-40</td>
<td>21, 22, 24, 26, 27, 29, 25, 26</td>
<td>38, 49</td>
<td>21, 22, 24, 26, 27, 29, 25, 26</td>
<td>38, 49</td>
</tr>
<tr>
<td>2-60</td>
<td>32, 43, 44, 45, 46, 48, 22, 51, 52, 53, 54, 55, 56, 49, 60</td>
<td>38, 59</td>
<td>32, 43, 44, 45, 46, 48, 22, 51, 52, 53, 54, 55, 56, 49, 60</td>
<td>38, 59</td>
</tr>
<tr>
<td>6-80</td>
<td>61, 62, 63, 65, 66, 67, 68, 69, 70, 71, 74, 75, 76, 77, 78, 89</td>
<td>38, 79</td>
<td>61, 62, 63, 65, 66, 67, 68, 69, 70, 71, 74, 75, 76, 77, 78, 89</td>
<td>38, 79</td>
</tr>
<tr>
<td>8-100</td>
<td>83, 84, 85, 86, 87, 88, 91, 92, 93, 94, 95, 96, 97, 98, 99</td>
<td>38, 99</td>
<td>83, 84, 85, 86, 87, 88, 91, 92, 93, 94, 95, 96, 97, 98, 99</td>
<td>38, 99</td>
</tr>
<tr>
<td>14-160</td>
<td>141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159</td>
<td>38, 159</td>
<td>141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159</td>
<td>38, 159</td>
</tr>
<tr>
<td>Total</td>
<td>104</td>
<td></td>
<td>104</td>
<td></td>
</tr>
</tbody>
</table>

TABLE III
LFSR SETS THAT SATISFY ONE OR MORE OF LEMMAS 3–10

III. IMPORTANT PROPERTIES OF THE BS-LFSR

There are some important features of the proposed BS-LFSR that make it equivalent to a conventional LFSR. The most important properties of the BS-LFSR are the following.

1) The proposed BS-LFSR generates the same number of 1s and 0s at the output of multiplexers after swapping of two adjacent cells; hence, the probabilities of having a 0 or 1 at a certain cell of the scan chain before applying the test vectors are equal. Hence, the proposed design retains an important feature of any random TPG. Furthermore, the output of the multiplexer depends on three different cells of the LFSR, each of which contains a pseudorandom value. Hence, the expected value at the output can also be considered to be a pseudorandom value.

2) If the BS-LFSR is used to generate test patterns for either test-per-clock BIST or for the primary inputs of a scan-based sequential circuit (assuming that they are directly accessible) as shown in Fig. 3, then consider the case that c1 will be swapped with c2 and c3 with c4, ..., cn–2 with cn–1 according to the value of cn, which is connected to the selection line of the multiplexers (see Fig. 3). In this case, we have the same exhaustive set of test vectors as would be generated by the conventional LFSR, but their order will be different and the overall transitions in the primary inputs of the CUT will be reduced by 25%.

IV. CELL REORDERING ALGORITHM

Although the proposed BS-LFSR can achieve good results in reducing the consumption of average power during test and also in minimizing the peak power that may result while scanning a new test vector, it cannot reduce the overall peak power because there are some components that occur while scanning out the captured response or while applying a test vector and capturing a response in the test cycle. To solve these problems, first, the proposed BS-LFSR has been combined with a cell-ordering algorithm presented in [11] that reduces the number of transitions in the scan chain while scanning out the captured response. This will reduce the overall average power and also the peak power that may arise while scanning out a captured response.

The problem of the capture power (peak power in the test cycle) will be solved by using a novel algorithm that will reorder some cells in the scan chain in such a way that minimizes the Hamming distance between the applied test vector and the captured response in the test cycle, hence reducing the test cycle peak power (capture power).
In this scan-chain-ordering algorithm, some cells of the ordered scan chain using the algorithm in [11] will be re-ordered again in order to reduce the peak power which may result during the test cycle. This phase mainly depends on an important property of the BS-LFSR. This property states that, if two cells are connected with each other, then the probability that they have the same value at any clock cycle is 0.75. (In a conventional LFSR where the transition probability is 0.5, two adjacent cells will have the same value in 50% of the clocks and different values in 50% of the clocks; for a BS-LFSR that reduces the number of transition of an LFSR by 50%, the transition probability is 0.25, and hence, two adjacent cells will have the same value in 75% of the clock cycles.) Thus, for two connected cells (cells j and k), if we apply a sufficient number of test vectors to the CUT, then the values of cells j and k are similar in 75% of the applied vectors. Hence, assume that we have cell x which is a function of cells y and z. If the value that cell x will have in the captured response is the same as its value in the applied test vector (i.e., no transition will happen for this cell in the test cycle) in the majority of cases where cells y and z have the same value, then we connect cells y and z together on the scan chain, since they will have the same value in 75% of the cases. This reduces the possibility that cell x will undergo a transition in the test cycle. The steps in this algorithm are as follows.

1) Simulate the CUT for the test patterns generated by the BS-LFSR.
2) Identify the group of vectors and responses that violate the peak power.
3) In these vectors, identify the cells that mostly change their values in the test cycle and cause the peak-power violation.
4) For each cell found in step 3), identify the cells that play the key role in the value of this cell in the test cycle.
5) If it is found that, when two cells have a similar value in the applied test vector, the concerned cell will most probably have no transition in the test cycle, then connect these cells together. If it is found that, when two cells have a different value, the cell under consideration will most probably have no transitions in the test cycle, then connect these cells together through an inverter.

It is important to note that this phase of ordering is done when necessary only, as stated in step 2 of the algorithm description that the group of test vectors that violates the peak power should be identified first. Hence, if no vector violates the peak power, then this phase will not be done. In the worst case, this phase is performed in few sub-sets of the cells. This is because, if this phase of ordering is done in all cells of the scan chain, then it will destroy the effect of algorithm found in [11] and will substantially increase the computation time.

<table>
<thead>
<tr>
<th>Circuit</th>
<th>TL</th>
<th>LFSR</th>
<th>BS-LFSR with cell ordering</th>
<th>%Savings of BS-LFSR with cell ordering</th>
</tr>
</thead>
<tbody>
<tr>
<td>S641</td>
<td>3000</td>
<td>97.64</td>
<td>97.78</td>
<td>97.64</td>
</tr>
<tr>
<td>S3378</td>
<td>7000</td>
<td>96.15</td>
<td>94.01</td>
<td>96.15</td>
</tr>
<tr>
<td>S2196</td>
<td>2000</td>
<td>95.30</td>
<td>95.38</td>
<td>95.30</td>
</tr>
<tr>
<td>S1258</td>
<td>1000</td>
<td>94.11</td>
<td>94.20</td>
<td>94.11</td>
</tr>
<tr>
<td>S5378</td>
<td>4000</td>
<td>93.45</td>
<td>93.43</td>
<td>93.45</td>
</tr>
<tr>
<td>S1074</td>
<td>1000</td>
<td>93.14</td>
<td>93.17</td>
<td>93.14</td>
</tr>
<tr>
<td>S5092</td>
<td>2000</td>
<td>92.88</td>
<td>92.91</td>
<td>92.88</td>
</tr>
<tr>
<td>S3547</td>
<td>5000</td>
<td>92.59</td>
<td>92.60</td>
<td>92.59</td>
</tr>
</tbody>
</table>

For each cell found in step 3), identify the cells that play the key role in the value of this cell in the test cycle. If it is found that, when two cells have a similar value in the applied test vector, the concerned cell will most probably have no transition in the test cycle, then connect these cells together. If it is found that, when two cells have a different value, the cell under consideration will most probably have no transitions in the test cycle, then connect these cells together through an inverter.

It is important to note that this phase of ordering is done when necessary only, as stated in step 2 of the algorithm description that the group of test vectors that violates the peak power should be identified first. Hence, if no vector violates the peak power, then this phase will not be done. In the worst case, this phase is performed in few sub-sets of the cells. This is because, if this phase of ordering is done in all cells of the scan chain, then it will destroy the effect of algorithm found in [11] and will substantially increase the computation time.
using the BS-LFSR with the scan-chain-ordering algorithm.

In order to provide a comparison with the techniques published previously by other authors, Table VI compares the results obtained by the proposed technique with those obtained in [15]. Table VI compares the TL, FC, and average-power reduction (WSAavg). It is clear that the proposed method is much better for most of the circuits, not only in average-power reduction but also in the test length needed to obtain good fault coverage.

Finally, Table VII compares the results obtained by the proposed technique for peak-power reduction with those obtained in [25]. It is clear from the table that the proposed method has better results for most of the benchmark circuits.

### V. EXPERIMENTAL RESULTS

A group of experiments was performed on full-scan ISCAS’89 benchmark circuits. In the first set of experiments, the BS-LFSR is evaluated regarding the length of the test sequence needed to achieve a certain fault coverage with and without the scan-chain-ordering algorithm. Table IV shows the results for a set of ten benchmark circuits. The columns labeled n, m, and PI refer to the sizes of the LFSR, the number of flip-flops in the scan chain, and the number of primary inputs of the CUT, respectively. The column labeled RF indicates the percentage of redundant faults in the CUT, and fault coverage (FC) indicates the target fault coverage where redundant faults are included. The last four columns show the test length needed by a deterministic test (i.e., the optimal test vector set is stored in a ROM), a conventional LFSR, a BS-LFSR with no scan-chain ordering, and the BS-LFSR with scan-chain ordering, respectively. The results in Table IV show that the BS-LFSR needs a shorter test length than a conventional LFSR for many circuits even without using the scan-chain-ordering algorithm. It also shows that using the scan-chain-ordering algorithm with BS-LFSR will shorten the required test length. The second set of experiments is used to evaluate the BS-LFSR together with the proposed scan-chain-ordering algorithm in reducing average and peak power. For each benchmark

### TABLE VI

<table>
<thead>
<tr>
<th>Circuit</th>
<th>Results in [15]</th>
<th>Proposed Method</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Results of proposed method</td>
<td></td>
</tr>
<tr>
<td></td>
<td>TL, FC, WSAavg, TL, FC, WSAavg, TL, FC, WSAavg</td>
<td></td>
</tr>
<tr>
<td>S5378</td>
<td>36.6%</td>
<td>39</td>
</tr>
<tr>
<td>S9234</td>
<td>38.9%</td>
<td>45</td>
</tr>
<tr>
<td>S13207</td>
<td>46.1%</td>
<td>41</td>
</tr>
<tr>
<td>S38417</td>
<td>40.1%</td>
<td>39</td>
</tr>
<tr>
<td>S38584</td>
<td>35.9%</td>
<td>51</td>
</tr>
<tr>
<td>AVG</td>
<td>39.5%</td>
<td>43.0%</td>
</tr>
</tbody>
</table>

### VI. CONCLUSION

A low-transition TPG that is based on some observations about transition counts at the output sequence of LFSRs has been presented. The proposed TPG is used to test vectors for test-per-scan BISTs in order to reduce the switching activity while scanning test vectors into the scan chain. Furthermore, a novel algorithm for scan-chain ordering has been presented. When the BS-LFSR is used together with the proposed scan-chain-ordering algorithm, the average and peak power are substantially reduced. The effect of the proposed design in the fault coverage, test-application time, and hardware area overhead is negligible. Comparisons between the proposed design and other previously published methods show that the proposed design can achieve better results for most tested benchmark circuits.

### REFERENCES


AUTHORS PROFILE:

Mr. Rama Krishna P. received his B.Tech Degree in Electronics and Communication Engineering and M.Tech degree in DIGITAL ELECTRONICS &COMMUNICATION SYSTEMS from JNTU, Hyderabad, A.P-India. He is currently working as an Assistant Professor in the Department of ECE in LAQSHYA INSTITUTE OF TECHNOLOGY AND SCIENCES, Khammam, A.P-India. His research interests on Digital circuits, IMAGE PROCESSING and Communication Systems .He had 3 years of Teaching experience and also a LIFE MEMBER OF ISTE

Mr. P. Ramu received his B.Tech Degree in Electronics and communication Engineering and the M.Tech degree in VLSI System Design JNTU, Hyderabad, A.P-India. He is currently working as an Associate Professor in the Department of ECE in LAQSHYA INSTITUTE OF TECHNOLOGY AND SCIENCES, Khammam, A.P-India. He had 9 years of teaching experience. His Research interests on Digital Circuits, Deep sub micron process, Electromagnetic fields, Microwave engineering and Radar system.

Mr. Gella Ravikanth, received B.Tech degree in Electronics and Communication Engineering and M.Tech degree in VLSI SYSTEM DESIGN from JNTU, Hyderabad. He is currently working as an Assistant Professor, in Department of ECE, in LAQSHYA INSTITUTE OF TECHNOLOGY AND SCIENCES, Khammam, A.P, India. His Research interests on digital circuits, Embedded Systems, Deep sub micron process and he had 7 years of Teaching Experience.