Pipelined Accumulators

In digital design, the ultimate bandwidth is determined by a feedback path.  There are a few tricks that can be used to help the situation, but it quickly becomes a losing battle.  This article shows one way to allow pipelining of the feedback path by exploiting the system’s mathematical properties.  This method is actually not the best way to solve this specific problem, but is slightly more general.  The better solution is presented here.

Accumulator

Accumulator

The basic accumulator circuit is shown on the left.  This system is a technically stable (BIBO) IIR filter.  The equation is: y_{n+1} = y_n + x_n.  This is multiplierless, which is good for implementations.  In practice, the size of the inputs are very large, and the math is done modulo 2^{b}.  This type of accumulator is common in CIC filters, as well as DDS applications.

In both of these applications, very large accumulators might be needed.  16b is generally too small, and sizes might grow beyond 48b.  Large rate changes in CIC filters lead to very large bit widths, and the frequency resolution and of a DDS is limited by the bit widths.

Even though addition is a low complexity operation, large additions can require a large amount of time.  This is more true of LUT4-based FPGAs, and the Virtex-6’s LUT6 based design allows high clock rates even with 64b addition.  In some cases, a large addition would require more time than one cycle.

Pipelined Addition

Pipelined Addition

Pipelining is commonly done by breaking the addition into two smaller additions.  This is shown on the right.  The issue is that the system no longer functions as expected.  The IIR filter is now defined by y_{n+1} = y_{n-1} + x_{n-1}.  It would seem that there is no way to pipeline the addition as the result appears to be needed on the very next cycle.

Pipelined Accumulator

Pipelined Accumulator

One solution is to calculate the value the accumulator would have two cycles later.  Two such accumulators are then run, each loading a value only on every other cycle.  This allows two cycles for the addition, and still generates the correct value.  This concept is shown on the left.

Preconditioning

Preconditioning

This type of system can be implemented by pre-conditioning the input.  The value an accumulator will have two cycles in the future is given by y_{n+2} = y_n + (x_{n} + x_{n+1}).  This means that the preconditioned input will simply be the sum x_n + x_{n+1}.  In general, this will also be a large addition, but there is no feedback, so the pipelined adder will work just fine.

On the next cycle, the other accumulator will calculate y_{n+3} = y_{n+1} + (x_{n+1} = x_{n+2}).  Notice that the x_{n+1} affects both of the accumulators as expected.

Conclusions

This specific problem actually has a much easier solution, again due to the properties of some addition circuits.  This better solution will be posted in another article.  The improved solution relies on the ability to split the logic of an addition between two clock cycles cleanly.  With addition, this is easy because the MSB’s don’t affect the LSB’s, thus the lower word can be calculated in advance of the upper word.

The above method does work, but might be more of a curiosity.  It can be used effectively when there are logic operations that require two cycles to perform, that can’t be broken across two cycles easily.  An example might be ROM lookups.

Honestly, I haven’t come up with a good example of a situation where this method would be preferred.  Such a case would need logic operations that can be computed two cycles in advance AND are difficult to break-up to be performed in multiple cycles AND such operation are in a feedback path.

This entry was posted in FPGA, Math. Bookmark the permalink.

Comments are closed.