Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
16 years ago

Pipelining a recursive equation

Dear Forum,

I am working on a polyphase filter implementation, which is built up from a number of recursive equations such as:

A(n) = alpha*[X(n-1)-A(n-1)] + beta*[X(n)-A(n-2)] + X(n-2)

Such an expression has to be pipelined (current depth is 5) given the bit width of A and X (which is 26) and the given clock speed. However as A depends on the previous value of A, and the previous value won't be available for another 4 time steps, I'm beginning to think that pipelining such an expression is impossible.

So, to make it as simple as possible, my question is:

In principle, is it impossible to pipeline the following seemingly simple recursive expression:

A(n) = X(n) - A(n-1)

?

I'm beginning to think that the following limitations must be enforced if recursive relationships are to be pipelined, i.e.,

A(n) = X(n) - A(n-y)

can be pipelined as long as y >= the pipeline depth.

Any thoughts we be greatly appreciated.

Kind regards,

Kurt

10 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Your refering to a case, where the filter time step equals a clock cycle and the arithmetic won't work without pipelining. Yes, you can't build an IIR filter in this situation.

    However, looking at the above equation, only the term alpha*A(n-1) must be calculated in one filter time step. Reordering the equation may help.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You will need to make a distinction between pipeline register and filter delay unit(= a register). A filter's delay element is fundamental to its operation but can serve as pipeline by itself. You cannot add any further registers unless very carefully designed.

    Your filter has already some delay elements which can serve as pipeline.

    Adding any more registers will certainly lead to filter malfunction unless you re-evaluate the functionality.

    on the other hand, FIR filters can be pipelined readily in an adder tree but not in the cascaded adders(semi systolic structure). In fact, the systolic structure is an extension of semi that allows pipelining. Similarly, the transposed structure is favoured because the delay elements serve as efficient pipeline.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Many thanks FvM and kaz for your suggestions and for confirming my suspiscions... once I realized the expression could not be implemented as written your reorganization comment makes a lot of sense (although again the same restrictions regarding pipelining apply)... so after sleeping on it this is what I tried this morning (for anyone whose interested):

    First I rewrote:

    A(n) = alpha*[X(n-1)-A(n-1)] + beta*[X(n)-A(n-2)] + X(n-2)

    as

    A(n) = {alpha*[X(n-1)]} - {alpha*[A(n-1)]} + {beta*[X(n)]} + {1*[Y(n-2)]}

    where Y(n-2) = {1*[X(n-2)]} - {beta*[A(n-2)]}

    - which can of course be pre-computed as long as it is done in 1 time step.

    A(n) can be completed in one time step by configuring a DSP Builder DSP Block with 4 multipliers and by deselecting the 'Register Output of the Multiplier' and also the 'Register Output of the Adder'. Y(n-2) can be implemented the same way, but with only 2 multipliers.

    The good news is that the output is now correct (as least according to the Simulink DSP Builder model). The downside is that is uses 24 DSP 18-bit Elements (remember that the bit width is 26).

    So does this revised implementation seem reasonable? If so, any ideas on how to reduced DSP resources used without reducing the bit width?

    Thanks again, Kurt
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    what is your sample rate relative to clock rate? you can clock the design at multiples of the sample rate to "pipeline" the IIR and even more to "share" the multipliers. if you have Simulink check out DSP Builder Advanced Blockset, it does these sorts of optimizations.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    HI pancake

    Unfortunately for this implementation the sample rate is the system clock rate, i.e., 100MHz on an SIII. Would creating a clock sub-domain running at 300MHz, say, for the filters be a way forward?

    Bye for now, Kurt
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Would creating a clock sub-domain running at 300MHz, say, for the filters be a way forward?

    --- Quote End ---

    I fear, it isn't. 300 MHz is too fast for complex arithmetic. I guess, you can hardly avoid spending hardware multipliers in this place. There may be a different situation with fixed coefficients, that allow more efficient implementation of logic style multipliers.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Apparently on the SIII the DSP Blocks can be clocked as high as 550MHz... or am I misunderstanding?

    Thanks, Kurt
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    According to my experience. Stratix iii can hardly achieve 300MHz for a general design.

    practical speeds of 500MHz are for the future. Virtx devices are better for speed but less resourceful than stratix.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    500 Mhz is possible only with a short data path register - mult/add - register. Practical designs typically require more, e.g. a filter with saturated arithmetic. In the present case, I don't expect an advantage from a higher clock than 100 MHz, as long as you can't multiplex arithmetic functions to be used sequentially.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Actually, rereading the suggestions above I didn't appreciate at first that thepancake's comment about sampling also amounts to this necessarily being a decimating filter. So I can build a 3-stage pipeline as long as the input data is downsampled by a factor of 3... (and if I decimate more than that I can even reuse the mulitpliers as suggested and get to a very small very fast design) and perhaps the affect that will have on the filtering characteristics can be compensated for in the coefficient calculation... so perhaps decimation will save me!

    Thanks all once again for your input.

    Kurt