Thanks for responding.
I didn't mention that this was an adaptive filter.
Im doing convolutions with a circular input buffer and a linear coefficient buffer. The state machine grabs the top most input sample and lowest coefficient and along with the error output calculates the new coefficient. Then it uses this new coefficient to calculate the next FIR MAC.
When you implement pipelining, it takes so many cycles for a coefficient calculation to complete.
The next FIR MAC calculation requires the latest coefficient to be calculated first. That is the problem I'm facing. If I implement a 5 stage pipeline, for example, I will have to wait 5 cycles before I can perform the MAC with the newly calculated coefficient.
I still don't understand how I can get around this.