--- Quote Start ---
DWH, are you saying that it is a waste of time to split the filter up? I agree. I don't understand why we are to split it up if we are working in an FPGA. Do you think there would be any performance decrease if we did just brute force it and implement the whole filter?
--- Quote End ---
I'm not saying its a waste of time, I'm just questioning whether it is the correct solution.
Take for example the following; lets say I have a signal that I sample at 1MHz and need to filter with a 100-tap filter and decimate the signal by 8 times. If you look in a multirate signal processing book, you will get lots of nice math that show how to take your filter taps, split them into 8 parallel filters and run them at 1/8th the input sample rate. Its very cool, but its a complete waste of time for this specific implementation (and possibly resources). The filter can be implemented using one MAC and a RAM and just run it at 100MHz (or whatever higher rate is appropriate for the filter and FPGA). Or you can get slightly trickier and implement the 8 parallel filters using that same MAC and RAM. The only difference then might be that you can run the logic slower - but if you were going to run the logic at 100MHz for your other logic, what difference does this 'savings' make?
'Optimal' is in the eye of the beholder.
Cheers,
Dave