Thanks for the help Kaz. Do you mean to create 8 seperate filters, and add all of the outputs? Then shift each block of 128 to each successive filter?
Each filter could consist of two buffers, one for the coefficients and one for the inputs, each 128 in length. These two buffers would feed into seperate MACs. And then I would add the outputs of all of the MACs, while at the same time shifting each previous 128 value block to the next input buffer. Am Is this what you are saying?
DWH, are you saying that it is a waste of time to split the filter up? I agree. I don't understand why we are to split it up if we are working in an FPGA. Do you think there would be any performance decrease if we did just brute force it and implement the whole filter?