Forum Discussion
Altera_Forum
Honored Contributor
8 years agoEach DSP has two 18x18 multipliers. Since your data is char, it is possible to do two multiplications per DSP. Hence, 512 multiplications only requires 256 DSPs. Apart from that, since you have a reduction on result_buffer and the each index j in this buffer is written to at every iteration of i, apart from the addition inside of the j loop, you also need extra adders between the iterations in the i loop to get the final values of result_buffer for each index. However, based on my calculations, the total number of adders should be 64 x 8 + 64. However, the compiler is for some reason instantiating 64 extra adders.
Regarding the excessive usage with FILTER_PARALLEL=16, I am not sure what is happening there. Maybe there is some device limitation with respect to the carry chains that is increasing the usage.