Forum Discussion
Altera_Forum
Honored Contributor
8 years agoIs this code writing style will have tree reduction effect?
# pragma unroll for(int i=0; i<FILTER_PARALLEL; i++){ # pragma unroll for(int j=0; j<KERNEL_PARALLEL; j++){ ​ result_buffer[j] += w_in.ff.kk[j] * data_in.ff.kk[j]; } } Is there any way can further save resource usage? And is that possible compiler use ALUTs to make multiplier when MAC per clock exceed certain range ? because 64*8 uses 256 DSP and 64*16 should use 512 DSP but it only use 305 DSP. for KERNEL_PARALLEL=64, FILTER_PARALLEL=16, I have change result_buffer data type from Int to short, and the resource usage become smaller, (but result would be wrong when use short to store two char MAC) the total usage of DSP is 512, but didn't seem like two multiplication share one DSP, I still can't explain how it calculate resource usage. 16-bit Integer Add (x959) 8196 0 0 319 16-bit Integer Mul (x193) 0 0 0 193 I have follow Best Practice Guide and try use Mask to save some resource when result_buffer is Int. I only need 25 bit that can full hold my data. result_buffer[j] += 0x01ffffff & w_in.ff.kk[j] * data_in.ff.kk[j]; the report.html still report that I am using 32-bit Add and 32-bit mul, even I change my mask to 0x0000ffff. and also the signed bit will also be mask. do you have more information about how to do this?