Forum Discussion
Altera_Forum
Honored Contributor
7 years agoI use quartus opencl SDK 17.0 or 17.1 have same result.
the file 32bit_256 is
int partial_sum;
char w_in,d_in;# pragma unroll
for(int j=0 ; j<256; j++){
partial_sum +=(w_in * d_in);
}
compiler auto convert w_in and d_in to same type as partial_sum, so report.html shows 32 bit Mul and Add. However, DSP usage is still half of 256, is shows 128 DSP usage. and the file 32bit_800 is increase parallelism to 800, it should use 400 DSP. However, the report.html shows that I have 759 Mul, DSP usage is 379.5, and it use LEs to implement rest of Mul. which means, the maximum of 32 bit Mul is only half of total 1518 DSP. While the report shows it cost me 0.5 DSP every 32 bit Mul, it cost me 2 DSP in real. the DSP usage is underestimate by report.html. I modify the code to force use 16 bit Mul with parallelism 800. it works well and the report shows I have 800 Mul and DSP usage is 400. but when further increase parallelism to 1600, the same thing happened, I can have 1518 Mul, and it use LEs to implement rest of Mul. While the report shows it cost me 0.5 DSP every 16 bit Mul, it cost me 1 DSP in real.
int partial_sum;
char w_in,d_in;# pragma unroll
for(int j=0 ; j<256; j++){
partial_sum +=(short)(w_in * d_in);
}
My question is if my data type is char or short, I can consider DSP resource as twice it provide, right? I have seen some paper with Arria 10 GX 1150, when data type is float, their DSP is 1518, and when data type if FP16 or FP8, their DSP is 3036. and how to use it correctly?