Altera_Forum
Honored Contributor
7 years agohow to share DSP correctly ?
16bit or 8bit multiply should only use 0.5 DSP.
I have try to implement two char type MAC, and the result will be store into int type. The DSP usage should be 64x16/2=1024/2=512 and in report.html, it also report I use 512 DSP. However, after compile to aocx, the DSP usage is 1024, so there is no DSP sharing. And, although the report.html report the kernel have DSP sharing, when I try to increase parallelism to 64x32, the DSP usage should be 64x32/2=1024 It should fit on a10 1150, which has 1518 DSP. However, the report.html report that I use 759 DSP, which is half of 1518. and also a lot of "add", "and" logic, I guess the compiler us logic to implement rest of (1024-759) DSP. which means although report.html know that I want to have DSP sharing, but it didn't have ability to do that. the report have underestimate the DSP usage. I use quartus 17.0 and also 17.1, the result is same. how to share DSP correctly ? __kernel __attribute__((task)) void mul(){ int sum = 0; int partial_sum; char w_in[64][16]; char d_in[16]; for(){ partial_sum = 0; # pragma unroll for(int i=0 ; i<64 ; i++){ # pragma unroll for(int j=0 ; j<16 ; j++){ partial_sum += (short) (w_in[i][j] * d_in[j]); } } sum += partial_sum; }