Forum Discussion
Altera_Forum
Honored Contributor
7 years agoThis is an interesting observation. However, it seems there is some issue with resource estimation in Quartus v17/17.1. Using 16.1.2, I get 512 DSPs for 32bit_256 in the report, and full DSP utilization for 32bit_800. Also I get 800 DSPs for 16bit_800, and full DSP utilization for 16bit_1600.
Based on Intel's documentation, each DSP on Arria 10 can do a maximum of one 27-bit x 27-bit MUL, or two independent 18-bit x 18-bit MULs. This means that multiplying two 32-bit integers requires two DSPs. However, it should be possible to perform two 16-bit x 16-bit multiplications per DSP, but for some reason, the compiler is failing to correctly infer this. Intel has added a new extension in v17 for Arbitrary Precision Integers (Programming Guide, Section 5.6). You might be able to do what you want using that extension. Just make sure to follow the instructions for casting the variables from the documentation. Finally, regarding the case with 3036 MULs, are you talking about Intel's own paper here? https://arxiv.org/abs/1701.03534 That paper uses fixed-point with shared exponent, and inferring that data type probably requires complex bit masking. In fact, it is possible that they also used undocumented features of the compiler to achieve that behavior. P.S. It is also possible that the "printf" in your kernel is preventing the mapper from correctly packing the DSPs. For the sake of completeness, you should try removing the printf and see what happens.