Forum Discussion
The DSPs in Stratix 10 do not natively support FP16 (half-precision) computation, while they natively support FP32. When you use FP32, all the computation is performed within the DSP and little logic is used outside of the DSP. In fact, for FP32, a full FMA (a.k.a. MAC) operation can be performed with each DSP. For FP16, however, only the multiplication of the mantissa is offloaded to the DSP and every other operation has to be done using logic. For the particular case of FP16 addition, DSPs are not used at all and the operation is completely offloaded to logic. This is shown clearly in the are report.
Thanks for the quick reply.
>>> This is shown clearly in the are report.
I guess it is what "Implemented using inlined soft-IP" means.
Therefore, should I conclude that if I have a compute intense half-precision routine (e.g. GEMM), it is better in terms of performance/resource usage to implement it using classical FP32 (or to cast the loaded numbers to fp32)?
- HRZ6 years ago
Frequent Contributor
Sorry, I had a typo in my original post, I meant the "area report":
On Arria 10 and Stratix 10, FP32 performance will likely be higher than FP16 in general, unless your application is memory-bound (which is actually quite likely considering how low the external memory bandwidth of typical FPGA boards is). If you don't need floating-point and fixed-point/integer can be enough for your application, then you can do a full (a * b) + (c * d) with one DSP if your data type size is 18 bits or less and achieve higher computational performance than FP32. Next generation Intel Agilex will have native support for FP16 in the DSPs.