Forum Discussion
Thanks for the quick reply.
>>> This is shown clearly in the are report.
I guess it is what "Implemented using inlined soft-IP" means.
Therefore, should I conclude that if I have a compute intense half-precision routine (e.g. GEMM), it is better in terms of performance/resource usage to implement it using classical FP32 (or to cast the loaded numbers to fp32)?
Sorry, I had a typo in my original post, I meant the "area report":
On Arria 10 and Stratix 10, FP32 performance will likely be higher than FP16 in general, unless your application is memory-bound (which is actually quite likely considering how low the external memory bandwidth of typical FPGA boards is). If you don't need floating-point and fixed-point/integer can be enough for your application, then you can do a full (a * b) + (c * d) with one DSP if your data type size is 18 bits or less and achieve higher computational performance than FP32. Next generation Intel Agilex will have native support for FP16 in the DSPs.