Forum Discussion

Occasional Contributor

6 years ago

High resource usage when using half precision

Hello, I started to investigate the possibility to use half precision for some mathematical routines. However I found a suspicious high resource utilization when compared with classical floating po...

HRZ

Frequent Contributor

6 years ago

The DSPs in Stratix 10 do not natively support FP16 (half-precision) computation, while they natively support FP32. When you use FP32, all the computation is performed within the DSP and little logic is used outside of the DSP. In fact, for FP32, a full FMA (a.k.a. MAC) operation can be performed with each DSP. For FP16, however, only the multiplication of the mantissa is offloaded to the DSP and every other operation has to be done using logic. For the particular case of FP16 addition, DSPs are not used at all and the operation is completely offloaded to logic. This is shown clearly in the are report.

tde_m

Occasional Contributor

6 years ago

Thanks for the quick reply.

>>> This is shown clearly in the are report.

I guess it is what "Implemented using inlined soft-IP" means.

Therefore, should I conclude that if I have a compute intense half-precision routine (e.g. GEMM), it is better in terms of performance/resource usage to implement it using classical FP32 (or to cast the loaded numbers to fp32)?

HRZ
Frequent Contributor
6 years ago
Sorry, I had a typo in my original post, I meant the "area report":
On Arria 10 and Stratix 10, FP32 performance will likely be higher than FP16 in general, unless your application is memory-bound (which is actually quite likely considering how low the external memory bandwidth of typical FPGA boards is). If you don't need floating-point and fixed-point/integer can be enough for your application, then you can do a full (a * b) + (c * d) with one DSP if your data type size is 18 bits or less and achieve higher computational performance than FP32. Next generation Intel Agilex will have native support for FP16 in the DSPs.

Forum Discussion

High resource usage when using half precision

Recent Discussions

Agilex 7 FPGA Starter Kit with oneAPI Toolkit flow not detected over PCIe

MCTP over PCIe VDM routing to PMCI in OFS N6000 FIM configuration and datapath clarification

HLS Compiler 24.1 error - aocl-clang.exe - dll entry point not found

Error faced while executing on Agilex FPGA board....

AI Suite System Throughput Issue