Forum Discussion
Altera_Forum
Honored Contributor
8 years agoHave you checked the final post-place-and-route resource utilization? The computation might have as well been optimized out during synthesis. The OpenCL compiler is indeed generating FMA operations, but it seems it is ignoring the SIMD attribute since the number of DSPs that are used is exactly equal to the number of FMA operations in the code, rather than the number of operations multiplied by SIMD factor. Assuming that nothing is getting optimized out, you have probably made a mistake in calculating the FLOPS value.
Arria 10 GX 1150 has 1518 DSPs, each capable of performing one single-precision FMA operation, and with the DSPs running at the maximum frequency of 482 MHz, you will get 1.46 TFLOPS. It is certainly not possible to go over this number.