Altera_Forum
Honored Contributor
7 years agovector_add example - measuring the performance
Hello,
I have executed the vector_add example on the DE10-Standard board and got the following output. It took 6.9ms kernel time to perform the floating point add operation on 1M elements. So, the performance is around 145M FLOPS. I expected the performance to be much higher in the order of 100 Giga FLOPS. Is there a way to achieve a better performance? ------------------------------------------------------------ Initializing OpenCL Platform: Intel(R) FPGA SDK for OpenCL(TM) Using 1 device(s) de10_standard_sharedonly : Cyclone V SoC Development Kit Using AOCX: vector_add.aocx Reprogramming device [0] with handle 1 Launching for device 0 (1000000 elements) Time: 108.505 ms Kernel time (device 0): 6.931 ms Verification: PASS -------------------------------------------------- Thanks Pavan