Unexplained performance difference for same kernels

Honored Contributor

9 years ago

That calculation seems correct to me, though since you are only writing "temp.s0" to memory, it is possible that during synthesis, the extra computation for "temp.s1" to "temp.sF" are optimized out since their results are never used and hence, you are estimating the number of operations at 16 times more than it actually is. Again I recommend comparing the OpenCL compiler's area usage estimation with the final area utilization to see if things are getting optimized out.

Can you post a snippet of your host code where you are timing the kernel? Specifically, have you put a clFlush() or clFinish() after clEnqueueNDRangeKernel() and before reading the end time of the operation?

Forum Discussion

Unexplained performance difference for same kernels

Recent Discussions

Connection bit order between hierarchy

How to fix Error(23782): Failed to find an expected report

Quartus 22.1 and 23.1 Synthesis Error

Could not link 'vsim_auto_compile.dll' error troubleshooting.

Failed to run ip-setup-simulation: