Wrong results when running design on hardware

Honored Contributor

8 years ago

I am not sure about CUDA, but with OpenCL on GPUs, you can still have multiple queues and try to run multiple kernels in parallel, and they could actually run in parallel on the hardware as long as there are shader blocks left unused by the first kernel. And you can also always have such kind of races between work-items from the same kernel which are running in different work-groups.

Regarding NDRange kernels, without SIMD, all work-items from all work-groups will be pipelined on the actual hardware and no two threads will ever be issued in the same clock (hence you don't need to recompile the kernel if you change local or global size). However, if you use SIMD, as many threads as your SIMD width can potentially be issued in the same clock. With num_compute_units, you can have multiple work-groups issued concurrently in different compute units.

Forum Discussion

Wrong results when running design on hardware

Recent Discussions

Invalid license key (inconsistent authentication code)

memory infer

qsys-generate outputs Info as Error

Timing analysis - long combinational path

Regarding the issue of UFM not starting