Forum Discussion
I may have stumbled across a solution but I'm not sure if it's correct. By wrapping accesses to global memory from different concurrent kernels with atomic_fence, it removes the hang. I updated my example repo here: https://github.com/AustinKnutsonSprint/oneapi-timer-kernel-hang/tree/fences
Hopefully someone from Intel can confirm what the underlying problem is and whether this is an appropriate fix.
Kernel hangs with pipes pretty much always happen due to one of the following two reasons:
- The amount of data read/written to a pipe is not equal to the amount written/read on the other side. This would result in a hang during software emulation, too.
- Existence of a cycle of pipes in the kernel where, in case of pipe read/write operations being reordered by the compiler, could result in a kernel hang. This will not show up in software emulation.
I am not familiar with OneApi, but the backend compiler is supposedly the same the OpenCL compiler. I would assume just like OpenCL, there should also be some barrier pragma or something that allows forcing ordering of pipe operations and preventing the compiler from reordering them. The first debugging step in your case would probably be to add such a barrier after every pipe read and write operation in your code to see if the hang is the result of operation re-ordering by the compiler.
P.S. Pipes will never "drop" data; that is why they can cause hangs.