Runtime hang with pipes and global memory access
I'm trying to develop a oneAPI application that uses a long-running persistent kernel running independently of the host process and uses short-lived kernels to coordinate messages between the host and persistent kernel (which I hope will form the basis of a network packet processing kernel). I am using kernel-to-kernel pipes and kernel lifetimes to signal events to and from the host process until host-to-kernel pipes are implemented. I have a timer kernel which runs an exact number of iterations to generate events at a deterministic time interval for use on the FPGA. Currently, all it does is signal an event back to the host and the host prints the time it took to run. I have gotten the whole setup working as expected, except when I introduce a global memory access inside the persistent kernel, I get random hangs and it looks like some pipe messages are getting dropped. I am a software engineer and don't have any experience with HDL or FPGA specific constructs but have done some research to try and understand what might be going on. My best guess looks like this has something to do with stallable instructions on the FPGA when accessing memory. The compiler is generating Burst-Coalesced LSUs which mention trying to aggregate memory operations to improve efficient access. In this case it looks like the pipes are dropping messages, but I don't fully understand the behavior and am hoping someone with more experience can explain it to me since I'm unable to debug the design at the FPGA simulation level. I'm not sure if this is a bug or if my code is violating some assumptions, but from what I can tell, I don't see anything obviously wrong. I put the code up in a Git repo to reproduce the issue: https://github.com/AustinKnutsonSprint/oneapi-timer-kernel-hang To remove the hang, comment out line 101 which is the problematic global memory access. I have been running my tests on devcloud with an Arria 10 FPGA.