Waiting for IOs to complete

Question

Hello everyone,I am implementing openCL based kernels for the Intel Stratix 10 FPGA for some high performance application.I would like to know what is the best way to guarantee that the current data write to the global memory from a kernel is completed, before next iteration of the kernel is executed.I first thought of waiting in the kernel for some fixed amount of cycles but I don't see any defined way of achieving this using openCL.I hope someone would be able to guide me and suggest a way to achieve this.Regards,Gaurav

gsing13 · Answer

here is a reference of write-ack in the best practice guide https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html#lsu-modifiers__lsu-modifier-write-ack

hrz · Answer

If I understand your question correctly, this is not at all required. Kernel execution only finishes after all data is written to device memory; this is required by the OpenCL standard. Needless to say, kernel enqueue functions are non-blocking; hence, you need to use clFinish() or clWaitForEvents() to determine when the kernel execution has actually completed. If you enqueue two or more kernels back to back in the same queue, it is again guaranteed that each kernel starts only after the previous one finishes. Please note that the OpenCL standard does not guarantee global memory consistency during kernel execution.

gsing13 · Answer

Thanks for response.I do require the synchronization of the device memory during the kernel execution between the kernels which execute simultaneously.Can you suggest a way in which this can be achieved?

hrz · Answer

If you have multiple kernels running in parallel in different queues updating the same global buffer, this is always going to give you an undefined output because, as I said, the OpenCL standard ensures global memory consistency only after the kernel execution has finished. I tried doing something like what you want once by using channels between two kernels running in parallel and sending messages from one to the other to synchronize them, but that didn't work since channel operations and memory operations have different latency and there is no guarantee that by the time the message reaches the second kernel, the memory operation in the first kernel has finished. Intel also provides a global memory barrier that should supposedly help for such cases but didn't seem to make any difference in my case. You can try using channels in conjunction with the global memory barrier to see if it works for you but note that if it doesn't, this is completely normal since such functionality is not expected to be supported by the OpenCL standard. Needless to say, there will always be alternative designs which do not require sharing global memory buffers.

gsing13 · Answer

Thanks HRZ,I rather have a producer consumer relation between the kernels running simultaneously. As you pointed out as well, I expected to have a latency for the store operation in the producer and thats why wanted to understand the best way to implement a synchronization method. the channels like you said didn't work for me as well but I have not tried the combination of channels and memory fence.I also find an interesting discussion regarding the buffer management using volatile memory in the following post https://forums.intel.com/s/question/0D50P00003yyQkbSAE/question-regarding-buffer-management-for-aocl-kernelsI did try this and it is producing a much better results but there are still some errors and I need to debug it further to be sure.Also I would like to understand if any one can tell what is the purpose of write-ack LSU. They have higher latency then the burst coalesced LSU. Does they kind of guarantee memory updates while sacrificing cycles?

Forum Discussion

Waiting for IOs to complete

7 Replies

Recent Discussions

Error faced while executing on Agilex FPGA board....

AI Suite System Throughput Issue

Agilex 7 I-Series "aocl diagnose acl0" error following OFS

HLS Compiler 24.1 error - aocl-clang.exe - dll entry point not found

How Do I get the License for HLS?