What causes OpenCL to insert arbitration for local memory accesses?
I know FPGA OpenCL is deprecated in favor of OneAPI, but I hope you can help me anyway. I've created a MWE of a kernel for which the compiler inserts arbitration:
__attribute__((uses_global_work_offset(0))) __attribute__((max_global_work_dim(0))) __kernel void kmain(uint n_tics, __global const volatile uint * restrict dsts) { float frontier[100]; #pragma disable_loop_pipelining for (uint i = 0; i < 100; i++) { frontier[i] = 0; } uint nqueue[100]; uint nqueue_n = 20; for (uint t = 0; t < n_tics; t++) { for (uint i = 0; i < 100; i++) { float tmp = frontier[i]; frontier[i] = 0; } for (uint j = 0; j < nqueue_n; j++) { uint src=nqueue[j]; frontier[dsts[src]] += 50; } } }
So first I reset all elements of frontier. Then the simulation loop starts and I read one element from frontier and clear it. Then I add 50 to the values at the indexes given by another variable. I know the kernel reads from uninitialized memory, but it's beside the point (I think). In the report aoc complains about "Potentially inefficient configuration" and I can see that it has inserted arbitration circuits (see screenshot).
So the question is why? And how can I fix this memory access pattern to be arbitration-free?
Hi Björne2,
Maybe can refer to this link for OpenCL optimization related.
https://www.youtube.com/watch?v=1zGpN28mXN4
I will try to consult the team if there is any further info which may help.
Thanks.
Regards,
Aik Eu