Forum Discussion
Altera_Forum
Honored Contributor
8 years agoHaving one write per branch will most likely give you even lower performance due to even more contention on the memory bus. Remember that with the kernel running at the same operating frequency as the memory controller (266 MHz in case of 2133 MHz memory) the FPGA external memory bandwidth will be saturated with two 512-bit accesses (read or write) per clock. If you have two 1024-bit accesses just for writing, and also some reads, then you are going to get a huge amount of contention on the memory bus.
In NDRange kernels, sequentially of threads is not guaranteed, but sentimentality of operations per thread is guaranteed.