Forum Discussion
Altera_Forum
Honored Contributor
7 years agoGPUs have complex and efficient memory controllers and mostly rely on run-time access coalescing of consecutive accesses by threads in a warp. On FPGAs, there is little (or likely no) support for run-time coalescing and accesses must be coalesced at compile-time instead. This can be achieved by unrolling the memory access loop in single work-item kernels, or using SIMD in NDRange kernels. If you check the system viewer section in the area report, you will see that loop unrolling/SIMD will increase the size of the ports going from the kernel to memory.