Forum Discussion
Altera_Forum
Honored Contributor
11 years agoThank you! I'll try. What if I have 2d work groups instead of for loops, where each thread copies 1 item from the global memory to local memory? Would the compiler automatically merge memory accesses? Is using "num_simd_work_items" the only way to optimize the kernel?