WWP00
New Contributor
6 years agoHow many clock cycles to transfer between global and local memory?
For example, suppose we have a local memory array:
float local[10];
And a much larger, global memory array. Would we copy like:
int memStart = 50;
for (int i = 0; i < 10; ++i)
local[i] = globalMem[memStart + i];
Or should we use pragma unroll for this copy, to avoid making the loop take one clock cycle per copy? Or is there some other recommended way to move array data between local and global memory?
Does this transfer take 10 clock cycles, or lesss than that?