Forum Discussion
Altera_Forum
Honored Contributor
11 years ago --- Quote Start --- Thank you so much for the help! I am just wondering: what if number of iteration for the for loop could not be determine at compilation time (either the operation is conditional or the size N changes during different kernel invocations)? I am forced to access local memory sequentially or is there any other optimization that can be done? --- Quote End --- You may try manually unrolling the inner loop like below. The compiler will again merge the stores in the inner loop into wide access; if-statements will determine which bytes within the wide access will be active. Although this gives efficient memory accesses, the kernel may not be as efficient as the earlier example where the loop bounds are known because of the double-nested loop.
for(row = 0; row < N; row++) {
for(col = 0; col < M; col++) {
if(4*col+0 < N) A = row + col;
if(4*col+1 < N) A = row + col;
if(4*col+2 < N) A = row + col;
if(4*col+3 < N) A = row + col;
}
}