Forum Discussion
Altera_Forum
Honored Contributor
7 years ago --- Quote Start --- You can pipeline the loop like this:
__kernel void order( __global unsigned* restrict input,
__global unsigned* restrict output, int N ) {
unsigned sum=0;
for (unsigned i = 0; i < N; i++) {
for (unsigned j = 0; j < N; j++)
if (j < i) sum += input;
}
output = sum;
} However, since in this case both of the loops will run N times, depending on N, this code could actually be slower than the original case due to redundant computation. For such unpipelineable loops, it is actually preferred to use NDRange kernels. --- Quote End --- Thanks very much. My code is more complex then it is hard to make the same number of inner iterations... Yes, it is actually preferred to use NDRange kernels...