Forum Discussion

PRavi7's avatar
PRavi7
Icon for New Contributor rankNew Contributor
6 years ago

Concurrent execution of two loops in OpenCL

Hi,

I'm implementing a kernel in OpenCL which has two loops - Loop_a and Loop_b. Loop_a and Loop_b operations are totally independent of each other and hence can be executed concurrently. The code has been optimized as follows

int loop_limit = max(loop_a_num_iterations, loop_b_num_iterations);
for(int i = 0; i < loop_limit; i++)
{
    if(i < loop_a_num_iterations)
    {
        // loop_a operation
    }
    if(i < loop_b_num_iterations)
    {
        // loop_b operation
    }
}

Both these if statements are executed concurrently. loop_a operation has high latency than loop_b operation, but loop_a performs lesser number of iterations than loop_b. For the first loop_b_num_iterations, both loop_a operation and loop_b operation is executed at the same high latency as loop_a. Followed by this is remaining iterations for loop_b operation.

Is there a better way to overlap the execution of two loops?

Thanks in advance

2 Replies

  • HRZ's avatar
    HRZ
    Icon for Frequent Contributor rankFrequent Contributor

    The best way to overlap the execution of two different blocks of code in single work-item kernels is to put them int two different kernels, create two queues on the host, and queue the kernels concurrently. It is expected that the compiler should implement two independent blocks of code within the same kernel in a parallel fashion anyway. May I ask why you care about the "latency" of the operations? As long as you have a fully pipelined loop with an initiation interval of 1 and your input size (loop trip count) is large enough, the latency of the loop will have negligible effect on performance/run time.