[QUOTE=HRZ;239754
This should also be somewhere in their OpenCL documentation but I didn't find it.
.
--- Quote End ---
Not much is mentioned about conditional execution but some information is there in Best practices guide Pg 162, chapter 9. Strategies for Optimizing Intel Stratix® 10 OpenCL Designs. But do you think the Kernel 2 with logic replication would give different latency.
__kernel
__attribute__((task))
void dummy_kernel
(
uchar switch_loop,
__global float *restrict bottom ,
__local float *restrict top,
__global float *restrict final
)
{
float privatee;
if (switch_loop == 0)
{
for (unsigned i = 0; i< 20; i++) {
privatee = bottom
;
final = privatee;
}
}
else
{
for (unsigned i = 0; i< 20; i++) {
privatee = top
;
final = privatee;
}
}
}