The effect of conditional branch on performance

Question

Hi all,

According to the Altera aocl optimization guide, forwarding branch (if-then-else) will not impact the performance negatively. I am trying to see how it will affect the actual performance by using vectorAdd kernel.

The two version codes are like

version 1 (no branch):

__kernel void

__attribute((reqd_work_group_size(256,1,1)))

__attribute((num_simd_work_items(4)))

vectorAdd(__global const uint *x,

__global const uint *y,

__global uint *restrict z)

{

// get index of the work item

int index = get_global_id(0);

// add the vector elements

z[index] = x[index] + y[index];

}

version 2 (conditional branch):

__kernel void

__attribute((reqd_work_group_size(256,1,1)))

vectorAdd(__global const uint *x,

__global const uint *y,

__global uint *restrict z)

{

// get index of the work item

int index = get_local_id(0);

int block_id = get_group_id(0);

// add the vector elements

if(index < 64) {

z[4*index+256*block_id] = x[4*index+256*block_id] + y[4*index+256*block_id];

z[4*index+1+256*block_id] = x[4*index+1+256*block_id] + y[4*index+1+256*block_id];

z[4*index+2+256*block_id] = x[4*index+2+256*block_id] + y[4*index+2+256*block_id];

z[4*index+3+256*block_id] = x[4*index+3+256*block_id] + y[4*index+3+256*block_id];

}

I manually disable part of the work-groups by adding a condition. If an forwarding-branch do not hurt the performance then these 2 versions should lead to similar result, right?

But experiments show version 1 is much better than version 2 (5X~ faster). Then could I know if I misunderstand something about what the optimization guide saying about "forwarding branch will not impact the performance negatively"? Or there are some other tricks in the codes leading to different results?

Thanks.

altera_forum · Answer

What happens if you also specify the num_simd_work_items for version 2?    --- Quote Start ---   __attribute((num_simd_work_items(4))).  --- Quote End ---

altera_forum · Answer

Also have you changed the NDRange in your host code?

Forum Discussion

The effect of conditional branch on performance

2 Replies

Recent Discussions

Regarding the issue of UFM not starting

starting to learn FPGAs

qsys-generate outputs Info as Error

Timing analysis - long combinational path

Quartus Prime Lite 25.1 License Error - "Unable to checkout a license" (SALT_LICENSE_SERVER)