Altera_Forum
Honored Contributor
10 years agoThe effect of conditional branch on performance
Hi all,
According to the Altera aocl optimization guide, forwarding branch (if-then-else) will not impact the performance negatively. I am trying to see how it will affect the actual performance by using vectorAdd kernel. The two version codes are like version 1 (no branch): __kernel void __attribute((reqd_work_group_size(256,1,1))) __attribute((num_simd_work_items(4))) vectorAdd(__global const uint *x, __global const uint *y, __global uint *restrict z) { // get index of the work item int index = get_global_id(0); // add the vector elements z[index] = x[index] + y[index]; } version 2 (conditional branch): __kernel void __attribute((reqd_work_group_size(256,1,1))) vectorAdd(__global const uint *x, __global const uint *y, __global uint *restrict z) { // get index of the work item int index = get_local_id(0); int block_id = get_group_id(0); // add the vector elements if(index < 64) { z[4*index+256*block_id] = x[4*index+256*block_id] + y[4*index+256*block_id]; z[4*index+1+256*block_id] = x[4*index+1+256*block_id] + y[4*index+1+256*block_id]; z[4*index+2+256*block_id] = x[4*index+2+256*block_id] + y[4*index+2+256*block_id]; z[4*index+3+256*block_id] = x[4*index+3+256*block_id] + y[4*index+3+256*block_id]; } } I manually disable part of the work-groups by adding a condition. If an forwarding-branch do not hurt the performance then these 2 versions should lead to similar result, right? But experiments show version 1 is much better than version 2 (5X~ faster). Then could I know if I misunderstand something about what the optimization guide saying about "forwarding branch will not impact the performance negatively"? Or there are some other tricks in the codes leading to different results? Thanks.