Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

Kernel Vectorization

hi

==========================================================

void tempA( ...) {...};

void tempB( ...) {...};

void processing(global int *a){

if(a == 0)

tempA( a );

else

tempB( a );

}

__attribute__((num_simd_work_items(2)))

__attribute__((reqd_work_group_size(256,1,1)))

kernel void test (__global int * a ) // NDR , globalsize = a /2 , initial a[ 0~N ] = 1

{

int gid = get_gloabla_gid(0);

for(int i = 0 ; i < 2 ; i++){

while(a[gid + i] == 0)

processing(&a[gid + i]);

}

}

===========================================================

The code I wrote above is the thing I was trying .

It showed that "Compiler Warning: Kernel Vectorization: branching is thread ID dependent ... cannot vectorize."

How to solve or explain this situation ?

And while loop with unpredicted end condition is not friendly for vectorization and very inefficent , right ?

Thanks.

3 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It means that one of your branches is thread ID dependent. So the follow section

    while(a == 0)
      processing(&a); 
    

    is thread-id dependent. Best practices guide states to avoid work-item dependent backwards branching.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks okebz ,

    So , if my write as follows , is it the same things ?

    ===========================================

    void tempA( ...) {...};

    void tempB( ...) {...};

    void processing(global int *a , int *b){

    if(a == 0)

    tempA( a ,b);

    else

    tempB( a ,b);

    }

    __attribute__((num_simd_work_items(2)))

    __attribute__((reqd_work_group_size(256,1,1)))

    kernel void test (__global int * a ) // NDR

    {

    int gid = get_gloabla_gid(0);

    int b ;

    while ( b ==0 )

    processing(&a[gid] , &b );

    }

    =================================

    But if my program flow is as previously said , how to optimize this code ?

    Each workitem stays in while loop until condition is matched.

    Is it better to use task instead of NDR ?

    Regards .,
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    As long as b is not dependent on the work-item ID. Yes, depending on what you're trying to do, it seems like a single task would be better. If your problem data set cannot be divided into independent sections and depends on other work items, then a single work-item kernel might be a good choice.