Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
8 years ago

[Kernel vectorization] Loads/Stores cannot be vectorized

Hi guys!

The compiler says this on the above code:

Compiler Warning: Vectorized kernel contains loads/stores that cannot be vectorized. This might reduce performance.

What loads/stores are causing this?

--- Quote Start ---

__attribute__((num_simd_work_items(16)))

__attribute__((reqd_work_group_size(32,1,1)))

__kernel void int_loop(__global const short * restrict a,

__constant const bool * restrict b,

__global int * restrict group_counters,

__local int * restrict local_counter,

const int base, const int base2)

{

__private uint local_id = get_local_id(0);

__private uint group_size = get_local_size(0);

__private uint global_id = get_global_id(0);

__private int g1 = a[base + global_id];

__private int comparison;

__private int wi_counter;

comparison = (g1 == a[base2 + global_id]);

wi_counter = comparison & b[global_id] & (g1 != 0);

local_counter[local_id] = wi_counter;

for(uint stride = group_size >> 1; stride > 0; stride = stride >> 1)

{

barrier(CLK_LOCAL_MEM_FENCE);

if(local_id < stride)

local_counter[local_id] += local_counter[local_id + stride];

}

if(local_id == 0)

group_counters[get_group_id(0)] = local_counter[0];

}

--- Quote End ---

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The likely case is the following part:

    if(local_id == 0)
        group_counters = local_counter;

    SIMD vectorizes operations at work-item level, and coalesces accesses that are consecutive at this level. For this particular access, since the write is done only by one work-item per work-group, it cannot be vectorized/coalesced.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    The likely case is the following part:

    if(local_id == 0)
        group_counters = local_counter;

    SIMD vectorizes operations at work-item level, and coalesces accesses that are consecutive at this level. For this particular access, since the write is done only by one work-item per work-group, it cannot be vectorized/coalesced.

    --- Quote End ---

    Always right mate :)

    Thanks HRZ