Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
12 years ago

Kernel Vectorization query

Hi,

I am trying to incorporate the kernel vectorization optimization

I get the following compiler warning

Compiler Warning: Kernel is vectorized but there exist loads/stores that cannot be vectorized. This may reduce performance.

The following are the details:

Global thread dimension: 240 x 540

Local Work grp dimension 240 x 1

Input dimension 1920 x 1080

I used following attributes

_attribute__((num_simd_work_items(4)))

__attribute__((reqd_work_group_size(240,1,1)))

input loading code snippet :

for(UInt32 i = 0 ; i < 8; i++)

{

tempin[lidx + i * 240] = input[lidx + i * 240];

}

where

lidx: local_work_id in x direction with max val as 239 (since Local wrg grp dim 240 x 1)

tempin is a local memory buffer which is used for per workgrp computation

Can anyone suggest way to avoid this warning.............?

Let me know if I have to furnish any more details ......

Thanks

Neelakandan

3 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    reqd_work_group_size might be set to be the power of 2, 240 is not normal.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi

    Even after specifying the required work group size as a power of 2 (Instead of 240 I specified it as 256), I get the same warning message...

    Can there be any other reason ?

    Thanks
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It is about "lidx + i * 240", the AOC cannot analyze them effectively, which leads to suboptimal performance.

    You may try the "#prama unroll" before the for loop.