Kernel Vectorization query

Question

Hi,

I am trying to incorporate the kernel vectorization optimization

I get the following compiler warning

Compiler Warning: Kernel is vectorized but there exist loads/stores that cannot be vectorized. This may reduce performance.

The following are the details:

Global thread dimension: 240 x 540

Local Work grp dimension 240 x 1

Input dimension 1920 x 1080

I used following attributes

_attribute__((num_simd_work_items(4)))

__attribute__((reqd_work_group_size(240,1,1)))

input loading code snippet :

for(UInt32 i = 0 ; i < 8; i++)

{

tempin[lidx + i * 240] = input[lidx + i * 240];

}

where

lidx: local_work_id in x direction with max val as 239 (since Local wrg grp dim 240 x 1)

tempin is a local memory buffer which is used for per workgrp computation

Can anyone suggest way to avoid this warning.............?

Let me know if I have to furnish any more details ......

Thanks

Neelakandan

altera_forum · Answer

reqd_work_group_size might be set to be the power of 2, 240 is not normal.

altera_forum · Answer

Hi  Even after specifying the required work group size as a  power of 2 (Instead of 240 I specified  it as 256), I get the same warning message... Can there be any other reason ?   Thanks

altera_forum · Answer

It is about "lidx + i * 240", the AOC cannot analyze them effectively, which leads to suboptimal performance.   You may try the "#prama unroll" before the for loop.

Forum Discussion

Kernel Vectorization query

3 Replies

Recent Discussions

Interfacing Avalon Streaming FIFO IP with GTS Ethernet Hard IP

Invalid license key (inconsistent authentication code)

Regarding the issue of UFM not starting

ram retiming

Reset Release IP for Agilex needs Stratix 10 device files installed!