Altera_Forum
Honored Contributor
11 years agoKernel load-store vectorization
Hello all,
I am in the process of optimizing an OpenCL kernel which performs a very simple task (the core part of the kernel is about 10 lines long and does not contain any complex branching or such). As I try to use the "num_simd_work_items" attribute, aoc outputs the following warning : --- Quote Start --- Compiler Warning: Vectorized kernel contains loads/stores that cannot be vectorized. This might reduce performance. --- Quote End --- The result I get is higher resource usage and lower throughput limited by global memory. I'm trying to understand what causes the compiler not to be able to vectorize the loads/stores.- My code looks like this :
#ifdef ALTERA_CL
__attribute__((num_simd_work_items(4)))
__attribute__((reqd_work_group_size(256,1,1)))# endif
__kernel void kernel( __global const uchar * restrict input, __global uchar * restrict output) {
unsigned int gx = get_global_id(0);
// Load to private mem
__private float tempIn = (float) input;
__private uchar tempOut;
// Do stuff
tempOut = f(tempIn);
// Store to global mem
output = tempOut;
} I also tried to vectorize by hand (see code below), but while the application worked on CPU, I stumbled upon walls of error from llvm when invoking aoc. - Manually vectorized code, works on CPU :
#ifdef ALTERA_CL
__attribute__((reqd_work_group_size(256,1,1)))# endif
__kernel void melate( __global const uchar * restrict input, __global uchar * restrict output) {
unsigned int gx = get_global_id(0)*4;
// Private mem
__private float4 tempIn = convert_float4( vload4 (0, input+gx));
__private uchar4 tempOut;
// Do stuff
tempOut = f(tempIn);
// Store to global mem
vstore4(tempOut, 0, output+gx);
} - The part I suspect is causing errors with llvm :
__private float4 tempVar;
__private float4 ot = (255, 255, 255, 255);
__private float4 ut = (0, 0, 0, 0);
tempVar = (tempIn > TRS) ? ot : ut; Any help with these two problems will be greatly appreciated. :) koper