Knowledge Base Article
Why do I get bad performance when compiling vector add example design with Intel® FPGA SDK for OpenCL™?
Description
Due to a problem in the Intel® FPGA SDK for OpenCL™ version 18.1 and later, you may get bad performance when you compile the same vector_add example design code. The performance is as follows.
Intel® FPGA SDK for OpenCL™ version | Performance |
V16.1 V18.0 V18.1 V19.1 | ~3ms ~3ms ~170ms ~170ms |
Resolution
To work around this problem, add an attribute to vector_add.cl which sets the required work group size.
__attribute__((reqd_work_group_size(1, 1, 1)))
__kernel void vector_add(__global const float *x,
__global const float *y,
__global float *restrict z)
{
// get index of the work item
int index = get_global_id(0);
// add the vector elements
z[index] = x[index] y[index];
}
The problem is scheduled to be fixed in a future release of the the Intel® FPGA SDK for OpenCL™.