Knowledge Base Article
Why does #pragma ivdep not work correctly in aocl version 17.0?
Description
In 16.1, this code behaved as expected where the outer loop was serialized due to dependencies and the inner loop dependencies were removed by the #pragma ivdep.
// This loop gets serialized due to true dependencies with inner loop
for (unsigned char x = 0; x < 4; x ) {
// Inner loop does not have inter-iteration dependencies, but depends on outer loop
#pragma ivdep
for (unsigned char y = 0; y<64; y ) {
In 17.0, the #pragma ivdep is now applied to both the inner and outer loop, so the dependencies in the outer loop are not accounted for by the compiler. As a result, similar code may not work correctly in hardware despite working in emulation.
Resolution
Workaround:
1. Add an extra argument "dummy" to the kernel. On the host side, always pass 1 for this dummy argument.
BEFORE
__kernel void my_kernel(
__global cpx_t* restrict input,
__global cpx_t* restrict result)
AFTER
__kernel void my_kernel(
__global cpx_t* restrict input,
__global cpx_t* restrict result,
int dummy)
2. In the loop nest, wrap the inner loop in "if (dummy)":
// This loop gets serialized due to true dependencies
for (unsigned char x = 0; x < 4; x ) {
if (dummy) {
// No dependencies within each set of 64 iterations
#pragma ivdep
for (unsigned char y = 0; y<64; y ) {
This issue is scheduled to be fixed in a future version of the Intel© OpenCL™ for FPGA SDK.