Is HyperFlex Optimization disabled by Non-Aligned LSU?
Hello,
I'm trying to understand why HyperFlex optimization is disabled in the following toy example.
#pragma OPENCL EXTENSION cl_intel_channels : enable
#define BLOCK_ROWS 8
#define BLOCK_COLS 8
channel float channel_A[BLOCK_COLS] __attribute__((depth(32)));
__kernel void readA(__global volatile const float * restrict A, const unsigned int N, const unsigned int M)
{
const uint BlocksN = 1 + (int)((N-1) / BLOCK_ROWS);
const uint BlocksM = 1 + (int)((M-1) / BLOCK_COLS);
for(int bi=0; bi < BlocksN; bi++){
for(int bj=0;bj<BlocksM;bj++){
for(int i=0;i<BLOCK_ROWS;i++){
//reading offset
const int offset = (bi*BLOCK_ROWS+i)*M+bj*BLOCK_COLS;
// dummy offset that will let hyper opt ON
// const int offset =0;
#pragma unroll
for(int j=0;j<BLOCK_COLS;j++){
float value = A[offset + j];
write_channel_intel(channel_A[j],value);
}
}
}
}
}
__kernel void dummy(__global volatile float * restrict A, const unsigned int N){
for(int i=0;i<N;i++){
#pragma unroll
for(int j=0;j<BLOCK_COLS;j++){
A[i]=read_channel_intel(channel_A[j]);
}
}
}In the example, I want to read a matrix block by block, and I would like to have multiple reads (unrolled innermost loop). A dummy kernel is in charge of receiving data for the sake of producing a report.
If I compile this with aoc 19.1 targeting a Stratix 10, I obtain that the HyperOptimization is disabled "because of the Load Operation that does not support it". The corresponding LSU is generated as bursted and non-aligned.
If I simplify accesses to matrix A (for example by using an offset equal to zero), the LSU is generated as bursted and Hyper Optimization is ON.
Could it be because of the non-aligned LSU? And if yes, is there some way to prevent the compiler to infer the accesses as non-aligned?
Thanks,