OpenCL private_copies attribute does not seem to work in 20.1
Hello OpenCL FPGA developers,
I have an OpenCL NDRange (64,1,1) kernel with 8 times replicated multiple local memories which make this kernel memory size limited (>100% M20s on A10). I have attempted to limit the replication factor by applying the newly introduced attribute in UG-OCL002 | 2020.04.13 20.1 aocl_programming_guide.pdf page 41.
Example for one of the buffers:
__local float __attribute__((private_copies(4))) x[M][N];
However, this attribute does not seem to have intended effect and I am stuck with 8 times replicate private copies. I know that reducing replication by a factor of 2 will make my kernel slower, but I could use less memory tradeoff for a bit slower kernel. Moreover, the speed decrease when all these buffers are used is a small percentage of the overall kernel schedule.
Thank you for your input.