I would like to understand more in depth if the problem is caused by an issue on my code or it really is a bug compiler. Do you think I have to open a technical support ticket with Intel in order to report it?
About the run time performance, if I understood well, I can calculate the FPGA external memory bandwidth as:
kernel_frequency x number_of_banks x bus_width
In my case I have 2 banks of DDR3 @933MHz, thus the memory operating frequency is 933x2=1866MHz. According to your answer in the thread that you suggested me, the memory controller on the FPGA has a frequency of 1866/8=233MHz. Thus, the maximum frequency achievable for the kernel is 233MHz. I have to read/write 512x16=8192 bits from/to the global memory. Assuming that the kernel operative frequency is 233MHz, the max FPGA external memory bandwidth is 233MHz x 2 x 64 = 29.8Gbps and the upper-bound is 8192bit/29.8Gbps=275 nanoseconds to transfer data from/to the FPGA.
Are these calculation correct? Am I making some mistakes?
Thank you very much.