You can try reporting your issue to Intel, but unless you have Premier Support access, I don’t think it is possible to open tickets with Intel anymore. They have offloaded support for people who do not have Premier Support access to the forums. Maybe you can send a PM to one of the Intel-affiliated moderators in the forum and then they can open a ticket with the engineering team on your behalf.
Regarding the memory performance, the memory of the DE5-Net board operates at 1600 MHz (800 MHz double data-rate) and the memory controller operates at 200 MHz. The peak external memory bandwidth of the board is 2 x 64 bit x 1600 MHz = 25.6 GB/s = 23.8 GiB/s. Assuming that your OpenCL kernel is running at 200 MHz, you will need to read/write a minimum of 128 bytes per clock to saturate the external memory bandwidth. Since the kernel and the memory controller operate at different clocks, and the clock of the memory controller is fixed, you can saturate the memory bandwidth using less bytes per clock at a higher kernel operating frequency, or using more bytes per clock at a lower kernel operating frequency. The compiler places buffers between the kernel and the memory interface to allow this. If the total amount of data you need to transfer between the FPGA and its external memory is only 8192 bits, then you would probably be better off just running your code on the host since your bottleneck will definitely be the PCI-E transfer. The computation on the FPGA in this case will be latency-bound since you will not be saturating the pipeline, and your run time will then depend on the depth of the pipeline and kernel operating frequency - parameters which the user has very little control over in a high-level design.