Ok. I can try to explain the issue to one of the Intel-affiliated moderators in the forums. Thank you for your suggestion.
About the processing time, I tried to read/write less bit per clock cycle. At the moment I read/write 1024 bits 8 times instead of 8192 bits in a single reading/writing. In this way, the kernel operating frequency is higher and it reaches 242 MHz, according to the profiler. Unfortunately, the kernel execution time remains the same.
About the PCIe bottleneck, do you mean that it is not worth to do the processing in hw for only 8192 bits because of the PCIe transfer bottleneck?
Moreover, I would like to be sure that I understood correctly the flow from host to FPGA and viceversa. As matter of example, I consider the kernel reading operation. The flow consists on:
- PCIe writes the data to the DDR
- DDR has to be read by the kernel.
Thus, the bottlenecks are DDR access and PCIe writing. Is it correct?
Thank you for your support.