FPGA Opencl caching for CPU access to shared memory through ACP port
I am doing a project on CycloneV soc which involves transferring large amount of data from memory to FPGA. As DMA needs to work with physically contiguous memory, I copied data to the share memory first (allocated by clEnqueueMapBuffer) and then FPGA consumes the data accordingly.
The problem I am having now is that moving data to the shared memory in user space is very time-consuming. I think it is due to the fact that Intel OpenCL library disables the caching for cpu access to the shared memory (as shown in the following pic). I can understand that it's hard to manage cache coherence in this case but it is not impossible to achieve!
As the Intel OpenCL library is not open-source, it seems hard for us to do any changes and enable the cache. Can anyone tell me a way around this problem, plz?