Forum Discussion
FawazJ_Altera
Frequent Contributor
6 years agoHello,
This approach should work well when the producer and consumer kernels are operating on large chunks of contiguous memory so that the large DDR access penalty is hidden. It looks like you just want to read some data from global memory, compute some results with it, and then write the results back into global memory for the host to read. If this is the case, then I think it would be best to create a channel between the producer and consumer kernel to minimize latency.
To debug the stalling issue, I suggest compiling and running the design with the profiler enabled. Knowing the memory access pattern (sequential, random, etc.) would also help.
Thanks