Forum Discussion
Have you checked further down the pipeline? Especially the memory write at the very end? The stalls could be propagating from there all the way up the pipeline. Also, are the IIs of all your loops the same?
P.S. Fantastic comments in your code. :D
I have checked the other PEs and also the final stage which receives the final output and writes it back to the memory. they suffer from the same stall.
I'm kinda afraid that I'm not capturing the PEs counter numbers properly.
BTW, about the comments, that's how a software engineer survives FPGA programming :D
- HRZ6 years ago
Frequent Contributor
Then it sounds like the stalls are propagating from the bottom of the pipeline. I am afraid I have never profiled autorun kernels, so I cannot comment on the correctness of the way you are capturing the counters. However, I find it very unlikely for regular compute PEs or on-chip channel to become a performance bottleneck. As a test, you can remove all your PEs from the kernel, and just keep the memory read/write kernels directly connected to each other through a channel. If you get similar stalling on this simplified kernel, the problem is coming from memory. Note that if you are exhausting the external memory bandwidth, seeing such stalls on the channels is completely normal.
- SBioo6 years ago
Occasional Contributor
Alright,
Thanks much. Will try the new approach and see how things will change. Will update you here :)