Forum Discussion
Hi @AUT,
Noted on the version used and device involved, thanks for the explanation.
Based on the explanation and request would recommended parhaps to look at a type of LSU which is the burst coalesced.
The allow larger and more robust order to utilize the memory bandwidth more efficiently. You may refer to more explanation of the coalesced type LSU in the link below:
There are also best practices and sample codes which could demo the LSU which comes with the HLS installation, you may find them under the following path:
- <quartus install directory>\hls\examples\tutorials\best_practices\lsu_control
Hope that clarify
Best Wishes
BB
- AUT1 year ago
New Contributor
Having to instantiate a significantly more complex LSU isn't a helpful solution. Doing the math we won't be able to complete our design with HLS if we have to use burst coalesced LSUs. Our design would use none of the features from the burst coalesced unit. It seems like there should be a way to indicate to HLS that I want a deeper pipelined LSU without wasting copious amounts of of resources.
We are starting to switch to Verilog at this point since there doesn't seem to be an answer to this question. If anyone has a solution to this we would greatly appreciate it as having to rewrite our design in Verilog is significantly affecting our timeline.
- justin-rosner1 year ago
New Contributor
Hi @AUT ,
Unfortunately at this time, without specifically modifying the generated Verilog (i.e. updating the KERNEL_SIDE_MEM_LATENCY so that the instantiated FIFO is larger), there is no way to increase the capacity of the FIFO associated with the pipelined LSU. What is the desired number of dispatch requests that you are trying to achieve?- AUT1 year ago
New Contributor
Thank you for letting me know, we will continue with development of our Verilog based design. Given the high latency of HBM I currently need around 128 outstanding request. The best I can get even upping KERNEL_SIDE_MEM_LATENCY is 41 request so I think other more involved changes would be needed in the generated Verilog to exploit the full depth.
If possible in a future release I think a feature similar to Xilinx's num_read_outstanding would be useful especially for HBM/DDR designs.
Best,
Austin