User Profile

AUT

New Contributor

Joined 2 years ago

5 Posts

View All Badges

User Widgets

Contributions

Re: Intel HLS pipeline::lsu won't dispatch more than 8 request
Hi @justin-rosner Thank you for letting me know, we will continue with development of our Verilog based design. Given the high latency of HBM I currently need around 128 outstanding request. The best I can get even upping KERNEL_SIDE_MEM_LATENCY is 41 request so I think other more involved changes would be needed in the generated Verilog to exploit the full depth. If possible in a future release I think a feature similar to Xilinx's num_read_outstanding would be useful especially for HBM/DDR designs. Best, Austin
1 year ago Place Acceleration
1.9KViews
0likes
0Comments
Re: Intel HLS pipeline::lsu won't dispatch more than 8 request
Having to instantiate a significantly more complex LSU isn't a helpful solution. Doing the math we won't be able to complete our design with HLS if we have to use burst coalesced LSUs. Our design would use none of the features from the burst coalesced unit. It seems like there should be a way to indicate to HLS that I want a deeper pipelined LSU without wasting copious amounts of of resources. We are starting to switch to Verilog at this point since there doesn't seem to be an answer to this question. If anyone has a solution to this we would greatly appreciate it as having to rewrite our design in Verilog is significantly affecting our timeline.
1 year ago Place Acceleration
2.2KViews
0likes
2Comments
Re: Intel HLS pipeline::lsu won't dispatch more than 8 request
Hi @BoonBengT_Altera I am using HLS 21.1 with a Stratix 10MX developer kit. Sorry if I misspoke, it isn't an example design it is an example of the LSU that HLS generates for my design. The module I posted is from the Verilog that HLS generates. It is generated from a standard mm_master interface with a pipelined lsu transfers specified.
1 year ago Place Acceleration
2.4KViews
0likes
0Comments
Re: Intel HLS pipeline::lsu won't dispatch more than 8 request
Some more follow up information I tried increasing the KERNEL_SIDE_MEM_LATENCY to 63 and the number of dispatched request did increase. However, they didn't increase to 64 they increased to 41? Again this is internal to the HLS module as there is no incoming waitrequest signal. This really confuses me as I figured any internal limits would be a factor of 2. Additionally this doesn't seem consistent as someone I am working with reported that they could only get 10 request to dispatch when modifying a slightly different design with a pipelined LSU. I would really appreciate some help with getting the pipelined LSU to work as using a burst interface increases usage by ~2-4x for most resources and by ~40x for M20K blocks. As the number of channels scales to saturate all the HBM channels this will begin to waste non-negligible amounts of resources impacting our final design performance. Given the Stratix's already limited amount of M20K that waste is really making things difficult.
1 year ago Place Acceleration
2.5KViews
0likes
0Comments
Intel HLS pipeline::lsu won't dispatch more than 8 request
I'm creating a design to load data from HBM using an Avalon memory mapped interface. Due to HBM not supporting typical bursting I need to dispatch as many individual request as possible to get the full bandwidth. Since I utilize the full data width natively and can't burst I decided to use a pipelined lsu to reduce resource utilization as the burst coalesced LSU instantiates many features I can't use, wasting space. lsu<style<PIPELINED>, static_coalescing<false>>; However whenever I run my component it will only dispatch 8 read request before stalling until it receives a response. I have checked using signal tap and the incoming wait-request signal is never asserted. This clearly indicates that the behavior is internal and caused by the HLS compiler. I cannot figure out how to change this behavior and there doesn't seem to be anything in the documentation about it. From what I can gather it seems like HLS is arbitrarily instantiating the LSU with a FIFO size of 8. This behavior appears to be controlled by the Verilog parameter KERNEL_SIDE_MEM_LATENCY when creating a LSU. The value is hard coded in one of the generated files and I don't want to have to manually change it every time I re synthesize my design. I also do not know if there are other modules that will behave undesirably if I increase this FIFO size. Is there a easy way for me to tell the LSU to make it's FIFO bigger without having to modify the underlying Verilog? I know I could use bursting with a burst adapter thus allowing me to use a typical burst-coalesced LSU but I would like to avoid adding unnecessary components and adapters to my design. Below is an example of the LSU that HLS instantiates lsu_top #( .ABITS_PER_LMEM_BANK(0), .ADDRSPACE(1025), .ALIGNMENT_BYTES(64), .ALLOW_HIGH_SPEED_FIFO_USAGE(0), .ASYNC_RESET(0), .ATOMIC(0), .ATOMIC_WIDTH(3), .AVM_READ_DATA_LATENESS(0), .AVM_WRITE_DATA_LATENESS(0), .AWIDTH(32), .BURSTCOUNT_WIDTH(1), .ENABLE_BANKED_MEMORY(0), .FORCE_NOP_SUPPORT(0), .HIGH_FMAX(1), .INPUTFIFO_USEDW_MAXBITS(5), .KERNEL_SIDE_MEM_LATENCY(7), .LMEM_ADDR_PERMUTATION_STYLE(0), .MEMORY_SIDE_MEM_LATENCY(0), .MWIDTH_BYTES(64), .NUMBER_BANKS(1), .PROFILE_ADDR_TOGGLE(0), .READ(1), .STALLFREE(0), .STYLE("PIPELINED"), .SYNCHRONIZE_RESET(0), .USECACHING(0), .USEINPUTFIFO(0), .USEOUTPUTFIFO(1), .USE_BYTE_EN(0), .USE_STALL_LATENCY(0), .USE_WRITE_ACK(0), .WIDE_DATA_SLICING(0), .WIDTH_BYTES(64), .WRITEDATAWIDTH_BYTES(64) ) thei_llvm_fpga_mem_a1_all_buff_sroa_0_0_copyload1_ld_unit5121 (
1 year ago Place Acceleration
High-level Design Tools
2.7KViews
0likes
9Comments