Intel HLS pipeline::lsu won't dispatch more than 8 request
I'm creating a design to load data from HBM using an Avalon memory mapped interface. Due to HBM not supporting typical bursting I need to dispatch as many individual request as possible to get the full bandwidth. Since I utilize the full data width natively and can't burst I decided to use a pipelined lsu to reduce resource utilization as the burst coalesced LSU instantiates many features I can't use, wasting space.
lsu<style<PIPELINED>, static_coalescing<false>>;
However whenever I run my component it will only dispatch 8 read request before stalling until it receives a response. I have checked using signal tap and the incoming wait-request signal is never asserted. This clearly indicates that the behavior is internal and caused by the HLS compiler. I cannot figure out how to change this behavior and there doesn't seem to be anything in the documentation about it.
From what I can gather it seems like HLS is arbitrarily instantiating the LSU with a FIFO size of 8. This behavior appears to be controlled by the Verilog parameter KERNEL_SIDE_MEM_LATENCY when creating a LSU. The value is hard coded in one of the generated files and I don't want to have to manually change it every time I re synthesize my design. I also do not know if there are other modules that will behave undesirably if I increase this FIFO size.
Is there a easy way for me to tell the LSU to make it's FIFO bigger without having to modify the underlying Verilog? I know I could use bursting with a burst adapter thus allowing me to use a typical burst-coalesced LSU but I would like to avoid adding unnecessary components and adapters to my design.
Below is an example of the LSU that HLS instantiates
lsu_top #( .ABITS_PER_LMEM_BANK(0), .ADDRSPACE(1025), .ALIGNMENT_BYTES(64), .ALLOW_HIGH_SPEED_FIFO_USAGE(0), .ASYNC_RESET(0), .ATOMIC(0), .ATOMIC_WIDTH(3), .AVM_READ_DATA_LATENESS(0), .AVM_WRITE_DATA_LATENESS(0), .AWIDTH(32), .BURSTCOUNT_WIDTH(1), .ENABLE_BANKED_MEMORY(0), .FORCE_NOP_SUPPORT(0), .HIGH_FMAX(1), .INPUTFIFO_USEDW_MAXBITS(5), .KERNEL_SIDE_MEM_LATENCY(7), .LMEM_ADDR_PERMUTATION_STYLE(0), .MEMORY_SIDE_MEM_LATENCY(0), .MWIDTH_BYTES(64), .NUMBER_BANKS(1), .PROFILE_ADDR_TOGGLE(0), .READ(1), .STALLFREE(0), .STYLE("PIPELINED"), .SYNCHRONIZE_RESET(0), .USECACHING(0), .USEINPUTFIFO(0), .USEOUTPUTFIFO(1), .USE_BYTE_EN(0), .USE_STALL_LATENCY(0), .USE_WRITE_ACK(0), .WIDE_DATA_SLICING(0), .WIDTH_BYTES(64), .WRITEDATAWIDTH_BYTES(64) ) thei_llvm_fpga_mem_a1_all_buff_sroa_0_0_copyload1_ld_unit5121 (