LWH2F Throughput
Dear all,
I'm facing the issue that the throughput on the Agilex 5 LWH2F interface appears to be lower compared to Cyclone 5.
- I have two custom boards, one is based on Cyclone 5, the other on Agilex 5.
- I use a linux OS. The code runs in a linux kernel driver. The code for the driver is the same for both devices.
- The memory is mapped using ioremap(), i.e. it is mapped as device memory.
- I measure performance by taking a timestamp with ktime_get_ns(), then read/write 10000 uint32 values, then take another timestamp.
- I've measured the follwing values
- Cyclone5 board read: 47.2 MB/s (million bytes per second or 11.8 million words à 4 byte per second)
- Cyclone5 board write: 73.1 MB/s
- Agilex5 board read: 24.8 MB/s
- Agilex5 board write: 18.8 MB/s
- I've noticed that performance varies, depending on the CPU that the process is running on (when measuring in userspace I can explicitly set the cpu affinity; For the kernel driver I've noticed that it is sometimes slower than above, presumably because it's running on a different cpu).
- There is a slight difference in the QSYS design:
- The Cyclone5 based board:
- drives the AXI bus with a 64 MHz clock.
- uses the Avalon MM Slave Translator.
- The Agilex5 based board:
- drives the AXI bus with a 200 MHz clock.
- uses the Avalon Memory Mapped Pipeline Bridge Intel FPGA IP.
- Our FPGA takes one 64/200 MHz cycle to process the read (readdatavalid). For a write our FPGA doesn't generate a writeresponsevalid, this is handled by the IP block.
- I'm using Quartus 25.1.1 for the Agilex 5 design.
I'm aware that the LWH2F interface is not intended for high throughput. Also, since the memory is mapped as "Device Memory", every load/store is processed separately and we're not taking advantage of AXI bursts, etc.
I'm aware that we could improve performance by using the H2F interface and mapping the memory as normal memory.
That said, we have a prooven design and are reluctant to change it unless absolutely neccessary.
So I have the following questions:
- Is a higher latency expected on Agilex5? (E.g. due to a different architecture of the interconnect)
- Have you measured the performance of the LWH2F interface? Can you give me a number on how many transactions per second we can expect?
Kind Regards,
Eric Opitz