LWH2F Throughput

Question

Dear all,I'm facing the issue that the throughput on the Agilex 5 LWH2F interface appears to be lower compared to Cyclone 5.- I have two custom boards, one is based on Cyclone 5, the other on Agilex 5.- I use a linux OS. The code runs in a linux kernel driver. The code for the driver is the same for both devices.- The memory is mapped using ioremap(), i.e. it is mapped as device memory.- I measure performance by taking a timestamp with ktime_get_ns(), then read/write 10000 uint32 values, then take another timestamp.- I've measured the follwing values- Cyclone5 board read: 47.2 MB/s (million bytes per second or 11.8 million words à 4 byte per second)- Cyclone5 board write: 73.1 MB/s- Agilex5 board read: 24.8 MB/s- Agilex5 board write: 18.8 MB/s- I've noticed that performance varies, depending on the CPU that the process is running on (when measuring in userspace I can explicitly set the cpu affinity; For the kernel driver I've noticed that it is sometimes slower than above, presumably because it's running on a different cpu).- There is a slight difference in the QSYS design:&nbsp; &nbsp; - The Cyclone5 based board:&nbsp; &nbsp; &nbsp; &nbsp; - drives the AXI bus with a 64 MHz clock.&nbsp; &nbsp; &nbsp; &nbsp; - uses the Avalon MM Slave Translator.&nbsp; &nbsp; - The Agilex5 based board:&nbsp; &nbsp; &nbsp; &nbsp; - drives the AXI bus with a 200 MHz clock.&nbsp; &nbsp; &nbsp; &nbsp; - uses the Avalon Memory Mapped Pipeline Bridge Intel FPGA IP.- Our FPGA takes one 64/200 MHz cycle to process the read (readdatavalid). For a write our FPGA doesn't generate a writeresponsevalid, this is handled by the IP block.- I'm using Quartus 25.1.1 for the Agilex 5 design.I'm aware that the LWH2F interface is not intended for high throughput. Also, since the memory is mapped as "Device Memory", every load/store is processed separately and we're not taking advantage of AXI bursts, etc.I'm aware that we could improve performance by using the H2F interface and mapping the memory as normal memory.That said, we have a prooven design and are reluctant to change it unless absolutely neccessary.So I have the following questions:- Is a higher latency expected on Agilex5? (E.g. due to a different architecture of the interconnect)- Have you measured the performance of the LWH2F interface? Can you give me a number on how many transactions per second we can expect?Kind Regards,Eric Opitz

tehjingy_altera · Answer

Hi EricOpitz&nbsp;
&nbsp;
For your reference, there is an errata that discusses potential HPS performance considerations related to the L3 cache, which you can review here:https://docs.altera.com/r/docs/825514/current/agilextm-5-es-device-errata-and-user-guidelines/degraded-hps-emif-performance-with-2mb-l3-cache
&nbsp;
To better understand whether this may be applicable in your case, could you please confirm if your device part number is included in the list described?&nbsp;
If possible, kindly share the exact device part number so we can help review it more accurately.
&nbsp;
&nbsp;

ericopitz · Answer

Hi tehjingy_Altera​,Yes our device is affected by the errata. The device number is A5ED013BB32AE4S. The errata suggests the workaround "Reduce L3 cache from 2MB to 1MB." We have configured this workaround in QSYS ("Override default cache size", MPU L3 Cache Size = 1MB).&nbsp;Should the workaround eliminate the performance issues entirely?Kind Regards,Eric Opitz

Forum Discussion

LWH2F Throughput

2 Replies

Recent Discussions

LWH2F Throughput

avalon mm stimulus

Agilex 5/3 FreeRTOS Heterogeneous SMP SDK Release

Stratix 10 Linux SD card booting

GSRD for DE25-Nano