Agilex 5E ES Memory Performance Issues
Setup
We observed significant performance issues during sequential memory reads in HPS.
Target device is A5ED065BB32AE6SR0 from the premium dev kit using the GSRD Example.
Sysbench was used to benchmark the memory performance.
Test Results
For comparison, we also performed the test on an STM32 system (Arm Dual Cortex-A7 800 MHz) and the host PC (Ryzen 7 CPU).
| Agilex 5E ES | STM32MP157F | Host PC, Ryzen 7 | |
| T0 (sequential read) | 480 MiB/s | 290 MB/s | 78972 MiB/s |
| T1 (sequential write) | 4058 MiB/s | 190 MB/s | 44749 MiB/s |
| T2 (random read) | 67 MiB/s | 373 MB/s | 3461 MiB/s |
| T3 (random write) | 52 MiB/s | 372 MB/s | 3608 MiB/s |
The test cases where executed as follows:
T0: sysbench --num-threads=1 --time=10 memory --memory-block-size=4MiB --memory-total-size=64GiB --memory-access-mode="seq" --memory-hugetlb=off --memory-oper=read run
T1: sysbench --num-threads=1 --time=10 memory --memory-block-size=4MiB --memory-total-size=64GiB --memory-access-mode="seq" --memory-hugetlb=off --memory-oper=write run
T2: sysbench --num-threads=1 --time=10 memory --memory-block-size=4MiB --memory-total-size=64GiB --memory-access-mode="rnd" --memory-hugetlb=off --memory-oper=read run
T3: sysbench --num-threads=1 --time=10 memory --memory-block-size=4MiB --memory-total-size=64GiB --memory-access-mode="rnd" --memory-hugetlb=off --memory-oper=write run
Observations
The sequential read operation on the Agilex 5 is significantly (factor 10!) slower than the write operation. Especially when comparing to other systems where the sequential read achieves about a third more throughput.
Sequential Read vs. Write: The sequential read operation on Agilex 5E ES is about 10x slower than the sequential write operation. On other systems, sequential read typically achieves about 30% higher throughput than write.
We found 2 possible issues with the ES devices in the Errata:
- Degraded HPS EMIF performance with 2MB L3 Cache: https://docs.altera.com/r/docs/825514/current/agilextm-5-es-device-errata-and-user-guidelines/degraded-hps-emif-performance-with-2mb-l3-cache
- HPS EMIF read throughput less than target: https://docs.altera.com/r/docs/825514/current/agilextm-5-es-device-errata-and-user-guidelines/hps-emif-read-throughput-less-than-target
The workaround for 1. is to change the L3-cache to a value different to 2MB. However, this did not improve the performance any way.
For the second errata entry, there is no workaround.
Question
Is the "HPS EMIF read throughput less than target" errata entry the primary cause of the degraded sequential read performance?
If confirmed, is this issue resolved in the series Agilex 5 Devices, and what performance improvements can we expect?